This script uses the BeautifulSoup library to scrape data from the Central Bank of Russia's website by building a URL with a gene parameter that specifies the organization's Taxpayer Identification Number (hereinafter - INN).

The numpy library is used to read a csv file containing a list of INN numbers, which are then passed to the build_url() function to construct a list of URLs.

The requests library is used to fetch the HTML content of each URL and the resulting HTML code is passed to BeautifulSoup to parse the tables.

Finally, the pandas library is used to transform each table into a DataFrame and check if it is empty or not. The script outputs '1' if the table is not empty, '0' if it is, and 'error' if an exception is raised.

In [1]:
from bs4 import BeautifulSoup
import pandas as pd
import requests
import numpy as np


def build_url(gene):
    return 'https://www.cbr.ru/finorg/?UniDbQuery2.Posted=True&UniDbQuery2.SearchPrase=&UniDbQuery2.SearchOGRN=&UniDbQuery2.SearchINN=' + gene + '&UniDbQuery2.SearchREGN=&UniDbQuery2.SearchADR=&UniDbQuery2.Lic=&UniDbQuery2.ViewMode=0&UniDbQuery2.orgType=&UniDbQuery2.foStatus=1&UniDbQuery2.foOkato='
# Generic extraction of list genes from csv
csv_file = np.genfromtxt('DM_cbr_finorg_parse_data.csv', 
                          delimiter=',', dtype=str)
genes = csv_file[:].tolist()

# Using regular for
genes_urls = []
for gene in genes:
    genes_urls.append(build_url(gene))

# Using list comprehension
res = [requests.get(url) for url in genes_urls]

soup = [BeautifulSoup(tab.text) for tab in res]

tables = [tablesoup.find_all('table', {'class': 'data'}) for tablesoup in soup]

result = [pd.DataFrame(res) for res in tables]

for resss in result:
    if resss.empty == True:
        print ('0')
    elif resss.empty == False:
        print('1')
    else:
        print('error')

1
1
0
1
0
0
0


This Python script uses the BeautifulSoup library and the requests module to scrape data from the Central Bank of Russia's website. Now the user inputs a list of Major State Registration Numbers (hereinafter - OGRN), which are then used to construct a URL for each OGRN number. The script then uses requests to access each URL and retrieve the HTML content.

The script then uses BeautifulSoup to parse the HTML and extract a specific table from it based on the table's unique class attribute. It then uses a custom function called tableDataText to extract the table data and create a pandas DataFrame with it. The DataFrame is then displayed using the display() function.

In [48]:
from bs4 import BeautifulSoup as Soup
import requests


OGRN = input().split(';')
for i in OGRN:
    URL = 'https://www.cbr.ru/finorg/foinfo/?ogrn=' + i
    htmltable = requests.get(URL)

html = htmltable # read your html with urllib/requests etc.
soup = BeautifulSoup(html.text, parser='lxml')

htmltable = soup.find('table', { 'class' : 'data' })
# where the dictionary specify unique attributes for the 'table' tag

def tableDataText(table):    
    """Parses a html segment started with tag <table> followed 
    by multiple <tr> (table rows) and inner <td> (table data) tags. 
    It returns a list of rows with inner columns. 
    Accepts only one <th> (table header/data) in the first row.
    """
    def rowgetDataText(tr, coltag='td'): # td (data) or th (header)       
        return [td.get_text(strip=True) for td in tr.find_all(coltag)]  
    rows = []
    trs = table.find_all('tr')
    headerow = rowgetDataText(trs[0], 'th')
    if headerow: # if there is a header row include first
        rows.append(headerow)
        trs = trs[1:]
    for tr in trs: # for every table row
        rows.append(rowgetDataText(tr, 'td') ) # data row       
    return rows

list_table = tableDataText(htmltable)
#list_table[:]

dftable = pd.DataFrame(list_table[1:], columns=list_table[0])
display(dftable)

Unnamed: 0,№ лицензии,Статус лицензии,Тип лицензии,Подтип лицензии,Дата выдачи лицензии,Срок действия лицензии,Дата прекращения действия лицензии,Файл лицензии
0,1943,Отозванная,Лицензия на привлечение во вклады денежных сре...,,25.09.2013,Без ограничения срока действия,,
1,1943,Отозванная,Лицензия на осуществление банковских операций ...,,25.09.2013,Без ограничения срока действия,,
