Az adatok Eurostat-ról származnak. Részletes leírása az `eurostat` könyvtárnak:
https://pypi.org/project/eurostat/
<br>
Első lépés: instalálni a könyvtárat:
`$ python -m pip install eurostat`


In [107]:
import eurostat
import pandas as pd

`read a dataset from the main database` fejezet
<br> Az én dataset-em amit olvasni akarok: _sbs_na_ind_r2_

In [298]:
df = eurostat.get_data_df('sbs_na_ind_r2')
df

Unnamed: 0,nace_r2,indic_sb,geo\time,2019,2018,2017,2016,2015,2014,2013,2012,2011,2010,2009,2008,2007,2006,2005
0,B,V11110,AT,298.0,304.0,341.0,348.0,348.0,349.0,353.0,355.0,355.0,358.0,357.0,349.0,345.0,352.0,358.0
1,B,V11110,BA,178.0,182.0,197.0,199.0,196.0,188.0,191.0,194.0,223.0,,,,,,
2,B,V11110,BE,174.0,147.0,218.0,192.0,213.0,192.0,214.0,203.0,274.0,293.0,,,224.0,208.0,219.0
3,B,V11110,BG,304.0,326.0,342.0,357.0,374.0,359.0,386.0,382.0,379.0,378.0,373.0,353.0,304.0,266.0,243.0
4,B,V11110,CH,,209.0,205.0,212.0,212.0,211.0,216.0,193.0,195.0,199.0,203.0,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
758987,E3900,V94415,SE,12.8,11.9,5.7,8.9,11.2,6.9,5.7,8.9,9.8,8.3,11.3,22.0,,,
758988,E3900,V94415,SI,26.1,15.1,30.7,13.1,5.7,,12.2,21.6,,,,,,,
758989,E3900,V94415,SK,9.0,11.0,2.9,8.1,16.5,3.2,5.0,2.4,4.6,38.0,4.0,24.6,,,
758990,E3900,V94415,TR,,,,,,4.6,,,,,,,,,


Mivel csak 5 évre visszamenőleg akarom ábrázolni, 2014-től arra eldobom az adatokat

In [299]:
for year in range(2005,2015,1):
    df.drop(year, axis=1, inplace=True)

Átnevezem az oszlopokat

In [300]:
df = df.rename(columns={'indic_sb': 'indicator', 'geo\\time': 'country_code'})
df

Unnamed: 0,nace_r2,indicator,country_code,2019,2018,2017,2016,2015
0,B,V11110,AT,298.0,304.0,341.0,348.0,348.0
1,B,V11110,BA,178.0,182.0,197.0,199.0,196.0
2,B,V11110,BE,174.0,147.0,218.0,192.0,213.0
3,B,V11110,BG,304.0,326.0,342.0,357.0,374.0
4,B,V11110,CH,,209.0,205.0,212.0,212.0
...,...,...,...,...,...,...,...,...
758987,E3900,V94415,SE,12.8,11.9,5.7,8.9,11.2
758988,E3900,V94415,SI,26.1,15.1,30.7,13.1,5.7
758989,E3900,V94415,SK,9.0,11.0,2.9,8.1,16.5
758990,E3900,V94415,TR,,,,,


Csak európai országokat szeretnék, ezért a többit ugyancsak eldobom. Ugyanúgy az Uniós összesített adatokat is

In [301]:
df["country_code"].unique()

array(['AT', 'BA', 'BE', 'BG', 'CH', 'CY', 'CZ', 'DE', 'DK', 'EE', 'EL',
       'ES', 'EU27_2007', 'EU27_2020', 'EU28', 'FI', 'FR', 'HR', 'HU',
       'IE', 'IS', 'IT', 'LI', 'LT', 'LU', 'LV', 'MK', 'MT', 'NL', 'NO',
       'PL', 'PT', 'RO', 'RS', 'SE', 'SI', 'SK', 'TR', 'UK', 'TOTAL'],
      dtype=object)

In [302]:
df = df.set_index('country_code').drop(['EU27_2007', 'EU27_2020', 'EU28', 'TR', 'TOTAL', 'LI'],axis=0).reset_index()

Kellenének a NACE kódok nevei meg az indic_sb mezőhöz tartozó nevek.

Eurostat erre is lehetőséget ad: get an Eurostat dictionary => `eurostat.get_dic(code)`, ahol a code az oszlop neve

_A NACE kódokat hiába tölteném le innen, mert angolul lennének meg, nekem meg magyarul kellenének._


Az összes mutató letöltése ami megtalálható az eurostat-on

In [131]:
indicators = pd.Series(eurostat.get_dic('indic_sb'))
indicators

V11110                                   Enterprises - number
V11111       Enterprises broken down by legal status - number
V11112      Enterprises broken down by size class of gross...
V11113      Enterprises broken down by size class of gross...
V11114_C    Enterprises broken down by residence of the pa...
                                  ...                        
V97461      Employment share of high growth enterprises me...
V97462      Average size of high growth enterprises measur...
V99110             Turnover at constant prices - million euro
V99120      Production value at constant prices - million ...
V99150      Value added at factor cost at constant prices ...
Length: 478, dtype: object

Kiszűröm azokat a mutatókat amelyekre nekem szükségem lesz

In [132]:
#kiszűröm ezeket előbb a lekérdezett mutatók közül
indicators = indicators.loc[['V11110', 'V12150', 'V12110']]
indicators

V11110                                 Enterprises - number
V12150            Value added at factor cost - million euro
V12110    Turnover or gross premiums written - million euro
dtype: object

In [304]:
#majd kiszűröm ezeket a kapott DF-ből is.
df = df[df.indicator.isin(indicators.index)]
df

Unnamed: 0,country_code,nace_r2,indicator,2019,2018,2017,2016,2015
0,AT,B,V11110,298.0,304.0,341.0,348.0,348.0
1,BA,B,V11110,178.0,182.0,197.0,199.0,196.0
2,BE,B,V11110,174.0,147.0,218.0,192.0,213.0
3,BG,B,V11110,304.0,326.0,342.0,357.0,374.0
4,CH,B,V11110,,209.0,205.0,212.0,212.0
...,...,...,...,...,...,...,...,...
700236,RS,E3900,V12150,0.1,0.0,1.1,0.4,
700237,SE,E3900,V12150,34.7,34.0,38.2,33.6,31.1
700238,SI,E3900,V12150,10.0,10.4,9.0,12.1,14.3
700239,SK,E3900,V12150,3.4,4.1,4.3,5.9,2.1


Országonkként, szektoronkként, mutatókként és évekként kellene az adat

In [315]:
df = df.set_index(['country_code', 'nace_r2', 'indicator']).stack().reset_index()
df.columns=['country_code', 'nace_r2', 'indicator', 'year', 'value']
df

Unnamed: 0,country_code,nace_r2,indicator,year,value
0,AT,B,V11110,2019,298.0
1,AT,B,V11110,2018,304.0
2,AT,B,V11110,2017,341.0
3,AT,B,V11110,2016,348.0
4,AT,B,V11110,2015,348.0
...,...,...,...,...,...
164964,SK,E3900,V12150,2015,2.1
164965,UK,E3900,V12150,2018,530.5
164966,UK,E3900,V12150,2017,470.2
164967,UK,E3900,V12150,2016,523.6


Exportálom json-ba

In [316]:
open("data.json", "w").write(df.to_json())

12132518