Skip to content

Commit

Permalink
Merge branch 'dev' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
epogrebnyak committed Dec 12, 2017
2 parents c87c6ba + 0ce363e commit cc0f928
Show file tree
Hide file tree
Showing 337 changed files with 53,528 additions and 25,976 deletions.
17 changes: 10 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,19 @@
Concept
-------

This code allows to extract pandas dataframes from MS Word files from [Rosstat KEP publication][Rosstat] and save them as [CSV files][backend].
The task is to extract pandas dataframes from MS Word files from [Rosstat KEP publication][Rosstat] and save them as [CSV files][backend].

This code replaces a predecessor repo, [data-rosstat-kep](https://github.com/epogrebnyak/data-rosstat-kep), which could not handle vintages of macroeconomic data.

Windows and Word are required to create table dumps from .doc files, but once done CSV files can be parsed on a linux machine.

Interface
---------
[manage.py](https://github.com/mini-kep/parser-rosstat-kep/blob/master/src/manage.py) does the following job:
- download MS Word files from Rostst
- extracts tables from Word files and assigns variable names
- creates pandas dataframes with time series (at annual, quarterly and monthly frequency)
- saves dataframes as [CSV files at stable URL][backend].
- download and unpack MS Word files from Rosstat
- extract tables from Word files and assigns variable names
- create pandas dataframes with time series (at annual, quarterly and monthly frequency)
- save dataframes as [CSV files at stable URL][backend]

[kep]: https://github.com/mini-kep/parser-rosstat-kep
[Rosstat]: http://www.gks.ru/wps/wcm/connect/rosstat_main/rosstat/ru/statistics/publications/catalog/doc_1140080765391
Expand Down Expand Up @@ -73,14 +75,15 @@ invoke add <year> <month>

and commit to this repo.

Basically this command:
This command:
- downloads a rar file from Rosstat,
- unpacks MS Word files,
- dumps all tables from MS Word files to an interim CSV file,
- parses interim CSV file to three dataframes by frequency
- validates parsing result
- transforms some variables
- transforms some variables (eg. deaccumulates government expenditures)
- saves dataframes as processed CSV files
- saves csv for latest date
- saves an Excel file for latest date.

Same job is done by [manage.py](https://github.com/mini-kep/parser-rosstat-kep/blob/master/src/manage.py)
Expand Down
6,974 changes: 6,974 additions & 0 deletions data/interim/2016/02/tab.csv

Large diffs are not rendered by default.

6,974 changes: 6,974 additions & 0 deletions data/interim/2016/03/tab.csv

Large diffs are not rendered by default.

6,974 changes: 6,974 additions & 0 deletions data/interim/2016/06/tab.csv

Large diffs are not rendered by default.

6,974 changes: 6,974 additions & 0 deletions data/interim/2016/07/tab.csv

Large diffs are not rendered by default.

22 changes: 11 additions & 11 deletions data/processed/2009/04/dfa.csv
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
time_index,year,EXPORT_GOODS_TOTAL_bln_usd,GDP_bln_rub,GDP_yoy,GOV_EXPENSE_ACCUM_CONSOLIDATED_bln_rub,GOV_EXPENSE_ACCUM_FEDERAL_bln_rub,GOV_EXPENSE_ACCUM_SUBFEDERAL_bln_rub,GOV_REVENUE_ACCUM_CONSOLIDATED_bln_rub,GOV_REVENUE_ACCUM_FEDERAL_bln_rub,GOV_REVENUE_ACCUM_SUBFEDERAL_bln_rub,GOV_SURPLUS_ACCUM_FEDERAL_bln_rub,GOV_SURPLUS_ACCUM_SUBFEDERAL_bln_rub,IMPORT_GOODS_TOTAL_bln_usd,IND_PROD_yoy,RETAIL_SALES_FOOD_bln_rub,RETAIL_SALES_FOOD_yoy,RETAIL_SALES_NONFOODS_bln_rub,RETAIL_SALES_NONFOODS_yoy,RETAIL_SALES_bln_rub,RETAIL_SALES_yoy
1999-12-31,1999,75.6,4823.0,106.4,1258.0,666.9,653.8,1213.6,615.5,660.8,-51.4,7.0,39.5,,866.1,93.6,931.3,94.7,1797.4,94.2
2000-12-31,2000,105.0,7306.0,110.0,1960.1,1029.2,1032.1,2097.7,1132.1,1065.8,102.9,33.8,44.9,,1093.2,107.5,1259.1,110.5,2352.3,109.0
2001-12-31,2001,101.9,8944.0,105.1,2419.4,1321.9,1330.2,2683.7,1594.0,1322.4,272.1,-7.8,53.8,,1416.8,107.6,1653.2,113.9,3070.0,111.0
2002-12-31,2002,107.3,10831.0,104.7,3422.3,2054.2,1687.2,3519.2,2204.7,1633.6,150.5,-53.6,61.0,103.1,1753.9,110.1,2011.5,108.6,3765.4,109.3
2003-12-31,2003,135.9,13243.0,107.3,3964.9,2358.6,1984.3,4138.7,2586.2,1930.5,227.6,-53.8,76.1,108.9,2091.7,107.7,2438.0,109.7,4529.7,108.8
2004-12-31,2004,183.2,17048.0,107.2,4669.7,2698.9,2373.0,5429.9,3428.9,2403.2,730.0,30.2,97.4,108.0,2580.3,111.4,3062.2,115.1,5642.5,113.3
2005-12-31,2005,243.8,21625.0,106.4,6820.6,3514.3,2941.2,8579.6,5127.2,2999.9,1612.9,58.7,125.4,105.1,3217.6,110.5,3823.9,115.1,7041.5,112.8
2006-12-31,2006,303.6,26904.0,107.7,8375.2,4284.8,3657.7,10625.8,6278.9,3797.3,1994.1,139.6,164.3,106.3,3947.4,111.0,4764.5,116.8,8711.9,114.1
2007-12-31,2007,354.4,33111.0,108.1,11378.6,5986.6,4790.5,13368.3,7781.1,4828.5,1794.6,38.0,223.5,106.3,4891.4,112.6,5977.6,119.1,10869.0,116.1
2008-12-31,2008,471.6,41668.0,105.6,13989.2,7566.6,6253.5,16003.4,9274.1,6199.1,1707.5,-54.4,291.9,102.1,6342.8,109.1,7571.8,117.2,13914.6,113.5
,year,CPI_ALCOHOL_rog,CPI_FOOD_rog,CPI_NONFOOD_rog,CPI_SERVICES_rog,CPI_rog,EXPORT_GOODS_bln_usd,GDP_bln_rub,GDP_yoy,GOV_EXPENSE_CONSOLIDATED_bln_rub,GOV_EXPENSE_FEDERAL_bln_rub,GOV_EXPENSE_SUBFEDERAL_bln_rub,GOV_REVENUE_CONSOLIDATED_bln_rub,GOV_REVENUE_FEDERAL_bln_rub,GOV_REVENUE_SUBFEDERAL_bln_rub,GOV_SURPLUS_FEDERAL_bln_rub,GOV_SURPLUS_SUBFEDERAL_bln_rub,IMPORT_GOODS_bln_usd,INDPRO_yoy,INVESTMENT_bln_rub,INVESTMENT_yoy,RETAIL_SALES_FOOD_bln_rub,RETAIL_SALES_FOOD_yoy,RETAIL_SALES_NONFOOD_bln_rub,RETAIL_SALES_NONFOOD_yoy,RETAIL_SALES_bln_rub,RETAIL_SALES_yoy,TRANSPORT_FREIGHT_bln_tkm,UNEMPL_pct,WAGE_NOMINAL_rub,WAGE_REAL_yoy
1999-12-31,1999,143.2,135.0,139.2,134.0,136.5,75.6,4823.0,106.4,1258.0,666.9,653.8,1213.6,615.5,660.8,-51.4,7.0,39.5,,670.4,105.3,866.1,93.6,931.3,94.7,1797.4,94.2,3372.0,13.0,1523.0,78.0
2000-12-31,2000,125.0,117.1,118.5,133.7,120.2,105.0,7306.0,110.0,1960.1,1029.2,1032.1,2097.7,1132.1,1065.8,102.9,33.8,44.9,,1165.2,117.4,1093.2,107.5,1259.1,110.5,2352.3,109.0,3542.0,10.5,2223.0,120.9
2001-12-31,2001,112.6,117.8,112.7,136.9,118.6,101.9,8944.0,105.1,2419.4,1321.9,1330.2,2683.7,1594.0,1322.4,272.1,-7.8,53.8,,1504.7,110.0,1416.8,107.6,1653.2,113.9,3070.0,111.0,3651.0,9.0,3240.0,119.9
2002-12-31,2002,108.9,111.3,110.9,136.2,115.1,107.3,10831.0,104.7,3422.3,2054.2,1687.2,3519.2,2204.7,1633.6,150.5,-53.6,61.0,103.1,1762.4,102.8,1753.9,110.1,2011.5,108.6,3765.4,109.3,3868.0,8.0,4360.0,116.2
2003-12-31,2003,109.9,110.2,109.2,122.3,112.0,135.9,13243.0,107.3,3964.9,2358.6,1984.3,4138.7,2586.2,1930.5,227.6,-53.8,76.1,108.9,2186.4,112.5,2091.7,107.7,2438.0,109.7,4529.7,108.8,4171.0,8.6,5499.0,110.9
2004-12-31,2004,108.7,113.0,107.4,117.7,111.7,183.2,17048.0,107.2,4669.7,2698.9,2373.0,5429.9,3428.9,2403.2,730.0,30.2,97.4,108.0,2865.0,113.7,2580.3,111.4,3062.2,115.1,5642.5,113.3,4441.0,8.2,6740.0,110.6
2005-12-31,2005,107.6,109.9,106.4,121.0,110.9,243.8,21625.0,106.4,6820.6,3514.3,2941.2,8579.6,5127.2,2999.9,1612.9,58.7,125.4,105.1,3611.1,110.9,3217.6,110.5,3823.9,115.1,7041.5,112.8,4550.0,7.6,8555.0,112.6
2006-12-31,2006,110.1,108.4,106.0,113.9,109.0,303.6,26904.0,107.7,8375.2,4284.8,3657.7,10625.8,6278.9,3797.3,1994.1,139.6,164.3,106.3,4730.0,116.7,3947.4,111.0,4764.5,116.8,8711.9,114.1,4675.0,7.2,10634.0,113.3
2007-12-31,2007,107.7,117.1,106.5,113.3,111.9,354.4,33111.0,108.1,11378.6,5986.6,4790.5,13368.3,7781.1,4828.5,1794.6,38.0,223.5,106.3,6716.2,122.7,4891.4,112.6,5977.6,119.1,10869.0,116.1,4788.0,6.1,13593.0,117.2
2008-12-31,2008,110.9,117.6,108.0,115.9,113.3,471.6,41668.0,105.6,13989.2,7566.6,6253.5,16003.4,9274.1,6199.1,1707.5,-54.4,291.9,102.1,8764.9,109.8,6342.8,109.1,7571.8,117.2,13914.6,113.5,4820.0,6.4,17226.0,110.3

0 comments on commit cc0f928

Please sign in to comment.