Skip to content

Commit

Permalink
Merge pull request #150 from mini-kep/dev
Browse files Browse the repository at this point in the history
merge to master
  • Loading branch information
epogrebnyak committed Feb 8, 2018
2 parents 4e60802 + ffc477e commit 95ca5be
Show file tree
Hide file tree
Showing 107 changed files with 9,939 additions and 4,625 deletions.
7 changes: 0 additions & 7 deletions .checkignore

This file was deleted.

7 changes: 0 additions & 7 deletions .codacy.yml

This file was deleted.

12 changes: 11 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
.idea/*
*/.idea/*
*.doc
*.rar
doc/html/
doc2/

src/.idea
src/.vscode
src/.cache

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down Expand Up @@ -106,3 +110,9 @@ ENV/

# mypy
.mypy_cache/
.idea/kep.iml
.idea/misc.xml
.idea/modules.xml
.idea/workspace.xml
.idea/vcs.xml
.idea/vcs.xml
20 changes: 12 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,19 @@
[![Build Status](https://travis-ci.org/mini-kep/parser-rosstat-kep.svg?branch=master)](https://travis-ci.org/mini-kep/parser-rosstat-kep)
[![Coverage badge](https://codecov.io/gh/mini-kep/parser-rosstat-kep/branch/master/graphs/badge.svg)](https://codecov.io/gh/mini-kep/parser-rosstat-kep)
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/8a467743314641b4a22b66b327834367)](https://www.codacy.com/app/epogrebnyak/mini-kep?utm_source=github.com&utm_medium=referral&utm_content=epogrebnyak/mini-kep&utm_campaign=Badge_Grade)


Concept
-------

The task is to extract pandas dataframes from MS Word files from [Rosstat KEP publication][Rosstat] and save them as [CSV files][backend].
Russian statistics agency Rosstat publishes macroeconomic time series as [MS Word files][Rosstat]. In this repo we extract these time series as pandas dataframes and save them as [CSV files][backend]. This is a machine-readable dataset, ready to use with python/R and econometrics tools.

This code replaces a predecessor repo, [data-rosstat-kep](https://github.com/epogrebnyak/data-rosstat-kep), which could not handle vintages of macroeconomic data.
This code replaces a predecessor, [data-rosstat-kep](https://github.com/epogrebnyak/data-rosstat-kep), which could not handle vintages of macroeconomic data.


Data source: [Short-term Economic Indicators (KEP) monthly Rosstat publication][Rosstat]

Parsing result: [three CSV files at annual, quarterly and monthly frequencies][backend]

Windows and Word are required to create table dumps from .doc files, but once done CSV files can be parsed on a linux machine.

Interface
---------
Expand All @@ -34,15 +37,16 @@ directory structure.
- **download**: download and unpack rar files from Rosstat website
- **word2csv**: convert MS Word files to single interim CSV file (Windows-only)
- **csv2df**: parse interim CSV files and save processed CSV files with annual, quarterly and monthly data
- **finaliser.py**
- **df2xl**: make Excel file with three worksheets for annual, quarterly and monthly data

*NOTE:* Windows and MS Word are required to create interim text dumps from MS Word files. Оnce these text files are created, they can be parsed on a linux machine.

[Processed data folder](https://github.com/mini-kep/parser-rosstat-kep/tree/master/data/processed)
has datasets by year and month (vintages).


# Access to parsing result

[getter.py](https://github.com/mini-kep/parser-rosstat-kep/blob/master/src/getter.py)
[access.py](https://github.com/mini-kep/parser-rosstat-kep/blob/master/src/access.py)
is an entry point to get parsed data.

```python
Expand Down Expand Up @@ -110,6 +114,6 @@ All historic raw data available on internet?
- [ ] Yes
- [x] No (data prior to 2016-12 is in this repo only)

Is scrapper automated (can download required information from internet without manual operations)?
Is scrapper automated (can download required_labels information from internet without manual operations)?
- [x] Yes
- [ ] No
8 changes: 8 additions & 0 deletions codecov.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# https://docs.codecov.io/v4.3.6/docs/ignoring-paths
ignore:
- "*/test_*.py" # wildcards accepted
- "tests/*" # wildcards accepted
- */usercase/*.py
- tasks.py
- src/manage.py
- src/kep/word2csv/word.py
4,624 changes: 4,624 additions & 0 deletions data/interim/2017/11/tab.csv

Large diffs are not rendered by default.

36 changes: 18 additions & 18 deletions data/processed/2016/07/dfa.csv
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
,year,CPI_ALCOHOL_rog,CPI_FOOD_rog,CPI_NONFOOD_rog,CPI_SERVICES_rog,CPI_rog,EXPORT_GOODS_bln_usd,GDP_bln_rub,GDP_yoy,GOV_EXPENSE_CONSOLIDATED_bln_rub,GOV_EXPENSE_FEDERAL_bln_rub,GOV_EXPENSE_SUBFEDERAL_bln_rub,GOV_REVENUE_CONSOLIDATED_bln_rub,GOV_REVENUE_FEDERAL_bln_rub,GOV_REVENUE_SUBFEDERAL_bln_rub,GOV_SURPLUS_FEDERAL_bln_rub,GOV_SURPLUS_SUBFEDERAL_bln_rub,IMPORT_GOODS_bln_usd,INDPRO_yoy,INVESTMENT_bln_rub,INVESTMENT_yoy,RETAIL_SALES_FOOD_bln_rub,RETAIL_SALES_FOOD_yoy,RETAIL_SALES_NONFOOD_bln_rub,RETAIL_SALES_NONFOOD_yoy,RETAIL_SALES_bln_rub,RETAIL_SALES_yoy,TRANSPORT_FREIGHT_bln_tkm,UNEMPL_pct,WAGE_NOMINAL_rub,WAGE_REAL_yoy
1999-12-31,1999,143.2,135.0,139.2,134.0,136.5,75.6,4823.0,106.4,1258.0,666.9,653.8,1213.6,615.5,660.8,-51.4,7.0,39.5,,670.4,105.3,866.1,93.6,931.3,94.7,1797.4,94.2,3372.0,13.0,1523.0,78.0
2000-12-31,2000,125.0,117.1,118.5,133.7,120.2,105.0,7306.0,110.0,1960.1,1029.2,1032.1,2097.7,1132.1,1065.8,102.9,33.8,44.9,,1165.2,117.4,1093.2,107.5,1259.1,110.5,2352.3,109.0,3542.0,10.5,2223.0,120.9
2001-12-31,2001,112.6,117.8,112.7,136.9,118.6,101.9,8944.0,105.1,2419.4,1321.9,1330.2,2683.7,1594.0,1322.4,272.1,-7.8,53.8,,1504.7,111.7,1416.8,107.6,1653.2,113.9,3070.0,111.0,3651.0,9.0,3240.0,119.9
2002-12-31,2002,108.9,111.3,110.9,136.2,115.1,107.3,10831.0,104.7,3422.3,2054.2,1687.2,3519.2,2204.7,1633.6,150.5,-53.6,61.0,103.1,1762.4,102.9,1753.9,110.1,2011.5,108.6,3765.4,109.3,3868.0,8.0,4360.0,116.2
2003-12-31,2003,109.9,110.2,109.2,122.3,112.0,135.9,13208.0,107.3,3964.9,2358.6,1984.3,4138.7,2586.2,1930.5,227.6,-53.8,76.1,108.9,2186.4,112.7,2091.7,107.7,2438.0,109.7,4529.7,108.8,4171.0,8.2,5499.0,110.9
2004-12-31,2004,108.7,113.0,107.4,117.7,111.7,183.2,17027.0,107.2,4669.7,2698.9,2373.0,5429.9,3428.9,2403.2,730.0,30.2,97.4,108.0,2865.0,116.8,2580.3,111.4,3062.2,115.1,5642.5,113.3,4441.0,7.7,6740.0,110.6
2005-12-31,2005,107.6,109.9,106.4,121.0,110.9,240.0,21610.0,106.4,6820.6,3514.3,2941.2,8579.6,5127.2,2999.9,1612.9,58.7,123.8,105.1,3611.1,110.2,3217.6,110.5,3823.9,115.1,7041.5,112.8,4550.0,7.1,8555.0,112.6
2006-12-31,2006,110.1,108.4,106.0,113.9,109.0,297.5,26917.0,108.2,8375.2,4284.8,3657.7,10625.8,6278.9,3797.3,1994.1,139.6,163.2,106.3,4730.0,117.8,3947.4,111.0,4764.5,116.8,8711.9,114.1,4675.0,7.0,10634.0,113.3
2007-12-31,2007,107.7,117.1,106.5,113.3,111.9,346.5,33248.0,108.5,11378.6,5986.6,4790.5,13368.3,7781.1,4828.5,1794.6,38.0,223.1,106.8,6716.2,123.8,4891.4,112.6,5977.6,119.1,10869.0,116.1,4788.0,6.0,13593.0,117.2
2008-12-31,2008,110.9,117.6,108.0,115.9,113.3,466.3,41277.0,105.2,13991.8,7570.9,6253.1,16003.9,9275.9,6198.8,1705.1,-54.4,288.7,100.6,8781.6,109.5,6495.7,111.7,7448.5,115.3,13944.2,113.7,4820.0,6.2,17290.0,111.5
2009-12-31,2009,108.9,105.5,109.7,111.6,108.8,297.2,38807.0,92.2,16048.3,9660.1,6255.7,13599.7,7337.8,5926.6,-2322.3,-329.1,183.9,89.3,7976.0,86.5,7097.1,98.1,7502.1,91.8,14599.2,94.9,4344.0,8.2,18638.0,96.5
2010-12-31,2010,108.3,113.7,105.0,108.1,108.8,392.7,46308.0,104.5,17616.7,10117.5,6636.9,16031.9,8305.4,6537.3,-1812.0,-99.6,245.7,107.3,9152.1,106.3,8002.2,105.1,8509.8,108.0,16512.0,106.5,4645.0,7.3,20952.0,105.2
2011-12-31,2011,108.4,103.2,106.7,108.7,106.1,515.4,59698.0,104.3,19994.6,10925.6,7679.1,20855.4,11367.7,7644.2,442.0,-34.9,318.6,105.0,11035.7,110.8,9104.3,103.4,10000.0,110.8,19104.3,107.1,4799.0,6.5,23369.0,102.8
2012-12-31,2012,112.1,106.7,105.2,107.3,106.6,527.4,66927.0,103.5,23174.7,12895.0,8343.2,23435.1,12855.5,8064.5,-39.4,-278.7,335.8,103.4,12586.1,106.8,9961.4,103.6,11433.1,108.6,21394.5,106.3,4934.0,5.5,26629.0,108.4
2013-12-31,2013,114.6,106.1,104.5,108.0,106.5,521.8,71017.0,101.3,25290.9,13342.9,8806.6,24442.7,13019.9,8165.1,-323.0,-641.5,341.3,100.4,13450.3,100.8,11143.0,102.6,12542.9,104.9,23685.9,103.9,4958.0,5.5,29792.0,104.8
2014-12-31,2014,113.7,115.7,108.1,110.5,111.4,496.8,77945.0,100.7,27611.7,14831.6,9353.3,26766.1,14496.9,8905.7,-334.7,-447.6,307.9,101.7,13902.6,98.5,12380.9,100.0,13975.3,105.1,26356.2,102.7,4955.0,5.2,32495.0,101.2
2015-12-31,2015,110.7,114.5,113.7,110.2,112.9,341.5,80804.0,96.3,29741.5,15620.3,9479.8,26922.0,13659.2,9308.2,-1961.0,-171.6,193.0,96.6,14555.9,91.6,13419.3,91.0,14119.1,89.1,27538.4,90.0,4978.0,5.6,34030.0,91.0
,year,AGROPROD_yoy,CPI_ALCOHOL_rog,CPI_FOOD_rog,CPI_NONFOOD_rog,CPI_SERVICES_rog,CPI_rog,EXPORT_GOODS_bln_usd,GDP_bln_rub,GDP_yoy,GOV_EXPENSE_CONSOLIDATED_bln_rub,GOV_EXPENSE_FEDERAL_bln_rub,GOV_EXPENSE_SUBFEDERAL_bln_rub,GOV_REVENUE_CONSOLIDATED_bln_rub,GOV_REVENUE_FEDERAL_bln_rub,GOV_REVENUE_SUBFEDERAL_bln_rub,GOV_SURPLUS_FEDERAL_bln_rub,GOV_SURPLUS_SUBFEDERAL_bln_rub,IMPORT_GOODS_bln_usd,INDPRO_yoy,INVESTMENT_bln_rub,INVESTMENT_yoy,PPI_rog,RETAIL_SALES_FOOD_bln_rub,RETAIL_SALES_FOOD_yoy,RETAIL_SALES_NONFOOD_bln_rub,RETAIL_SALES_NONFOOD_yoy,RETAIL_SALES_bln_rub,RETAIL_SALES_yoy,TRANSPORT_FREIGHT_bln_tkm,UNEMPL_pct,WAGE_NOMINAL_rub,WAGE_REAL_yoy
1999-12-31,1999,103.8,143.2,135.0,139.2,134.0,136.5,75.6,4823.0,106.4,1258.0,666.9,653.8,1213.6,615.5,660.8,-51.4,7.0,39.5,,670.4,105.3,170.7,866.1,93.6,931.3,94.7,1797.4,94.2,3372.0,13.0,1523.0,78.0
2000-12-31,2000,106.2,125.0,117.1,118.5,133.7,120.2,105.0,7306.0,110.0,1960.1,1029.2,1032.1,2097.7,1132.1,1065.8,102.9,33.8,44.9,,1165.2,117.4,131.9,1093.2,107.5,1259.1,110.5,2352.3,109.0,3542.0,10.5,2223.0,120.9
2001-12-31,2001,106.9,112.6,117.8,112.7,136.9,118.6,101.9,8944.0,105.1,2419.4,1321.9,1330.2,2683.7,1594.0,1322.4,272.1,-7.8,53.8,,1504.7,111.7,108.3,1416.8,107.6,1653.2,113.9,3070.0,111.0,3651.0,9.0,3240.0,119.9
2002-12-31,2002,100.9,108.9,111.3,110.9,136.2,115.1,107.3,10831.0,104.7,3422.3,2054.2,1687.2,3519.2,2204.7,1633.6,150.5,-53.6,61.0,103.1,1762.4,102.9,117.7,1753.9,110.1,2011.5,108.6,3765.4,109.3,3868.0,8.0,4360.0,116.2
2003-12-31,2003,99.9,109.9,110.2,109.2,122.3,112.0,135.9,13208.0,107.3,3964.9,2358.6,1984.3,4138.7,2586.2,1930.5,227.6,-53.8,76.1,108.9,2186.4,112.7,112.5,2091.7,107.7,2438.0,109.7,4529.7,108.8,4171.0,8.2,5499.0,110.9
2004-12-31,2004,102.4,108.7,113.0,107.4,117.7,111.7,183.2,17027.0,107.2,4669.7,2698.9,2373.0,5429.9,3428.9,2403.2,730.0,30.2,97.4,108.0,2865.0,116.8,128.8,2580.3,111.4,3062.2,115.1,5642.5,113.3,4441.0,7.7,6740.0,110.6
2005-12-31,2005,101.6,107.6,109.9,106.4,121.0,110.9,240.0,21610.0,106.4,6820.6,3514.3,2941.2,8579.6,5127.2,2999.9,1612.9,58.7,123.8,105.1,3611.1,110.2,113.4,3217.6,110.5,3823.9,115.1,7041.5,112.8,4550.0,7.1,8555.0,112.6
2006-12-31,2006,103.0,110.1,108.4,106.0,113.9,109.0,297.5,26917.0,108.2,8375.2,4284.8,3657.7,10625.8,6278.9,3797.3,1994.1,139.6,163.2,106.3,4730.0,117.8,110.4,3947.4,111.0,4764.5,116.8,8711.9,114.1,4675.0,7.0,10634.0,113.3
2007-12-31,2007,103.3,107.7,117.1,106.5,113.3,111.9,346.5,33248.0,108.5,11378.6,5986.6,4790.5,13368.3,7781.1,4828.5,1794.6,38.0,223.1,106.8,6716.2,123.8,125.1,4891.4,112.6,5977.6,119.1,10869.0,116.1,4788.0,6.0,13593.0,117.2
2008-12-31,2008,110.8,110.9,117.6,108.0,115.9,113.3,466.3,41277.0,105.2,13991.8,7570.9,6253.1,16003.9,9275.9,6198.8,1705.1,-54.4,288.7,100.6,8781.6,109.5,93.0,6495.7,111.7,7448.5,115.3,13944.2,113.7,4820.0,6.2,17290.0,111.5
2009-12-31,2009,101.4,108.9,105.5,109.7,111.6,108.8,297.2,38807.0,92.2,16048.3,9660.1,6255.7,13599.7,7337.8,5926.6,-2322.3,-329.1,183.9,89.3,7976.0,86.5,113.9,7097.1,98.1,7502.1,91.8,14599.2,94.9,4344.0,8.2,18638.0,96.5
2010-12-31,2010,88.7,108.3,113.7,105.0,108.1,108.8,392.7,46308.0,104.5,17616.7,10117.5,6636.9,16031.9,8305.4,6537.3,-1812.0,-99.6,245.7,107.3,9152.1,106.3,116.7,8002.2,105.1,8509.8,108.0,16512.0,106.5,4645.0,7.3,20952.0,105.2
2011-12-31,2011,123.0,108.4,103.2,106.7,108.7,106.1,515.4,59698.0,104.3,19994.6,10925.6,7679.1,20855.4,11367.7,7644.2,442.0,-34.9,318.6,105.0,11035.7,110.8,112.0,9104.3,103.4,10000.0,110.8,19104.3,107.1,4799.0,6.5,23369.0,102.8
2012-12-31,2012,95.2,112.1,106.7,105.2,107.3,106.6,527.4,66927.0,103.5,23174.7,12895.0,8343.2,23435.1,12855.5,8064.5,-39.4,-278.7,335.8,103.4,12586.1,106.8,105.1,9961.4,103.6,11433.1,108.6,21394.5,106.3,4934.0,5.5,26629.0,108.4
2013-12-31,2013,105.8,114.6,106.1,104.5,108.0,106.5,521.8,71017.0,101.3,25290.9,13342.9,8806.6,24442.7,13019.9,8165.1,-323.0,-641.5,341.3,100.4,13450.3,100.8,103.7,11143.0,102.6,12542.9,104.9,23685.9,103.9,4958.0,5.5,29792.0,104.8
2014-12-31,2014,103.5,113.7,115.7,108.1,110.5,111.4,496.8,77945.0,100.7,27611.7,14831.6,9353.3,26766.1,14496.9,8905.7,-334.7,-447.6,307.9,101.7,13902.6,98.5,105.9,12380.9,100.0,13975.3,105.1,26356.2,102.7,4955.0,5.2,32495.0,101.2
2015-12-31,2015,103.0,110.7,114.5,113.7,110.2,112.9,341.5,80804.0,96.3,29741.5,15620.3,9479.8,26922.0,13659.2,9308.2,-1961.0,-171.6,193.0,96.6,14555.9,91.6,110.7,13419.3,91.0,14119.1,89.1,27538.4,90.0,4978.0,5.6,34030.0,91.0

0 comments on commit 95ca5be

Please sign in to comment.