diff --git a/README.md b/README.md index 26561cc..1181de4 100644 --- a/README.md +++ b/README.md @@ -1,59 +1,50 @@ # Eurostat -The program `eurostat.py` is a simple interface to parse Eurostat data. +Package is a simple interface for parsing data from Eurostat: -## Executing the modul +* deaths counts +* population sizes -Parsing data from Eurostat to a file is as easy as +To import and fetch data, simply write -```bash -python3 eurostat.py --output data.csv --start 2019-01-01 --verbose +```python +import eurostat_deaths ``` -It downloads the file from Eurostat and parses it according to the input to an output format. +Function `deaths()` fetches the deaths, function `populations()` fetches the populations. Use them such as -``` -sex,age,geo\time,2020W23,2020W22,2020W21, ... ,2019W03,2019W02,2019W01 -F,OTAL,AT,,,, ... ,852,877,914 -F,OTAL,AT1,,, ... ,364,361,387 -... -``` +## Deaths -All parameters of the command can be shown with +```python +from datetime import datetime +import eurostat -```bash -python3 eurostat.py --help +data = eurostat.deaths(start = datetime(2019,1,1)) ``` -``` -usage: eurostat.py [-h] [-o OUTPUT] [-n CHUNKSIZE] [-s START] [-v] - -optional arguments: - -h, --help show this help message and exit - -o OUTPUT, --output OUTPUT - Directs the output to a name of your choice. - -n CHUNKSIZE, --chunksize CHUNKSIZE - Number of lines in chunk (in thousands). - -s START, --start START - Start date. - -v, --verbose Sets verbose log (logging level INFO). -``` +Parameter `start` sets the start of the data. The end is always `now()`. -## Importing +You receive per-week data of deaths. Since the total size of the data frame is about 218 MB, call taes more than 15 minutes. The usage of memory is significant. -It can be imported as well. Following code is using the inner function `read_eurostat()` to load the data. The total size of the data frame is about 218 MB, so the call takes more than 15 minutes and the usage of memory is enormous. +In the future, module will be reimplemented to use Big Data framework, such as PySpark. -The module should not be used like this. Recommended is implementation using Big Data framework, e.g. PySpark. +The data can be forwarded directly to file. Give the function a filename by parameter `output`. ```python from datetime import datetime import eurostat -data = eurostat.read_eurostat(output = None, start = datetime(2019,1,1)) +data = eurostat.deaths(output = "file.csv", start = datetime(2019,1,1)) ``` Parameter `output = None` causes that the output is collected into a single dataframe and returned. +One additional setting is `chunksize` to set the size of chunk, that is processed at a time. The unit used is thousands of rows. + +## Population + +**TODO** + ## Credits Author: [Martin Benes](https://www.github.com/martinbenes1996). \ No newline at end of file diff --git a/eurostat_deaths/deaths.py b/eurostat_deaths/deaths.py index 2dff38f..59d3006 100644 --- a/eurostat_deaths/deaths.py +++ b/eurostat_deaths/deaths.py @@ -17,7 +17,7 @@ def tryInt(i): try: return int(i) except: return i -def deaths(start = None, output = "output.csv", chunksize = 1): +def deaths(start = None, output = None, chunksize = 1): """Reads data from Eurostat, filters and saves to CSV. Args: diff --git a/requirements.txt b/requirements.txt index e69de29..5b2f108 100644 --- a/requirements.txt +++ b/requirements.txt @@ -0,0 +1,2 @@ +pandas +requests \ No newline at end of file diff --git a/setup.py b/setup.py index 5305166..2e34986 100644 --- a/setup.py +++ b/setup.py @@ -12,7 +12,7 @@ setuptools.setup( name = 'eurostat_deaths', - version = '0.0.1', + version = '0.0.2', author = 'Martin Beneš', author_email = 'martinbenes1996@gmail.com', description = 'Web Scraper for Eurostat data.', @@ -21,7 +21,7 @@ packages=setuptools.find_packages(), license='MIT', url = 'https://github.com/martinbenes1996/eurostat_deaths', - download_url = 'https://github.com/martinbenes1996/eurostat_deaths/archive/0.0.1.tar.gz', + download_url = 'https://github.com/martinbenes1996/eurostat_deaths/archive/0.0.2.tar.gz', keywords = ['eurostat', 'deaths', 'web', 'html', 'webscraping'], install_requires = reqs, package_dir={'': '.'},