Skip to content

Commit

Permalink
readme rewritten
Browse files Browse the repository at this point in the history
  • Loading branch information
martinbenes1996 committed Jun 18, 2020
1 parent 51d0a70 commit f073d48
Show file tree
Hide file tree
Showing 4 changed files with 28 additions and 35 deletions.
55 changes: 23 additions & 32 deletions README.md
@@ -1,59 +1,50 @@
# Eurostat

The program `eurostat.py` is a simple interface to parse Eurostat data.
Package is a simple interface for parsing data from Eurostat:

## Executing the modul
* deaths counts
* population sizes

Parsing data from Eurostat to a file is as easy as
To import and fetch data, simply write

```bash
python3 eurostat.py --output data.csv --start 2019-01-01 --verbose
```python
import eurostat_deaths
```

It downloads the file from Eurostat and parses it according to the input to an output format.
Function `deaths()` fetches the deaths, function `populations()` fetches the populations. Use them such as

```
sex,age,geo\time,2020W23,2020W22,2020W21, ... ,2019W03,2019W02,2019W01
F,OTAL,AT,,,, ... ,852,877,914
F,OTAL,AT1,,, ... ,364,361,387
...
```
## Deaths

All parameters of the command can be shown with
```python
from datetime import datetime
import eurostat

```bash
python3 eurostat.py --help
data = eurostat.deaths(start = datetime(2019,1,1))
```

```
usage: eurostat.py [-h] [-o OUTPUT] [-n CHUNKSIZE] [-s START] [-v]
optional arguments:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
Directs the output to a name of your choice.
-n CHUNKSIZE, --chunksize CHUNKSIZE
Number of lines in chunk (in thousands).
-s START, --start START
Start date.
-v, --verbose Sets verbose log (logging level INFO).
```
Parameter `start` sets the start of the data. The end is always `now()`.

## Importing
You receive per-week data of deaths. Since the total size of the data frame is about 218 MB, call taes more than 15 minutes. The usage of memory is significant.

It can be imported as well. Following code is using the inner function `read_eurostat()` to load the data. The total size of the data frame is about 218 MB, so the call takes more than 15 minutes and the usage of memory is enormous.
In the future, module will be reimplemented to use Big Data framework, such as PySpark.

The module should not be used like this. Recommended is implementation using Big Data framework, e.g. PySpark.
The data can be forwarded directly to file. Give the function a filename by parameter `output`.

```python
from datetime import datetime
import eurostat

data = eurostat.read_eurostat(output = None, start = datetime(2019,1,1))
data = eurostat.deaths(output = "file.csv", start = datetime(2019,1,1))
```

Parameter `output = None` causes that the output is collected into a single dataframe and returned.

One additional setting is `chunksize` to set the size of chunk, that is processed at a time. The unit used is thousands of rows.

## Population

**TODO**

## Credits

Author: [Martin Benes](https://www.github.com/martinbenes1996).
2 changes: 1 addition & 1 deletion eurostat_deaths/deaths.py
Expand Up @@ -17,7 +17,7 @@ def tryInt(i):
try: return int(i)
except: return i

def deaths(start = None, output = "output.csv", chunksize = 1):
def deaths(start = None, output = None, chunksize = 1):
"""Reads data from Eurostat, filters and saves to CSV.
Args:
Expand Down
2 changes: 2 additions & 0 deletions requirements.txt
@@ -0,0 +1,2 @@
pandas
requests
4 changes: 2 additions & 2 deletions setup.py
Expand Up @@ -12,7 +12,7 @@

setuptools.setup(
name = 'eurostat_deaths',
version = '0.0.1',
version = '0.0.2',
author = 'Martin Beneš',
author_email = 'martinbenes1996@gmail.com',
description = 'Web Scraper for Eurostat data.',
Expand All @@ -21,7 +21,7 @@
packages=setuptools.find_packages(),
license='MIT',
url = 'https://github.com/martinbenes1996/eurostat_deaths',
download_url = 'https://github.com/martinbenes1996/eurostat_deaths/archive/0.0.1.tar.gz',
download_url = 'https://github.com/martinbenes1996/eurostat_deaths/archive/0.0.2.tar.gz',
keywords = ['eurostat', 'deaths', 'web', 'html', 'webscraping'],
install_requires = reqs,
package_dir={'': '.'},
Expand Down

0 comments on commit f073d48

Please sign in to comment.