readme rewritten

martinbenes1996 · Jun 18, 2020 · f073d48 · f073d48
1 parent 51d0a70
commit f073d48
Show file tree

Hide file tree

Showing 4 changed files with 28 additions and 35 deletions.
diff --git a/README.md b/README.md
@@ -1,59 +1,50 @@
 # Eurostat
 
-The program `eurostat.py` is a simple interface to parse Eurostat data.
+Package is a simple interface for parsing data from Eurostat:
 
-## Executing the modul
+* deaths counts
+* population sizes
 
-Parsing data from Eurostat to a file is as easy as
+To import and fetch data, simply write
 
-```bash
-python3 eurostat.py --output data.csv --start 2019-01-01 --verbose
+```python
+import eurostat_deaths
 ```
 
-It downloads the file from Eurostat and parses it according to the input to an output format.
+Function `deaths()` fetches the deaths, function `populations()` fetches the populations. Use them such as
 
-```
-sex,age,geo\time,2020W23,2020W22,2020W21, ... ,2019W03,2019W02,2019W01
-F,OTAL,AT,,,,                             ... ,852,877,914
-F,OTAL,AT1,,,                             ... ,364,361,387
-...
-```
+## Deaths
 
-All parameters of the command can be shown with
+```python
+from datetime import datetime
+import eurostat
 
-```bash
-python3 eurostat.py --help
+data = eurostat.deaths(start = datetime(2019,1,1))
 ```
 
-```
-usage: eurostat.py [-h] [-o OUTPUT] [-n CHUNKSIZE] [-s START] [-v]
-
-optional arguments:
-  -h, --help            show this help message and exit
-  -o OUTPUT, --output OUTPUT
-                        Directs the output to a name of your choice.
-  -n CHUNKSIZE, --chunksize CHUNKSIZE
-                        Number of lines in chunk (in thousands).
-  -s START, --start START
-                        Start date.
-  -v, --verbose         Sets verbose log (logging level INFO).
-```
+Parameter `start` sets the start of the data. The end is always `now()`.
 
-## Importing
+You receive per-week data of deaths. Since the total size of the data frame is about 218 MB, call taes more than 15 minutes. The usage of memory is significant.
 
-It can be imported as well. Following code is using the inner function `read_eurostat()` to load the data. The total size of the data frame is about 218 MB, so the call takes more than 15 minutes and the usage of memory is enormous.
+In the future, module will be reimplemented to use Big Data framework, such as PySpark.
 
-The module should not be used like this. Recommended is implementation using Big Data framework, e.g. PySpark.
+The data can be forwarded directly to file. Give the function a filename by parameter `output`.
 
 ```python
 from datetime import datetime
 import eurostat
 
-data = eurostat.read_eurostat(output = None, start = datetime(2019,1,1))
+data = eurostat.deaths(output = "file.csv", start = datetime(2019,1,1))
 ```
 
 Parameter `output = None` causes that the output is collected into a single dataframe and returned.
 
+One additional setting is `chunksize` to set the size of chunk, that is processed at a time. The unit used is thousands of rows.
+
+## Population
+
+**TODO**
+
 ## Credits
 
 Author: [Martin Benes](https://www.github.com/martinbenes1996).
diff --git a/eurostat_deaths/deaths.py b/eurostat_deaths/deaths.py
@@ -17,7 +17,7 @@ def tryInt(i):
     try: return int(i)
     except: return i
 
-def deaths(start = None, output = "output.csv", chunksize = 1):
+def deaths(start = None, output = None, chunksize = 1):
     """Reads data from Eurostat, filters and saves to CSV.
     
     Args:

diff --git a/requirements.txt b/requirements.txt
@@ -0,0 +1,2 @@
+pandas
+requests
diff --git a/setup.py b/setup.py
@@ -12,7 +12,7 @@
 
 setuptools.setup(
   name = 'eurostat_deaths',
-  version = '0.0.1',
+  version = '0.0.2',
   author = 'Martin Beneš',
   author_email = 'martinbenes1996@gmail.com',
   description = 'Web Scraper for Eurostat data.',
@@ -21,7 +21,7 @@
   packages=setuptools.find_packages(),
   license='MIT',
   url = 'https://github.com/martinbenes1996/eurostat_deaths',
-  download_url = 'https://github.com/martinbenes1996/eurostat_deaths/archive/0.0.1.tar.gz',
+  download_url = 'https://github.com/martinbenes1996/eurostat_deaths/archive/0.0.2.tar.gz',
   keywords = ['eurostat', 'deaths', 'web', 'html', 'webscraping'],
   install_requires = reqs,
   package_dir={'': '.'},