Serenata de Amor Toolbox
Serenata_toolbox is compatible with Python 3.6+
$ pip install -U serenata-toolbox
If you are a regular user you are ready to get started after pip install.
If you are a core developer willing to upload datasets to the cloud you need to configure AMAZON_ACCESS_KEY and AMAZON_SECRET_KEY environment variables before running the toolbox.
We have plenty of them ready for you to download from our servers. And this toolbox helps you get them. Here some examples:
Example 1: Using the command line wrapper
# without any arguments will download our pre-processed datasets and store into data/ folder $ serenata-toolbox # will download these specific datasets and store into /tmp/serenata-data folder $ serenata-toolbox /tmp/serenata-data --module federal_senate chamber_of_deputies # you can specify a dataset and a year $ serenata-toolbox --module chamber_of_deputies --year 2009 # or specify all options simultaneously $ serenata-toolbox /tmp/serenata-data --module federal_senate --year 2017 # getting help $ serenata-toolbox --help
Example 2: How do I download the datasets?
Another option is creating your own Python script:
from serenata_toolbox.datasets import Datasets datasets = Datasets('data/') # now lets see what are the latest datasets available for dataset in datasets.downloader.LATEST: print(dataset) # and you'll see a long list of datasets! # and let's download one of them datasets.downloader.download('2018-01-05-reimbursements.xz') # yay, you've just downloaded this dataset to data/ # you can also get the most recent version of all datasets: latest = list(datasets.downloader.LATEST) datasets.downloader.download(latest)
Example 3: Using shortcuts
If the last example doesn't look that simple, there are some fancy shortcuts available:
from serenata_toolbox.datasets import fetch, fetch_latest_backup fetch('2018-01-05-reimbursements.xz', 'data/') fetch_latest_backup( 'data/') # yep, we've just did exactly the same thing
Example 4: Generating datasets
If you ever wonder how did we generated these datasets, this toolbox can help you too (at least with the more used ones — the other ones are generated in our main repo):
from serenata_toolbox.federal_senate.dataset import Dataset as SenateDataset from serenata_toolbox.chamber_of_deputies.reimbursements import Reimbursements as ChamberDataset chamber = ChamberDataset('2018', 'data/') chamber() senate = SenateDataset('data/') senate.fetch() senate.translate() senate.clean()
$ cd docs $ make clean;make rst;rm source/modules.rst;make html
Firstly, you should create a development environment with Python's venv module to isolate your development. Then clone the repository and build the package by running:
$ git clone https://github.com/okfn-brasil/serenata-toolbox.git $ cd serenata-toolbox $ python setup.py develop
Always add tests to your contribution — if you want to test it locally before opening the PR:
$ pip install pytest pytest-cov $ pytest
When the tests are passing, also check for coverage of the modules you edited or added — if you want to check it before opening the PR:
$ pytest $ open htmlcov/index.html
$ pip install prospector $ prospector -s veryhigh serenata_toolbox
If this report includes issues related to import section of your files, isort can help you:
$ pip install isort $ isort **/*.py --diff
- MICRO: the API is the same, no risk of breaking code
- MINOR: values have been added, existing values are unchanged
- MAJOR: existing values have been changed or removed
This is really important because every new code merged to master triggers the CI and then the CI triggers a new release to PyPI. The attemp to roll out a new version of the toolbox will fail without a version bump. So we do encorouge to add a version bump even if all you have changed is the README.rst — this is the way to keep the README.rst updated in PyPI.
If you are not changing the API or README.rst in any sense and if you really do not want a version bump, you need to add [skip ci] to you commit message.
And finally take The Zen of Python into account:
$ python -m this