Fetching contributors…
Cannot retrieve contributors at this time
334 lines (214 sloc) 13.1 KB

Jarbas — a tool for Serenata de Amor

Jarbas is part of Serenata de Amor — we fight corruption with data science.

Jarbas is in charge of making data from CEAP more accessible. In the near future Jarbas will show what Rosie thinks of each reimbursement made for our congresspeople.

Table of Contents

  1. JSON API endpoints
    1. Reimbursement
    2. Subquota
    3. Applicant
    4. Company
    5. Tapioca Jarbas
  2. Installing
    1. Settings
    2. Using Docker
    3. Local install

JSON API endpoints


Each Reimbursement object is a reimbursement claimed by a congressperson and identified publicly by its document_id.

Retrieving a specific reimbursement

GET /api/chamber_of_deputies/reimbursement/<document_id>/

Details from a specific reimbursement. If receipt_url wasn't fetched yet, the server won't try to fetch it automatically.

GET /api/chamber_of_deputies/reimbursement/<document_id>/receipt/

URL of the digitalized version of the receipt of this specific reimbursement.

If receipt_url wasn't fetched yet, the server will try to fetch it automatically.

If you append the parameter force (i.e. GET /api/chamber_of_deputies/reimbursement/<document_id>/receipt/?force=1) the server will re-fetch the receipt URL.

Not all receipts are available, so this URL can be null.

Listing reimbursements

GET /api/chamber_of_deputies/reimbursement/

Lists all reimbursements.


All these endpoints accepts any combination of the following parameters:

  • applicant_id
  • cnpj_cpf
  • document_id
  • issue_date_start (inclusive)
  • issue_date_end (exclusive)
  • month
  • subquota_number
  • suspicions (boolean, 1 parses to True, 0 to False)
  • has_receipt (boolean, 1 parses to True, 0 to False)
  • year
  • state
  • order_by: issue_date (default) or probability (both descending)
  • in_latest_dataset (boolean, 1 parses to True, 0 to False)
  • search (Search the value in any of the fields below)
    • congressperson_name
    • supplier
    • cnpj_cpf
    • party
    • state
    • receipt_text
    • passenger
    • leg_of_the_trip
    • subquota_description
    • subquota_group_description

For example:

GET /api/chamber_of_deputies/reimbursement/?year=2016&cnpj_cpf=11111111111111&subquota_number=42&order_by=probability

This request will list:

  • all 2016 reimbursements
  • made in the supplier with the CNPJ 11.111.111/1111-11
  • made according to the subquota with the ID 42
  • sorted by the highest probability

Also you can pass more than one value per field (e.g. document_id=111111,222222).

GET /api/chamber_of_deputies/reimbursement/<document_id>/same_day/

Lists all reimbursements of expenses from the same day as document_id.


Subquotas are categories of expenses that can be reimbursed by congresspeople.

Listing subquotas

GET /api/chamber_of_deputies/subquota/

Lists all subquotas names and IDs.


Accepts a case-insensitve LIKE filter in as the q URL parameter (e.g. GET /api/chamber_of_deputies/subquota/?q=meal list all applicant that have meal in their names.


An applicant is the person (congressperson or the leadership of a party or government) who claimed the reimbursemement.

List applicants

GET /api/chamber_of_deputies/applicant/

Lists all names of applicants together with their IDs.


Accepts a case-insensitive LIKE filter as the q URL parameter (e.g. GET /api/chamber_of_deputies/applicant/?q=lideranca list all applicant that have lideranca in their names.


A company is a Brazilian company in which congressperson have made expenses and claimed for reimbursement.

Retrieving a specific company

GET /api/company/<cnpj>/

This endpoit gets the info we have for a specific company. The endpoint expects a cnpj (i.e. the CNPJ of a Company object, digits only). It returns 404 if the company is not found.

Tapioca Jarbas

There is also a tapioca-wrapper for the API. The tapioca-jarbas can be installed with pip install tapioca-jarbas and can be used to access the API in any Python script.



Copy contrib/.env.sample as .env in the project's root folder and adjust your settings. These are the main variables:

Django settings
NewRelic settings
Message Broker
  • CELERY_BROKER_URL (string) Celery compatible message broker URL (e.g. amqp://guest:guest@localhost//)
Google settings
  • GOOGLE_ANALYTICS (str) Google Analytics tracking code (e.g. UA-123456-7)
  • GOOGLE_STREET_VIEW_API_KEY (str) Google Street View Image API key
Twitter settings
  • TWITTER_CONSUMER_KEY (str) Twitter API key
  • TWITTER_CONSUMER_SECRET (str) Twitter API secret
  • TWITTER_ACCESS_TOKEN (str) Twitter access token
  • TWITTER_ACCESS_SECRET (str) Twitter access token secret

To get this credentials follow python-twitter instructions.

For the production environment
  • VIRTUAL_HOST_WEB (str) host used for the HTTPS certificate (for testing production settings locally you might need to add this host name to your /etc/hosts)
  • LETSENCRYPT_EMAIL (str) Email used to create the HTTPS certificate at Let's Encrypt
  • HTTPS_METHOD (str) if set to noredirect does not redirect from HTTP to HTTPS (default: redirect)

Using Docker

You must first install Docker and Docker Compose environments.

Build and start services

$ docker-compose up -d

Create and seed the database with sample data

Creating the database and applying migrations:

$ docker-compose run --rm django python migrate

Seeding it with sample data:

$ docker-compose run --rm django python reimbursements /mnt/data/reimbursements_sample.csv
$ docker-compose run --rm django python companies /mnt/data/companies_sample.xz
$ docker-compose run --rm django python suspicions /mnt/data/suspicions_sample.xz
$ docker-compose run --rm django python tweets

If you're interesting in having a database full of data you can get the datasets running Rosie. To add a fresh new reimbursements.xz or suspicions.xz brewed by Rosie, or a companies.xz you've got from the toolbox, you just need copy these files to contrib/data and refer to them inside the container from the path /mnt/data/.

Creating search vector

For text search in the dashboard:

$ docker-compose run --rm django python searchvector

Acessing Jabas

You can access it at localhost:8000 in development mode or localhost in production mode.

To change any of the default environment variables defined in the docker-compose.yml just export it in a local environment variable, so when you run Jarbas it will get them.

Docker Ready?

Not sure? Test it!

$ docker-compose run --rm django python check
$ docker-compose run --rm django python test

Local install


Jarbas requires Python 3.5, Node.js 8, RabbitMQ 3.6, and PostgreSQL 9.6. Once you have pip and npm available install the dependencies:

$ npm install
$ ./node_modules/.bin/elm-package install --yes  # this might not be necessary
$ python -m pip install -r requirements-dev.txt
Python's lzma module

In some Linux distros lzma is not installed by default. You can check whether you have it or not with $ python -m lzma. In Debian based systems you can fix that with $ apt-get install liblzma-dev or in macOS with $ brew install xz — but you might have to re-compile your Python.

Setup your environment variables

Basically this means copying contrib/.env.sample as .env in the project's root folder — but there is an entire section on that.


Once you're done with requirements, dependencies and settings, create the basic database structure:

$ python migrate

Load data

To load data you need RabbitMQ running and a Celery worker:

$ celery worker --app jarbas

Now you can load the data from our datasets and get some other data as static files:

$ python reimbursements <path to reimbursements.xz>
$ python suspicions <path to suspicions.xz file>
$ python companies <path to companies.xz>
$ python tweets

There are sample files to seed yout database inside contrib/data/. You can get full datasets running Rosie or directly with the toolbox.

Creating search vector

For text search in the dashboard:

$ python searchvector

Generate static files

We generate assets through NodeJS, so run it before Django collecting static files:

$ npm run assets
$ python collectstatic


Not sure? Test it!

$ python check
$ python test


Run the server with $ python runserver and load localhost:8000 in your favorite browser.

Using Django Admin

If you would like to access the Django Admin for an alternative view of the reimbursements, you can access it at localhost:8000/admin/ creating an user with:

$ python createsuperuser