# Extract layer

At this point, there are three functions, two of which are to configure the information we need to send to the API and the last one to actually collect the data.

## The API's endpoint
The [REST API](http://www.transparencia.gov.br/swagger-ui.html) endpoint that will be used, allows requests by passing two parameters, as the following example:
```
http://www.transparencia.gov.br/api-de-dados/bolsa-familia-por-municipio/?mesAno={yyyymm}&codigoIbge={xxxxyy}pagina=1
```
Where:
+ `mesAno` indicates the date, in the _YYYYMM_ format, when the Bolsa Famílias' benefit was provided by the government;
+ `codigoIbge` is the information composed by the [IBGE](https://www.ibge.gov.br/)'s _state_code_ (the first 4 digits) and _city_code_ (the last 2 digits), which is used to represent the benefied region.
---
For example, if we make a request with the parameters `201901` and `5300108`, we get the following response:
```json
[
    {
        "id": 78300058,
        "dataReferencia": "01/01/2019",
        "municipio": {
            "codigoIBGE": "5300108",
            "nomeIBGE": "BRASÍLIA",
            "nomeIBGEsemAcento": "BRASILIA",
            "pais": "BRASIL",
            "uf": {
                "sigla": "DF",
                "nome": "DISTRITO FEDERAL"
            }
        },
        "tipo": {
            "id": 1,
            "descricao": "Bolsa Família",
            "descricaoDetalhada": "Bolsa Família"
        },
        "valor": 12013474.00,
        "quantidadeBeneficiados": 66650
    }
]
```
> _The displayed information is about the city of Brasília, federal district of Brazil, from 01/01/2019._

## Imports
First of all, let's define our _imports_. For this layer, we'll need the `csv`, `json` and `requests` python modules.

You may need to download the _requests_ module if it's missing. So, download it using the `pip` package installer by running the following command:
```
$ pip install requests
```

Or by any other ways you prefer.

In [1]:
import csv
import json
import requests as r

## The _mesAno_ parameter
In order the arrange the date interval which the program's benefit was provided, the following function was implemented. It requires 2 arguments:
+ `year_range`: a tuple with the first and last occurrence of _years_.
+ `month_range`: a tuple with the first and last occurrence of _months_.

And it returns a `list` containing a year and month set in `YYYYMM` format.

In [2]:
def set_date_range(year_range: tuple, month_range: tuple) -> list:
    '''
    Returns a list containing a year and month set in YYYYMM format.
    '''

    dates = []
    for year in range(year_range[0], year_range[1] + 1):
        for month in range(month_range[0], month_range[1] + 1):
            dates.append(
                '{}0{}'.format(year, month) if month <= 9 else '{}{}'.format(year, month))

    return dates

## The _codigoIbge_ parameter
As cited before, the second parameter required in the API's endpoint is the concatenation of _state_code_ with the _city_code_. There's a .csv file in the project root which provides such information, it can be found in _CSV/municipios_IBGE.csv_ path. We'll use this file to arrange the `codigoIbge` parameter set.

To do so, the bellow function was implemented. It accepts 1 argument:
+ `reading_file`: the .csv file path that will be read.

And as an optional argument, it accepts:
+ `column_value`: represents the value of the column that will be filtered.

Finally, it returns a `list` with the formatted IBGE codes as `XXXXYY`.
> _X = state_code digits;\
Y = city_code digits._

In [4]:
def set_ibge_codes(reading_file: str, column_value='') -> list:
    '''
    Reads a .csv file, and filters by column value, and returns
    a list with the IBGE codes in the API's endpoint format.
    Optional keywords arguments: column_value: value of the column to be filtered.
    '''

    ibge_codes = []
    with open(reading_file, newline='', encoding='utf-8') as csvfile:
        spamreader = csv.reader(csvfile, delimiter=',')
        filtered = list(filter(lambda x: column_value in x, spamreader)
                        ) if column_value != '' else list(spamreader)

        for arr in filtered:
            ibge_codes.append('{}{}'.format(arr[1], arr[2]))

    return ibge_codes

## Data gathering
After we set the date interval and where the benefit was provided, we can finally collect the data by calling the API endpoint. 

The following function is responsible to do this work. It requires 3 arguments:
+ `dates`: a list containing the date interval;
+ `ibge_codes`: a list containing the IBGE codes, as decribed before;
+ `save_path`: the directory path which a _.json_ file, containing the collected program data, will be written and saved.

In [6]:
def gather_data(dates: list, ibge_codes: list, save_path: str) -> None:
    '''
    Collects data from the API, by passing the dates and IBGE codes
    formatted, and saves a JSON file in the specified path.
    '''

    uris = []
    returned_json = []

    for code in ibge_codes:
        for date in dates:
            uris.append(
                'http://www.transparencia.gov.br/api-de-dados/bolsa-familia-por-municipio/?mesAno={}&codigoIbge={}&pagina=1'.format(
                    date, code
                ))

    for uri in uris:
        try:
            response = r.get(uri)
            if response.status_code == 200 and response.json():
                returned_json.append(response.json()[0])
        except ValueError:
            print('ValueError exception for: {}'.format(uri))
        except Exception as e:
            blocked = True
            while blocked:
                response = r.get(uri)
                blocked = False

            print(repr(e))

    with open(save_path, 'w+', encoding='utf-8') as jsonfile:
        all_json = json.dumps(returned_json, indent=4, ensure_ascii=False)
        jsonfile.write(all_json)