# Collecting Bundestag data from Abgeordnetenwatch
> [Abgeordnetenwatch](https://www.abgeordnetenwatch.de) provides an [open API](https://www.abgeordnetenwatch.de/api) that provides info on, among other things, politicians, the politicians' votes and the different polls in parliament, including meta info.

This notebook collects the following information and prepares its parsing to `pandas.DataFrame` objects:
* polls for the 2017-2021 period of the Bundestag
* votes of members of the Bundestag 2017-2021
* info on members of the Bundestag 2017-2021

TODOs:
- identify why in vote json files some mandate_id values (politicians / mandates) appear multiple times (not always with the same vote result) -> affects `compile_votes_data` -> currently ignored and first of the duplicates used

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import bundestag.data.download.abgeordnetenwatch as download_aw
import bundestag.data.transform.abgeordnetenwatch as transform_aw
import bundestag.logging as logging
import bundestag.paths as paths

logger = logging.logger
logger.setLevel("DEBUG")

_paths = paths.get_paths("../data")
_paths

## Polls 2017-2021

Polls = objects voted on in the Bundestag by the parlamentarians

In [None]:
dry = True  # set `True` for testing, `False` otherwise

In [None]:
json_path = _paths.raw_abgeordnetenwatch
json_path

### Collecting

In [None]:
legislature_id = 111
data = download_aw.request_poll_data(legislature_id, dry=dry, num_polls=999)

In [None]:
download_aw.store_polls_json(data, legislature_id, dry=dry, path=json_path)

### Parsing

In [None]:
legislature_id = 111
df = transform_aw.get_polls_data(legislature_id, path=json_path)
df.head()

In [None]:
preprocessed_path = _paths.preprocessed_abgeordnetenwatch
preprocessed_path

In [None]:
file = preprocessed_path / "df_polls.parquet"
logger.debug(f"writing to {file}")

In [None]:
if not dry:
    df.to_parquet(path=file)

## Info on politicians

### Collecting

In [None]:
legislature_id = 111
data = download_aw.request_mandates_data(
    legislature_id, dry=dry, num_mandates=999
)

In [None]:
download_aw.store_mandates_json(data, legislature_id, dry=dry, path=json_path)

### Parsing

In [None]:
legislature_id = 111
df = transform_aw.get_mandates_data(legislature_id, path=json_path)
df.head()

In [None]:
df["party"] = df.apply(transform_aw.get_parties_from_col, axis=1)
df.head().T

In [None]:
file = _paths.preprocessed_abgeordnetenwatch / "df_mandates.parquet"
logger.debug(f"Writing to {file}")

In [None]:
if not dry:
    df.to_parquet(path=file)

## Votes for one specific poll

### Collecting

In [None]:
poll_id = 4217
legislature_id = 111

In [None]:
data = download_aw.request_vote_data(poll_id, dry=dry)

In [None]:
download_aw.store_vote_json(data, poll_id, dry=dry, path=json_path)

### Parsing

In [None]:
legislature_id, poll_id = 111, 4217
df = transform_aw.get_votes_df(legislature_id, poll_id, path=json_path)
df.head()

## All votes for all remaining polls of a specific legislative period

Above only one specific poll vote information was collected for. Here we collect votes for whatever polls are missing.

### Collecting

In [None]:
legislature_id = 111
download_aw.get_all_remaining_vote_data(
    legislature_id, dry=dry, path=json_path
)

### Parsing

In [None]:
legislature_id = 111
df_all_votes = transform_aw.compile_votes_data(legislature_id, path=json_path)

display(df_all_votes.head(), df_all_votes.tail())

In [None]:
df_all_votes["vote"].unique()

Write compiled votes to disk as csv

In [None]:
all_votes_path = (
    _paths.preprocessed_abgeordnetenwatch
    / f"compiled_votes_legislature_{legislature_id}.csv"
)
logger.debug(f"Writing to {all_votes_path}")

In [None]:
if not dry:
    df_all_votes.to_csv(all_votes_path, index=False)

In [None]:
df_all_votes = df_all_votes.assign(
    **{"politician name": transform_aw.get_politician_names}
)

In [None]:
file = _paths.preprocessed_abgeordnetenwatch / "df_all_votes.parquet"
logger.debug(f"Writing to {file}")

In [None]:
if not dry:
    df_all_votes.to_parquet(path=file)