# Collecting Bundestag data from Abgeordnetenwatch
> [Abgeordnetenwatch](https://www.abgeordnetenwatch.de) provides an [open API](https://www.abgeordnetenwatch.de/api) that provides info on, among other things, politicians, the politicians' votes and the different polls in parliament, including meta info.

This notebook collects the following information and prepares its parsing to `pandas.DataFrame` objects:
* polls for the 2017-2021 period of the Bundestag
* votes of members of the Bundestag 2017-2021
* info on members of the Bundestag 2017-2021

TODOs:
- identify why in vote json files some mandate_id values (politicians / mandates) appear multiple times (not always with the same vote result) -> affects `compile_votes_data` -> currently ignored and first of the duplicates used

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import bundestag.logging as logging
import bundestag.paths as paths
from bundestag import abgeordnetenwatch as aw

logger = logging.logger
logger.setLevel("DEBUG")

_paths = paths.get_paths("../data")
_paths

In [None]:
aw.ABGEORDNETENWATCH_PATH.parent

In [None]:
aw.ABGEORDNETENWATCH_PATH.mkdir(exist_ok=True)

## Polls 2017-2021

Polls = objects voted on in the Bundestag by the parlamentarians

In [None]:
dry = False  # set `True` for testing, `False` otherwise

### Collecting

In [None]:
json_path = _paths.raw_abgeordnetenwatch
json_path

In [None]:
legislature_id = 111
info = aw.get_poll_info(legislature_id, dry=dry)
aw.store_polls_json(info, legislature_id, dry=dry, path=json_path)

### Parsing

In [None]:
legislature_id = 111
df = aw.get_polls_df(legislature_id, path=json_path)
aw.test_poll_data(df)
df.head()

In [None]:
preprocessed_path = _paths.preprocessed_abgeordnetenwatch
preprocessed_path

In [None]:
file = preprocessed_path / "df_polls.parquet"
logger.debug(f"writing to {file}")

In [None]:
df.to_parquet(path=file)

## Info on politicians

### Collecting

In [None]:
legislature_id = 111
info = aw.get_mandates_info(legislature_id, dry=dry)
aw.store_mandates_info(info, legislature_id, dry=dry, path=json_path)

### Parsing

In [None]:
legislature_id = 111
df = aw.get_mandates_df(legislature_id, path=json_path)
aw.test_mandate_data(df)
df["party"] = df.apply(aw.get_party_from_fraction_string, axis=1)
df.head().T

In [None]:
file = _paths.preprocessed_abgeordnetenwatch / "df_mandates.parquet"
logger.debug(f"Writing to {file}")

In [None]:
offenders = []
for c in df.columns:
    tmp = df.copy()
    tmp = tmp.drop(columns=[c])
    try:
        tmp.to_parquet(file)
    except TypeError:
        offenders.append(c)

In [None]:
offenders

In [None]:
df.columns

In [None]:
df.to_parquet(path=file)

## Votes for one specific poll

### Collecting

In [None]:
poll_id = 4217
info = aw.get_vote_info(poll_id, dry=dry)
aw.store_vote_info(info, poll_id, dry=dry, path=json_path)

### Parsing

In [None]:
aw.test_stored_vote_ids_check(path=json_path)

In [None]:
legislature_id, poll_id = 111, 4217
df = aw.get_votes_df(legislature_id, poll_id, path=json_path)
aw.test_vote_data(df)
df.head()

## All votes for all remaining polls of a specific legislative period

Above only one specific poll vote information was collected for. Here we collect votes for whatever polls are missing.

### Collecting

In [None]:
legislature_id = 111
aw.get_all_remaining_vote_info(legislature_id, dry=dry, path=json_path)

### Parsing

In [None]:
legislature_id = 111
df_all_votes = aw.compile_votes_data(legislature_id, path=json_path)

display(df_all_votes.head(), df_all_votes.tail())

Write compiled votes to disk as csv

In [None]:
all_votes_path = (
    _paths.preprocessed_abgeordnetenwatch
    / f"compiled_votes_legislature_{legislature_id}.csv"
)
logger.debug(f"Writing to {all_votes_path}")

In [None]:
df_all_votes.to_csv(all_votes_path, index=False)

In [None]:
df_all_votes = df_all_votes.assign(
    **{"politician name": aw.get_politician_names}
)

In [None]:
file = _paths.preprocessed_abgeordnetenwatch / "df_all_votes.parquet"
logger.debug(f"Writing to {file}")

In [None]:
df_all_votes.to_parquet(path=file)

In [None]:
!head $all_votes_path