# Collecting Bundestag data from Abgeordnetenwatch
> [Abgeordnetenwatch](https://www.abgeordnetenwatch.de) provides an [open API](https://www.abgeordnetenwatch.de/api) that provides info on, among other things, politicians, the politicians' votes and the different polls in parliament, including meta info.

## How to use

Run the notebook top to bottom. Given parameters control the download and transformation components.

### CLI equivalent

Download

    uv run bundestag download abgeordnetenwatch 111

transformation

    uv run bundestag transform abgeordnetenwatch 111

### Skip processing

You can run this notebook to re-do some data processing. But know you can also skip this by running

    uv run bundestag download huggingface

instead.

TODOs:
- identify why in vote json files some mandate_id values (politicians / mandates) appear multiple times (not always with the same vote result) -> affects `compile_votes_data` -> currently ignored and first of the duplicates used

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
from bundestag.data.transform.abgeordnetenwatch.transform import (
    run as transform_abgeordnetenwatch,
)
from bundestag.data.download.abgeordnetenwatch.download import (
    run as download_abgeordnetenwatch,
)

from bundestag.fine_logging import setup_logging
from bundestag.paths import get_paths
import logging
import pandas as pd

## Setup

In [None]:
logger = logging.getLogger(__name__)
setup_logging(logging.INFO)

In [None]:
paths = get_paths("../data")
paths

In [None]:
dry = False  # set `True` for testing, `False` otherwise

## Collecting polls for 2017-2021

In [None]:
legislature_id = 111
max_mandates = 999
max_polls = 999
assume_yes = True
raw_path = paths.raw_abgeordnetenwatch

In [None]:
download_abgeordnetenwatch(
    legislature_id,
    dry=dry,
    raw_path=raw_path,
    max_mandates=max_mandates,
    max_polls=max_polls,
    assume_yes=assume_yes,
)

## Transforming abgeordnetenwatch data

In [None]:
preprocessed_path = paths.preprocessed_abgeordnetenwatch

In [None]:
transform_abgeordnetenwatch(
    legislature_id,
    raw_path=raw_path,
    preprocessed_path=preprocessed_path,
    dry=dry,
    assume_yes=assume_yes,
)

In [None]:
votes = pd.read_parquet(preprocessed_path / f"votes_{legislature_id}.parquet")
votes.head()

In [None]:
mandates = pd.read_parquet(preprocessed_path / f"mandates_{legislature_id}.parquet")
mandates.head()

In [None]:
polls = pd.read_parquet(preprocessed_path / f"polls_{legislature_id}.parquet")
polls.head()