# "Namentliche Abstimmungen"  in the Bundestag

> Parse and inspect "Namentliche Abstimmungen" (roll call votes) in the Bundestag (the federal German parliament)

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/eschmidt42/bundestag/binder0?urlpath=%2Fvoila%2Frender%2Fnbs%2F04_gui_clean.ipynb)

The German Parliament is so friendly to put all votes of all members into readable XLSX / XLS files (and PDFs ¯\\\_(ツ)\_/¯ ). Those files  can be found here: https://www.bundestag.de/parlament/plenum/abstimmung/liste. 

The purpose of this repo is to help collect those roll call votes and analyze them. This may be particularly interesting for the upcoming election in 2021. So if you want to see what your local member of the parliament has been up to in terms of public roll call votes relative to other members or the respective party, this dataset may be interesting for you. At this point I'd also like to point out the excellent resource [abgeordnetenwatch](https://www.abgeordnetenwatch.de/).

Since the files on the bundestag website are stored in a way making it tricky to automatically crawl them, a bit of manual work is required to generate the dataset. But don't fret! Quite a few recent roll call votes (as of the publishing of this repo) are already prepared for you. But if older or more recent roll call votes are missing, convenience tools to reduce your manual effort are demonstrated below.

An example analysis for inspiration can be found behind the binder link 😁.

In [None]:
#hide
%load_ext autoreload
%autoreload 2

In [None]:
#hide
try:
    from bundestag import parsing, similarity, gui
except ImportError:
    import sys
    sys.path.append('..')
    from bundestag import parsing, similarity, gui

In [None]:
#hide
from pathlib import Path
import ipywidgets as widgets
import pandas as pd
import re
from fastcore.all import *

## How to use

First let's look at what the processed data looks like and then how to parse it from the XLS / XLSX files.

### Inspecting the prepared data

If you have cloned the repo you should already have a `votes.parquet` file in the root directory of the repo. If not feel free to download the `votes.parquet` file directly.

In [None]:
fname = Path('../votes.parquet')

In [None]:
df = pd.read_parquet(fname)
df.head()

### Visualizing the roll call votes

Before we can process the similarities / agreements between the MdBs let's reshape `df`

In [None]:
df_squished = similarity.get_squished_dataframe(df)

and now for the agreements between the MdBs

In [None]:
agreements = similarity.scan_all_agreements(df_squished)

With agreement between two MdBs we here use 1 - [Jaccard distance](https://en.wikipedia.org/wiki/Jaccard_index) times 100. This is the intersection of the issues pairs of MdBs have voted on in the same way divided by the total number of issues the pairs have voted on this way. So if two MdBs have voted on all the same issues and voted always the same way their agreement is 100%. 

### Running the GUI

Using just calculated `df` and `agreements`

In [None]:
_gui = gui.GUI(df, agreements)

Using pre-computed `df` and `agreements`

In [None]:
_gui = gui.GUI(gui.df, gui.agreements)

Running the GUI

In [None]:
_gui.run()

### Downloading & parsing the data into a useful format

In order to collect the data and produce a dataframe like the one stored in `votes.parquet` we need to open https://www.bundestag.de/parlament/plenum/abstimmung/liste and **manually download all the pages of interest into one location**. Then we can automatically query the html documents for the XLS / XLSX documents, download and clean those with the following steps.

Let's first define the source dir with the html data and and the target dir for the downloaded XLSX / XLS files 

In [None]:
html_path = Path('../raw_data')   # location where the html files were >manually< downloaded to
sheet_path = Path('../xlsx_data') # location to automatically download the xlsx and xls files

Downloading all sheet uris found in the files in `html_path` to `sheet_path`

In [None]:
df = parsing.get_multiple_sheets(html_path, sheet_path, nmax=3)

In [None]:
# df.to_parquet("../new_votes.parquet")