Get all members of parliament elected from 2015 onwards in order to look up their Facebook pages and retrieve IDs. IDs will be used to query the Meta Ads Library. I would rather have avoided this step, but it is necessary since unconstrained queries of the Ads Library have proved computationally infeasible/extremely time-consuming.

Lists of names are scraped from Wikipedia. Supplementary vote data may be collected from KMDvalg later.

In [25]:
import requests
import pandas as pd
from tqdm.notebook import tqdm
from bs4 import BeautifulSoup

In [4]:
election_years = ['2015', '2019']

In [57]:
def get_elected_politicians(election_years):
    
    base_url = 'https://da.wikipedia.org/wiki/Folketingsmedlemmer_valgt_i_'
    wiki_tables = []

    for year in tqdm(election_years, desc = 'Election years collected'):

        url = base_url + year
        response = requests.request('GET', url)

        soup = BeautifulSoup(response.content, 'html.parser')
        tables = soup.find_all('table', class_ = "wikitable sortable")

        df = pd.read_html(str(tables))[1]   
        df['election_data_year'] = year
        wiki_tables.append(df)
        
    return wiki_tables

In [58]:
tables = get_elected_politicians(election_years)

Election years collected:   0%|          | 0/2 [00:00<?, ?it/s]

In [75]:
df_complete = (
    pd.concat(tables)
        .drop_duplicates(subset = 'Navn', keep = 'last')
        .sort_values(by = 'Navn')
        .reset_index(drop = True)
        .rename(columns = {
            'Navn': 'name',
            'Fødselsår': 'birth_year',
            'Parti': 'group_name',
            'Storkreds': 'electoral_region',
            'Uddannelse': 'education',
            'Personlige stemmer': 'personal_votes'
        })
)

In [77]:
df_complete.to_excel('data/raw/parliament/MP_names_15_19.xlsx', index = False)