# Final Aassignment Applied Economic Analysis

## Introduction


It won´t be long until the political billboards will be constructed again in the Netherlands: new elections are close. Almost a year ago, Dutch citizens voted for the new government and members of Parliament. It will soon be time to vote again, now for the local councils. Although many institutions put effort in predicting the outcome of the elections, it is never really sure. What is striking, is that certain parties perform very well in certain areas, whilst they barely get votes in other areas. The Politically Reformed Party (SGP), for example, is very popular in the Bible Belt but doesn't get much votes from the major cities along the Dutch westcoast. The liberals (VVD) are then again more popular in wealthy areas but do not get many votes in neighbourhoods with many people performing manual labour. These statistics can be linked to the performance of certain parties and with that it should be possible to predict the election result to a certain extend. Next to the fact that it gives a better idea about what to expect from the coming elections, it might help strategic voters to maximise their influence on the outcome of the elections. Also, parties could use a prediction tool to find out where and where not to focus their marketing on. If the Socialists (SP) have a very low correlation with agricultural municipalities, then they can optimize their marketing campaign by focussing on those areas that were historically the most likely to vote on them. 

Because of all these benefits, the purpose of this research is to construct a tool that helps predicting the results of elections in municipalities. With this tool, democracy could be optimized as strategic voters can have a better influence. First, I will obtain, clean and structure the data of the CBS, the election results from the 'Kiesraad' and the geographical data obtained from the CBS. After that, I will develop a tool that iterates over columns to find the correlation between votes on a certain party and a specific statistic. When those results are in, I will construct a map where you can see past election results together with a prediction of the coming municipality elections. 

## Collecting, cleaning and structuring

Because of the large quantity of data used in this project, I chose to work with a database. The parent table will be a table that contains all the municipality codes that have ever existed in the Netherlands between 2000 and 2022. Such a table is necessary as there were many reclassifications of municipalities over the past years. This table is obtained from the [CBS](https://opendata.cbs.nl/portal.html?_la=nl&_catalog=CBS&tableId=70739ned&_theme=234) and was created as follows:  

In [None]:
import cbsodata as cbs 
import pandas as pd
import sqlite3

# Use CBS api to obtain the table (link in text above)
muni_all = pd.DataFrame(cbs.get_data('70739ned'))

# Rename municipality code column 
muni_all = muni_all.rename(columns = {'GebiedsOfGemeentecode_3':'municode'})

# Define function to remove higher levels as provinces, municipality indication 'GM' and empty spaces.
def clean_municode(table):
    # Define removal function
    remove_string = lambda x: table[~table.RegioS.str.contains(x)]
    # Define strings to remove 
    removables = ['(PV)','(CR)','(LD)','Nederland']
    # Remove defined strings from column
    for item in removables: 
        table = remove_string(item)
    # Drop empty rows in municode column
    table.dropna(subset = ['municode'], inplace = True)
    # Define string cleaning function
    clean_string = lambda x: table.municode.str.replace(x, '')
    # Clean string with GM and empty space command
    table.municode = clean_string('GM')
    table.municode = clean_string(' ')
    # Transform to numeric 
    table.municode = pd.to_numeric(table.municode)
    clean_string = lambda x: table.enddate.str.replace(x, '')
    table.enddate = clean_string(' ')
    table.enddate = pd.to_numeric(table.enddate)
    table = table[(table.enddate >= 20000101) | (table.enddate.isnull())]
    return(table)

main_data = clean_municode(muni_all).drop_duplicates(subset = ['municode'])

# Establish database connection
con = sqlite3.connect("D:/data/PolProj.db")

#Write dataframe to database. municode gets primary key 
main_data.to_sql('general_data', con, 
                if_exists = 'replace', index = False, 
                dtype={'municode': 'INTEGER PRIMARY KEY AUTOINCREMENT'})

