In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import os
import re
%matplotlib inline
#np.set_printoptions(threshold=np.nan)

# Milestone 1

## Title
Improving electoral fairness in Switzerland

## Abstract
The cantonal borders in Switzerland have mostly stayed fixed since the 19th century, and in some cases, even the Middle-Ages. Therefore, they sometimes 
do not take into account the evolutions of the Swiss society that occured in the last 150 years. Furthermore, since the number of seats the parlement is 
limited, some parties will never get enough votes to be representated, therefore part of the Swiss population is de facto excluded from parlementary debates.
In this study, we intend to explore the Swiss OpenData dataset's electoral and economical data. We would then like to clusters the different 'Communes' into 
new Cantons. These new Cantons would then exibit the best political representation of Switzerland. We would like to study the influence of the political 
partitioning on the parlement's composition. To this effect, we will tweak the different parameters of our model to study its effect. We can also verify 
the quality of the current partition.

## Research questions
- Can we construct an optimal partition for the Council of States ?
- Can we construct an optimal partition for the National Council ?
- Is it possible to optmize both at the same time ?
- Can geography be taken into account in order to keep coherent borders ?
- Can social factors be taken into account ?
- Can the current representation be rated as fair ?
- Can we do the opposite and find the least fair partitioning ?

## Dataset
Swiss open dataset (opendata.swiss) :
- Election au Conseil National (suffrages des partis et forces des partis depuis 1975 : districts et communes)
- Votations populaires (résultats au niveau des communes depuis 1981)
- Population résidente permanente et non permanente selon les niveaux géographiques institutionnels, le sexe, la nationalité (catégorie), le lieu de naissance et la classe d'âge
- swissBoundaries 3D

## A list of internal milestones up until project milestone 2
- Download datasets
- Clean (if needed)
- Map data to Swiss map
- Build model for both elections
- Find an optimization algorithm
- Sweep hyperparameters
	
## Questions for TAs
- What kind of problems can we expect ?
- Do you happen to know any algorithms for geographical clustering ?

# Milestone 2

## Description of the data used
<!--- - intimately acquaint yourself with the data,
- That you can handle the data in its size.
- That you understand what’s into the data (formats, distributions).
- That you considered ways to enrich, filter, transform the data according to your needs.
-->
For the first dataset mentionned above, the  opendata.swiss  website redirected us to this [link](https://www.pxweb.bfs.admin.ch/pxweb/fr/px-x-1702020000_105/-/px-x-1702020000_105.px) to download the dataset. But since the tool provided has a bounded number of downloadable cells, the dataset was downloaded into two Excel documents.
These Excel files were loaded and merged in a single dataframe. We recovered the results of the 2015 Swiss National Council election (Election au Conseil National) for each municipality (commune):
* the percentage of votes (#votes registered in the municipalities for the party/#votes registered in the municipalities) <!--- à vérifier, #votes or #voters???--->
* the corresponding party
* the corresponding municipalities

<!--- bout de code avec print des partis, peut etre un index avec le nom et l'acronyme correspondant des parties -->
Since the milestone 1, we added a dataset named "Elections au Conseil national (électeurs inscrits, électeurs et types de bulletins de vote depuis 1975)" [[link](https://www.pxweb.bfs.admin.ch/pxweb/fr/px-x-1702020000_101/-/px-x-1702020000_101.px)] which includes complementary data linked to the first dataset, such as the number of registered <!--- vrai ?---> voters per municipality.<!--- pourquoi on a besoin de ces données en plus, qu'est qui nous manque par rapport au premier dataset ?--->

<!-- topojson à afficher : où on l'a trouvé, 2e topjson pour retrouver les communes adjacentes -->
In order to display the first dataset and our future results, we downloaded the topographical borders of the Swiss municipalities in a TopoJson file from this [link](https://github.com/selinerdominik/swiss_zoom). Combined with the election data, we can display the election results for each party on a choropleth map (see map below).
<!--- add map--->

We also scrapped another [TopoJson file](http://bl.ocks.org/herrstucki/raw/4327678/aa6f466b7600651bd57838ca70b72ce07e79165d/ch.json) of the municipalities borders from this [link](http://bl.ocks.org/herrstucki/9204795) using Postman. Then, we recovered the adjacent municipalities using the function *neighbors* from the JavaScript module *topojson* [[documentation](https://github.com/topojson/topojson)] and saved the names of municipalities and their neighbors in a CSV file named *graph_communes.csv*. The dataframe of adjacent municipalities will be used later in our project to create cantons from connected municipalities. 


## Cleaning process, preprocess
<!--- - preprocess it 
- That you understand what’s into the data ( missing values, correlations, etc.).
- That you considered ways to enrich, filter, transform the data according to your needs.
- parti socialiste autonomiste qui n'existe que dans le Jura Bernois
- topojson of communes from 2013, cission et fusion of 150 municipalities. --->

For the 2105 Swiss National Council election results (2015SNCER) dataframe, when the percentage of a party is unavailable it is marked as '...'. The missing data was replaced by 0%. The missing values are marked the same way for the 2105 Swiss National Council election voters (2015SNCEV) dataframe, and the missing values are dealt with by replacing the missing number of voters by 0. <!--- justification ?--->

<!--- pas compris dans le code: #remove data from comming from correspondancies --->
<!--- L'histoire des urnes communes--->
We kept the same results for the municipalities that shared a ballot box.

The parties also differ for particular municipalities/cantons, such as the socialist party which is splitted in two in the canton of Jura. Such parties referred as "others" in the dataset were not taken into account <!--- vrai ? on compte les visualiser  ou pas ?-->. 

We realized that the municipalities listed in the adjacent municipalities dataframe recovered from this [link](http://bl.ocks.org/herrstucki/9204795) which redirect us to this [link](https://github.com/interactivethings/swiss-maps) dated back to 2013. Between 2013 and 2015, many municipalities were merged. The  dataframe of the adjacent communes was updated as such (c'est pas la bonne expression).
That explains why later on, we used another TopoJson to visualize the most recent municipalities. An other reason also was that the TopoJson didn't use Switzerland absolute coordinates and therefore it was not possible to use *folium* to implement a choropleth map of the municipalities.

## Clustering
<!--- - complete all the necessary descriptive statistics tasks
- That you have updated your plan in a reasonable way, reflecting your improved knowledge after data acquaintance. In particular, discuss how your data suits your project needs and discuss the methods you’re going to use, giving their essential mathematical details in the notebook.--->

Another idea that has not been tried out yet would be to use spectral clustering for feature extraction. Essentially, by mapping the municipalities to a similarity graph using a distance map, we could derive a set of features based on the eigenvectors of the Laplacian of said graph. This would enable us to reduce the dimentions of the Swiss political spectrum from 24 parties to a few abtract features. Combined with other features derived from other factors such as spoken-language, geography, etc..., we can use traditional machine learning techniques to cluster our municipalities into new cantons. 

## Pipepline
<!--- - have a pipeline in place, fully documented in a notebook, and 
- show us that you’ve advanced with your understanding of the project goals by updating its README description.
- That your plan for analysis and communication is now reasonable and sound, potentially discussing alternatives to your choices that you considered but dropped.
When describing the data, in particular, you should show (non-exhaustive list):--->