This is the main notebook, it is used to create the map and any data we need for the website.

We will not explain in this notebook the result obtained or the reason why we want them, those will be discuss in the data story

In [7]:
import pandas as pd
import numpy as np
import json
import folium

#This 2 library are used for clustering the data
from sklearn.cluster import KMeans, DBSCAN
#This library is used to reduce a vector in such way as to retain as much data as possible
from sklearn.decomposition import PCA

import matplotlib.pyplot as plt

In [45]:
#This allow us to load the content of data.pkl, 
#this file contains all the data about the votation (especially for us the approval rate) 
#for all canton and all votation of the last 30 years
data = pd.read_pickle("data/data.pkl")

The following cell is loading all the data we need to draw the map. 

In [46]:
#coordinate for folium of the center of Switzerland
switzerland_coord = [46.765213, 8.252444]

#path to a geojson with in it all the borders between cantons and the swiss border.
town_geo_path = r'data/switzerland_borders/municipalities_no_urnes.geojson'
#content of the geojson 
geo_json_data = json.load(open(town_geo_path, encoding="utf8"))
#list of the name of all the commune into the geojson
commune = [x['name']  for x in geo_json_data['features']]

This cell is creating a matrix that represent the dataframe data with a line representing a commune and a column a votation.
This will be used when we will need to clustre our data.
It is possible to pass from a commune to its line on the matrix and the other way around by using respectively commune_dict (which is a dictionary commune to index) and commue_list. It is the same for each vote with votation_dict and votation_list.

If we are missing some data for the matrix we put this vote of a canton to 50% because we cannot let a empty case for the clustering and this way the votation will be counting as not really having an opinion about the vote. (This case will happen 1306 on 696696 so this will not skew the resuts too much).

In [47]:
commune_list = list(set(data['Commune'].values))
commune_dict = { val : idx for idx , val in enumerate(commune_list)   }

votation_list = list(set(data['Votation'].values))
votation_dict = { val : idx for idx , val in enumerate(votation_list)   }

# we create an array of the good size and for each line we get the the result of the votation in the good position,
#using the dictionaries to find the good indexes.
X = np.ones((len(commune_list) , len(votation_list) ) , dtype=float)
for x in data [['Commune','Votation','Oui en %']].fillna(50).values :
    X [commune_dict[x[0]]][votation_dict[x[1]]] = x[2]

This list is a color list that will be used to colour the different group in the maps where the distinction we want to do are about a few discrete values.

In [11]:
color_list = ['#ff0000' , '#00ff00' , '#0000ff' , '#ffff00' , '#ff00ff' , '#00ffff' , '#000000' ]

## Draw map languages

This map is about creating a map which show what language is talked in this commune and at which intensity, and it will show the most voted and less voted proposition by language.

This cell loads the language spoken by commune, for each commune we gives the information of the language spoken (french, german, italian or romansh) and at which intensity it is spoken (big or medium) or if no language is a majority. 

In [12]:
languages = pd.read_excel('data/languages_2000.xlsx', skiprows=1, skip_footer=11)
languages.drop(['Regions-ID'], axis=1, inplace=True)
languages.columns = ['Commune' , 'Language']
languages.head()

Unnamed: 0,Commune,Language
0,Aeugst am Albis,Allemand: forte
1,Affoltern am Albis,Allemand: moyenne
2,Bonstetten,Allemand: forte
3,Hausen am Albis,Allemand: forte
4,Hedingen,Allemand: forte


In [13]:
#give for each line of data the language spoken inside its commune. 
data_lang = data.merge ( languages , on = 'Commune')
data_lang.head()

Unnamed: 0,Commune,Votation,Electeurs inscrits,Bulletins rentrés,Participation en %,Bulletins valables,Oui,Non,Oui en %,District,Canton,Language
0,Aeugst am Albis,29.11.1998 Initiative Droleg,1070.0,487.0,45.5,478.0,167.0,311.0,34.9,Affoltern,Zürich,Allemand: forte
1,Aeugst am Albis,14.06.2015 Initiative sur les bourses d'études,1380.0,706.0,51.2,695.0,186.0,509.0,26.8,Affoltern,Zürich,Allemand: forte
2,Aeugst am Albis,25.09.2016 Loi fédérale sur le renseignement,1400.0,670.0,47.9,659.0,417.0,242.0,63.3,Affoltern,Zürich,Allemand: forte
3,Aeugst am Albis,03.03.1991 Encouragement des transports publics,835.0,321.0,38.4,312.0,128.0,184.0,41.0,Affoltern,Zürich,Allemand: forte
4,Aeugst am Albis,12.02.2017 Réforme de l'imposition des entrepr...,1395.0,759.0,54.4,750.0,318.0,432.0,42.4,Affoltern,Zürich,Allemand: forte


In [43]:
#this is a dictionnary that associate a language to a color on the map.
color_language = {
    'Allemand: forte':'red',
    'Allemand: moyenne':'lightcoral',
    'Français: forte':'blue',
     'Français: moyenne':'lightskyblue',
    'Italien: forte':'limegreen',
     'Italien: moyenne':'darkseagreen',
    'Romanche: forte':'yellow',
    'Romanche: moyenne':'khaki',
    'Pas de dominance nette': 'grey'   
}

In [44]:
#this cell is drawing the map of the language in Switzerland


languages_series = languages.set_index('Commune')['Language']

#this function will be used inside the geojson method to color the part of the map as descibed by color_language
def style_function_language(feature):
    language = languages_series.get(feature['name'], None)
    if(language == None):
        print(feature['name'])
    return {
        'fillOpacity': 1,
        'weight': 0,
        'fillColor': color_language[language]
    }

m = folium.Map(
    location=switzerland_coord,
    tiles=None,
    zoom_start=8
)

folium.GeoJson(
    geo_json_data,
    style_function=style_function_language
).add_to(m)

m.choropleth(geo_data=geo_json_data,
             fill_opacity=0,
             line_opacity=1)

m.save('data/map_language.html')

For each language and intensity spoken we search for the 5 most voted and the 5 least voted.

In [None]:
for data_by_l in data_lang.groupby('Language') :
    current_language = data_by_l[0]
    databl_mean = data_by_l[1].groupby('Votation' , as_index = False).mean()[['Votation','Oui en %']]
    databl_votation = databl_mean.sort_values(by='Oui en %' , ascending = False)
    print (current_language + ' max : '  )
    print (databl_votation.head(5))
    
    print (current_language + ' min : '  )
    print (databl_votation.tail(5))

### Map by theme

This part creates maps for each that show the percentage of agreement for each thematique.

We merge the 2 dataframes so that we have for each votation and each commune the theme and the percentage of yes. We only takes 'Thématique','Commune' and 'Oui en %' because it will be the only usefull information for later (Votation is no longer usefull once we managed to merge)

We also make sure that there is in the dataframes only commune that are in the json so that we do not make the folium functions crash.

In [21]:
thematique = pd.read_pickle("data/Thématique.pkl")
data_theme = data.merge(thematique , on = 'Votation')

data_t = data_theme[['Thématique','Commune','Oui en %']]
data_t = data_t[data_t['Commune'].isin(commune)]
data_t.head()

Unnamed: 0,Thématique,Commune,Oui en %
0,"Enseignement, culture et médias",Aeugst am Albis,26.8
1,"Enseignement, culture et médias",Affoltern am Albis,26.6
2,"Enseignement, culture et médias",Bonstetten,23.6
3,"Enseignement, culture et médias",Hausen am Albis,26.1
4,"Enseignement, culture et médias",Hedingen,25.8


We group the data by theme, and for each one we create a map showing how much people voted yes. We then save it into an html.

In [None]:

for theme, data_theme in data_t.groupby('Thématique') :
    data_theme = data_theme.groupby('Commune', as_index  = False).mean()
    map1 = folium.Map(location=switzerland_coord, zoom_start=8)
    map1.choropleth(geo_data = geo_json_data, \
                                    data = data_theme, \
                                    columns = ['Commune', 'Oui en %'], \
                                    key_on = 'feature.name', \
                                    fill_color = 'RdYlGn', \
                                    fill_opacity = 0.7, \
                                    line_opacity = 0.2, \
                                    legend_name = 'yes in % given to the theme ' + theme)
    
    map1.save('data/map_theme/map_'+theme+'.html')

### Map by recommendation

This notebook is for using the proposition of vote of each poilitical party to create a visual representation of how much each party is listened and try to see which region vote more for each party.

Prepare the map :    
get the json to do the border   
get all the commune name     
only keep the value that are in the json.

In [22]:
recommend = pd.read_pickle("data/data_Recommandation.pkl")

parties = list(recommend.columns.drop_duplicates())
parties.remove('Date')
parties.remove('Votation')



data_recommend = data.merge(recommend.loc[:, ['Votation'] + parties], on='Votation')
data_recommend.head()

Unnamed: 0,Commune,Votation,Electeurs inscrits,Bulletins rentrés,Participation en %,Bulletins valables,Oui,Non,Oui en %,District,...,PLS,POCH,PRD,PS,PSL,PST,PVL,Rep.,UDC,UDF
0,Aeugst am Albis,14.06.2015 Initiative sur les bourses d'études,1380.0,706.0,51.2,695.0,186.0,509.0,26.8,Affoltern,...,0,0,0,1,0,1,-1,0,-1,-1
1,Affoltern am Albis,14.06.2015 Initiative sur les bourses d'études,7026.0,2915.0,41.5,2851.0,759.0,2092.0,26.6,Affoltern,...,0,0,0,1,0,1,-1,0,-1,-1
2,Bonstetten,14.06.2015 Initiative sur les bourses d'études,3529.0,1740.0,49.3,1705.0,402.0,1303.0,23.6,Affoltern,...,0,0,0,1,0,1,-1,0,-1,-1
3,Hausen am Albis,14.06.2015 Initiative sur les bourses d'études,2395.0,1143.0,47.7,1120.0,292.0,828.0,26.1,Affoltern,...,0,0,0,1,0,1,-1,0,-1,-1
4,Hedingen,14.06.2015 Initiative sur les bourses d'études,2476.0,1252.0,50.6,1224.0,316.0,908.0,25.8,Affoltern,...,0,0,0,1,0,1,-1,0,-1,-1


Create a map of % of people agreeing in a party for each party.

People agreeing are considered to people that vote the same as the party if it votes yes or no. We do not take into consideration other proposition of the party (like abstentation) or when we do not have information about the recommandation of a party.

In [None]:
for parti in parties :
    current_to_map = data_recommend.loc[:, ['Commune', 'Oui en %', parti]]
    current_to_map = current_to_map[current_to_map[parti] != 0]
    current_to_map['Agreement'] = current_to_map[['Oui en %', parti]] \
        .apply(lambda x : x['Oui en %'] if x[parti] == 1 else 100 - x['Oui en %'], axis=1)
    
    current_to_map = current_to_map.groupby('Commune', as_index=False).mean()
    
    map1 = folium.Map(location=switzerland_coord, zoom_start=8)
    map1.choropleth(geo_data = geo_json_data, \
                    data = current_to_map, \
                    columns = ['Commune', 'Agreement'], \
                    key_on = 'feature.name', \
                    fill_color = 'RdYlGn', \
                    fill_opacity = 0.7, \
                    line_opacity = 0.2, \
                    legend_name = 'Agreement in % with ' + parti)
    
    map1.save('data/maps_partis/map_' + parti + '.html')
    

## Clustering


### kmeans

In [26]:
def draw_map_kmeans (n_clusters , X , file_PCA = None ) :
    kmeans_res = KMeans(n_clusters=n_clusters, random_state=0).fit(X)
    groups = kmeans_res.labels_
    commune_to_group = pd.DataFrame({'Commune' : commune_list , 'Group' : groups})
    commune_to_group = commune_to_group.set_index('Commune')['Group']
    
    
    
    plt.figure(100+n_clusters)
    
    
    model_PCA = PCA ( n_components=2)
    X_PCA = model_PCA.fit_transform(X)
    
    for current_group in range (n_clusters) :
        group_y = [X_PCA[i] for i in range(len(X_PCA)) if groups[i] == current_group]
        plt.scatter( [x[0] for x in group_y], [x[1] for x in group_y], c= color_list[current_group])
    
    if (file_PCA != None) :
        plt.savefig(file_PCA+'PCAA_kmeans'+str(n_clusters)+'.png')
    else :
        print(plt.show())
    plt.gcf().clear()
    
    
    
    def style_function_kmeans(feature):
        group = commune_to_group.get(feature['name'], None)
        if(group == None):
            print(feature['name'])
        return {
            'fillOpacity': 1,
            'weight': 0,
            'fillColor': color_list[group]
        }
    
    
    
    m = folium.Map(location=switzerland_coord, zoom_start=8)
    
    folium.GeoJson(
        geo_json_data,
        style_function=style_function_kmeans
    ).add_to(m)

    m.choropleth(geo_data=geo_json_data,
             fill_opacity=0,
             line_opacity=1)
    
    return m

In [14]:
for i in range (2,6) :  
    draw_map_kmeans(i,X, file_PCA='data/map_ml/').save('data/map_ml/kmeans'+str(i)+'.html')

<matplotlib.figure.Figure at 0x1bf71119fd0>

<matplotlib.figure.Figure at 0x1bf711919e8>

<matplotlib.figure.Figure at 0x1bf711d2fd0>

<matplotlib.figure.Figure at 0x1bf711cdf28>

### DBSCAN

In [27]:
def draw_map_DBSCAN (X , file_PCA = None) : 
    min_samples = 20

    X_array = [ np.array(x_) for x_ in X]
    range_X = range(len(X))
    Xmeans = np.mean([ np.mean(\
                            np.sort([np.linalg.norm(X_array[x]-X_array[y]) \
                             for x in range_X  if x!=y])[:(min_samples*2-1)] \
                           )\
                   for y in range_X ] )
    groups =  DBSCAN(eps=Xmeans, min_samples=min_samples).fit(X).labels_
    
    
    
    
    model_PCA = PCA ( n_components=2)
    X_PCA = model_PCA.fit_transform(X)
    
    n_clusters = max(groups)+1
    for current_group in range (-1,n_clusters) :
        group_y = [X_PCA[i] for i in range(len(X_PCA)) if groups[i] == current_group]
        plt.scatter( [x[0] for x in group_y], [x[1] for x in group_y], c= color_list[current_group])
    
    
    if (file_PCA != None) :
        
        plt.savefig(file_PCA+'PCAA_DBSCAN.png')
    else :
        print(plt.show())
        
    plt.gcf().clear()
    
    
    
    
    
    commune_to_group = pd.DataFrame({'Commune' : commune_list , 'Group' : groups}).set_index('Commune')['Group']
    
    def style_function_DBSCAN(feature):
        group = commune_to_group.get(feature['name'], None)
        if(group == None):
            print(feature['name'])
        return {
            'fillOpacity': 1,
            'weight': 0,
            'fillColor': color_list[group]
        }
    
    
    
    m = folium.Map(location=switzerland_coord, zoom_start=8)
    
    folium.GeoJson(
        geo_json_data,
        style_function=style_function_DBSCAN
    ).add_to(m)

    m.choropleth(geo_data=geo_json_data,
             fill_opacity=0,
             line_opacity=1)
    
    return m

In [16]:
draw_map_DBSCAN (X,'data/map_ml/').save('data/map_ml/DBSCAN.html')

<matplotlib.figure.Figure at 0x1bf0a059400>

### clustering for theme

In [18]:
#data_bg = data by group
for data_bg in data_theme[['Thématique','Votation','Commune','Oui en %']].groupby('Thématique') :
    theme = data_bg[0]
    data_bg = pd.DataFrame(data = data_bg[1])

    votation_list_t = list(set(data_bg['Votation'].values))
    votation_dict_t = { val : idx for idx , val in enumerate(votation_list_t)   }

    Xt = np.ones((len(commune_list) , len(votation_list_t) ) , dtype=float)
    
    for x in data_bg [['Commune','Votation','Oui en %']].fillna(50).values :
        Xt [commune_dict[x[0]]][votation_dict_t[x[1]]] = x[2]
    draw_map_kmeans(2,Xt,'data/maps_theme_ml/'+theme).save('data/maps_theme_ml/kmeans_'+theme+'.html') 
    draw_map_DBSCAN (Xt,'data/maps_theme_ml/'+theme).save('data/maps_theme_ml/DBSCAN_'+theme+'.html')

<matplotlib.figure.Figure at 0x1bf7124ce80>

### Cluster by recommendation

In [25]:
for parti in parties :
    curr_recommend = data_recommend.loc[:, ['Commune', 'Votation' , 'Oui en %', parti]]
    curr_recommend = curr_recommend[curr_recommend[parti] != 0]
    curr_recommend['Agreement'] = curr_recommend[['Oui en %', parti]] \
        .apply(lambda x : x['Oui en %'] if x[parti] == 1 else 100 - x['Oui en %'], axis=1)
    
    
    votation_list_t = list(set(curr_recommend['Votation'].values))
    votation_dict_t = { val : idx for idx , val in enumerate(votation_list_t)   }

    Xt = np.ones((len(commune_list) , len(votation_list_t) ) , dtype=float)
    
    for x in curr_recommend [['Commune','Votation','Oui en %']].fillna(50).values :
        Xt [commune_dict[x[0]]][votation_dict_t[x[1]]] = x[2]
    draw_map_kmeans(2,Xt,'data/map_recommendation_cluster/'+parti).save('data/map_recommendation_cluster/kmeans_'+parti+'.html') 
    draw_map_DBSCAN (Xt,'data/map_recommendation_cluster/'+parti).save('data/map_recommendation_cluster/DBSCAN_'+parti+'.html')
    
    

<matplotlib.figure.Figure at 0x1bf703f6588>

# Current day analysis

In [28]:
data = pd.read_pickle("data/data_young.pkl")
data_recommend = data.merge(recommend.loc[:, ['Votation'] + parties], on='Votation')
data_theme = data.merge(thematique , on = 'Votation')

In [29]:
votation_list = list(set(data['Votation'].values))
votation_dict = { val : idx for idx , val in enumerate(votation_list)   }

X = np.ones((len(commune_list) , len(votation_list) ) , dtype=float)


for x in data [['Commune','Votation','Oui en %']].fillna(50).values :
    X [commune_dict[x[0]]][votation_dict[x[1]]] = x[2]

In [30]:
for i in range (2,6) :  
    draw_map_kmeans(i,X, file_PCA='data/young/map_ml/').save('data/young/map_ml/kmeans'+str(i)+'.html')

<matplotlib.figure.Figure at 0x25ad88fc898>

<matplotlib.figure.Figure at 0x25ad3869780>

<matplotlib.figure.Figure at 0x25ad3878588>

<matplotlib.figure.Figure at 0x25adb41ffd0>

In [31]:
draw_map_DBSCAN (X,'data/young/map_ml/').save('data/young/map_ml/DBSCAN.html')

<matplotlib.figure.Figure at 0x25adb41f1d0>

In [32]:
#data_bg = data by group
for data_bg in data_theme[['Thématique','Votation','Commune','Oui en %']].groupby('Thématique') :
    theme = data_bg[0]
    data_bg = pd.DataFrame(data = data_bg[1])

    votation_list_t = list(set(data_bg['Votation'].values))
    votation_dict_t = { val : idx for idx , val in enumerate(votation_list_t)   }

    Xt = np.ones((len(commune_list) , len(votation_list_t) ) , dtype=float)
    
    for x in data_bg [['Commune','Votation','Oui en %']].fillna(50).values :
        Xt [commune_dict[x[0]]][votation_dict_t[x[1]]] = x[2]
    draw_map_kmeans(2,Xt,'data/young/maps_theme_ml/'+theme).save('data/young/maps_theme_ml/kmeans_'+theme+'.html') 
    draw_map_DBSCAN (Xt,'data/young/maps_theme_ml/'+theme).save('data/young/maps_theme_ml/DBSCAN_'+theme+'.html')

<matplotlib.figure.Figure at 0x25adb43e828>

In [52]:
for parti in parties :
    curr_recommend = data_recommend.loc[:, ['Commune', 'Votation' , 'Oui en %', parti]]
    curr_recommend = curr_recommend[curr_recommend[parti] != 0]
    if (len(curr_recommend) > 0):
        curr_recommend['Agreement'] = curr_recommend[['Oui en %', parti]] \
            .apply(lambda x : x['Oui en %'] if x[parti] == 1 else 100 - x['Oui en %'], axis=1)
    
    
        votation_list_t = list(set(curr_recommend['Votation'].values))
        votation_dict_t = { val : idx for idx , val in enumerate(votation_list_t)   }

        Xt = np.ones((len(commune_list) , len(votation_list_t) ) , dtype=float)
    
        for x in curr_recommend [['Commune','Votation','Oui en %']].fillna(50).values :
            Xt [commune_dict[x[0]]][votation_dict_t[x[1]]] = x[2]
        
        if ( len(Xt[0]) > 1) :
            draw_map_kmeans(2,Xt,'data/young/map_recommendation_cluster/'+parti).save('data/young/map_recommendation_cluster/kmeans_'+parti+'.html') 
            draw_map_DBSCAN (Xt,'data/young/map_recommendation_cluster/'+parti).save('data/young/map_recommendation_cluster/DBSCAN_'+parti+'.html')
    

<matplotlib.figure.Figure at 0x25adb2d3c88>