The work done till this second milestone was mainly to get our data and clean them. As we do not use a provided dataset, collecting and cleaning data take a tremendeous amount of time.

In [1]:
import pandas as pd
import numpy as np
from difflib import SequenceMatcher
import json
import folium

### Dominating language in towns

To be able to prove that a Rustigraben exists, we need together with the votation results the language spoken in each town. We find such data [here](https://www.atlas.bfs.admin.ch/maps/13/fr/12401_229_228_227/20443.html). The raw values downloaded from there can be found in **data/3007\_Langues\_nationales\_dominantes\_dans\_les\_communes\_en\_2000\_(fr)**. It gives the dominating language for each town in 2000. The value each town get is [Language]: [(medium)/(strong)] or "No domination" (note that these values are in French and language can be either one of the fourth national language, i.e, 'French', 'German', 'Italian', or 'Romansh').

We had a lot of missmatches between the town names in the votations dataframe and the file above. The reason is that mergers between multiple town have been encouraged from the beginning of this century. While the votation data have been nicely adapted susequently by the BFS, it is obviously not the case for the last language survey performed in 2000. We manually added the merge in the file, resulting in **data/'data/languages_2000.xlsx'**

In [20]:
languages = pd.read_excel('data/languages_2000.xlsx', skiprows=1, skip_footer=11)
languages.drop(['Regions-ID'], axis=1, inplace=True)
languages.head()

Unnamed: 0,Regionsname,Langue nationale dominante
0,Aeugst am Albis,Allemand: forte
1,Affoltern am Albis,Allemand: moyenne
2,Bonstetten,Allemand: forte
3,Hausen am Albis,Allemand: forte
4,Hedingen,Allemand: forte


In [22]:
#loading votation with only town as index and extracting list of towns
df_votation = pd.read_pickle('data/data.pkl')
towns_votations = set(df_votation['Commune'])

In [23]:
#all unique town in the language dataset  
towns_languages = set(languages['Regionsname'])

Now lets look at the difference between the two sets:

In [25]:
diff1 = towns_votations - towns_languages
len(diff1)
diff1

{'Büren an der Aare-Meienried',
 'Höchstetten-Hellsau',
 'Kirchdorf (BE)-Jaberg',
 'Mötschwil-Rüti bei Lyssach',
 'Münchenwiler-Clavaleyres',
 'Wald (BE)-Niedermuhlern',
 'Wiggiswil-Deisswil bei Münchenbuchsee'}

#### --> 100% match between towns in votations and towns in languages after a tough job!

We tried to draw a map of the languages with folium but without success. However, the code we tried to run can be seen in the notebook 'parse_languages'. You can also find in that notebook some helpers to perform the manual matching of the towns name in the language file.

### Map by theme

This part creates maps for each that show the percentage of agreement for each thematique.

Load the data for the % by commune and for the the theme of each votation.

In [27]:
data = pd.read_pickle("data/data.pkl")
thematique = pd.read_pickle("data/Thématique.pkl")

load the json to create the commune on the map, and create the lsit of the commune in the json. 

In [28]:
switzerland_coord = [46.765213, 8.252444]
town_geo_path = r'data/switzerland_borders/municipalities_no_urnes.geojson'
geo_json_data = json.load(open(town_geo_path, encoding="utf8"))
commune = [x['name']  for x in geo_json_data['features']]

We merge the 2 dataframes so that we have for each votation and each commune the theme and the percentage of yes. We only takes 'Thématique','Commune' and 'Oui en %' because it will be the only usefull information for later (Votation is no longer use full once we managed to merge)

We also make sure thatthere is in the dtaframes only commune that are in the json.

In [29]:
data_t = data.merge(thematique , on = 'Votation')[['Thématique','Commune','Oui en %']]

data_t = data_t[data_t['Commune'].isin(commune)]
data_t.head()

Unnamed: 0,Thématique,Commune,Oui en %
0,Politique sociale,Aeugst am Albis,34.9
1,Politique sociale,Affoltern am Albis,30.3
2,Politique sociale,Bonstetten,34.3
3,Politique sociale,Hausen am Albis,35.5
4,Politique sociale,Hedingen,31.1


We group the data by theme, and for each one we create a map showing how much people voted yes. We then save it into an html.

In [31]:

for theme, data_theme in data_t.groupby('Thématique') :
    data_theme = data_theme.groupby('Commune', as_index  = False).mean()
    map1 = folium.Map(location=switzerland_coord, zoom_start=8)
    map1.choropleth(geo_data = geo_json_data, \
                                    data = data_theme, \
                                    columns = ['Commune', 'Oui en %'], \
                                    key_on = 'feature.name', \
                                    fill_color = 'RdYlGn', \
                                    fill_opacity = 0.7, \
                                    line_opacity = 0.2, \
                                    legend_name = 'yes in % given to the theme ' + theme)
    
    map1.save('data/map_theme/map_'+theme+'.html')

Now we will randomly take 3 theme to make an alysis (we took only 3 but the anylisis can be the same on the other maps.

So we will look at the ['Economic'](data/map_theme/map_Economie.html) theme, ['Politique de securite'](data/map_theme/map_Politique de sécurité.html) theme and the ['Régime politique'](data/map_theme/map_Régime politique.html) :    
In each one of them we can clearly see a large strip beggining in the valais and ending at Vaduz. We can see in this the Röstigraben, but at the same time the north part and east part of the deutsch part of Switzerland are not as different as the strip, so even if we can put forward the split between the freanch part and deutsch part, it is possible that the Röstigraben is not the only explanationfor those differences.

### Map by recommendation

This notebook is for using the proposition of vote of each poilitical party to create a visual representation of how much each party is listened and try to see which region vote more for each party.

Prepare the map :    
get the json to do the border   
get all the commune name     
only keep the value that ar ein the json.

In [32]:
df = pd.read_pickle("data/data.pkl")
recommend = pd.read_pickle("data/data_Recommandation.pkl")

parties = list(recommend.columns.drop_duplicates())
parties.remove('Date')
parties.remove('Votation')

switzerland_coord = [46.765213, 8.252444]
town_geo_path = r'data/switzerland_borders/municipalities_no_urnes.geojson'
geo_json_data = json.load(open(town_geo_path, encoding="utf8"))
commune = [x['name'] for x in geo_json_data['features']]



to_map = df.merge(recommend.loc[:, ['Votation'] + parties], on='Votation')
to_map.head()

Unnamed: 0,Commune,Votation,Electeurs inscrits,Bulletins rentrés,Participation en %,Bulletins valables,Oui,Non,Oui en %,District,...,PLS,POCH,PRD,PS,PSL,PST,PVL,Rep.,UDC,UDF
0,Aeugst am Albis,29.11.1998 Initiative Droleg,1070.0,487.0,45.5,478.0,167.0,311.0,34.9,Affoltern,...,-1,0,-1,1,-1,1,0,0,-1,-1
1,Affoltern am Albis,29.11.1998 Initiative Droleg,5729.0,2286.0,39.9,2236.0,678.0,1558.0,30.3,Affoltern,...,-1,0,-1,1,-1,1,0,0,-1,-1
2,Bonstetten,29.11.1998 Initiative Droleg,2596.0,1063.0,40.9,1045.0,358.0,687.0,34.3,Affoltern,...,-1,0,-1,1,-1,1,0,0,-1,-1
3,Hausen am Albis,29.11.1998 Initiative Droleg,2081.0,807.0,38.8,792.0,281.0,511.0,35.5,Affoltern,...,-1,0,-1,1,-1,1,0,0,-1,-1
4,Hedingen,29.11.1998 Initiative Droleg,1858.0,810.0,43.6,791.0,246.0,545.0,31.1,Affoltern,...,-1,0,-1,1,-1,1,0,0,-1,-1


Create a map of % of people agreeing in a party for each party.

People agreeing are considered to people that vote the same as the party if it votes yes or no. We do not take into consideration other proposition of the party (like abstentation) or when we do not have information about the recommandation of a party.

In [33]:
for parti in parties :
    current_to_map = to_map.loc[:, ['Commune', 'Oui en %', parti]]
    current_to_map = current_to_map[current_to_map[parti] != 0]
    current_to_map['Agreement'] = current_to_map[['Oui en %', parti]] \
        .apply(lambda x : x['Oui en %'] if x[parti] == 1 else 100 - x['Oui en %'], axis=1)
    
    current_to_map = current_to_map.groupby('Commune', as_index=False).mean()
    
    map1 = folium.Map(location=switzerland_coord, zoom_start=8)
    map1.choropleth(geo_data = geo_json_data, \
                    data = current_to_map, \
                    columns = ['Commune', 'Agreement'], \
                    key_on = 'feature.name', \
                    fill_color = 'RdYlGn', \
                    fill_opacity = 0.7, \
                    line_opacity = 0.2, \
                    legend_name = 'Agreement in % with ' + parti)
    
    map1.save('data/maps_partis/map_' + parti + '.html')
    print(map1)

<folium.folium.Map object at 0x000001ADA63A6BE0>
<folium.folium.Map object at 0x000001AD98B58240>
<folium.folium.Map object at 0x000001ADA68BA208>
<folium.folium.Map object at 0x000001ADA68C1128>
<folium.folium.Map object at 0x000001ADA3A9D588>
<folium.folium.Map object at 0x000001AD9F60D160>
<folium.folium.Map object at 0x000001AD98B58240>
<folium.folium.Map object at 0x000001ADA7043438>
<folium.folium.Map object at 0x000001ADA63A6BE0>
<folium.folium.Map object at 0x000001ADA63A6F98>
<folium.folium.Map object at 0x000001ADA3CB88D0>
<folium.folium.Map object at 0x000001AD828E9F98>
<folium.folium.Map object at 0x000001ADA351A7B8>
<folium.folium.Map object at 0x000001AD82A43438>
<folium.folium.Map object at 0x000001ADA3F2E198>
<folium.folium.Map object at 0x000001ADA33B7320>
<folium.folium.Map object at 0x000001ADA337EDA0>
<folium.folium.Map object at 0x000001AD9C734828>
<folium.folium.Map object at 0x000001AD941196A0>
<folium.folium.Map object at 0x000001AD992B3438>


Now we will take 4 parties to make an alysis, the 2 biggest (UDC,PS) , on with a medium level of importance (PB) and a parti with very few seat at the parlement (PST) (once again we took only 4 but the anylisis can be the same on the other maps) .

So we will look at the ['UDC'](data//maps_partis/map_UDC.html), ['PS](data/maps_partis/map_PS.html) , ['PB](data/maps_partis/map_PBD.html) and the ['PST'](data/maps_partis/map_PST.html) :    
This time the maps shows a lot of difference : the PS and PST shows very clear distinction between the french part and the deutsch part, very fitting of the Röstigraben, in the UDC and PBC on the other hand the difference are less important where in the UDC the difference overall between most of the state is not that big and in the PBC map the Röstigraben is near impossible to see.

# Conclusion

In the end we have seen obvious differences between the french part and the deutsch part, but those differences seems to vary, not apply every where, and even in those differences we can see that they are far from being uniform.     
As we stand it is diffcult to approve or deny the existence of the Röstigraben.
