# Interactive Viz

Build a Choropleth map which shows intuitively (i.e., use colors wisely) how much grant money goes to each Swiss canton. To do so, you will need to use the provided TopoJSON file, combined with the Choropleth map example you can find in the Folium README file.  

HINT: the P3 database is formed by entries which assign a grant (and its approved amount) to a University name.  

Therefore you will need a smart strategy to go from University to Canton name. The Geonames Full Text Search API in JSON can help you with this -- try to use it as much as possible to build the canton mappings that you need. For those universities for which you cannot find a mapping via the API, you are then allowed to build it manually -- feel free to stop by the time you mapped the top-95% of the universities. I also recommend you to use an intermediate viz step for debugging purposes, showing all the universties as markers in your map (e.g., if you don't select the right results from the Geonames API, some of your markers might be placed on nearby countries...)

BONUS: using the map you have just built, and the geographical information contained in it, could you give a rough estimate of the difference in research funding between the areas divided by the Röstigraben?   

HINT: for those cantons cut through by the Röstigraben, this viz can be helpful!


In [1]:
%matplotlib inline
import pandas as pd
import numpy as np
import scipy.stats as stats
import folium
import os
import json
import geopy
import geocoder
import time
import csv
from IPython.display import IFrame
from geopy.geocoders import Nominatim
from branca.colormap import *
#pip install python-google-places
from googleplaces import GooglePlaces, types, lang

# Reading data

We read the data from P3 Grant csv file.

In [7]:
grantExport = pd.read_csv("data/P3_GrantExport.csv", sep=';')
grantExport = grantExport.fillna("")
grantExport.head(5)

Unnamed: 0,"﻿""Project Number""",Project Title,Project Title English,Responsible Applicant,Funding Instrument,Funding Instrument Hierarchy,Institution,University,Discipline Number,Discipline Name,Discipline Name Hierarchy,Start Date,End Date,Approved Amount,Keywords
0,1,Schlussband (Bd. VI) der Jacob Burckhardt-Biog...,,Kaegi Werner,Project funding (Div. I-III),Project funding,,Nicht zuteilbar - NA,10302,Swiss history,Human and Social Sciences;Theology & religious...,01.10.1975,30.09.1976,11619.0,
1,4,Batterie de tests à l'usage des enseignants po...,,Massarenti Léonard,Project funding (Div. I-III),Project funding,Faculté de Psychologie et des Sciences de l'Ed...,Université de Genève - GE,10104,Educational science and Pedagogy,"Human and Social Sciences;Psychology, educatio...",01.10.1975,30.09.1976,41022.0,
2,5,"Kritische Erstausgabe der ""Evidentiae contra D...",,Kommission für das Corpus philosophorum medii ...,Project funding (Div. I-III),Project funding,Kommission für das Corpus philosophorum medii ...,"NPO (Biblioth., Museen, Verwalt.) - NPO",10101,Philosophy,Human and Social Sciences;Linguistics and lite...,01.03.1976,28.02.1985,79732.0,
3,6,Katalog der datierten Handschriften in der Sch...,,Burckhardt Max,Project funding (Div. I-III),Project funding,Abt. Handschriften und Alte Drucke Bibliothek ...,Universität Basel - BS,10302,Swiss history,Human and Social Sciences;Theology & religious...,01.10.1975,30.09.1976,52627.0,
4,7,Wissenschaftliche Mitarbeit am Thesaurus Lingu...,,Schweiz. Thesauruskommission,Project funding (Div. I-III),Project funding,Schweiz. Thesauruskommission,"NPO (Biblioth., Museen, Verwalt.) - NPO",10303,Ancient history and Classical studies,Human and Social Sciences;Theology & religious...,01.01.1976,30.04.1978,120042.0,


## Focus on useful data

In order to resolve this exercise and find the difference in grant accorded to Cantons of Switzerland, we need first to know what are the useful data.   

Obviously, the column "Approved Amount" is essential, and we have to use it.   

Now we noticed that **the column "University" of our data is empty** in some row and not assigneable in others (Nicht zuteilbar - NA in german), this could be problematic since we should potentially find other way to assign the grant to a Canton than linking it with the University. This is why we need to analyze if the grants with empty University are significantly important.

We then start by computing the total amount of grant.

In [8]:
grantExport['Approved Amount'] = grantExport['Approved Amount'].replace('data not included in P3', 0)
total_amount = sum(grantExport['Approved Amount'].astype(float))
total_amount

13025330134.940002

We then compute the amount of the grant with empty university field, and their proportion

In [9]:
grantExport['University'] = grantExport['University'].replace('Nicht zuteilbar - NA', '')
grantExportEmptyUni = grantExport[grantExport['University'] == '']
grantExportEmptyUniCleanAmount = grantExportEmptyUni['Approved Amount'].copy()
grantExportEmptyUniCleanAmount[grantExportEmptyUniCleanAmount == 'data not included in P3'] = 0
empty_uni_total_amount = sum(grantExportEmptyUniCleanAmount.astype(float))
empty_uni_total_amount

189886657.92000002

In [10]:
print("Rate of grant amount non assigned to a University:", empty_uni_total_amount/total_amount*100, "%")

Rate of grant amount non assigned to a University: 1.45782606623 %


Hopefully, we see that the grants with empty or non assigneable university represents less than 1.46% of the total amount, we decide to simply drop this rows from our data.   

In [11]:
grantExport = grantExport[grantExport['University'] != '']

**In conclusion, we see that We can link University and Grant amount, and furthermore Canton/Amount, using only the data from columns "University" and "Approved Amount".** 

## Linking Universities to Cantons

We first extract the universities from the data and get only unique values.

In [12]:
universities = grantExport['University'].unique()
len(universities)

76

In order to find the univerisites addresses, we combined two APIs. 
* **Google Places API**
This API provides a **text_search** method which has a really good performance in finding our universities and institute adresses. Unfortunately this method does not give us directly the Cantons associated to the locations, but only latitude/longitude. This is the reason why we used also the second API.
* **GeoNames API**
This API provides the method **reverse** which converts easily a location (latitude/longitude) into a Location with City/State/Country information.

In [76]:
#Google Place API, use your own key
GOOGLE_API_KEY = 'NO_API_SPECIFIED'

google_places = GooglePlaces(GOOGLE_API_KEY)

#GeoName API, feel free to use our account
GEO_NAMES_ACCOUNT = "blip2"
geolocator = geopy.geocoders.GeoNames(username=GEO_NAMES_ACCOUNT)


We initialize a dictionnary with all universities of our list and "None" addresses. The next step will be to fill this dictionnary to translate Universities names to Locations.

In [57]:
uni_adresses_dict = {}
for university in universities:
    uni_adresses_dict[university] = geopy.location.Location(address="None")

## Finding adresses of Universities using APIs

We use our two API described above to translate Universities name into Location and save them in our dictionnary

In [58]:
if GOOGLE_API_KEY != 'NO_API_SPECIFIED':
    #Iterate through all universities
    for university in universities:
        #Iterate through all parts of universities name
        for keywords in university.split(" - "):
            if str(uni_adresses_dict[university].address) == "None":
                try:
                    #Remmove abbreviations (words ending with a dot) from keywords since Google Place doesn't handle it well
                    keywords = " ".join(filter(lambda x:x.endswith('.')==False, str(keywords).split()))
                    #Get google place associated to university name
                    query_result = google_places.text_search((keywords), location="Switzerland")
                    #If there is a google place, get its location using GeoNames reverse with the latitude/longitude
                    if len(query_result.places) > 0:
                        location = query_result.places[0].geo_location
                        location = geopy.point.Point(location['lat'], location['lng'])
                        address = geolocator.reverse(location)[0]
                        #Save the adress in the dictionary
                        uni_adresses_dict[university] = address
                except:
                    print("Google Exception")

### Saving/Restoring results

Since we use Google API and it has a limitation in the number of requests, we save the resulting dictionnary as csv file and read it in case we need to restore it.

In [59]:
def saveAddressDictToCSV(path, dict):
    with open(path, 'w+') as csv_file:
        writer = csv.writer(csv_file)
        for key, value in dict.items():
           writer.writerow([key, value.address, value.latitude, value.longitude])

In [3]:
def loadAddressDictFromCSV(path):
    with open(path, 'r') as csv_file:
        reader = csv.reader(csv_file)
        return dict([rows[0],geopy.location.Location(address=rows[1],point=geopy.point.Point(rows[2], rows[3]))] for rows in reader)

In [75]:
#Uncomment if you want to overwrite the CSV file
#saveAddressDictToCSV("data/universities_addresses_dict.csv", uni_adresses_dict)

In [63]:
uni_adresses_dict = loadAddressDictFromCSV("data/universities_addresses_dict.csv")

### Addresses check and cleaning

We now need to check that we have all addresses and good addresses.

For this purpose, we assume that all universities that found a match address in Switzerland are correct. These adresses have a **" CH,"** in their location. 

To have an idea, we start by counting the "Wrong adresses".

In [64]:
found = sum([1 for x in uni_adresses_dict.keys() if uni_adresses_dict[x].address.endswith(", CH")])
overall = sum([1 for x in uni_adresses_dict.keys()])
print(found, "address found over", overall)

70 address found over 76


Now that we have still these wrong addresses, we print them to have an idea of the possible problems

In [65]:
for university in uni_adresses_dict.keys():
    if uni_adresses_dict[university].address.endswith(", CH") == False:
        print("problem with", university, "located in:", uni_adresses_dict[university])

problem with Forschungskommission SAGW located in: None
problem with Weitere Spitäler - ASPIT located in: Høje Tåstrup, 17, DK
problem with Forschungsinstitut für Opthalmologie - IRO located in: Tübingen, 01, DE
problem with Weitere Institute - FINST located in: None
problem with Istituto Svizzero di Roma - ISR located in: Colonna, 07, IT
problem with Fernfachhochschule Schweiz (Mitglied SUPSI) - FFHS located in: Croatan Shores, NC, US


We observe the two entries : **Weitere Institute** and **Weitere Spitäler**, which in german means other institute and other hospitals. We decide also to drop these entries for the same reason.

*Note that Weitere Spitäler - ASPIT was wrongly assigned to an address in Denmark, because we tried in our loop all combination of the university name, ASPIT is a danish company"*

In [66]:
del uni_adresses_dict["Weitere Institute - FINST"]
del uni_adresses_dict["Weitere Spitäler - ASPIT"]

Concerning the **Istituto Svizzero di Roma - ISR** located in Italy, after checking on the internet, we conclude that this institute is in fact correctly located in Italy.   

Since the goal is to do a map representing grant accorded to universities and institue in Switzerland, we decide to drop this entry.

In [67]:
del uni_adresses_dict["Istituto Svizzero di Roma - ISR"]

For the **Forschungsinstitut für Opthalmologie - IRO**, after looking on the internet, we found that it is actually the german translation (from french) for "Institut de Recherche en Ophtalmologie" located in Bramois, VS. We then assign it the location of this institute.

In [68]:
if GOOGLE_API_KEY != 'NO_API_SPECIFIED':
    query_result = google_places.text_search("Institut de recherche en Ophtalmologie", location="Switzerland")
    location = query_result.places[0].geo_location
    location = geopy.point.Point(location['lat'], location['lng'])
    location = geolocator.reverse(location)[0]
    print(location)
    uni_adresses_dict["Forschungsinstitut für Opthalmologie - IRO"] = location

Préjeux, VS, CH


For the remaining two :
* Forschungskommission SAGW
* Fernfachhochschule Schweiz (Mitglied SUPSI) - FFHS 

The unknown/wrong addresses we get are due to the fact that we do not split properly the part of the name in our loop. We then have to look each of them separetely with the name we believe corresponds best

In [69]:
if GOOGLE_API_KEY != 'NO_API_SPECIFIED':
    #Forschungskommission SAGW
    query_result = google_places.text_search("SAGW", location="Switzerland")
    location = query_result.places[0].geo_location
    location = geopy.point.Point(location['lat'], location['lng'])
    location = geolocator.reverse(location)[0]
    print(location)
    uni_adresses_dict["Forschungskommission SAGW"] = location

    #Fernfachhochschule Schweiz (Mitglied SUPSI) - FFHS 
    query_result = google_places.text_search("Fernfachhochschule Schweiz", location="Switzerland")
    location = query_result.places[0].geo_location
    location = geopy.point.Point(location['lat'], location['lng'])
    location = geolocator.reverse(location)[0]
    print(location)
    uni_adresses_dict["Fernfachhochschule Schweiz (Mitglied SUPSI) - FFHS"] = location

Bern / Marzili, BE, CH
Brig, VS, CH


We can now check that we have addresses for all entries, located in Switzerland

In [70]:
found = sum([1 for x in uni_adresses_dict.keys() if uni_adresses_dict[x].address.endswith(", CH")])
overall = sum([1 for x in uni_adresses_dict.keys()])
print(found, "address found over", overall)

73 address found over 73


## Universities location vizualization

We can now simply vizualize universities location on a map to check that everything is fine. For this we load the topo JSON containing Cantons border and add markers using the locations we found.

In [19]:
ch_cantons = os.path.join('data', 'ch-cantons.topojson.json')

topo_json_data = json.load(open(ch_cantons))

Swiss_map = folium.Map([46.75, 8.25], zoom_start=8)

folium.TopoJson(open(ch_cantons),
                'objects.cantons',
                style_function=lambda feature: {
        'fillColor': '#ffff00',
        'color': 'black',
        'weight': 2,
        'dashArray': '5, 5'
    }).add_to(Swiss_map)

for university in uni_adresses_dict.keys():
    folium.Marker([uni_adresses_dict[university].latitude, uni_adresses_dict[university].longitude], popup=str(university)).add_to(Swiss_map)

Swiss_map.save('uni_location_map.html')    
    
Swiss_map

In [72]:
#Optional display if you dont have folium library installed
IFrame('uni_location_map.html', width=1200, height=600)

# [Here is a link to the html file of the map if you can't see it!](http://adaepfl.azurewebsites.net/uni_location_map.html)  
This looks quite good :)

### Save/load the clean entries

Now that our univerities-location dictionnary is clean, we can save it in order to reload it when needed without having to do other requests.

In [73]:
#Uncomment if you want to overwrite the CSV file
#saveAddressDictToCSV("data/universities_addresses_dict_clean.csv", uni_adresses_dict)

In [4]:
uni_adresses_dict = loadAddressDictFromCSV("data/universities_addresses_dict_clean.csv")

# Vizualization

## Computing total grant for each Canton

The main goal of the viz is to display the grant difference between all Cantons. For this, we need to compute the total amount granted to each Canton.    

Our first step is to convert the addresses found in the previous section to the Cantons, with official abbreviation (ZH for Zürich, VS for Valais, etc...)

In [5]:
uni_adresses_dict_dataframe = pd.DataFrame([uni_adresses_dict.keys(), uni_adresses_dict.values()]).transpose()
uni_adresses_dict_dataframe.columns = [['University', 'Address']]
uni_adresses_dict_dataframe['Address'] = uni_adresses_dict_dataframe['Address'].map(lambda x: str(x).split(', ')[1])
uni_adresses_dict_dataframe.columns = [['University', 'Canton']]
uni_adresses_dict_dataframe.head()

Unnamed: 0,University,Canton
0,Interkant. Hochschule für Heilpädagogik ZH - HfH,ZH
1,Schweizer Paraplegiker Forschung - SPF,LU
2,AO Research Institute - AORI,GR
3,Fernfachhochschule Schweiz (Mitglied SUPSI) - ...,VS
4,Schweiz. Institut für Kunstwissenschaft - SIK-...,ZH


We then join the universities, the cantons and sum their grant, and group by university, in order to get a DataFrame containing one row for each University with Cantons and total amount granted.   

The join will drop the Universities not present in our dictionnary (the ones that are not in Switzerland, or not defined)

In [13]:
uni_canton_amount = pd.merge(uni_adresses_dict_dataframe, grantExport[['University', 'Approved Amount']], how='inner', on='University')
uni_canton_amount['Approved Amount'] = uni_canton_amount['Approved Amount'].replace("data not included in P3", "0")
uni_canton_amount['Approved Amount'] = uni_canton_amount['Approved Amount'].astype(float)
uni_canton_amount = uni_canton_amount.groupby(('University','Canton')).sum().reset_index()
uni_canton_amount.head()

Unnamed: 0,University,Canton,Approved Amount
0,AO Research Institute - AORI,GR,3435621.0
1,Allergie- und Asthmaforschung - SIAF,GR,19169965.0
2,Berner Fachhochschule - BFH,BE,31028695.0
3,Biotechnologie Institut Thurgau - BITG,TG,2492535.0
4,Centre de rech. sur l'environnement alpin - CR...,VS,1567678.0


We described the total approved amount to display mean, min, max, etc...

In [14]:
uni_canton_amount.describe()

Unnamed: 0,Approved Amount
count,73.0
mean,175552000.0
std,450202200.0
min,8000.0
25%,1430686.0
50%,5067172.0
75%,42771910.0
max,1838237000.0


We sort the universities by total amount of grant, and we see that the University of Geneva is the one with the biggest total grant.

In [15]:
uni_canton_amount.sort_values('Approved Amount', ascending = False).head()

Unnamed: 0,University,Canton,Approved Amount
69,Université de Genève - GE,GE,1838237000.0
67,Universität Zürich - ZH,ZH,1826843000.0
6,ETH Zürich - ETHZ,ZH,1635597000.0
64,Universität Bern - BE,BE,1519373000.0
63,Universität Basel - BS,BS,1352251000.0


Finally, we group the University by Canton and get the total grant for each Canton. 

With sorting, we see that universities and institute in the Canton of Zürich has received the biggest amount of grant.

In [16]:
cantons_amount = uni_canton_amount.groupby('Canton').sum().reset_index()
cantons_amount.sort_values('Approved Amount', ascending = False).head()

Unnamed: 0,Canton,Approved Amount
20,ZH,3610851000.0
17,VD,2366920000.0
4,GE,1877102000.0
1,BE,1555148000.0
2,BS,1392498000.0


## Choropleth

We now have all information to display a Choropleth showing differences in Swiss Cantons in grant.

We start by creating a dictionnary resolving the total approved amount for each university, this will let us put Markers on the map and let the user click on a specific university in order to see the total amount granted for it.

In [17]:
uni_grant_dict = uni_canton_amount.set_index('University')['Approved Amount']

Now if we want to display the choropleth, we need to have the data for every Canton on the map, but there is actually  a few Cantons which do not appear in our data, since they have no university/institutes.  

To solve this, we will get the list of canton from the TopoJson and join it to our data, and put the amount 0 for the canton without University.

In [20]:
ch_cantons = os.path.join('data', 'ch-cantons.topojson.json')

topo_json_data = json.load(open(ch_cantons))

cantons = []

for geo in topo_json_data['objects']['cantons']['geometries']:
    cantons = cantons + [geo['id']]
cantons = pd.DataFrame(cantons, columns=['Canton'])

cantons_amount = pd.merge(cantons, cantons_amount[['Canton', 'Approved Amount']], how='outer', on='Canton')

cantons_amount = cantons_amount.fillna(0)

Now in order to display in a clearer way, we divide the amount by 1000'000, to have smaller numbers

In [21]:
cantons_amount_kchf = cantons_amount.copy()
cantons_amount_kchf['Approved Amount'] = (cantons_amount_kchf['Approved Amount']/1000000).astype(int)
cantons_amount_kchf.columns = ['Canton', 'Approved Amount [in kCHF]']

### Linear Choropleth

We can finally display the Map with the Choropleth.  

This will separate the groups linearly by amount, since there are huge differences in Cantons, this does not discriminate very well the ones with small amount.

The amounts are in millions of CHF.

In [22]:
Swiss_map = folium.Map([46.75, 8.25], zoom_start=8)

Swiss_map.choropleth(geo_path=ch_cantons, data=cantons_amount_kchf,
               columns=['Canton', 'Approved Amount [in kCHF]'],
               key_on='feature.id',
               fill_color='BuPu', fill_opacity=0.8, line_opacity=0.3, 
               legend_name='Amount Granted For each Canton (Millions of CHF)',
               topojson='objects.cantons')

for university in uni_adresses_dict.keys():
    uni_amount_granted = int(uni_grant_dict[university])
    folium.Marker([uni_adresses_dict[university].latitude, uni_adresses_dict[university].longitude], popup= (str(university + " Total granted amount in CHF : " + str(uni_amount_granted)))).add_to(Swiss_map)

Swiss_map.save('linear_canton_grant.html')
Swiss_map



In [24]:
#Optional display if you dont have folium library installed
IFrame('linear_canton_grant.html', width=1200, height=600)

# [Here is a link to the html file of the map if you can't see it!](http://adaepfl.azurewebsites.net/linear_canton_grant.html)  

### Logarithmic choropleth

In order to discriminate better the Cantons with small amount, we do another map with logarithmic threshold. We choose to use log byse 10, so the values will be easier to understand.  

We first apply log10 function to all amount (Since we have 0 values, we set them to 0 instead of applying log)

In [25]:
dataLog = pd.DataFrame(cantons_amount.copy())
dataLog['Approved Amount'] = dataLog['Approved Amount'].map(lambda x: math.log10(x) if x > 10 else 0)

Then we can simply display the map with **logarithmic base 10** values of amount granted, in CHF.

In [26]:
Swiss_map = folium.Map([46.75, 8.25], zoom_start=8)

Swiss_map.choropleth(geo_path=ch_cantons, data=dataLog,
               columns=['Canton', 'Approved Amount'],
               threshold_scale=tuple(np.linspace(0,10,6)),
               key_on='feature.id',
               fill_color='YlOrRd', fill_opacity=0.8, line_opacity=0.3, 
               legend_name='Amount Granted For each Canton (CHF)',
               topojson='objects.cantons')

for university in uni_adresses_dict.keys():
    uni_amount_granted = int(uni_grant_dict[university])
    folium.Marker([uni_adresses_dict[university].latitude, uni_adresses_dict[university].longitude], popup= (str(university + " Total granted amount in CHF : " + str(uni_amount_granted)))).add_to(Swiss_map)

Swiss_map.save('log_canton_grant.html')
Swiss_map

In [92]:
#Optional display if you dont have folium library installed
IFrame('log_canton_grant.html', width=1200, height=600)

# [Here is a link to the html file of the map if you can't see it!](http://adaepfl.azurewebsites.net/log_canton_grant.html)  

We can observe now that Cantons with smaller amounts are now well differentiate, thanks to the log, this gives a better unerstanding of which Cantons belong to the group of receiver of big, medium or small grant.

Here is an explanation of the values display : 
* Cantons with values 0-2 received 0 CHF in grant
* Cantons with values 2-4 received between 100 and 10,000 CHF
* Cantons with values 4-6 received between 10,000 and 1,000,000 CHF
* Cantons with values 6-8 received between 1,000,000 and 100,000,000 CHF
* Cantons with values 8-10 received between 100,000,000 and 10,000,000,000 CHF



# Bonus

We need to estimate the difference between areas divided by the Röstigraben. the three areas are "Suisse Romande", "Schwyzerdütsch" and "Svizzera Italiana", where people speak respectively french, german (kind of) and italian.

For this, we first need to separate our Cantons according to the languages. Since some Cantons are in two different areas (Valais, Fribourg, Grison and Bern) of the Röstigraben, we need to separate each University by area.   

We then first create a dataframe linking each university with the Area. All universities in Geneve, Vaud, Jura and Neuchatel are considered as part of the area "Suisse Romande". 

We checked on our map and we observed that from all the four Cantons splitted in two areas, only Valais has universities on both side (see maps above).

We then assigned the two swiss-german university manually to the area "Schwyzerdütsch", and the rest in "Suisse Romande". We assigned all uni of Fribourg to "Suisse Romande" and all of Bern and Grison to "Schwyzerdütsch".

We also assign universites of Ticino to "Svizzera Italiana", and all the rest to "Schwyzerdütsch".

In [93]:
uni_zone_dict = {}
for university in uni_adresses_dict_dataframe['University']:
    canton = uni_canton_amount[uni_canton_amount.University == university]['Canton']
    if(canton.item() == 'GE' or canton.item() == 'VD' or canton.item() == 'FR' or canton.item() == 'NE' or canton.item() == 'JU'):
        uni_zone_dict[university] = 'Suisse Romande'
    elif(canton.item() == 'VS'):
        if(university == 'Pädagogische Hochschule Wallis - PHVS' or university == 'Fernfachhochschule Schweiz (Mitglied SUPSI) - FFHS'):
            uni_zone_dict[university] = 'Schwyzerdütsch'
        else:
            uni_zone_dict[university] = 'Suisse Romande'
    elif(canton.item() == 'TI'):
        uni_zone_dict[university] = 'Svizzera Italiana'
    else:
        uni_zone_dict[university] = 'Schwyzerdütsch'
uni_zone = pd.DataFrame([uni_zone_dict.keys(), uni_zone_dict.values()]).transpose()
uni_zone.columns = ['University', 'Region']
uni_zone.head()

Unnamed: 0,University,Region
0,HES de Suisse occidentale - HES-SO,Suisse Romande
1,Pädagogische Hochschule Zürich - PHZFH,Schwyzerdütsch
2,Università della Svizzera italiana - USI,Svizzera Italiana
3,"Eidg. Forschungsanstalt für Wald,Schnee,Land -...",Schwyzerdütsch
4,Pädagogische Hochschule Graubünden - PHGR,Schwyzerdütsch


We now join the universities, areas, canton and amount

In [94]:
uni_canton_amount_region = pd.merge(uni_zone, uni_canton_amount, how='outer', on='University')
uni_canton_amount_region.head()

Unnamed: 0,University,Region,Canton,Approved Amount
0,HES de Suisse occidentale - HES-SO,Suisse Romande,JU,34162965.46
1,Pädagogische Hochschule Zürich - PHZFH,Schwyzerdütsch,ZH,3298346.0
2,Università della Svizzera italiana - USI,Svizzera Italiana,TI,84970554.75
3,"Eidg. Forschungsanstalt für Wald,Schnee,Land -...",Schwyzerdütsch,NW,48360389.63
4,Pädagogische Hochschule Graubünden - PHGR,Schwyzerdütsch,GR,614613.0


And we can finally simply display the amount for each areas

In [95]:
region_amount = uni_canton_amount_region[['Region','Approved Amount']]
region_amount_groupby = region_amount.groupby('Region').sum()
region_amount_groupby = region_amount_groupby.reset_index()
region_amount_groupby

Unnamed: 0,Region,Approved Amount
0,Schwyzerdütsch,7531044000.0
1,Suisse Romande,5168990000.0
2,Svizzera Italiana,115262300.0


According to wikipedia https://fr.wikipedia.org/wiki/Langues_en_Suisse#Usage_des_langues_nationales_par_la_population_immigrante, the areas Schwyzerdütsch, Suisse Romande and Svizzera Italiana represents respectively 72.5%, 21.0%, 4.3% of the local population.

Therefore, we can compute a (very) rough estimate of the relative grant according to population and find the above result. (We use 8 million for Switzerland's population)

In [96]:
relative_region_amount_groupby = region_amount_groupby.copy()
relative_region_amount_groupby['Approved Amount'] = relative_region_amount_groupby['Approved Amount'] / 8000000/ [0.725, 0.21, 0.043]
relative_region_amount_groupby.columns = ['Region', 'Approved Amount in CHF per people']
relative_region_amount_groupby

Unnamed: 0,Region,Approved Amount in CHF per people
0,Schwyzerdütsch,1298.455816
1,Suisse Romande,3076.77969
2,Svizzera Italiana,335.064883


We see that there is a clear disparity according to regions, Suisse romande seems to receive almost 10 times more grant than Svizzera Italiana relatively to its population, and more than twice than Schwyzerdütsch.

**But there are a lot of factors that can explain this difference, this is by no mean a significant statistic.**
