# Grant Money Interactive Visualization in Switzerland

## Assignment
Build a Choropleth map which shows intuitively (i.e., use colors wisely) how much grant money goes to each Swiss canton. To do so, you will need to use the provided TopoJSON file, combined with the Choropleth map example you can find in the Folium README file. Click [here](https://github.com/ADAEPFL/Homework/tree/master/03%20-%20Interactive%20Viz) for more details.

## Data Processing
First we will need to work on the provided data from [P3](http://p3.snf.ch/Pages/DataAndDocumentation.aspx). 

The goal is to extract the amount of money granted to each Swiss canton. We will need to map each project with a canton, in order to provide well formated data to [Folium](https://github.com/python-visualization/folium), the tool which we will use to create the final visualization.

We will also use Folium during the data processing pipeline in order to check the validity of the information extracted, essentially to check if the extracted location is actually in Switzerland. 

In [10]:
# Import some useful modules
import pandas as pd
import folium as fl
import requests as rq
import numpy as np
import json

Let's see what does the data look like : 

In [99]:
# Import the data as panda DataFrame
projects = pd.read_csv('./P3_GrantExport.csv', sep=';')
projects.head(3)

Unnamed: 0,"﻿""Project Number""",Project Title,Project Title English,Responsible Applicant,Funding Instrument,Funding Instrument Hierarchy,Institution,University,Discipline Number,Discipline Name,Discipline Name Hierarchy,Start Date,End Date,Approved Amount,Keywords
0,1,Schlussband (Bd. VI) der Jacob Burckhardt-Biog...,,Kaegi Werner,Project funding (Div. I-III),Project funding,,Nicht zuteilbar - NA,10302,Swiss history,Human and Social Sciences;Theology & religious...,01.10.1975,30.09.1976,11619.0,
1,4,Batterie de tests à l'usage des enseignants po...,,Massarenti Léonard,Project funding (Div. I-III),Project funding,Faculté de Psychologie et des Sciences de l'Ed...,Université de Genève - GE,10104,Educational science and Pedagogy,"Human and Social Sciences;Psychology, educatio...",01.10.1975,30.09.1976,41022.0,
2,5,"Kritische Erstausgabe der ""Evidentiae contra D...",,Kommission für das Corpus philosophorum medii ...,Project funding (Div. I-III),Project funding,Kommission für das Corpus philosophorum medii ...,"NPO (Biblioth., Museen, Verwalt.) - NPO",10101,Philosophy,Human and Social Sciences;Linguistics and lite...,01.03.1976,28.02.1985,79732.0,


We can get rid of all the information we don't need for what we want to visualize : 

In [100]:
# Drop the columns we don't need
projects = projects[['University','Approved Amount']]

# Is the university always non null, what about the amount ?
print('Is the University column non null ?', projects['University'].notnull().all())
print('Is the Amount column non null ?', projects['Approved Amount'].notnull().all())

# The amount is never null, so we drop the rows with a null University
projects = projects[projects['University'].notnull()]
print('Is the University column non null ?', projects['University'].notnull().all())

projects.head(3)

Is the University column non null ? False
Is the Amount column non null ? True
Is the University column non null ? True


Unnamed: 0,University,Approved Amount
0,Nicht zuteilbar - NA,11619.0
1,Université de Genève - GE,41022.0
2,"NPO (Biblioth., Museen, Verwalt.) - NPO",79732.0


Now it looks better.

The next step is to map each project with a Swiss canton. We will try to use as much as possible the [geonames web services](http://www.geonames.org/export/web-services.html) to extract the location from the plain text university name.

In [101]:
# Extract the unique values for University
projects['University'].value_counts()

Universität Zürich - ZH                               6774
Université de Genève - GE                             6394
ETH Zürich - ETHZ                                     6153
Universität Bern - BE                                 5473
Universität Basel - BS                                4746
EPF Lausanne - EPFL                                   4428
Université de Lausanne - LA                           4092
Nicht zuteilbar - NA                                  2595
Université de Fribourg - FR                           2079
Université de Neuchâtel - NE                          1596
NPO (Biblioth., Museen, Verwalt.) - NPO               1473
Paul Scherrer Institut - PSI                           538
Firmen/Privatwirtschaft - FP                           492
Universität St. Gallen - SG                            426
Università della Svizzera italiana - USI               346
Eidg. Anstalt für Wasserversorgung - EAWAG             333
HES de Suisse occidentale - HES-SO                     2

<b> From University to Canton </b>

Now we need to go from the University to the Canton name. For this purpose, we use the <i>Geonames Full Text Search API</i>. The arguments provided in the request are the following:
- q = Univerty name (q searches over all the attributes of a place)
- country = Switzerland (the map we wish to create should represent Switzerland only, but some institutions listed in the data are located outside of the country) 
- username

In [64]:
geonames_url = 'http://api.geonames.org/searchJSON'

# Every single university
unis = projects['University'].unique()

# Map a university to a canton
uni_to_canton = {}

for uni in unis:
    
    # perform the request
    geo_param = {'q': uni, 'username': 'pnicolet', 'country': 'CH'}
    r = rq.get(geonames_url, geo_param)
    result = r.json()

    # Extract the canton from the response
    geonames = result.get('geonames', None)
    
    if len(geonames) > 0:
        uni_to_canton[uni] = geonames[0].get('adminCode1')
    else:
        uni_to_canton[uni] = None
    
uni_to_canton

{'AO Research Institute - AORI': None,
 'Allergie- und Asthmaforschung - SIAF': None,
 'Berner Fachhochschule - BFH': None,
 'Biotechnologie Institut Thurgau - BITG': None,
 "Centre de rech. sur l'environnement alpin - CREALP": None,
 'EPF Lausanne - EPFL': None,
 'ETH Zürich - ETHZ': None,
 'Eidg. Anstalt für Wasserversorgung - EAWAG': None,
 'Eidg. Forschungsanstalt für Wald,Schnee,Land - WSL': None,
 'Eidg. Hochschulinstitut für Berufsbildung - EHB': None,
 'Eidg. Material und Prüfungsanstalt - EMPA': None,
 'Ente Ospedaliero Cantonale - EOC': None,
 'Fachhochschule Kalaidos - FHKD': None,
 'Fachhochschule Nordwestschweiz (ohne PH) - FHNW': None,
 'Fachhochschule Ostschweiz - FHO': None,
 'Facoltà di Teologia di Lugano - FTL': None,
 'Fernfachhochschule Schweiz (Mitglied SUPSI) - FFHS': None,
 'Firmen/Privatwirtschaft - FP': None,
 'Forschungsanstalten Agroscope - AGS': None,
 'Forschungsinstitut für Opthalmologie - IRO': None,
 'Forschungsinstitut für biologischen Landbau - FIBL': 

We choose not to go through the intermediate visualization step. Indeed, the 'country' parameter has already filtered out all possibilities of having locations outside of Switzerland.

Map the university to the canton in the dataframe : 

In [102]:
projects['University'] = projects['University'].map(uni_to_canton)
projects.rename(columns={'University' : 'Canton'}, inplace=True)
projects.head(10)

Unnamed: 0,Canton,Approved Amount
0,,11619.0
1,,41022.0
2,,79732.0
3,BS,52627.0
4,,120042.0
5,FR,53009.0
6,FR,25403.0
7,ZH,47100.0
8,,25814.0
9,,360000.0


Now compute the amount by canton

In [103]:
# Try to convert to integer, if not possible return 0
def to_int(x):
    try:
        return float(x)
    except:
        return 0

projects['Approved Amount'] = projects['Approved Amount'].apply(to_int)
projects = projects.groupby('Canton').sum()
projects

Unnamed: 0_level_0,Approved Amount
Canton,Unnamed: 1_level_1
BE,1519373000.0
BS,1352251000.0
FR,457526200.0
NE,383204600.0
ZH,1826843000.0


The dataframe we printed above shows the approved amount for each Canton. However, some cantons do not have any entry. Hence, we add those cantons to the dataframe.

In [75]:
# Read the topojson file
with open('./ch-cantons.topojson.json') as f:
    topo = json.load(f)

# Extract all the cantons IDs
cantons_ids = []
for geo in topo['objects']['cantons']['geometries']:
    cantons_ids.append(geo['id'])
    
cantons_ids

['ZH',
 'BE',
 'LU',
 'UR',
 'SZ',
 'OW',
 'NW',
 'GL',
 'ZG',
 'FR',
 'SO',
 'BS',
 'BL',
 'SH',
 'AR',
 'AI',
 'SG',
 'GR',
 'AG',
 'TG',
 'TI',
 'VD',
 'VS',
 'NE',
 'GE',
 'JU']

In [114]:
nul_df = pd.DataFrame(cantons_ids, columns=['Canton']).set_index('Canton')
nul_df['Approved Amount'] = 0
amounts = pd.concat([projects, nul_df]).reset_index().groupby('Canton').sum()
amounts

Unnamed: 0_level_0,Approved Amount
Canton,Unnamed: 1_level_1
AG,0.0
AI,0.0
AR,0.0
BE,1519373000.0
BL,0.0
BS,1352251000.0
FR,457526200.0
GE,0.0
GL,0.0
GR,0.0


## Data Vizualization

In [118]:
cantons_geo = r'./ch-cantons.topojson.json'

#Let Folium determine the scale
swiss_map = fl.Map(location=[46.8182, 8.2275], zoom_start=8)
swiss_map.choropleth(geo_path=cantons_geo, 
                     data=amounts,
                     columns=['Canton', 'Approved Amount'],
                     key_on='feature.id',
                     topojson='objects.cantons',
                     fill_color='YlGn',
                     legend_name = 'Random numbers'
                    )
swiss_map.save('canton_budget.html')

KeyError: 'Canton'