# Grant Money Interactive Visualization in Switzerland

## Assignment
Build a Choropleth map which shows intuitively (i.e., use colors wisely) how much grant money goes to each Swiss canton. To do so, you will need to use the provided TopoJSON file, combined with the Choropleth map example you can find in the Folium README file. Click [here](https://github.com/ADAEPFL/Homework/tree/master/03%20-%20Interactive%20Viz) for more details.

## Data Processing
First we will need to work on the provided data from [P3](http://p3.snf.ch/Pages/DataAndDocumentation.aspx). 

The goal is to extract the amount of money granted to each Swiss canton. We will need to map each project with a canton, in order to provide well formated data to [Folium](https://github.com/python-visualization/folium), the tool which we will use to create the final visualization.

We will also use Folium during the data processing pipeline in order to check the validity of the information extracted, essentially to check if the extracted location is actually in Switzerland. 

In [158]:
# Import some useful modules
import pandas as pd
import folium as fl
import requests as rq
import numpy as np
import json

Let's see what does the data look like : 

In [159]:
# Import the data as panda DataFrame
projects = pd.read_csv('./P3_GrantExport.csv', sep=';')
projects.head(3)

Unnamed: 0,"﻿""Project Number""",Project Title,Project Title English,Responsible Applicant,Funding Instrument,Funding Instrument Hierarchy,Institution,University,Discipline Number,Discipline Name,Discipline Name Hierarchy,Start Date,End Date,Approved Amount,Keywords
0,1,Schlussband (Bd. VI) der Jacob Burckhardt-Biog...,,Kaegi Werner,Project funding (Div. I-III),Project funding,,Nicht zuteilbar - NA,10302,Swiss history,Human and Social Sciences;Theology & religious...,01.10.1975,30.09.1976,11619.0,
1,4,Batterie de tests à l'usage des enseignants po...,,Massarenti Léonard,Project funding (Div. I-III),Project funding,Faculté de Psychologie et des Sciences de l'Ed...,Université de Genève - GE,10104,Educational science and Pedagogy,"Human and Social Sciences;Psychology, educatio...",01.10.1975,30.09.1976,41022.0,
2,5,"Kritische Erstausgabe der ""Evidentiae contra D...",,Kommission für das Corpus philosophorum medii ...,Project funding (Div. I-III),Project funding,Kommission für das Corpus philosophorum medii ...,"NPO (Biblioth., Museen, Verwalt.) - NPO",10101,Philosophy,Human and Social Sciences;Linguistics and lite...,01.03.1976,28.02.1985,79732.0,


We can get rid of all the information we don't need for what we want to visualize : 

In [160]:
# Drop the columns we don't need
projects = projects.drop(['Project Title', 'Project Title English', 'Responsible Applicant', 
                         'Funding Instrument', 'Funding Instrument Hierarchy', 'Institution', 
                         'Discipline Number', 'Discipline Name', 'Discipline Name Hierarchy', 
                         'Start Date', 'End Date', 'Keywords'], axis=1)

# Is the university always non null, what about the amount ?
print('Is the University column non null ?', projects['University'].notnull().all())
print('Is the Amount column non null ?', projects['Approved Amount'].notnull().all())

# The amount is never null, so we drop the rows with a null University
projects = projects[projects['University'].notnull()]
print('Is the University column non null ?', projects['University'].notnull().all())

projects.head(3)

Is the University column non null ? False
Is the Amount column non null ? True
Is the University column non null ? True


Unnamed: 0,"﻿""Project Number""",University,Approved Amount
0,1,Nicht zuteilbar - NA,11619.0
1,4,Université de Genève - GE,41022.0
2,5,"NPO (Biblioth., Museen, Verwalt.) - NPO",79732.0


Now it looks better.

The next step is to map each project with a Swiss canton. We will try to use as much as possible the [geonames web services](http://www.geonames.org/export/web-services.html) to extract the location from the plain text university name.

In [161]:
# Extract the unique values for University
projects['University'].value_counts()

Universität Zürich - ZH                               6774
Université de Genève - GE                             6394
ETH Zürich - ETHZ                                     6153
Universität Bern - BE                                 5473
Universität Basel - BS                                4746
EPF Lausanne - EPFL                                   4428
Université de Lausanne - LA                           4092
Nicht zuteilbar - NA                                  2595
Université de Fribourg - FR                           2079
Université de Neuchâtel - NE                          1596
NPO (Biblioth., Museen, Verwalt.) - NPO               1473
Paul Scherrer Institut - PSI                           538
Firmen/Privatwirtschaft - FP                           492
Universität St. Gallen - SG                            426
Università della Svizzera italiana - USI               346
Eidg. Anstalt für Wasserversorgung - EAWAG             333
HES de Suisse occidentale - HES-SO                     2

<b> From University to Canton </b>

Now we need to go from the University to the Canton name. For this purpose, we use the <i>Geonames Full Text Search API</i>. The arguments provided in the request are the following:
- q = Univerty name (q searches over all the attributes of a place)
- country = Switzerland (the map we wish to create should represent Switzerland only, but some institutions listed in the data are located outside of the country) 
- username

In [162]:
geonames_url = 'http://api.geonames.org/searchJSON'

# Every single university
universities = projects['University'].unique()

# Mapping from university to canton
university_canton = {}

for university in universities:
    
    # perform the request
    geo_param = {'q': university, 'username': 'pnicolet', 'country': 'CH'}
    r = rq.get(geonames_url, geo_param)
    result = r.json()

    # Extract the canton from the response
    geonames = result.get('geonames', None)
    
    canton = None
    if len(geonames) > 0:
        canton = geonames[0].get('adminCode1')

    university_to_canton[university] = canton

We choose not to go through the intermediate visualization step. Indeed, the 'country' parameter has already filtered out all possibilities of having locations outside of Switzerland.

In [163]:
# This will be useful to do a dictionary [canton ID => amount of money] at the end
# Read the topoJson file
with open('./ch-cantons.topojson.json') as f:
    topo = json.load(f)

# Extract all the cantons IDs
cantons_ids = []
for geo in topo['objects']['cantons']['geometries']:
    cantons_ids.append(geo['id'])
    
cantons_ids

['ZH',
 'BE',
 'LU',
 'UR',
 'SZ',
 'OW',
 'NW',
 'GL',
 'ZG',
 'FR',
 'SO',
 'BS',
 'BL',
 'SH',
 'AR',
 'AI',
 'SG',
 'GR',
 'AG',
 'TG',
 'TI',
 'VD',
 'VS',
 'NE',
 'GE',
 'JU']

We now get the total approved amount per canton.

In [191]:
projects['Canton'] = pd.DataFrame(projects['University']).applymap(university_to_canton.get)
canton_appr_amt = projects.drop(projects.columns[[0, 1]], axis=1)
canton_appr_amt['Approved Amount'] = canton_appr_amt['Approved Amount'].astype(int)
canton_appr_amt = canton_appr_amt.groupby('Canton').sum()
canton_appr_amt = canton_appr_amt.reset_index()


ValueError: invalid literal for int() with base 10: '11619.00'

## Data Vizualization

In [181]:
canton_geo = r'./ch-cantons.topojson.json'

#Let Folium determine the scale
map = fl.Map(location=[46.8182, 8.2275], zoom_start=8)
map.choropleth(geo_path=canton_geo, data=canton_appr_amt,
            columns=['Canton', 'Approved Amount'],
             key_on='feature.id',
             fill_color='YlGn', fill_opacity=0.7, line_opacity=0.2,
             legend_name='Approved Amount per Canton')
map.save('canton_budget.html')



TypeError: can't multiply sequence by non-int of type 'float'