# Building a choropleth map

In [49]:
%matplotlib inline
import pandas as pd
import numpy as np
import time
import glob
import matplotlib.pyplot as plt
import folium
import geocoder

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Loading & wrangling the data 

In [32]:
data_path = 'data/P3_GrantExport.xlsx'
grant_data = pd.read_excel(data_path)

We start by selecting the features we're interested in. Obviously, we need the grant given to a project. We also want the University (from which we will retrieve the canton) and the Institution (since for some projects one is missing, but we're somehow confident we can retrieve the canton from either of those two). At last, we decide for now to save the reason a project got a funding, without being sure whether or not it will be usefull.



In [33]:
grant_data = grant_data[["Approved Amount", "University", "Institution","Funding Instrument Hierarchy"]]
grant_data.count()

Approved Amount                 63967
University                      50988
Institution                     58831
Funding Instrument Hierarchy    62915
dtype: int64

For now we did not remove any entries. A more thorough visualization gave us the confirmation that we should though. We start by removing entries for which the Approved Amount is not a number, since we won't be able to use those data in this study.

In [34]:
grant_data = grant_data[grant_data['Approved Amount'].apply(lambda x: str(x).isdigit())]
grant_data.count()

Approved Amount                 52663
University                      50487
Institution                     48205
Funding Instrument Hierarchy    51620
dtype: int64

As we can see we already removed about 11K entries, but we can do better. Indeed, we will use the Google's API to link a university name or an institution to a canton. So far we are confident that we can find the canton from either of those two. That means however, that we cannot treat data that miss both those values.

In [44]:
grant_data = grant_data.dropna(subset=["Institution", "University"], how="all")
grant_data.count()

Approved Amount                 51843
University                      50487
Institution                     48205
Funding Instrument Hierarchy    50800
dtype: int64

We removed another 1K entries, and should now be ready to add the canton feature to the dataframe.

## Linking the University name to a Swiss canton

In [45]:
geo_str = 'ch-cantons.topojson.json'

ch_map = folium.Map(location=[46.6430788,8.018626], tiles='Mapbox Bright', zoom_start=7)
ch_map.choropleth(geo_path=geo_str)
ch_map.save('ch_map.html')

In [46]:
%%HTML
<iframe width='100%' height="350" src="ch_map.html"></iframe>

In [50]:
g = geocoder.google('Geneve')
print(g.state)
print(g.country_long=='Switzerland')

GE
True
