# Table of Contents
<p>
<div class="lev0"><a href="#Interactive Viz"><span class="toc-item-num">1&nbsp;&nbsp;</span>Interactive Viz</a></div>
<div class="lev1"><a href="#Getting-the-data"><span class="toc-item-num">2&nbsp;&nbsp;</span>Getting the data</a></div>
<div class="lev2"><a href="#Requesting-ISA-form"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Requesting ISA form</a></div>





# Interactive Viz

Build a Choropleth map which shows intuitively (i.e., use colors wisely) how much grant money goes to each Swiss canton. To do so, you will need to use the provided TopoJSON file, combined with the Choropleth map example you can find in the Folium README file.  

HINT: the P3 database is formed by entries which assign a grant (and its approved amount) to a University name.  

Therefore you will need a smart strategy to go from University to Canton name. The Geonames Full Text Search API in JSON can help you with this -- try to use it as much as possible to build the canton mappings that you need. For those universities for which you cannot find a mapping via the API, you are then allowed to build it manually -- feel free to stop by the time you mapped the top-95% of the universities. I also recommend you to use an intermediate viz step for debugging purposes, showing all the universties as markers in your map (e.g., if you don't select the right results from the Geonames API, some of your markers might be placed on nearby countries...)

BONUS: using the map you have just built, and the geographical information contained in it, could you give a rough estimate of the difference in research funding between the areas divided by the Röstigraben?   

HINT: for those cantons cut through by the Röstigraben, this viz can be helpful!


In [1]:
%matplotlib inline
import pandas as pd
import numpy as np
import scipy.stats as stats
import folium
import os
import json
import geopy
import geocoder
import time
import csv
from geopy.geocoders import Nominatim

# Reading data

We read the data from P3 Grant csv file.

In [2]:
grantExport = pd.read_csv("data/P3_GrantExport.csv", sep=';')
grantExport = grantExport.fillna("")
grantExport.head(5)

Unnamed: 0,"﻿""Project Number""",Project Title,Project Title English,Responsible Applicant,Funding Instrument,Funding Instrument Hierarchy,Institution,University,Discipline Number,Discipline Name,Discipline Name Hierarchy,Start Date,End Date,Approved Amount,Keywords
0,1,Schlussband (Bd. VI) der Jacob Burckhardt-Biog...,,Kaegi Werner,Project funding (Div. I-III),Project funding,,Nicht zuteilbar - NA,10302,Swiss history,Human and Social Sciences;Theology & religious...,01.10.1975,30.09.1976,11619.0,
1,4,Batterie de tests à l'usage des enseignants po...,,Massarenti Léonard,Project funding (Div. I-III),Project funding,Faculté de Psychologie et des Sciences de l'Ed...,Université de Genève - GE,10104,Educational science and Pedagogy,"Human and Social Sciences;Psychology, educatio...",01.10.1975,30.09.1976,41022.0,
2,5,"Kritische Erstausgabe der ""Evidentiae contra D...",,Kommission für das Corpus philosophorum medii ...,Project funding (Div. I-III),Project funding,Kommission für das Corpus philosophorum medii ...,"NPO (Biblioth., Museen, Verwalt.) - NPO",10101,Philosophy,Human and Social Sciences;Linguistics and lite...,01.03.1976,28.02.1985,79732.0,
3,6,Katalog der datierten Handschriften in der Sch...,,Burckhardt Max,Project funding (Div. I-III),Project funding,Abt. Handschriften und Alte Drucke Bibliothek ...,Universität Basel - BS,10302,Swiss history,Human and Social Sciences;Theology & religious...,01.10.1975,30.09.1976,52627.0,
4,7,Wissenschaftliche Mitarbeit am Thesaurus Lingu...,,Schweiz. Thesauruskommission,Project funding (Div. I-III),Project funding,Schweiz. Thesauruskommission,"NPO (Biblioth., Museen, Verwalt.) - NPO",10303,Ancient history and Classical studies,Human and Social Sciences;Theology & religious...,01.01.1976,30.04.1978,120042.0,


## Linking Universities to Cantons

We fist extract the universities from the data and get only unique values.

In [4]:
universities = grantExport['University'].unique()
universities

array(['Nicht zuteilbar - NA', 'Université de Genève - GE',
       'NPO (Biblioth., Museen, Verwalt.) - NPO', 'Universität Basel - BS',
       'Université de Fribourg - FR', 'Universität Zürich - ZH',
       'Université de Lausanne - LA', 'Universität Bern - BE',
       'Eidg. Forschungsanstalt für Wald,Schnee,Land - WSL',
       'Université de Neuchâtel - NE', 'ETH Zürich - ETHZ',
       'Inst. de Hautes Etudes Internat. et du Dév - IHEID',
       'Universität St. Gallen - SG', 'Weitere Institute - FINST',
       'Firmen/Privatwirtschaft - FP',
       'Pädagogische Hochschule Graubünden - PHGR', 'EPF Lausanne - EPFL',
       'Pädagogische Hochschule Zürich - PHZFH', 'Universität Luzern - LU',
       'Schweiz. Institut für Kunstwissenschaft - SIK-ISEA',
       'SUP della Svizzera italiana - SUPSI',
       'HES de Suisse occidentale - HES-SO',
       'Robert Walser-Stiftung Bern - RWS', 'Paul Scherrer Institut - PSI',
       'Pädagogische Hochschule St. Gallen - PHSG',
       'Eidg. Ans

In [190]:
#pip install python-google-places

from googleplaces import GooglePlaces, types, lang
GOOGLE_API_KEY = 'MY_GOOGLE_API_KEY__PLEASE_USE_YOURS'
google_places = GooglePlaces(GOOGLE_API_KEY)

GEO_NAMES_ACCOUNT = "blip2"
geolocator = geopy.geocoders.GeoNames(username=GEO_NAMES_ACCOUNT)


We initialize a dictionnary with all universities of our list and "None" addresses.

In [172]:
uni_adresses_dict = {}
for university in universities:
    uni_adresses_dict[university] = geopy.location.Location(address="None")

### Finding adresses of Universities using APIs

In order to find the univerisites addresses, we combined two APIs. 
* **Google Places API**
This API provides a **text_search** method which has a really good performance in finding our universities and institute adresses. Unfortunately this method does not give us directly the Cantons associated to the locations, but only latitude/longitude. This is the reason why we used also the second API.
* **GeoNames API**
This API provides the method **reverse** which converts easily a location (latitude/longitude) into a Location with City/State/Country information.

In [173]:
#Iterate through all universities
for university in universities:
    #Iterate through all parts of universities name
    for i in range(0, len(university.split(" - "))):
        if str(uni_adresses_dict[university].address) == "None":
            try:
                #Get google place associated to university name
                query_result = google_places.text_search(str(university.split(" - ")[i]), location="Switzerland")
                #If there is a google place, get its location using GeoNames reverse with the latitude/longitude
                if len(query_result.places) > 0:
                    location = query_result.places[0].geo_location
                    location = geopy.point.Point(location['lat'], location['lng'])
                    address = geolocator.reverse(location)[0]
                    #Save the adress in the dictionary
                    uni_adresses_dict[university] = address
            except:
                print("Google Exception")

Google Exception


### Saving/Restoring results

Since we use Google API and it has a limitation in the number of requests, we save the resulting dictionnary as csv file and read it in case we need to restore it.

In [174]:
def saveAddressDictToCSV(path, dict):
    with open(path, 'w+') as csv_file:
        writer = csv.writer(csv_file)
        for key, value in dict.items():
           writer.writerow([key, value.address, value.latitude, value.longitude])

In [175]:
def loadAddressDictFromCSV(path):
    with open(path, 'r') as csv_file:
        reader = csv.reader(csv_file)
        return dict([rows[0],geopy.location.Location(address=rows[1],point=geopy.point.Point(rows[2], rows[3]))] for rows in reader)

In [176]:
saveAddressDictToCSV("data/universities_addresses_dict.csv", uni_adresses_dict)

In [177]:
uni_adresses_dict = loadAddressDictFromCSV("data/universities_addresses_dict.csv")

### Addresses check and cleaning

We now need to check that we have all addresses and good addresses.

For this purpose, we assume that all universities that found a match address in Switzerland are correct. These adresses have a **" CH,"** in their location. 

To have an idea, we start by counting the "Wrong adresses".

In [178]:
found = sum([1 for x in uni_adresses_dict.keys() if uni_adresses_dict[x].address.endswith(", CH")])
overall = sum([1 for x in uni_adresses_dict.keys()])
print(found, "address found over", overall)

69 address found over 78


Now that we have still these wrong addresses, we print them to have an idea of the possible problems

In [179]:
for university in uni_adresses_dict.keys():
    if uni_adresses_dict[university].address.endswith(", CH") == False:
        print("problem with", university, "located in:", uni_adresses_dict[university])

problem with  located in: None
problem with Forschungskommission SAGW located in: None
problem with Weitere Spitäler - ASPIT located in: Høje Tåstrup, 17, DK
problem with Forschungsinstitut für Opthalmologie - IRO located in: None
problem with Istituto Svizzero di Roma - ISR located in: Colonna, 07, IT
problem with Weitere Institute - FINST located in: None
problem with Staatsunabh. Theologische Hochschule Basel - STHB located in: Merdekalio, 30, ID
problem with Fernfachhochschule Schweiz (Mitglied SUPSI) - FFHS located in: Croatan Shores, NC, US
problem with Physikal.-Meteorolog. Observatorium Davos - PMOD located in: None


We observe first that there is an empty university. This is due to a missing entry in the University name in the data. We then decide to delete this field since it cannot be associated to an address

In [180]:
del uni_adresses_dict[""]

Second, we observe the two entries : **Weitere Institute** and **Weitere Spitäler**, which in german means other institute and other hospitals. We decide also to drop these entries for the same reason.

*Note that Weitere Spitäler - ASPIT was wrongly assigned to an address in Denmark, because we tried in our loop all combination of the university name, ASPIT is a danish company"*

In [181]:
del uni_adresses_dict["Weitere Institute - FINST"]
del uni_adresses_dict["Weitere Spitäler - ASPIT"]

Concerning the **Istituto Svizzero di Roma - ISR** located in Italy, after checking on the internet, we conclude that this institute is in fact correctly located in Italy.   

Since the goal is to do a map representing grant accorded to universities and institue in Switzerland, we decide to drop this entry.

In [182]:
del uni_adresses_dict["Istituto Svizzero di Roma - ISR"]

For the **Forschungsinstitut für Opthalmologie - IRO**, after looking on the internet, we found that it is actually the german translation (from french) for "Institut de Recherche en Ophtalmologie" located in Bramois, VS. We then assign it the location of this institute.

In [183]:
query_result = google_places.text_search("Institut de recherche en Ophtalmologie", location="Switzerland")
location = query_result.places[0].geo_location
location = geopy.point.Point(location['lat'], location['lng'])
location = geolocator.reverse(location)[0]
print(location)
uni_adresses_dict["Forschungsinstitut für Opthalmologie - IRO"] = location

Bramois, VS, CH


For the remaining four :
* Forschungskommission SAGW
* Fernfachhochschule Schweiz (Mitglied SUPSI) - FFHS 
* with Staatsunabh. Theologische Hochschule Basel - STHB
* Physikal.-Meteorolog. Observatorium Davos - PMOD

The unknown/wrong addresses we get are due to the fact that we do not split properly the part of the name in our loop. We then have to look each of them separetely with the name we believe corresponds best

In [184]:
#Forschungskommission SAGW
query_result = google_places.text_search("SAGW", location="Switzerland")
location = query_result.places[0].geo_location
location = geopy.point.Point(location['lat'], location['lng'])
location = geolocator.reverse(location)[0]
print(location)
uni_adresses_dict["Forschungskommission SAGW"] = location

#Fernfachhochschule Schweiz (Mitglied SUPSI) - FFHS 
query_result = google_places.text_search("Fernfachhochschule Schweiz", location="Switzerland")
location = query_result.places[0].geo_location
location = geopy.point.Point(location['lat'], location['lng'])
location = geolocator.reverse(location)[0]
print(location)
uni_adresses_dict["Fernfachhochschule Schweiz (Mitglied SUPSI) - FFHS"] = location

#problem with Staatsunabh. Theologische Hochschule Basel - STHB
query_result = google_places.text_search("Theologische Hochschule Basel", location="Switzerland")
location = query_result.places[0].geo_location
location = geopy.point.Point(location['lat'], location['lng'])
location = geolocator.reverse(location)[0]
print(location)
uni_adresses_dict["Staatsunabh. Theologische Hochschule Basel - STHB"] = location

#Physikal.-Meteorolog. Observatorium Davos - PMOD
query_result = google_places.text_search("Observatorium Davos - PMOD", location="Switzerland")
location = query_result.places[0].geo_location
location = geopy.point.Point(location['lat'], location['lng'])
location = geolocator.reverse(location)[0]
print(location)
uni_adresses_dict["Physikal.-Meteorolog. Observatorium Davos - PMOD"] = location



Bern / Marzili, BE, CH
Brig, VS, CH
Grossbasel, BS, CH
Bünda, GR, CH


We can now check that we have addresses for all entries, located in Switzerland

In [185]:
found = sum([1 for x in uni_adresses_dict.keys() if uni_adresses_dict[x].address.endswith(", CH")])
overall = sum([1 for x in uni_adresses_dict.keys()])
print(found, "address found over", overall)

74 address found over 74


### Universities location vizualization

We can now simply vizualize universities location on a map to check that everything is fine.

In [186]:
Swiss_map = folium.Map([46.75, 8.25], zoom_start=8)

folium.TopoJson(open(ch_cantons),
                'objects.cantons',
                style_function=lambda feature: {
        'fillColor': '#ffff00',
        'color': 'black',
        'weight': 2,
        'dashArray': '5, 5'
    }).add_to(Swiss_map)

for university in uni_adresses_dict.keys():
    folium.Marker([uni_adresses_dict[university].latitude, uni_adresses_dict[university].longitude], popup=str(university)).add_to(Swiss_map)
Swiss_map

This looks quite good :)

### Save the clean entries

Now that our univerities-location dictionnary is clean, we can save it in order to reload it when needed without having to do other requests.

In [187]:
saveAddressDictToCSV("data/universities_addresses_dict_clean.csv", uni_adresses_dict)

In [188]:
uni_adresses_dict = loadAddressDictFromCSV("data/universities_addresses_dict_clean.csv")

### Computing total grant for each Canton



In [52]:
ch_cantons = os.path.join('data', 'ch-cantons.topojson.json')

topo_json_data = json.load(open(ch_cantons))

In [53]:
m = folium.Map([46.75, 8.25], zoom_start=8)


folium.TopoJson(open(ch_cantons),
                'objects.cantons',
                style_function=lambda feature: {
        'fillColor': '#ffff00',
        'color': 'black',
        'weight': 2,
        'dashArray': '5, 5'
    }).add_to(m)

m

AttributeError: 'Location' object has no attribute 'location'

In [59]:
json_data=open("data/ch-cantons.topojson.json").read()

data = json.loads(json_data)
data["objects"]["cantons"]["geometries"]["properties"]

TypeError: list indices must be integers or slices, not str