# ADA / Applied Data Analysis
<h2 style="color:#a8a8a8">Homework 3 - Visualization<br>
Aimée Montero, Alfonso Peterssen, Cyriaque Brousse</h2>

## Part 1 - Data import

Let's import the required libraries:

In [56]:
import folium
import pandas as pd
import requests
import json

Read the data from the provided CSV file:

In [2]:
data = pd.read_csv('data/grantexport.csv', sep=';')

In [3]:
cols = {'Project Number' : 'pnr',
       'Project Title' : 'title',
       'University' : 'univ',
       'Approved Amount' : 'amount'}
data = data.rename(columns=cols)
data = data[[v for (k,v) in cols.items()]]

According to the documentation, the values in the field `pnr` are unique. We check this and use it as an index for our data frame.

In [4]:
data.pnr.is_unique

True

In [None]:
data = data.set_index('pnr')

We discard the values in the `amount` field that are not numeric. The documentation states that *"This amount is not indicated in the case of mobility fellowships since it depends on administrative factors, typically the destination, cost of living, family allowances (if applicable) and exchange rate differences."*.

In [8]:
data.amount = pd.to_numeric(data.amount, errors='coerce')
data.amount.isnull().sum()
data.sample(5)

Unnamed: 0_level_0,title,univ,amount
pnr,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
122847,The 3rd international workshop on approaches t...,ETH Zürich - ETHZ,5000.0
455,Musikinstrumente der Schweiz,"NPO (Biblioth., Museen, Verwalt.) - NPO",74298.0
28071,Untersuchung der Rolle der Lipide und Lipoprot...,,
28475,Uebergangsmetallcluster als homogene Katalysat...,Université de Neuchâtel - NE,197707.0
46761,Grenzlagen der Laurophyllierung in der Schweiz,ETH Zürich - ETHZ,82407.0


The number of rows for which the `amount` field is not a number is:

In [9]:
data.amount.isnull().sum()

10910

## Part 2 - Mapping universities to cantons

<p>We want to display, somehow, the amount of money granted to each canton.<br>
To achieve that, the first step is to match each `univ` value to a new `location` field.<p>

<p>To do that, we will use the Google Places API. We take care not to divulge the API key, which will be placed in a separate file, ignored by Git.<br>
The API call will return a location for each university. We then input this location in the Reverse Geocoding API, which will return a canton.</p>

First, we need to import the Google API key from the key file:

In [29]:
api_key = !head api_key
api_key = api_key[0]

We define the following helper method for fetching JSON from Google APIs:

In [110]:
def fetch_json(url, params):
    ''' Fetches the json object resulting from the query of the url with
        the given params.
        Checks that the status is OK, otherwise returns None.
        If the status is OK, then it will return the first result,
        if it exists, None otherwise.
    '''
    response = requests.get(url, params)
    obj = json.loads(response.text)
    
    if obj['status'] != 'OK':
        print('[E] status was', obj['status'])
        return None
    
    if len(obj['results']) < 1:
        return None
    
    return obj['results'][0]

We get the location of the university:

In [111]:
url = 'https://maps.googleapis.com/maps/api/place/textsearch/json'
params = { 'key' : api_key, 'query' : 'epfl'}

latlng = fetch_json(url, params)['geometry']['location']
latlng

{'lat': 46.5190557, 'lng': 6.566757600000001}

We can input this result into the Reverse Geocoding API:

In [112]:
url = 'http://maps.googleapis.com/maps/api/geocode/json'
params = {'latlng' : str(latlng['lat']) + ','+ str(latlng['lng']),
          'sensor' : 'false'}

for r in fetch_json(url, params)['address_components']:
    if 'administrative_area_level_1' in r['types']:
        print(r['long_name'])

Vaud
