## Data fields & description:
- Project Number: the unique project id
- Project Title: the project name (Mandatory)
- Project Title English: the project name in English (Optional)
- Reponsible Applicant: the person who submitted the research proposal to the SNSF (Mandatory)
- Funding Instrument: funding schemes for research and scientific communication
- Funding Instrument Hierarchy: top level hierarchy funding instrument
- Institution: the research institution where the project will largely be carried out
- University: the University where the project will largely be carried out. **This field is only filled if the research is carried out at a Swiss institution, otherwise the field remains blank. In the case of mobility fellowships, it is generally left empty.**
- Discipline Number: number of the discipline
- Discipline Name: name of the discipline
- Discipline Name Hierarchy: top level of discipline
- Start Date: the starting date of the project
- End Date: the actual end date of the project
- Approved Amount: the total approved money spent for the project. **This amount is not indicated in the case of mobility fellowships.**
- Keywords: unstructured field.

In [1]:
# import required packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import HTML
import folium
import requests
%matplotlib inline

In [2]:
# Define constants
GEONAMES_SEARCH_URL = 'http://api.geonames.org/searchJSON'
GOOGLE_SEARCH_URL = 'https://maps.googleapis.com/maps/api/place/textsearch/json'
GOOGLE_GEOCODE_URL = 'https://maps.googleapis.com/maps/api/geocode/json'

In [3]:
# load the p3
df = pd.read_csv('./data/P3_GrantExport.csv', sep=";")

The documentation of fields mentions that institutes from outside Switzerland don't have an entry in University. So it's safe to throw away all rows has 'nicht zuteilbar - NA' in university column.

In [4]:
df_Nicht = df[df['University'] == 'Nicht zuteilbar - NA']
df = df[df['University'] != 'Nicht zuteilbar - NA']

In [8]:
df_null = df[df['University'].isnull()]
df = df[df['University'].notnull()]

In [11]:
list_of_universities = set(df['University'].unique())

## GeoNames Search Webservice:
GeoNames api offers services in XML and JSON format. We use JSON format as it is easier to extract data from it.

In [12]:
def search(url, params):
    """ This function returns a json file containing the geographical results from the
    Geonames or google map websites """
    try:
        response = requests.get(url, params)
    except:
        raise ServiceException(url, params)
    return response.json()

In [14]:
GEONAMES_SEARCH_URL = 'http://api.geonames.org/searchJSON'
lst = []
for university in list_of_universities:
    response = search(GEONAMES_SEARCH_URL, {'q': university, 'maxRows': 1, 'country': 'CH', 'username': 'sorooshafiee'})
    lst.append(response['totalResultsCount'])
print('Number of the recovered contons from the university name: {}/{}'.format(sum(x != 0 for x in lst), len(lst)))

Number of the recovered contons from the university name: 5/76


## Google API


In [81]:
def get_geonames(url, params, types):
    """ This function returns the filtered address component """
    try:
        response = requests.get(url,params)
    except:
        raise ServiceException(url, params)
    address_comps = response.json()['results'][0]['address_components']
    filter_method = lambda x: len(set(x['types']).intersection(list(types))
    return filter(filter_method, address_comps)

In [18]:
with open('data/api_key.txt', 'r') as in_file:
    api_key = in_file.read()

In [23]:
lst = []
for university in list_of_universities:
    response = search(GOOGLE_SEARCH_URL, {'key' : api_key, 'query' : 'epfl'})
    lst.append(response['totalResultsCount'])
print('Number of the recovered universities conton: {}/{}'.format(sum(x != 0 for x in lst), len(lst)))

KeyError: 'totalResultsCount'

In [84]:
types = ['administrative_area_level_1']
asd = get_geonames('https://maps.googleapis.com/maps/api/geocode/json',
                 {'key' : api_key, 'place_id' : response['results'][0]['place_id']},
                 types)

In [87]:
asd = list(types)
asd

['administrative_area_level_1']

In [67]:
list([filter(lambda x: tp in x.get('types'), asd) for tp in types][0])

[{'long_name': 'Vaud',
  'short_name': 'VD',
  'types': ['administrative_area_level_1', 'political']}]

In [None]:
url = 'http://maps.googleapis.com/maps/api/geocode/json' + \
            '?place_id={}&sensor=false'.format(response['results'][0]['place_id'])

In [None]:
data = requests.get('https://maps.googleapis.com/maps/api/geocode/json',
                    {'key' : api_key, 'place_id' : response['results'][0]['place_id']})

In [None]:
b['long_name']

In [None]:
data.json()['results'][0]['address_components'][4]

In [29]:
squares = map(lambda x: x**2, range(10))
special_squares = filter(lambda x: x > 5 and x < 50, squares)

In [33]:
special_squares

<filter at 0x1836466fb00>