-------------------------------------------------
-------------------------------------------------

# Coursera Capstone Project Notebook (Part 3)

[Link to Notebook (Part 1) of the project:](https://nbviewer.jupyter.org/gist/fy5std/1abce225f491d9471b80eca9edd8ae7c)

------------------------------------------------------

###     3. New Project - Where Do We Meet? WDWM

### Real World Data

##### Import libraries and gather location data 

In [1]:
import numpy as np # library for vectorized computation
import pandas as pd # library to process data as dataframes
from bs4 import BeautifulSoup
import csv
import requests

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

print('Basic Database, WebScrape, JSON Libraries imported.')

Basic Database, WebScrape, JSON Libraries imported.


In [2]:
source = requests.get('https://www.coa.nl/en/search-location').text
soup = BeautifulSoup(source, 'html5lib')
print(soup.prettify()[1:1000])

!DOCTYPE html>
<html dir="ltr" lang="en">
 <head>
  <link href="http://www.w3.org/1999/xhtml/vocab" rel="profile"/>
  <meta content="width=device-width, initial-scale=1.0" name="viewport"/>
  <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
  <link href="https://www.coa.nl/sites/www.coa.nl/themes/coa_bs/favicon.ico" rel="shortcut icon" type="image/vnd.microsoft.icon"/>
  <meta content="Approximately one-sixth of the Dutch municipalities now have a COA asylum centre. In some municipalities there are several reception locations, for example an azc and a process reception centre. Most reception locations are regular asylum seekers' centres." name="description"/>
  <meta content="Drupal 7 (https://www.drupal.org)" name="generator"/>
  <link href="https://www.coa.nl/en/search-location" rel="canonical"/>
  <link href="https://www.coa.nl/en/node/278" rel="shortlink"/>
  <title>
   Search location | www.coa.nl
  </title>
  <link href="https://www.coa.nl/sites/www.coa.nl/fi


##### Scrape the data

In [3]:
f = csv.writer(open("COA_Web.csv", "w"))
f.writerow([soup])

59853

##### Convert the data to dataframe

In [4]:
df_coa = pd.read_csv('coa_site_scrapped2.csv')
print(df_coa)

    Latitude  Longitude                                   Name
0    51.4948    3.59212                             Middelburg
1    51.4966    3.87917                                   Goes
2    52.0337    4.32979                               Rijswijk
3    52.1460    4.38730                              Wassenaar
4    52.1774    4.41329                                Katwijk
5    51.8850    4.56808                              Rotterdam
6    51.7612    4.62215                           s-Gravendeel
7    52.9321    4.75435  Den Helder Burgemeester Ritmeesterweg
8    52.3716    4.80231                Amsterdam - Willinklaan
9    52.6775    4.84204                          Heerhugowaard
10   52.3935    4.86152           Amsterdam - Transformatorweg
11   51.5346    4.90282                         Gilze en Rijen
12   51.5594    5.08258               Tilburg - Stationsstraat
13   52.0830    5.08572             Utrecht - Joseph Haydnlaan
14   51.5790    5.22727                             Ois

##### Plot the location information on a map

In [5]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from geopy.distance import geodesic

import folium # map rendering library
from folium import plugins
from folium.plugins import MarkerCluster
from folium.plugins import FastMarkerCluster

print('Geolocation, Plotting and Map Libraries imported.')

Geolocation, Plotting and Map Libraries imported.


In [6]:
#Let's try the geolocator:
address = 'Emmen'

geolocator = Nominatim(user_agent="AZC explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(location)
print('The geograpical coordinate of ',address,' are {}, {}.'.format(latitude, longitude))


Emmen, Drenthe, Nederland
The geograpical coordinate of  Emmen  are 52.788937, 6.8939001.


In [7]:
df_coa_map = folium.Map(location=[df_coa["Latitude"].mean(), df_coa["Longitude"].mean()], zoom_start=7, tiles='cartodbpositron')
mc = MarkerCluster()

for each in range(len(df_coa)):
    popup_info = folium.Popup(df_coa.Name[each], parse_html=True)
    mc.add_child(folium.Marker(location=[df_coa.Latitude[each], df_coa.Longitude[each]], popup=popup_info))

#print (df_coa["Latitude"].mean(), df_coa["Longitude"].mean())    
df_coa_map.add_child(mc)
df_coa_map

#### Current Location and Meeting Place

Now let's set up three groups in different locations. We specified their locations in a seperate csv. This data can be drawn randomly or manipulated later.

In [8]:
df_cur_loc = pd.read_csv('azc_current_location.csv')
print(df_cur_loc)

   Latitude  Longitude                Name
0   51.2908    5.62967   Budel-Cranendonck
1   52.1460    4.38730  Wassenaar-Duinrell


Now it's time to find a middle point distance-wise.

In [9]:
midpoint_lat=np.mean([df_cur_loc.iloc[0].Latitude,df_cur_loc.iloc[1].Latitude])
midpoint_long=np.mean([df_cur_loc.iloc[0].Longitude,df_cur_loc.iloc[1].Longitude])

print (midpoint_lat,midpoint_long)

51.7184 5.008485


Let's see the current points and the midpoint on the map:

In [10]:
cur_map = folium.Map(location=[midpoint_lat, midpoint_long],\
                     tiles='OpenStreetMap', zoom_start=8)
mc = MarkerCluster()

# current locations
for each in range(len(df_cur_loc)):
    popup_info = folium.Popup(df_cur_loc.Name[each], parse_html=True)
    mc.add_child(folium.Marker(location=[df_cur_loc.Latitude[each], df_cur_loc.Longitude[each]], popup=popup_info))

# add midpoint
popup_info = folium.Popup('Midpoint', parse_html=True)
mc.add_child(folium.Marker(location=[midpoint_lat, midpoint_long], popup=popup_info))
    
cur_map.add_child(mc)
cur_map

#### Foursquare Data to Overcome Transportation 

'Midpoint' is the point that we will scan around for a public transportation stop. Then after finding the the public transportation point, we can scan for an appropriate meeting point. We can specify the range up to 2 kms. 

So below we are looking for a *'bus'* or *'train'* category in foursquare near the midpoint.

Each function may be used multiple times so we define them seperately in this cell.

In [11]:
## Set up foursquare credientials

# foursquare credentials
CLIENT_ID = 'M2SDB00WE3N3ZGZSK2SW40QZGQ2BZ1BE1XV10S3NVWYMQLWJ' # Foursquare ID
CLIENT_SECRET = 'SO23NMFS4VAFE05F0K5PHTLBNC3CN5BEHPKCOBTE3KLUTKLU' # Foursquare Secret
VERSION = '20180605' # Foursquare API version

# function to search for a station (public transformation) in vicinity of middle point
def search_station(lat = midpoint_lat, long = midpoint_long, radius = 10000, searchfor = 'station', LIMIT = 5):
    # define limit of number of venues returned by Foursquare API, max 50
    # define radius in m.    
    global CLIENT_ID,CLIENT_SECRET,VERSION

    Travel_Category='4d4b7105d754a06379d81259' # main category title in foursquare hierarchy
    train_station='4bf58dd8d48988d129951735' 
    tram_station='52f2ab2ebcbc57f1066b8b51'
    metro_station='4bf58dd8d48988d1fd931735'
    light_metro_station='4bf58dd8d48988d1fc931735'
    bus_station='52f2ab2ebcbc57f1066b8b4f'
    bus_terminal='4bf58dd8d48988d1fe931735'
    transportation_service='54541b70498ea6ccd0204bff'
    Category_id=Travel_Category
    url2 = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{} \
        &query={}&categoryId={}&radius={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        lat, 
        long,
        searchfor, # we are looking for a station actually
        Category_id,
        radius, 
        LIMIT)
    results2 = requests.get(url2).json() 
    return results2

# function temp for error handling - prints the output of station list in a formatted way
def print_station_temp(output):
    LIMIT = 5
    jsum2 = output['response']['venues']
    for i in range(LIMIT):
        print(i,' : ',jsum2[i]['categories'][0]['name'],' - ',jsum2[0]['name'])
# function call for error handling
def print_station(output):
    try:
        print_station_temp(output)
    except:
        pass

# function to get selected station from output of station list    
def station_coordinates(output, number = 0):
    jsum2 = output['response']['venues']
    stat_lat = jsum2[number]['location']['lat']
    stat_long = jsum2[number]['location']['lng']
    print(jsum2[0]['categories'][0]['name'],'- lat:',stat_lat,', long:',stat_long)
    return [stat_lat, stat_long]

# function to search for a places (appropriate meeting points) in vicinity of selected station
def search_places(lat, long, radius = 1000, LIMIT = 50):
    # limit set to max=50 by default.
    # define in m.
    global CLIENT_ID,CLIENT_SECRET,VERSION

    url3 = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{} \
    &radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    lat, 
    long,
    radius,
    LIMIT)
    results3 = requests.get(url3).json()
    return results3    

# function temp for error handling - prints the output of station list in a formatted way
def print_places_temp(output):
    jsum3 = output['response']['groups'][0]['items']
    for i in range(output['response']['totalResults']):
        print(i,' : ',jsum3[i]['venue']['categories'][0]['name'],' - ', jsum3[i]['venue']['name'])
# function call for error handling
def print_places(output):
    try:
        print_places_temp(output)
    except:
        pass

# function to estimate approximate distances between selected appropriate places and 
# print the route with the distances in a formatted way    
def find_route(output,routelist):
    routelist_cat=[]
    routelist_name=[]
    routelist_lat=[]
    routelist_long=[]
    routelist_dist=[]
    jsum3 = output['response']['groups'][0]['items']
    for each in routelist:
        curr_cat=jsum3[each]['venue']['categories'][0]['name']
        routelist_cat.append (curr_cat)
        curr_name=jsum3[each]['venue']['name']
        routelist_name.append (curr_name)
        curr_lat=jsum3[each]['venue']['location']['lat']
        routelist_lat.append (curr_lat)
        curr_long=jsum3[each]['venue']['location']['lng']
        routelist_long.append (curr_lat)
    
        if each > 1:
            routelist_dist.append(int(geodesic((curr_lat,\
                                            curr_long),\
                                           ((jsum3[each-1]['venue']['location']['lat']),\
                                            (jsum3[each-1]['venue']['location']['lng']))).meters))

    df_route=pd.DataFrame([routelist_cat, routelist_name, routelist_lat, routelist_long,routelist_dist])
    return df_route

# function to list names and parameters of above functions
def list_functions():
    print(' search_station(lat = midpoint_lat, long = midpoint_long, radius = 10000, searchfor = "station", LIMIT = 5)','\n',
           'print_station(output)','\n',
           'station_coordinates(output, number = 0)','\n',
           'search_places(lat, long, radius = 1000, LIMIT = 50)','\n',
           'print_places(output)','\n',
           'find_route(Route_list)')

In [12]:
out_stat=search_station()
out_stat

{'meta': {'code': 200, 'requestId': '5c5f6b2b4c1f6764d0e85038'},
 'response': {'venues': [{'id': '4bb705bf2f70c9b6a83b8630',
    'name': 'Station De Oost',
    'location': {'address': 'Ruigrijk',
     'crossStreet': 'Efteling',
     'lat': 51.6479959849122,
     'lng': 5.053964781164498,
     'labeledLatLngs': [{'label': 'display',
       'lat': 51.6479959849122,
       'lng': 5.053964781164498}],
     'distance': 8442,
     'postalCode': '5171 KW',
     'cc': 'NL',
     'city': 'Kaatsheuvel',
     'state': 'Noord-Brabant',
     'country': 'Nederland',
     'formattedAddress': ['Ruigrijk (Efteling)',
      '5171 KW Kaatsheuvel',
      'Nederland']},
    'categories': [{'id': '4bf58dd8d48988d129951735',
      'name': 'Train Station',
      'pluralName': 'Train Stations',
      'shortName': 'Train Station',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/travel/trainstation_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1549757227',
    'hasPe

In [13]:
print_station(out_stat)

0  :  Train Station  -  Station De Oost
1  :  Taxi  -  Station De Oost
2  :  Bus Line  -  Station De Oost
3  :  Bus Station  -  Station De Oost
4  :  Bus Stop  -  Station De Oost


In [14]:
stat_coord = station_coordinates(out_stat, 0)

Train Station - lat: 51.6479959849122 , long: 5.053964781164498


#### Update Middle Point (With Nearest Station)

In [15]:
cur_map = folium.Map(location=[midpoint_lat, midpoint_long],\
                     tiles='OpenStreetMap', zoom_start=8)
mc = MarkerCluster()

# current locations
for each in range(len(df_cur_loc)):
    popup_info = folium.Popup(df_cur_loc.Name[each], parse_html=True)
    mc.add_child(folium.Marker(location=[df_cur_loc.Latitude[each], df_cur_loc.Longitude[each]], popup=popup_info))

# add midpoint
popup_info = folium.Popup('Midpoint', parse_html=True)
mc.add_child(folium.Marker(location=[midpoint_lat, midpoint_long], popup=popup_info))


# add the station close to midpoint
popup_info = folium.Popup('Station', parse_html=True)
mc.add_child(folium.Marker(location=[stat_coord[0], stat_coord[1]], popup=popup_info))
        
cur_map.add_child(mc)
cur_map


We defined the middle point with a transportation asset (train). Our new midpoint is the 'station' and we will explore nearby area within walking range (1 km) for appropriate venues. 

In [16]:
out_place = search_places(stat_coord[0],stat_coord[1])
out_place

{'meta': {'code': 200, 'requestId': '5c5f6b2b6a60712d32b256e5'},
 'response': {'headerLocation': 'Sprang-Capelle',
  'headerFullLocation': 'Sprang-Capelle',
  'headerLocationGranularity': 'city',
  'totalResults': 118,
  'suggestedBounds': {'ne': {'lat': 51.65699599391221,
    'lng': 5.068442354430305},
   'sw': {'lat': 51.63899597591219, 'lng': 5.039487207898691}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4d31889798336dcb752219f0',
       'name': 'Joris en de Draak',
       'location': {'address': 'Ruigrijk',
        'crossStreet': 'Efteling',
        'lat': 51.64689183165041,
        'lng': 5.052646100521088,
        'labeledLatLngs': [{'label': 'display',
          'lat': 51.64689183165041,
          'lng': 5.052646100521088}],
        'distance': 152,
        

In [17]:
print_places(out_place)

0  :  Theme Park Ride / Attraction  -  Joris en de Draak
1  :  Theme Park Ride / Attraction  -  De Vliegende Hollander
2  :  Theme Park Ride / Attraction  -  Baron 1898
3  :  Theme Park Ride / Attraction  -  Python
4  :  Theme Park Ride / Attraction  -  Ruigrijk
5  :  Theme Park Ride / Attraction  -  D'Oude Tuffer
6  :  Theme Park Ride / Attraction  -  Halve Maen
7  :  Theme Park Ride / Attraction  -  Piraña
8  :  Theme Park  -  Efteling
9  :  Theme Park Ride / Attraction  -  Symbolica: Paleis der Fantasie
10  :  Theme Park  -  De Blauwe Reiger
11  :  Theme Park Ride / Attraction  -  Pagode
12  :  Train Station  -  Station De Oost
13  :  Theme Park Ride / Attraction  -  Vogel Rok
14  :  Creperie  -  Polles Keuken
15  :  Theme Park Ride / Attraction  -  Bob
16  :  Theme Park Ride / Attraction  -  Gondoletta
17  :  Theme Park Ride / Attraction  -  Sprookjesbos
18  :  Theme Park Ride / Attraction  -  Diorama
19  :  Theme Park Ride / Attraction  -  Aquanura
20  :  Theme Park Ride / Attract

Voila! looks like there are appropriate places for two families with children.

Lastly, let's find the approximate walking distance. 

No.12, No.30, No.43, No.44, No.25, No.35, No.27, No.12 - Looks like a good plan to me.

In [18]:
routelist=[12,30,43,44,25,27,12]
df_Route=find_route(out_place,routelist)
df_Route

Unnamed: 0,0,1,2,3,4,5,6
0,Train Station,Café,Playground,Pizza Place,History Museum,Food Truck,Train Station
1,Station De Oost,Wachtruimte 1e Klas,IJspaleis,'t Melkhuysje,Efteling Museum,De Eigenheymer,Station De Oost
2,51.648,51.648,51.6514,51.6481,51.6521,51.648,51.648
3,51.648,51.648,51.6514,51.6481,51.6521,51.648,51.648
4,365,518,239,368,169,485,365


In [19]:
df_Route = df_Route.T
df_Route.rename(columns={0:'Category',1:'Name',2:'Latitude',3:'Longitude',4:'Distance'}, inplace=True)

In [20]:
df_Route

Unnamed: 0,Category,Name,Latitude,Longitude,Distance
0,Train Station,Station De Oost,51.648,51.648,365
1,Café,Wachtruimte 1e Klas,51.648,51.648,518
2,Playground,IJspaleis,51.6514,51.6514,239
3,Pizza Place,'t Melkhuysje,51.6481,51.6481,368
4,History Museum,Efteling Museum,51.6521,51.6521,169
5,Food Truck,De Eigenheymer,51.648,51.648,485
6,Train Station,Station De Oost,51.648,51.648,365


Although the lat/long information and walking distances may differ slightly in this scale, approximate distances (index 4, in meter, distance to next stop) looks ok. 

The next weekend or so, a friend will probably use this information (and this route :)