-------------------------------------------------
-------------------------------------------------

# Coursera Capstone Project Notebook (Part 3)

[Link to Notebook (Part 1) of the project:](https://nbviewer.jupyter.org/gist/fy5std/1abce225f491d9471b80eca9edd8ae7c)

------------------------------------------------------

###     3. New Project - Where Do We Meet? WDWM

#### From Assignment

This capstone project will be graded by your peers. This capstone project is worth *70%* of your total grade. The project will be completed over the course of 2 weeks. Week 1 submissions will be worth *30%* whereas week 2 submissions will be worth 40% of your total grade.

##### Week4
For this week, you will required to submit the following:

1.   A description of the problem and a discussion of the background. **(15 marks)**

2.   A description of the data and how it will be used to solve the problem. **(15 marks)**

##### Week5
This week, you will continue working on your capstone project. Please remember by the end of this week, you will need to submit the following:

1.  A full report consisting of all of the following components. **(15 marks)**:
    * Introduction where you discuss the business problem and who would be interested in this project.
    * Data where you describe the data that will be used to solve the problem and the source of the data.
    * Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, and what machine learnings were used and why.
    * Results section where you discuss the results.
    * Discussion section where you discuss any observations you noted and any recommendations you can make based on the results.
    * Conclusion section where you conclude the report.

2. A link to your Notebook on your Github repository pushed showing your code. **(15 marks)**

3. Your choice of a presentation or blogpost. **(10 marks)**

### Week 4

### Background

We all have families, friends and business contacts around the country. Last decade we started to meet online, but meeting in a common place is always preferred, and sometimes is essential. In that case, the procedure depends on a few things; 
   * How far are the groups (3 km, 15 miles etc.),
   * How is the transportation (by car, by public transportation etc.),
   * Do groups have special needs/preferences in a meeting place (children, restaurant, disabled people etc.),
   * Is there an appropriate place in between, within the transportation?

Generally, after a quick assessment, the meeting place is chosen between known places. If the middle point is an unknown place or if a change on the meeting place is intended, then some search on the internet will help to determine the place

In this frame, social interaction would be the main need to use this tool. Also, it would be used with business purposes such as finding a warehouse, store or maintenance place for stores in various places. So any individual user may be in a situation to use this tool. Specifically, we intend to specify the problem to a refugee (asylum seeker) group in Netherlands.

### Problem Description 

*The problem* we focus is about shortening the internet search time to filter appropriate places to meet.

To be more precise, if two or more groups around the country intends to find an appropriate meeting point (children playground, hotel, restaurant, sports bar etc.) in a city in-between current locations, a tool which takes the current locations and gives the suggestions would help a lot.

*The audience* in this project is AZC guests (refugees and asylum seekers). Since these people are new in the country, with no prior experience and lack of language, they generally struggle to find an appropriate place. Most of the guests don't even know their locations in the country (just know the name of AZC) and can not estimate a middle point to meet.

### Data

In the described frame, I would like to find a solution to the probelem described above. So in Netherlands , there are community centers for refugees, called AZC. These centers are located all over Netherlands and people may move from one to another. When one wants to meet with a friend from another AZC, since the refugees would not have a car, she generally tries to find an in-between city, an appropriate place to meet and travel there by public transportation.

So in this project, I will use the data from COA website (official AZC institution) to gather AZC locations. I will use foursquare data to find an appropriate place around intended meeting point.

### Methodology (Projection)

First, I will download the [webpage](https://www.coa.nl/en/search-location), and then scrape it to reach the exact latitude and longitudes of AZC's (reception centers). Then I will use some temporary information to locate the current locations of the groups. Then, with a few equations, I intend to find a middle point with a radius large enough to a public transportation. Then lastly, after finding the approximate meeting point, I would use the foursquare data to list the categories and alternative places to meet.

So let's move on.

### Real World Data

##### Import libraries and gather location data 

In [1]:
import numpy as np # library for vectorized computation
import pandas as pd # library to process data as dataframes
from bs4 import BeautifulSoup
import csv
import requests

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

print('Basic Database, WebScrape, JSON Libraries imported.')

Basic Database, WebScrape, JSON Libraries imported.


In [2]:
source = requests.get('https://www.coa.nl/en/search-location').text
soup = BeautifulSoup(source, 'html5lib')
print(soup.prettify())

<!DOCTYPE html>
<html dir="ltr" lang="en">
 <head>
  <link href="http://www.w3.org/1999/xhtml/vocab" rel="profile"/>
  <meta content="width=device-width, initial-scale=1.0" name="viewport"/>
  <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
  <link href="https://www.coa.nl/sites/www.coa.nl/themes/coa_bs/favicon.ico" rel="shortcut icon" type="image/vnd.microsoft.icon"/>
  <meta content="Approximately one-sixth of the Dutch municipalities now have a COA asylum centre. In some municipalities there are several reception locations, for example an azc and a process reception centre. Most reception locations are regular asylum seekers' centres." name="description"/>
  <meta content="Drupal 7 (https://www.drupal.org)" name="generator"/>
  <link href="https://www.coa.nl/en/search-location" rel="canonical"/>
  <link href="https://www.coa.nl/en/node/278" rel="shortlink"/>
  <title>
   Search location | www.coa.nl
  </title>
  <link href="https://www.coa.nl/sites/www.coa.nl/fi

This is the full wabpage, and there are location information inside, but since it was written in Java it's complicated to reach the location info with an automated process. I think it would be easier to scrape the information with a text editor using regex.

In [3]:
f = csv.writer(open("COA_Web.csv", "w"))
f.writerow([soup])

59853

I did some regex edits with Sublime Text on the csv file for faster process. Now the file contains lat-long of the AZC (Name).

##### Convert the data to dataframe

In [4]:
df_coa = pd.read_csv('coa_site_scrapped2.csv')
print(df_coa)

    Latitude  Longitude                                   Name
0    51.4948    3.59212                             Middelburg
1    51.4966    3.87917                                   Goes
2    52.0337    4.32979                               Rijswijk
3    52.1460    4.38730                              Wassenaar
4    52.1774    4.41329                                Katwijk
5    51.8850    4.56808                              Rotterdam
6    51.7612    4.62215                           s-Gravendeel
7    52.9321    4.75435  Den Helder Burgemeester Ritmeesterweg
8    52.3716    4.80231                Amsterdam - Willinklaan
9    52.6775    4.84204                          Heerhugowaard
10   52.3935    4.86152           Amsterdam - Transformatorweg
11   51.5346    4.90282                         Gilze en Rijen
12   51.5594    5.08258               Tilburg - Stationsstraat
13   52.0830    5.08572             Utrecht - Joseph Haydnlaan
14   51.5790    5.22727                             Ois

Our database for cuurent locations is ready to use. Now we can move on to plotting these locations.

##### Plot the location information on a map

In [5]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
import folium # map rendering library

from folium import plugins
from folium.plugins import MarkerCluster
from folium.plugins import FastMarkerCluster

print('Geolocation, Clustering, Plotting and Map Libraries imported.')

Geolocation, Clustering, Plotting and Map Libraries imported.


Let's try the geolocator:

In [6]:
address = 'Emmen'

geolocator = Nominatim(user_agent="AZC explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(location)
print('The geograpical coordinate of ',address,' are {}, {}.'.format(latitude, longitude))


Emmen, Drenthe, Nederland
The geograpical coordinate of  Emmen  are 52.788937, 6.8939001.


In [7]:
df_coa_map = folium.Map(location=[df_coa["Latitude"].mean(), df_coa["Longitude"].mean()], zoom_start=7)
mc = MarkerCluster()

for each in range(len(df_coa)):
    popup_info = folium.Popup(df_coa.Name[each], parse_html=True)
    mc.add_child(folium.Marker(location=[df_coa.Latitude[each], df_coa.Longitude[each]], popup=popup_info))

#print (df_coa["Latitude"].mean(), df_coa["Longitude"].mean())    
df_coa_map.add_child(mc)
df_coa_map

#### Current Location and Meeting Place

Now let's set up three groups in different locations. We specified their locations in a seperate csv. This data can be drawn randomly or manipulated later.

In [8]:
df_cur_loc = pd.read_csv('azc_current_location.csv')
print(df_cur_loc)

   Latitude  Longitude                Name
0   51.2908    5.62967   Budel-Cranendonck
1   52.1460    4.38730  Wassenaar-Duinrell


Now it's time to find a middle point distance-wise.

In [9]:
midpoint_lat=np.mean([df_cur_loc.iloc[0].Latitude,df_cur_loc.iloc[1].Latitude])
midpoint_long=np.mean([df_cur_loc.iloc[0].Longitude,df_cur_loc.iloc[1].Longitude])

print (midpoint_lat,midpoint_long)

51.7184 5.008485


Let's see the current points and the midpoint on the map:

In [10]:
cur_map = folium.Map(location=[midpoint_lat, midpoint_long],\
                     tiles='OpenStreetMap', zoom_start=8)
mc = MarkerCluster()

# current locations
for each in range(len(df_cur_loc)):
    popup_info = folium.Popup(df_cur_loc.Name[each], parse_html=True)
    mc.add_child(folium.Marker(location=[df_cur_loc.Latitude[each], df_cur_loc.Longitude[each]], popup=popup_info))

# add midpoint
popup_info = folium.Popup('Midpoint', parse_html=True)
mc.add_child(folium.Marker(location=[midpoint_lat, midpoint_long], popup=popup_info))
    
cur_map.add_child(mc)
cur_map

#### Foursquare Data to Overcome Transportation 

'Midpoint' is the point that we will scan around for a public transportation stop. Then after finding the the public transportation point, we can scan for an appropriate meeting point. We can specify the range up to 2 kms. 

So below we are looking for a *'bus'* or *'train'* category in foursquare near the midpoint.

In [18]:
## Set up foursquare credientials
CLIENT_ID = 'M2SDB00WE3N3ZGZSK2SW40QZGQ2BZ1BE1XV10S3NVWYMQLWJ' # Foursquare ID
CLIENT_SECRET = 'SO23NMFS4VAFE05F0K5PHTLBNC3CN5BEHPKCOBTE3KLUTKLU' # Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 5# limit of number of venues returned by Foursquare API, max 10
radius = 10000 # define in m.

neighborhood_name = 'midpoint_center'

Travel_Category='4d4b7105d754a06379d81259' # main category title in foursquare hierarchy
train_station='4bf58dd8d48988d129951735' 
tram_station='52f2ab2ebcbc57f1066b8b51'
metro_station='4bf58dd8d48988d1fd931735'
light_metro_station='4bf58dd8d48988d1fc931735'
bus_station='52f2ab2ebcbc57f1066b8b4f'
bus_terminal='4bf58dd8d48988d1fe931735'
transportation_service='54541b70498ea6ccd0204bff'

Category_id=Travel_Category
url2 = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{} \
    &query={}&categoryId={}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    midpoint_lat, 
    midpoint_long,
    'station', # we are looking for a station actually
    Category_id,
    radius, 
    LIMIT)
results2 = requests.get(url2).json()
results2

{'meta': {'code': 200, 'requestId': '5c5b73206a60717af388fc82'},
 'response': {'venues': [{'id': '4bb705bf2f70c9b6a83b8630',
    'name': 'Station De Oost',
    'location': {'address': 'Ruigrijk',
     'crossStreet': 'Efteling',
     'lat': 51.6479959849122,
     'lng': 5.053964781164498,
     'labeledLatLngs': [{'label': 'display',
       'lat': 51.6479959849122,
       'lng': 5.053964781164498}],
     'distance': 8442,
     'postalCode': '5171 KW',
     'cc': 'NL',
     'city': 'Kaatsheuvel',
     'state': 'Noord-Brabant',
     'country': 'Nederland',
     'formattedAddress': ['Ruigrijk (Efteling)',
      '5171 KW Kaatsheuvel',
      'Nederland']},
    'categories': [{'id': '4bf58dd8d48988d129951735',
      'name': 'Train Station',
      'pluralName': 'Train Stations',
      'shortName': 'Train Station',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/travel/trainstation_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1549497120',
    'hasPe

In [19]:
jsum2 = results2['response']['venues']
for i in range(LIMIT):
    print(i,' : ',jsum2[i]['categories'][0]['name'],' - ',jsum2[0]['name'])

0  :  Train Station  -  Station De Oost
1  :  Taxi  -  Station De Oost
2  :  Bus Line  -  Station De Oost
3  :  Bus Stop  -  Station De Oost
4  :  Bus Station  -  Station De Oost


In [20]:
## There is a train station nearby. We can extract the coordinates.
stat_lat = jsum2[0]['location']['lat']
stat_long = jsum2[0]['location']['lng']
print(jsum2[0]['categories'][0]['name'],'- lat:',stat_lat,', long:',stat_long)

Train Station - lat: 51.6479959849122 , long: 5.053964781164498


#### Update Middle Point (With Nearest Station)

In [22]:
cur_map = folium.Map(location=[midpoint_lat, midpoint_long],\
                     tiles='OpenStreetMap', zoom_start=8)
mc = MarkerCluster()

# current locations
for each in range(len(df_cur_loc)):
    popup_info = folium.Popup(df_cur_loc.Name[each], parse_html=True)
    mc.add_child(folium.Marker(location=[df_cur_loc.Latitude[each], df_cur_loc.Longitude[each]], popup=popup_info))

# add midpoint
popup_info = folium.Popup('Midpoint', parse_html=True)
mc.add_child(folium.Marker(location=[midpoint_lat, midpoint_long], popup=popup_info))


# add the station close to midpoint
popup_info = folium.Popup('Station', parse_html=True)
mc.add_child(folium.Marker(location=[stat_lat, stat_long], popup=popup_info))
        
cur_map.add_child(mc)
cur_map


We defined the middle point with a transportation asset (train). Our new midpoint is the 'station' and we will explore nearby area within walking range (1 km) for appropriate venues. 

#### Explore Around the Midpoint for Appropriate Venues

In [25]:
## Set up foursquare credientials
CLIENT_ID = 'M2SDB00WE3N3ZGZSK2SW40QZGQ2BZ1BE1XV10S3NVWYMQLWJ' # Foursquare ID
CLIENT_SECRET = 'SO23NMFS4VAFE05F0K5PHTLBNC3CN5BEHPKCOBTE3KLUTKLU' # Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 50# limit of number of venues returned by Foursquare API, max 10
radius = 1000 # define in m.

url3 = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    stat_lat, 
    stat_long,
    radius,
    LIMIT)
print(url3) 
results3 = requests.get(url3).json()
results3

https://api.foursquare.com/v2/venues/explore?&client_id=M2SDB00WE3N3ZGZSK2SW40QZGQ2BZ1BE1XV10S3NVWYMQLWJ&client_secret=SO23NMFS4VAFE05F0K5PHTLBNC3CN5BEHPKCOBTE3KLUTKLU&v=20180605&ll=51.6479959849122,5.053964781164498&radius=1000&limit=50


{'meta': {'code': 200, 'requestId': '5c5b7361dd5797192ba74af1'},
 'response': {'headerLocation': 'Sprang-Capelle',
  'headerFullLocation': 'Sprang-Capelle',
  'headerLocationGranularity': 'city',
  'totalResults': 117,
  'suggestedBounds': {'ne': {'lat': 51.65699599391221,
    'lng': 5.068442354430305},
   'sw': {'lat': 51.63899597591219, 'lng': 5.039487207898691}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4d31889798336dcb752219f0',
       'name': 'Joris en de Draak',
       'location': {'address': 'Ruigrijk',
        'crossStreet': 'Efteling',
        'lat': 51.64689183165041,
        'lng': 5.052646100521088,
        'labeledLatLngs': [{'label': 'display',
          'lat': 51.64689183165041,
          'lng': 5.052646100521088}],
        'distance': 152,
        

In [26]:
jsum3 = results3['response']['groups'][0]['items']
for i in range(LIMIT):
    print(i,' : ',jsum3[i]['venue']['categories'][0]['name'],' - ', jsum3[i]['venue']['name'])

0  :  Theme Park Ride / Attraction  -  Joris en de Draak
1  :  Theme Park Ride / Attraction  -  De Vliegende Hollander
2  :  Theme Park Ride / Attraction  -  Baron 1898
3  :  Theme Park Ride / Attraction  -  Python
4  :  Theme Park Ride / Attraction  -  Ruigrijk
5  :  Theme Park Ride / Attraction  -  D'Oude Tuffer
6  :  Theme Park Ride / Attraction  -  Halve Maen
7  :  Theme Park Ride / Attraction  -  Piraña
8  :  Theme Park  -  Efteling
9  :  Theme Park Ride / Attraction  -  Symbolica: Paleis der Fantasie
10  :  Theme Park  -  De Blauwe Reiger
11  :  Theme Park Ride / Attraction  -  Pagode
12  :  Train Station  -  Station De Oost
13  :  Theme Park Ride / Attraction  -  Vogel Rok
14  :  Creperie  -  Polles Keuken
15  :  Theme Park Ride / Attraction  -  Bob
16  :  Theme Park Ride / Attraction  -  Gondoletta
17  :  Theme Park Ride / Attraction  -  Sprookjesbos
18  :  Theme Park Ride / Attraction  -  Diorama
19  :  Theme Park Ride / Attraction  -  Aquanura
20  :  Theme Park Ride / Attract

Voila! looks like there are appropriate places for two families with children.

Lastly, let's find the approximate walking distance. 

No.12, No.30, No.43, No.44, No.25, No.35, No.27, No.12 - Looks like a good plan to me.

In [27]:
from geopy.distance import geodesic

Route = [12,30,43,44,25,35,27,12]

routelist_cat=[]
routelist_name=[]
routelist_lat=[]
routelist_long=[]
routelist_dist=[]
for each in Route:
    curr_cat=jsum3[each]['venue']['categories'][0]['name']
    routelist_cat.append (curr_cat)
    curr_name=jsum3[each]['venue']['name']
    routelist_name.append (curr_name)
    curr_lat=jsum3[each]['venue']['location']['lat']
    routelist_lat.append (curr_lat)
    curr_long=jsum3[each]['venue']['location']['lng']
    routelist_long.append (curr_lat)
    
    if each > 1:
        routelist_dist.append(geodesic((curr_lat,\
                                        curr_long),\
                                       ((jsum3[each-1]['venue']['location']['lat']),\
                                        (jsum3[each-1]['venue']['location']['lng']))).meters)

df_route=pd.DataFrame([routelist_cat, routelist_name, routelist_lat, routelist_long,routelist_dist])

In [28]:
df_route

Unnamed: 0,0,1,2,3,4,5,6,7
0,Train Station,Café,Playground,Pizza Place,History Museum,Candy Store,Food Truck,Train Station
1,Station De Oost,Wachtruimte 1e Klas,IJspaleis,'t Melkhuysje,Efteling Museum,De Verleiding,De Eigenheymer,Station De Oost
2,51.648,51.648,51.6514,51.6481,51.6521,51.6483,51.648,51.648
3,51.648,51.648,51.6514,51.6481,51.6521,51.6483,51.648,51.648
4,365.624,518.681,239.93,368.58,169.976,633.999,485.871,365.624


Although the lat/long information and walking distances may differ slightly in this scale, approximate distances (index 4, in meter, distance to next stop) looks ok. 

The next weekend or so, a friend will probably use this information (and this route :)