# Capstone Project - Day Tour Recommendation Service (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction-problem)
  * [Problem Statement](#problem)
* [Data Analysis](#data-analysis)
  * [Data Source and Considerations](#source-considerations)
  * [Feature Selection and Data Structure](#feature-datastructure)
* [Methodology](#methodology-details)
  * [Exploratory Analysis](#exploratory-analysis)
  * [K-Means Clustering](#k-means)
* [Results and Discussion](#results-discussion)
* [Conclusion](#conclusion-future)
  * [Future Possibilities](#future-possibilities)

## Introduction: Business Problem <a name="introduction-problem"></a>

We all love our weekends and holidays, and a lot of us like to make the most of them by exploring new places in the city based on our unique interests. Wouldn’t it be nice if we can quickly get some recommendations on the things to do in a city, based on our interests and preferences? Day tour recommendation service aims to provide recommendations about the places you can visit in a day, near a given starting location and within a desired distance radius. The service takes into account your things of interests like food, sight-seeing, outdoor activities, entertainment etc. and recommends places you can visit on a given day.
The service will then group the different places of interest and plot each group on a map for better visualization and informed decision-making.

**This service aims to target day trippers who are looking to explore a city during the day time and be back by the evening.**


### Problem Statement <a name="problem"></a>

I am on an official trip to San Francisco to attend a conference, staying at JW Marriot. I have a day off and I want to go on a day tour to explore SF. Being a foodie and a nature lover, I want to explore nature attractions and have some good food along the way. **Recommend me places to visit in SF within a radius of 50Kms from where I stay.**

My Interests and Preferences:
* I am interested in visiting one the following nature attractions:
  * Scenic lookout
  * Waterfall
  * Lake
  * Beach
* I am interested in trying one of these cuisines:
  * Indian
  * Italian
  * Mexican
  * Thai
  * Ethiopian
* I am staying at JW Marriot, recommend me places within 50kms of my stay


In [None]:
import warnings
warnings.filterwarnings('ignore')

# input to the service
ll = '37.7883, -122.4105'
places_of_interest = ['Scenic lookout', 'Waterfall', 'Lake', 'Beach']
cuisines_of_interest = ['Indian restaurant', 'Italian restaurant', 'Mexican restaurant', 'Ethiopian restaurant']
radius = 50000 # 50 kms

## Data Analysis <a name="data-analysis"></a>

### Data Source and Considerations <a name="source-considerations"></a>

We will use Foursquare API as our primary source of data. We will rely on the "Get Venue Recommendations" API to fetch relevant data. Following are the important parameters we will be passing to the API:
* ll --> This is the coordinates of the starting location (in this case 37.7883, -122.4105 for JW Marriot)
* radius --> This is the maximum distance from the starting location (in this case 50kms)
* categoryId or query --> Comma-separated list of interested categories of places to visit (food, outdoors etc.) or specific text to search within places (Indian restaurant etc.)

Following are some considerations while working with the Foursquare API:
* Separate API calls to fetch data for each cuisine of interest
We will be making separate calls to the API for each cuisine of interest as otherwise the data is mixed up and data set for each cuisine is drastically reduced. We will aim to get as much relevant data as possible. For example, to fetch Indian Restaurants near JW Marriot, we will use query parameter ('Indian Restaurant') instead of categoryId for food ('4d4b7105d754a06374d81259').
* Separate API calls to fetch data for each nature attraction of interest
Similar to cuisines/ food, we will be making separate calls to the API for each nature attraction of interest. For example, to fetch Waterfalls near JW Marriot, we will use query parameter ('Waterfall') instead of categoryId for outdoors ('4d4b7105d754a06377d81259').
* Distance from starting location and pagination
Each of the above API calls will pass 'radius' parameter to ensure only things of interest within a specified distance are considered. Each API call will also handle pagination for fetching multiple pages of the result.

### Feature Selection and Data Structure <a name="feature-datastructure"></a>

Below table lists the attributes of Venue we will extract from the response:

| Feature  | Path                                          | Explanation                                                             |
| -------- | --------------------------------------------- | ----------------------------------------------------------------------- |
| name     | response.groups.items.venue.name              | Name of the venue                                                       |
| lat      | response.groups.items.venue.location.lat      | Latitude of the venue; required for plotting venue on the map           |
| lng      | response.groups.items.venue.location.lng      | Longitude of the venue; required for plotting venue on the map          |
| distance | response.groups.items.venue.location.distance | Distance of the venue from starting point; required for grouping venues |

We will store these attributes into a Python DataFrame having below columns. Note that we will use indicator variables (like scenic_lookout, waterfall etc.) instead of a category variable (like venue_type).

In [2]:
import pandas as pd
column_names = ["venue_name", "venue_lat", "venue_lng", "distance", "scenic_lookout", "waterfall", "lake", "beach", "indian_restaurant", "italian_restaurant", "mexican_restaurant", "thai_restaurant"]

df_sample = pd.DataFrame(columns = column_names)
df_sample

Unnamed: 0,venue_name,venue_lat,venue_lng,distance,scenic_lookout,waterfall,lake,beach,indian_restaurant,italian_restaurant,mexican_restaurant,thai_restaurant


## Methodology <a name="methodology-details"></a>

### Exploratory Analysis <a name="exploratory-analysis"></a>

In [3]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
from pandas.io.json import json_normalize # tranforming json file into a pandas dataframe library
import folium # plotting library
import matplotlib.cm as cm
import matplotlib.colors as colors

In [4]:
CLIENT_ID = '2AF4TLBPWOCNZGHDH4LTN3V5P0HLVI0NYH5W3XLOX1SWQAY1' # your Foursquare ID
CLIENT_SECRET = 'VXRZFT2JFFIUF0Z1VCRX14U315VBQPRQDKDT4EZ0WY2PUGXO' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 50

In [5]:
def getVenuesAsDataframe(query):
    offset = 0
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={}&limit={}&radius={}&query={}&offset={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, ll, LIMIT, radius, query, offset)
    results = requests.get(url).json()
    results_count = results['response']['totalResults']
    #print(f'total results={results_count}')

    venues = results['response']['groups']
    df = json_normalize(venues[0]['items'])
    #print(df.shape)
    
    if results_count > LIMIT: # need pagination
        offset = offset + LIMIT
        while offset < results_count:
            url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={}&limit={}&radius={}&query={}&offset={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, ll, LIMIT, radius, query, offset)
            results = requests.get(url).json()
            
            venues = results['response']['groups']
            df_pg = json_normalize(venues[0]['items'])
            df = df.append(df_pg, ignore_index = True)
            offset = offset + LIMIT
        
    return df;

In [6]:
df = pd.DataFrame()

# fetch details on places of interest
for place in places_of_interest:
    query = place
    df_place = getVenuesAsDataframe(query)
    df_place = df_place[['venue.name', 'venue.location.lat', 'venue.location.lng', 'venue.location.distance']]
    df_place['venue_type'] = place
    df = df.append(df_place)

# fetch details on cuisines
for cuisine in cuisines_of_interest:
    query = cuisine
    df_cuisine = getVenuesAsDataframe(query)
    df_cuisine = df_cuisine[['venue.name', 'venue.location.lat', 'venue.location.lng', 'venue.location.distance']]
    df_cuisine['venue_type'] = cuisine
    df = df.append(df_cuisine)
    
df

Unnamed: 0,venue.name,venue.location.lat,venue.location.lng,venue.location.distance,venue_type
0,View of Alcatraz,37.811002,-122.410751,2527,Scenic lookout
1,Embarcadero Public Promenade,37.796622,-122.395442,1616,Scenic lookout
2,Lyon Street Steps,37.793544,-122.446559,3225,Scenic lookout
3,The Bay Lights,37.790707,-122.386166,2157,Scenic lookout
4,Montgomery & Green,37.800176,-122.404282,1430,Scenic lookout
...,...,...,...,...,...
37,Dareye Hide a Way Ethiopian Restaurant,37.850813,-122.260247,14933,Ethiopian restaurant
38,Taste of Ethiopia,37.927085,-122.319651,17390,Ethiopian restaurant
39,Dallaq Market & Cafe,37.801771,-122.275480,11971,Ethiopian restaurant
40,Enssaro Ethiopian Restaurant,37.864991,-122.121678,26791,Ethiopian restaurant


In [7]:
df_dummies = pd.get_dummies(df['venue_type'])
df_final = pd.concat([df, df_dummies], axis=1)
df_final.drop(['venue_type'], axis = 1, inplace=True)
df_final.rename(columns = {'venue.name':'Venue_name', 'venue.location.lat':'Venue_lat', 'venue.location.lng':'Venue_lng', 'venue.location.distance': 'Distance', 'Scenic lookout':'Scenic_lookout', 'Indian restaurant':'Indian_restaurant', 'Italian restaurant':'Italian_restaurant', 'Mexican restaurant':'Mexican_restaurant', 'Ethiopian restaurant':'Ethiopian_restaurant'}, inplace = True)
df_final

Unnamed: 0,Venue_name,Venue_lat,Venue_lng,Distance,Beach,Ethiopian_restaurant,Indian_restaurant,Italian_restaurant,Lake,Mexican_restaurant,Scenic_lookout,Waterfall
0,View of Alcatraz,37.811002,-122.410751,2527,0,0,0,0,0,0,1,0
1,Embarcadero Public Promenade,37.796622,-122.395442,1616,0,0,0,0,0,0,1,0
2,Lyon Street Steps,37.793544,-122.446559,3225,0,0,0,0,0,0,1,0
3,The Bay Lights,37.790707,-122.386166,2157,0,0,0,0,0,0,1,0
4,Montgomery & Green,37.800176,-122.404282,1430,0,0,0,0,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...
37,Dareye Hide a Way Ethiopian Restaurant,37.850813,-122.260247,14933,0,1,0,0,0,0,0,0
38,Taste of Ethiopia,37.927085,-122.319651,17390,0,1,0,0,0,0,0,0
39,Dallaq Market & Cafe,37.801771,-122.275480,11971,0,1,0,0,0,0,0,0
40,Enssaro Ethiopian Restaurant,37.864991,-122.121678,26791,0,1,0,0,0,0,0,0


Now let's do some analysis on where these points are located and what are the nearest/ farthest points of interest from JW Marriot.

In [8]:
print(f"\n--- Places of interest in numbers ---")
print(f"\tBeaches: {df_final['Beach'].value_counts()[1]} \tLakes: {df_final['Lake'].value_counts()[1]}")
print(f"\tScenic Lookouts: {df_final['Scenic_lookout'].value_counts()[1]} \tWaterfalls: {df_final['Waterfall'].value_counts()[1]}")

print(f"\n--- Cuisines of interest in numbers ---")
print(f"\tEthiopian Restaurants: {df_final['Ethiopian_restaurant'].value_counts()[1]} \tIndian Restaurants: {df_final['Indian_restaurant'].value_counts()[1]}")
print(f"\tItalian Restaurants: {df_final['Italian_restaurant'].value_counts()[1]} \tMexican Restaurants: {df_final['Mexican_restaurant'].value_counts()[1]}")


--- Places of interest in numbers ---
	Beaches: 179 	Lakes: 115
	Scenic Lookouts: 212 	Waterfalls: 18

--- Cuisines of interest in numbers ---
	Ethiopian Restaurants: 42 	Indian Restaurants: 245
	Italian Restaurants: 276 	Mexican Restaurants: 248


In [9]:
print(f"\n--- Beaches in numbers ---")
print(f"\tTotal: {df_final[df_final['Beach'] == 1]['Distance'].count()} \t Min_Dist_Kms: {df_final[df_final['Beach'] == 1]['Distance'].min()/1000} \t Max_Dist_Kms: {df_final[df_final['Beach'] == 1]['Distance'].max()/1000} \t 50%_Kms: {df_final[df_final['Beach'] == 1]['Distance'].quantile(.5)/1000}")
print(f"--- Lakes in numbers ---")
print(f"\tTotal: {df_final[df_final['Lake'] == 1]['Distance'].count()} \t Min_Dist_Kms: {df_final[df_final['Lake'] == 1]['Distance'].min()/1000} \t Max_Dist_Kms: {df_final[df_final['Lake'] == 1]['Distance'].max()/1000} \t 50%_Kms: {df_final[df_final['Lake'] == 1]['Distance'].quantile(.5)/1000}")
print(f"--- Scenic lookouts in numbers ---")
print(f"\tTotal: {df_final[df_final['Scenic_lookout'] == 1]['Distance'].count()} \t Min_Dist_Kms: {df_final[df_final['Scenic_lookout'] == 1]['Distance'].min()/1000} \t Max_Dist_Kms: {df_final[df_final['Scenic_lookout'] == 1]['Distance'].max()/1000} \t 50%_Kms: {df_final[df_final['Scenic_lookout'] == 1]['Distance'].quantile(.5)/1000}")
print(f"--- Waterfalls in numbers ---")
print(f"\tTotal: {df_final[df_final['Waterfall'] == 1]['Distance'].count()} \t Min_Dist_Kms: {df_final[df_final['Waterfall'] == 1]['Distance'].min()/1000} \t Max_Dist_Kms: {df_final[df_final['Waterfall'] == 1]['Distance'].max()/1000} \t 50%_Kms: {df_final[df_final['Waterfall'] == 1]['Distance'].quantile(.5)/1000}")
print(f"--- Ethiopian Restaurants in numbers ---")
print(f"\tTotal: {df_final[df_final['Ethiopian_restaurant'] == 1]['Distance'].count()} \t Min_Dist_Kms: {df_final[df_final['Ethiopian_restaurant'] == 1]['Distance'].min()/1000} \t Max_Dist_Kms: {df_final[df_final['Ethiopian_restaurant'] == 1]['Distance'].max()/1000} \t 50%_Kms: {df_final[df_final['Ethiopian_restaurant'] == 1]['Distance'].quantile(.5)/1000}")
print(f"--- Indian Restaurants in numbers ---")
print(f"\tTotal: {df_final[df_final['Indian_restaurant'] == 1]['Distance'].count()} \t Min_Dist_Kms: {df_final[df_final['Indian_restaurant'] == 1]['Distance'].min()/1000} \t Max_Dist_Kms: {df_final[df_final['Indian_restaurant'] == 1]['Distance'].max()/1000} \t 50%_Kms: {df_final[df_final['Indian_restaurant'] == 1]['Distance'].quantile(.5)/1000}")
print(f"--- Italian Restaurants in numbers ---")
print(f"\tTotal: {df_final[df_final['Italian_restaurant'] == 1]['Distance'].count()} \t Min_Dist_Kms: {df_final[df_final['Italian_restaurant'] == 1]['Distance'].min()/1000} \t Max_Dist_Kms: {df_final[df_final['Italian_restaurant'] == 1]['Distance'].max()/1000} \t 50%_Kms: {df_final[df_final['Italian_restaurant'] == 1]['Distance'].quantile(.5)/1000}")
print(f"--- Mexican Restaurants in numbers ---")
print(f"\tTotal: {df_final[df_final['Mexican_restaurant'] == 1]['Distance'].count()} \t Min_Dist_Kms: {df_final[df_final['Mexican_restaurant'] == 1]['Distance'].min()/1000} \t Max_Dist_Kms: {df_final[df_final['Mexican_restaurant'] == 1]['Distance'].max()/1000} \t 50%_Kms: {df_final[df_final['Mexican_restaurant'] == 1]['Distance'].quantile(.5)/1000}")


--- Beaches in numbers ---
	Total: 179 	 Min_Dist_Kms: 1.055 	 Max_Dist_Kms: 48.516 	 50%_Kms: 13.667
--- Lakes in numbers ---
	Total: 115 	 Min_Dist_Kms: 3.596 	 Max_Dist_Kms: 49.675 	 50%_Kms: 15.969
--- Scenic lookouts in numbers ---
	Total: 212 	 Min_Dist_Kms: 0.586 	 Max_Dist_Kms: 49.386 	 50%_Kms: 8.0525
--- Waterfalls in numbers ---
	Total: 18 	 Min_Dist_Kms: 0.8 	 Max_Dist_Kms: 42.405 	 50%_Kms: 17.1065
--- Ethiopian Restaurants in numbers ---
	Total: 42 	 Min_Dist_Kms: 0.511 	 Max_Dist_Kms: 46.434 	 50%_Kms: 13.374
--- Indian Restaurants in numbers ---
	Total: 245 	 Min_Dist_Kms: 0.283 	 Max_Dist_Kms: 49.72 	 50%_Kms: 17.905
--- Italian Restaurants in numbers ---
	Total: 276 	 Min_Dist_Kms: 0.131 	 Max_Dist_Kms: 49.183 	 50%_Kms: 13.8165
--- Mexican Restaurants in numbers ---
	Total: 248 	 Min_Dist_Kms: 0.115 	 Max_Dist_Kms: 49.126 	 50%_Kms: 9.644


As we can see from the data above, JW Marriot has plenty of attractions around it and **most of the venue types can be reached within a radius of 3.6 Kms**. Also, there are a good number of options for various cuisines of interest, with Italian being the most available cuisine while Ethiopian having fewer options. 

In [10]:
# create a dictionary to map numbers to venue types
venue_types = []
venue_types.extend(places_of_interest)
venue_types.extend(cuisines_of_interest)

venue_dict = {}
for i in range(len(venue_types)):
    venue_dict[venue_types[i]] = i
    
# SF
latitude = 37.773972
longitude = -122.431297
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(len(venue_types))
ys = [i + x + (i*x)**2 for i in range(len(venue_types))]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df['venue.location.lat'], df['venue.location.lng'], df['venue.name'], df['venue_type']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[venue_dict[cluster]],
        fill=True,
        fill_color=rainbow[venue_dict[cluster]],
        fill_opacity=0.7
    ).add_to(map_clusters)
    
# add market for starting location (JW Marriot)
folium.CircleMarker(
        [37.7883, -122.4105],
        radius=5,
        popup='JW Marriot, SF',
        color='black',
        fill=True,
        fill_color='black',
        fill_opacity=0.7).add_to(map_clusters)

# legend
legent_start_html = '<br>&nbsp; <i class="fa fa-circle" style="color:black"></i> &nbsp; JW Marriot<br>'
legend_venue_html = ''
for venue in venue_dict:
    legend_venue_html = legend_venue_html + '&nbsp; <i class="fa fa-circle" style="color:' +rainbow[venue_dict[venue]]+ '"></i> &nbsp; ' +venue+ '<br>'

legend_html = '<div style="position: fixed; bottom: 50px; left: 50px; width: 150px; border:2px solid grey; z-index:9999; font-size:11px;">' +legend_venue_html+ legent_start_html+ '</div>'
map_clusters.get_root().html.add_child(folium.Element(legend_html))

map_clusters

As we can see, the map just confirms the data points we discussed above. **JW Marriot has numerous restaurants of different cuisines immediately surrounding it**, and as we start traveling, we have plenty of natural attractions like beaches, lakes and scenic lookouts. Comparatively, **there are fewer waterfalls in the surroundings, and one needs to travel to neighboring cities to enjoy them**. Now the immediate question is whether we can group these points based on their distances from the starting location and see if each group consists of a good mix of these points of interests for the user to choose. Let's look at clustering.

### K-Means Clustering <a name="k-means"></a>

In [11]:
import folium
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

In [12]:
kclusters = 5

df_final_clustering = df_final.drop('Venue_name', axis=1)

# run k-means clustering
kmeans = KMeans(init="k-means++", n_clusters=kclusters, n_init=12, random_state=1).fit(df_final_clustering)
cluster_labels = kmeans.labels_

# add clustering labels
df_final.insert(0, 'Cluster_label', cluster_labels)
df_final

Unnamed: 0,Cluster_label,Venue_name,Venue_lat,Venue_lng,Distance,Beach,Ethiopian_restaurant,Indian_restaurant,Italian_restaurant,Lake,Mexican_restaurant,Scenic_lookout,Waterfall
0,1,View of Alcatraz,37.811002,-122.410751,2527,0,0,0,0,0,0,1,0
1,1,Embarcadero Public Promenade,37.796622,-122.395442,1616,0,0,0,0,0,0,1,0
2,1,Lyon Street Steps,37.793544,-122.446559,3225,0,0,0,0,0,0,1,0
3,1,The Bay Lights,37.790707,-122.386166,2157,0,0,0,0,0,0,1,0
4,1,Montgomery & Green,37.800176,-122.404282,1430,0,0,0,0,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
37,2,Dareye Hide a Way Ethiopian Restaurant,37.850813,-122.260247,14933,0,1,0,0,0,0,0,0
38,2,Taste of Ethiopia,37.927085,-122.319651,17390,0,1,0,0,0,0,0,0
39,2,Dallaq Market & Cafe,37.801771,-122.275480,11971,0,1,0,0,0,0,0,0
40,3,Enssaro Ethiopian Restaurant,37.864991,-122.121678,26791,0,1,0,0,0,0,0,0


In [13]:
# SF
latitude = 37.773972
longitude = -122.431297
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=8)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_final['Venue_lat'], df_final['Venue_lng'], df_final['Venue_name'], df_final['Cluster_label']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=0.7).add_to(map_clusters)

# add market for starting location (JW Marriot)
folium.CircleMarker(
        [37.7883, -122.4105],
        radius=5,
        popup='JW Marriot, SF',
        color='black',
        fill=True,
        fill_color='black',
        fill_opacity=0.7).add_to(map_clusters)

# legend
legent_start_html = '<br>&nbsp; <i class="fa fa-circle" style="color:black"></i> &nbsp; JW Marriot<br>'
legend_venue_html = ''
for i in range(5):
    legend_venue_html = legend_venue_html + '&nbsp; <i class="fa fa-circle" style="color:' +rainbow[i]+ '"></i> &nbsp; Cluster ' +str(i)+ '<br>'

legend_html = '<div style="position: fixed; bottom: 50px; left: 50px; width: 150px; border:2px solid grey; z-index:9999; font-size:11px;">' +legend_venue_html+ legent_start_html+ '</div>'
map_clusters.get_root().html.add_child(folium.Element(legend_html))

map_clusters

As we can see, the map exhibits a nice **pattern of concurrent circles, keeping the starting location (black marker) at the centre**. We have tried to create five clusters primarily based on distances, now let's see if we are able to find a good mix of points of interests in each of these clusters.

## Results and Discussion <a name="results-discussion"></a>

#### Cluster 0 Analysis

In [19]:
df_final.sort_values("Distance", axis = 0, ascending = True, inplace = True) 
cluster = df_final[df_final['Cluster_label'] == 0]

print("\n--- Cluster 0: PLACES OF INTEREST IN NUMBERS ---")
print(f"\tDistance range: {cluster['Distance'].min()/1000}Kms - {cluster['Distance'].max()/1000}Kms")
print(f"\tBeaches = {cluster['Beach'].value_counts()[1]}, Lakes = {cluster['Lake'].value_counts()[1]}, Scenic Lookouts = {cluster['Scenic_lookout'].value_counts()[1]}, Waterfalls = {cluster['Waterfall'].value_counts()[1]}")
print(f"\tEthiopian_restaurants = 0, Indian_restaurants = {cluster['Indian_restaurant'].value_counts()[1]}, Italian_restaurants = {cluster['Italian_restaurant'].value_counts()[1]}, Mexican_restaurants = {cluster['Mexican_restaurant'].value_counts()[1]}")


--- Cluster 0: PLACES OF INTEREST IN NUMBERS ---
	Distance range: 30.083Kms - 39.998Kms
	Beaches = 27, Lakes = 14, Scenic Lookouts = 14, Waterfalls = 4
	Ethiopian_restaurants = 0, Indian_restaurants = 29, Italian_restaurants = 45, Mexican_restaurants = 23


#### Cluster 1 Analysis

In [20]:
cluster = df_final[df_final['Cluster_label'] == 1]

print("\n--- Cluster 1: PLACES OF INTEREST IN NUMBERS ---")
print(f"\tDistance range: {cluster['Distance'].min()/1000}Kms - {cluster['Distance'].max()/1000}Kms")
print(f"\tBeaches = {cluster['Beach'].value_counts()[1]}, Lakes = {cluster['Lake'].value_counts()[1]}, Scenic Lookouts = {cluster['Scenic_lookout'].value_counts()[1]}, Waterfalls = {cluster['Waterfall'].value_counts()[1]}")
print(f"\tEthiopian_restaurants = {cluster['Ethiopian_restaurant'].value_counts()[1]}, Indian_restaurants = {cluster['Indian_restaurant'].value_counts()[1]}, Italian_restaurants = {cluster['Italian_restaurant'].value_counts()[1]}, Mexican_restaurants = {cluster['Mexican_restaurant'].value_counts()[1]}")


--- Cluster 1: PLACES OF INTEREST IN NUMBERS ---
	Distance range: 0.115Kms - 8.556Kms
	Beaches = 47, Lakes = 30, Scenic Lookouts = 108, Waterfalls = 7
	Ethiopian_restaurants = 13, Indian_restaurants = 78, Italian_restaurants = 123, Mexican_restaurants = 122


#### Cluster 2 Analysis

In [21]:
cluster = df_final[df_final['Cluster_label'] == 2]

print("\n--- Cluster 2: PLACES OF INTEREST IN NUMBERS ---")
print(f"\tDistance range: {cluster['Distance'].min()/1000}Kms - {cluster['Distance'].max()/1000}Kms")
print(f"\tBeaches = {cluster['Beach'].value_counts()[1]}, Lakes = {cluster['Lake'].value_counts()[1]}, Scenic Lookouts = {cluster['Scenic_lookout'].value_counts()[1]}, Waterfalls = {cluster['Waterfall'].value_counts()[1]}")
print(f"\tEthiopian_restaurants = {cluster['Ethiopian_restaurant'].value_counts()[1]}, Indian_restaurants = {cluster['Indian_restaurant'].value_counts()[1]}, Italian_restaurants = {cluster['Italian_restaurant'].value_counts()[1]}, Mexican_restaurants = {cluster['Mexican_restaurant'].value_counts()[1]}")


--- Cluster 2: PLACES OF INTEREST IN NUMBERS ---
	Distance range: 8.575Kms - 18.984Kms
	Beaches = 60, Lakes = 33, Scenic Lookouts = 46, Waterfalls = 3
	Ethiopian_restaurants = 27, Indian_restaurants = 47, Italian_restaurants = 52, Mexican_restaurants = 61


#### Cluster 3 Analysis

In [22]:
cluster = df_final[df_final['Cluster_label'] == 3]

print("\n--- Cluster 3: PLACES OF INTEREST IN NUMBERS ---")
print(f"\tDistance range: {cluster['Distance'].min()/1000}Kms - {cluster['Distance'].max()/1000}Kms")
print(f"\tBeaches = {cluster['Beach'].value_counts()[1]}, Lakes = {cluster['Lake'].value_counts()[1]}, Scenic Lookouts = {cluster['Scenic_lookout'].value_counts()[1]}, Waterfalls = {cluster['Waterfall'].value_counts()[1]}")
print(f"\tEthiopian_restaurants = {cluster['Ethiopian_restaurant'].value_counts()[1]}, Indian_restaurants = {cluster['Indian_restaurant'].value_counts()[1]}, Italian_restaurants = {cluster['Italian_restaurant'].value_counts()[1]}, Mexican_restaurants = {cluster['Mexican_restaurant'].value_counts()[1]}")


--- Cluster 3: PLACES OF INTEREST IN NUMBERS ---
	Distance range: 19.091Kms - 29.417Kms
	Beaches = 37, Lakes = 18, Scenic Lookouts = 34, Waterfalls = 2
	Ethiopian_restaurants = 1, Indian_restaurants = 27, Italian_restaurants = 26, Mexican_restaurants = 18


#### Cluster 4 Analysis

In [23]:
cluster = df_final[df_final['Cluster_label'] == 4]

print("\n--- Cluster 4: PLACES OF INTEREST IN NUMBERS ---")
print(f"\tDistance range: {cluster['Distance'].min()/1000}Kms - {cluster['Distance'].max()/1000}Kms")
print(f"\tBeaches = {cluster['Beach'].value_counts()[1]}, Lakes = {cluster['Lake'].value_counts()[1]}, Scenic Lookouts = {cluster['Scenic_lookout'].value_counts()[1]}, Waterfalls = {cluster['Waterfall'].value_counts()[1]}")
print(f"\tEthiopian_restaurants = {cluster['Ethiopian_restaurant'].value_counts()[1]}, Indian_restaurants = {cluster['Indian_restaurant'].value_counts()[1]}, Italian_restaurants = {cluster['Italian_restaurant'].value_counts()[1]}, Mexican_restaurants = {cluster['Mexican_restaurant'].value_counts()[1]}")


--- Cluster 4: PLACES OF INTEREST IN NUMBERS ---
	Distance range: 40.456Kms - 49.72Kms
	Beaches = 8, Lakes = 20, Scenic Lookouts = 10, Waterfalls = 2
	Ethiopian_restaurants = 1, Indian_restaurants = 64, Italian_restaurants = 30, Mexican_restaurants = 24


#### Summary

| Cluster           | Distance(Kms) | Scenic | Waterfalls | Lakes | Beaches | Indian Food | Italian Food | Mexican Food | Ethiopian Food |
| :---------------: |:------------: | :-----:| :---------:| :---: | :-----: | :---------: | :----------: | :----------: | :------------: |
| Cluster 1         | 0.1 - 8.6     |**108** |**7**       | 30    | 47      |**78**       |**123**       |**122**       | 13             |
| Cluster 2         | 8.6 - 19.0    | 46     | 3          |**33** |**60**   | 47          | 52           | 61           |**27**          |
| Cluster 3         | 19.1 - 29.4   | 34     | 2          | 18    | 37      | 27          | 26           | 18           | 1              |
| Cluster 0         | 30.1 - 40.0   | 14     | 4          | 14    | 27      | 29          | 45           | 23           | 0              |
| Cluster 4         | 40.5 - 49.7   | 10     | 2          | 20    | 8       | 64          | 30           | 24           | 1              |

As we can see, each group contains a relatively **good mix of natural attractions and restaurants of interest**. Through our Clustering model, we have tried to spread most of the attractions and restaurants evenly between the clusters. One exception being Ethiopian restaurants, as they are spread between Cluster 1 and Cluster 2, due to concentration of them being located between 0.1 - 19.0 Kms. From the summary above, we are able to present a clear picture of where these places/ restaurants of interest are located, making it an easy decision for the user to pick a cluster and go on a day-tour. **Cluster 1 is clearly my choice for a day trip! What's yours?**

## Conclusion <a name="conclusion-future"></a>

In this exercise we set out to recommend places of interest for a day-tour based on certain preferences. Data Science is all about telling a compelling story with data using various tools (visualization, machine learning etc.), allowing stakeholders to make a decision. **Effectiveness of the Data Science methodology can be judged by the clarity in the story telling and the ease in which a decision can be made. Here using exploratory analysis and clustering, we are able to present data with utmost clarity, leaving the user to make an easy decision. As can be seen from the summary table, Cluster 1 is the most straightforward choice from JW Marriot for the given preferences**, due to following reasons:
1. It has an excellent mix of places and restaurants of interest
2. It is the nearest cluster from JW Marriot, and the user has to travel a radius of just 8.5 Kms to enjoy these attractions 

Cluster 1 is followed by Cluster 2 and can be preferred by users who like beaches more, and it also has a greater number of Ethiopian restaurants to choose from. Cluster 1 and Cluster 2 are the clear top choices.

#### Future Possibilities <a name="future-possibilities"></a>

The Day Tour Recommendation Service can be easily extended to **cover more places of interest**, and when it comes to restaurants, **price-based filtering** becomes a possibility too. It can be **made generic to support any city in the world** and has the potential to become a well-rounded service. Possibilities about **integrating other location-based APIs** can also be explored to offer wide array of preferences.

When it comes to usability, the service can be **integrated with a Maps platform** (Google Maps etc.) and can help user navigate to each of the attractions inside a cluster of choice. The service can be further expanded to include **social collaboration features** for increased user engagement. **Possibilities exist in offering users a truly end-to-end product that they will love to use and recommend.**