# Description of the problem and a discussion of the background.

In this project, we will imagine we are traveling to NYC for a week and want to do a Vegetarian food tour around the city. We will start by an address point of our hotel and we will find all the available vegetarian restaurants in a certain radious. 
The goal is to develop an algorithm to calculate distances between each of the found restaurants and then optimize te fastest rout to visit them all, by minimizing the traveling distance.

The method used to obtain geolocation and restaurants information will bethrough the Foursquare API.

## Description of the data and how it will be used to solve the problem.

The first step will be identify all the restaurants in a 10 km radious area. Then, we will create a dataframe with following parameters: id, name, categorie ,latitude coordinate, longitude cordinate, distance to the hotel address. Once we have created that dataframe, we will proceed to filter the restaurants to select only the ones that are specialized in Vegan food and that have ratings above 7.0. Then, we will sort the dataframe in ascending order to find the closest restaurant from our initial location.

In the second step, we will create a function to calculate the [Haversine formula ](https://en.wikipedia.org/wiki/Haversine_formula) that will calculate distance between two coordinates. With this function, we will create a numpy array (Matrix) with all the distances between each of the restaurants. Having this Matrix, we will use a self developed version of the optimization algorithm to solve the [Traveling salesman problem](https://en.wikipedia.org/wiki/Travelling_salesman_problem) applied to this project.

As a result, we will find the shortest route to enjoy the best Vegan restaurants around our initial location in NYC.

# Solution

## Import all libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Libraries imported.


## Access geolocation data using Foursquare

### Input Foursquare credentials to access geolocation data

In [35]:
CLIENT_ID = 'DARJEFWAEDNBQS4DEPNE4ZX1H4TPYAQXGK4QLZFEBF5RWLIS' # your Foursquare ID
CLIENT_SECRET = 'WTAS5R0ZLENDIG4A25B2TE5ZQSCSB31SDCTX4YX0COFRDXRO' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: DARJEFWAEDNBQS4DEPNE4ZX1H4TPYAQXGK4QLZFEBF5RWLIS
CLIENT_SECRET:WTAS5R0ZLENDIG4A25B2TE5ZQSCSB31SDCTX4YX0COFRDXRO


### Set an initial position by inputing our Hotel Address 

In [36]:
address = '102 North End Ave, New York, NY'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

40.7151482 -74.0156573


### Search for Vegetarian / Vegan restaurants using Foursquare API using a 10km radious the Hotel

In [37]:
search_query = 'Vegetarian / Vegan Restaurant'
radius = 10000

In [38]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=DARJEFWAEDNBQS4DEPNE4ZX1H4TPYAQXGK4QLZFEBF5RWLIS&client_secret=WTAS5R0ZLENDIG4A25B2TE5ZQSCSB31SDCTX4YX0COFRDXRO&ll=40.7151482,-74.0156573&v=20180604&query=Vegetarian / Vegan Restaurant&radius=10000&limit=30'

#### We Display the results and load them into a dataframe

In [39]:
results = requests.get(url).json()

In [40]:
venues = results['response']['venues']

dataframe = json_normalize(venues)
dataframe.head()

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.crossStreet,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,delivery.id,delivery.url,delivery.provider.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.icon.name,venuePage.id
0,4df3ef8722718759f821ba1a,Wildflower Vegan Pop-up Restaurant,"[{'id': '4bf58dd8d48988d1d3941735', 'name': 'V...",v-1591353577,False,171 E Broadway,at Rutgers St.,40.714021,-73.98983,"[{'label': 'display', 'lat': 40.71402088381144...",2182,10002,US,New York,NY,United States,"[171 E Broadway (at Rutgers St.), New York, NY...",,,,,,,
1,4d600366b6b9a1cd41227651,the vegetarian restaurant,"[{'id': '4bf58dd8d48988d14e941735', 'name': 'A...",v-1591353577,False,2101 Church Ave,Bedford Ave,40.648551,-73.975728,"[{'label': 'display', 'lat': 40.64855095475247...",8143,11226,US,Brooklyn,NY,United States,"[2101 Church Ave (Bedford Ave), Brooklyn, NY 1...",,,,,,,
2,459b830af964a5208b401fe3,Buddha Bodai 佛菩提素菜,"[{'id': '4bf58dd8d48988d1d3941735', 'name': 'V...",v-1591353577,False,5 Mott St,btwn Chatham Sq & Mosco St,40.713798,-73.998677,"[{'label': 'display', 'lat': 40.71379837914967...",1440,10013,US,New York,NY,United States,"[5 Mott St (btwn Chatham Sq & Mosco St), New Y...",64628.0,https://www.seamless.com/menu/the-original-bud...,seamless,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_seamless_20180129.png,151136124.0
3,5d8500e8903bdb000734baf4,Jisu Vegetarian,"[{'id': '4bf58dd8d48988d1d3941735', 'name': 'V...",v-1591353577,False,125 Canal St,Bowery,40.71605,-73.995348,"[{'label': 'display', 'lat': 40.71605, 'lng': ...",1716,10002,US,New York,NY,United States,"[125 Canal St (Bowery), New York, NY 10002, Un...",,,,,,,
4,4b2fadfdf964a520e6ed24e3,B & H Dairy,"[{'id': '4bf58dd8d48988d1d3941735', 'name': 'V...",v-1591353577,False,127 2nd Ave,btwn E 7th St & St. Marks Pl,40.728358,-73.987905,"[{'label': 'display', 'lat': 40.72835753887985...",2764,10003,US,New York,NY,United States,"[127 2nd Ave (btwn E 7th St & St. Marks Pl), N...",,,,,,,182795226.0


#### We filter the results and create a new dataframe containing the following information: id, name, categorie ,latitude coordinate, longitude cordinate, distance to the hotel address

In [41]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['id','name', 'categories', 'location.lat','location.lng','location.distance']
dataframe_f = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_f['categories'] = dataframe_f.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_f.columns = [column.split('.')[-1] for column in dataframe_f.columns]

dataframe_f

Unnamed: 0,id,name,categories,lat,lng,distance
0,4df3ef8722718759f821ba1a,Wildflower Vegan Pop-up Restaurant,Vegetarian / Vegan Restaurant,40.714021,-73.98983,2182
1,4d600366b6b9a1cd41227651,the vegetarian restaurant,American Restaurant,40.648551,-73.975728,8143
2,459b830af964a5208b401fe3,Buddha Bodai 佛菩提素菜,Vegetarian / Vegan Restaurant,40.713798,-73.998677,1440
3,5d8500e8903bdb000734baf4,Jisu Vegetarian,Vegetarian / Vegan Restaurant,40.71605,-73.995348,1716
4,4b2fadfdf964a520e6ed24e3,B & H Dairy,Vegetarian / Vegan Restaurant,40.728358,-73.987905,2764
5,5e71cb23abf70b00087b749e,Vatan Indian Restaurant & Caterers in Jersey C...,Restaurant,40.736072,-74.067818,4978
6,586779ec07ac0762ed926424,Ginger Root Vegan Restaurant,Vegetarian / Vegan Restaurant,40.762475,-73.959583,7079
7,4a75edcff964a520b3e11fe3,Wild Ginger Vegetarian Kitchen,Asian Restaurant,40.720431,-73.99664,1708
8,4ba154a4f964a5204cad37e3,Ital Shak Vegetarian Restaurant,Vegetarian / Vegan Restaurant,40.672036,-73.950582,7293
9,51d1f204498ea8d2289826e5,Pongal Kosher South Indian Vegetarian Restaurant,Indian Restaurant,40.742376,-73.982739,4110


## Create  a Function to get the rating of restaurants and build the working dataframe

In [42]:
def get_rating(a):
    value = ['test']
    venue_id = dataframe_f.iloc[a,0] 
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
    result = requests.get(url).json()
    try:
        print(result['response']['venue']['rating'])
        value = result['response']['venue']['rating']
    except:
        print('This venue has not been rated yet.')
        value = 0
    return value

### We use the function to obtain the ratings of each restaurant from the selected dataframe

In [43]:
lst = [0] * len(dataframe_f)
for i in range(0, len(dataframe_f)):lst[i]=get_rating(i)
aux = pd.DataFrame(lst, columns = ['Rating']) 


7.3
This venue has not been rated yet.
7.7
This venue has not been rated yet.
8.6
This venue has not been rated yet.
7.9
8.4
7.3
5.6
5.8
6.9
8.5
8.2
8.4


#### We include the restaurant rating into the dataframe containing the information of the restaurants

In [44]:
result = pd.concat([dataframe_f.reset_index(drop=True),aux.reset_index(drop=True)], axis=1)

#### We exclude restaurants without ratings

In [45]:
result = result[(result.Rating!=0)]
result

Unnamed: 0,id,name,categories,lat,lng,distance,Rating
0,4df3ef8722718759f821ba1a,Wildflower Vegan Pop-up Restaurant,Vegetarian / Vegan Restaurant,40.714021,-73.98983,2182,7.3
2,459b830af964a5208b401fe3,Buddha Bodai 佛菩提素菜,Vegetarian / Vegan Restaurant,40.713798,-73.998677,1440,7.7
4,4b2fadfdf964a520e6ed24e3,B & H Dairy,Vegetarian / Vegan Restaurant,40.728358,-73.987905,2764,8.6
6,586779ec07ac0762ed926424,Ginger Root Vegan Restaurant,Vegetarian / Vegan Restaurant,40.762475,-73.959583,7079,7.9
7,4a75edcff964a520b3e11fe3,Wild Ginger Vegetarian Kitchen,Asian Restaurant,40.720431,-73.99664,1708,8.4
8,4ba154a4f964a5204cad37e3,Ital Shak Vegetarian Restaurant,Vegetarian / Vegan Restaurant,40.672036,-73.950582,7293,7.3
9,51d1f204498ea8d2289826e5,Pongal Kosher South Indian Vegetarian Restaurant,Indian Restaurant,40.742376,-73.982739,4110,5.6
10,5234b63a498e0ab4ca8a2422,Moly's Asian Restaurant,Asian Restaurant,40.693543,-73.929584,7651,5.8
11,4d6821f458ee8eec4ac46f27,Strictly Vegetarian,Vegetarian / Vegan Restaurant,40.650373,-73.956496,8771,6.9
12,4b4f6600f964a520cf0427e3,Red Bamboo,Vegetarian / Vegan Restaurant,40.731319,-74.000359,2214,8.5


#### We filter the dataframe for only restaurants specialized as Vegetarian / Vegan Restaurant

In [46]:
result2 = result[(result.categories=="Vegetarian / Vegan Restaurant")]
result2

Unnamed: 0,id,name,categories,lat,lng,distance,Rating
0,4df3ef8722718759f821ba1a,Wildflower Vegan Pop-up Restaurant,Vegetarian / Vegan Restaurant,40.714021,-73.98983,2182,7.3
2,459b830af964a5208b401fe3,Buddha Bodai 佛菩提素菜,Vegetarian / Vegan Restaurant,40.713798,-73.998677,1440,7.7
4,4b2fadfdf964a520e6ed24e3,B & H Dairy,Vegetarian / Vegan Restaurant,40.728358,-73.987905,2764,8.6
6,586779ec07ac0762ed926424,Ginger Root Vegan Restaurant,Vegetarian / Vegan Restaurant,40.762475,-73.959583,7079,7.9
8,4ba154a4f964a5204cad37e3,Ital Shak Vegetarian Restaurant,Vegetarian / Vegan Restaurant,40.672036,-73.950582,7293,7.3
11,4d6821f458ee8eec4ac46f27,Strictly Vegetarian,Vegetarian / Vegan Restaurant,40.650373,-73.956496,8771,6.9
12,4b4f6600f964a520cf0427e3,Red Bamboo,Vegetarian / Vegan Restaurant,40.731319,-74.000359,2214,8.5
13,4a272aeef964a52063911fe3,Blossom Restaurant,Vegetarian / Vegan Restaurant,40.745394,-74.002106,3555,8.2
14,4b02f734f964a520814b22e3,Franchia,Vegetarian / Vegan Restaurant,40.747549,-73.981124,4636,8.4


### We filter the dataframe to obtain only restaurants with ratings better than 7.0 points

In [47]:
df=result2[result2['Rating'] > 7.0]
df = df.reset_index(drop=True)
df

Unnamed: 0,id,name,categories,lat,lng,distance,Rating
0,4df3ef8722718759f821ba1a,Wildflower Vegan Pop-up Restaurant,Vegetarian / Vegan Restaurant,40.714021,-73.98983,2182,7.3
1,459b830af964a5208b401fe3,Buddha Bodai 佛菩提素菜,Vegetarian / Vegan Restaurant,40.713798,-73.998677,1440,7.7
2,4b2fadfdf964a520e6ed24e3,B & H Dairy,Vegetarian / Vegan Restaurant,40.728358,-73.987905,2764,8.6
3,586779ec07ac0762ed926424,Ginger Root Vegan Restaurant,Vegetarian / Vegan Restaurant,40.762475,-73.959583,7079,7.9
4,4ba154a4f964a5204cad37e3,Ital Shak Vegetarian Restaurant,Vegetarian / Vegan Restaurant,40.672036,-73.950582,7293,7.3
5,4b4f6600f964a520cf0427e3,Red Bamboo,Vegetarian / Vegan Restaurant,40.731319,-74.000359,2214,8.5
6,4a272aeef964a52063911fe3,Blossom Restaurant,Vegetarian / Vegan Restaurant,40.745394,-74.002106,3555,8.2
7,4b02f734f964a520814b22e3,Franchia,Vegetarian / Vegan Restaurant,40.747549,-73.981124,4636,8.4


# Use the obtained data to create the fastest route to visit all restaurants

### We prepare the dataframe sorting it starting by the closest restaurant from the initial point

To do this, we use the column distance, that gives us the distance from each restaurant from our starting point (Hotel address).

In [48]:
df2=df.sort_values(by=['distance'], ascending=True)
df2 = df2.reset_index(drop=True)
df2

Unnamed: 0,id,name,categories,lat,lng,distance,Rating
0,459b830af964a5208b401fe3,Buddha Bodai 佛菩提素菜,Vegetarian / Vegan Restaurant,40.713798,-73.998677,1440,7.7
1,4df3ef8722718759f821ba1a,Wildflower Vegan Pop-up Restaurant,Vegetarian / Vegan Restaurant,40.714021,-73.98983,2182,7.3
2,4b4f6600f964a520cf0427e3,Red Bamboo,Vegetarian / Vegan Restaurant,40.731319,-74.000359,2214,8.5
3,4b2fadfdf964a520e6ed24e3,B & H Dairy,Vegetarian / Vegan Restaurant,40.728358,-73.987905,2764,8.6
4,4a272aeef964a52063911fe3,Blossom Restaurant,Vegetarian / Vegan Restaurant,40.745394,-74.002106,3555,8.2
5,4b02f734f964a520814b22e3,Franchia,Vegetarian / Vegan Restaurant,40.747549,-73.981124,4636,8.4
6,586779ec07ac0762ed926424,Ginger Root Vegan Restaurant,Vegetarian / Vegan Restaurant,40.762475,-73.959583,7079,7.9
7,4ba154a4f964a5204cad37e3,Ital Shak Vegetarian Restaurant,Vegetarian / Vegan Restaurant,40.672036,-73.950582,7293,7.3


### We define a function to calculate distance between two georeferenced points using Harvesine formula

In [50]:
def calculate_distance(a,b):#Harvesine Formula
    R = 6371e3 #earth diameter
    lat_a= df.iloc[a,3]
    lat_b= df.iloc[b,3] 
    lon_a= df.iloc[a,4]
    lon_b= df.iloc[b,4]
    c1 = lat_a*np.pi/180
    c2 = lat_b*np.pi/180
    K1 = (lat_b-lat_a)*np.pi/180
    K2 = (lon_b-lon_a)*np.pi/180;

    d=2*R*np.arcsin(np.sqrt(np.sin(K1/2)*np.sin(K1/2)+np.cos(c1)*np.cos(c2)*np.sin(K2/2)*np.sin(K2/2)))
    return d



### We create a matrix using numpy with the distance of all possible connection between restaurants

In [51]:
matrix=np.zeros([len(df), len(df)], dtype=int)
for i in range(0, len(df)):
    for j in range(0,len(df)):
        matrix[i,j]=calculate_distance(i,j)
matrix

array([[    0,   746,  1602,  5960,  5722,  2118,  3638,  3799],
       [  746,     0,  1856,  6336,  6164,  1953,  3525,  4033],
       [ 1602,  1856,     0,  4481,  7008,  1099,  2240,  2209],
       [ 5960,  6336,  4481,     0, 10084,  4878,  4054,  2459],
       [ 5722,  6164,  7008, 10084,     0,  7814,  9241,  8782],
       [ 2118,  1953,  1099,  4878,  7814,     0,  1572,  2425],
       [ 3638,  3525,  2240,  4054,  9241,  1572,     0,  1783],
       [ 3799,  4033,  2209,  2459,  8782,  2425,  1783,     0]])

## We create a function to solve the Traveling Salesman Problem

In [60]:
def get_route(matrix):
    index=0
    I=[0]
    for i in range(0, len(df)-1):
        minimum=min([x for x in matrix[index,:] if x !=0])
        index = int(np.where(matrix[index,:]==minimum)[0])
        print(index)
        for j in range(0, len(I)):
            matrix[index,I[j]]=0
        I.append(index)
    return I

### We use the previous function to find the optimal rout to visit all restaurants

In [61]:
get_route(matrix)

1
2
5
6
7
3
4


[0, 1, 2, 5, 6, 7, 3, 4]

# Result

As a result, we obtained an array with the order of each of the places to visit in order to do it in the most optimal way. Now, to finish we create a dataframe with the list of restaurants sorted in visiting order using the results from the optimization algorithm.

In [74]:
List=[None]*len(I)
Ratings=[None]*len(I)
Order= ['First','Second','Third','Fourth','','','','','','','',]
for i in range(0, len(I)):
    List[i]=df2.iloc[I[i],1]
    Ratings[i]=df2.iloc[I[i],6]


In [75]:
Final = {'Restaurant': List,
        'Rating': Ratings
        }

df_Final = pd.DataFrame(Final, columns = ['Restaurant', 'Rating'])

The final list of restaurants in optimal order is: 

In [76]:
df_Final

Unnamed: 0,Restaurant,Rating
0,Buddha Bodai 佛菩提素菜,7.7
1,Wildflower Vegan Pop-up Restaurant,7.3
2,Red Bamboo,8.5
3,Franchia,8.4
4,Ginger Root Vegan Restaurant,7.9
5,Ital Shak Vegetarian Restaurant,7.3
6,B & H Dairy,8.6
7,Blossom Restaurant,8.2
