## BUSINESS PROBLEM

### A tourist from India visits New York city and want to stay in a hotel which is close to Indian restaurants. He wants to make sure that he does not spend too much time in finding food he loves and spend more time in visiting other places in the city. The aim of this project is to find good Indian restaurants and hotel to make a beautiful stay-in experience for tourists.

## DATA DESCRIPTION

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    geopy-1.22.0               |     pyh9f0ad1d_0          63 KB  conda-forge
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0           conda-forge
    geopy:          

In [3]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


### Exploring the data

In [4]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [5]:
newyork_data

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

### Finding the neighborhoods and Borough of the City

In [6]:
neighborhoods_data = newyork_data['features']
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

In [7]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [8]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


In [9]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [10]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [11]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


In [12]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [13]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

### Focussing on Centre of New York City which is Manhattan

In [14]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


In [15]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


In [16]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

### Using Foursquare API to find number of Hotels

In [17]:

CLIENT_ID = 'BYBNSPMWFLC1PJCFT5RGDXE45DBDIX3CRZPTH3CNTJEUDC05' # your Foursquare ID
CLIENT_SECRET = 'RXHCVJNVYBEA4OSMYIWMEONCRQXK1NLGADAI0PNUTYXTCP5Z' # your Foursquare Secret
VERSION = '20180604'
search_query = "Hotel"
radius = 10000
LIMIT = 50

# city details
address = 'Manhattan'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print("Manhattan:\t",latitude, longitude)


Manhattan:	 40.7896239 -73.9598939


In [18]:
url= 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
results = requests.get(url).json()
venues = results['response']['venues']
# making a dataframe
dfv = json_normalize(venues)
dfv

Unnamed: 0,categories,delivery.id,delivery.provider.icon.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.name,delivery.url,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,referralId,venuePage.id
0,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",,,,,,,False,4ad78cbff964a520140c21e3,1295 Madison Ave,US,New York,United States,92nd St,648,"[1295 Madison Ave (92nd St), New York, NY 1012...","[{'label': 'display', 'lat': 40.7847375, 'lng'...",40.784737,-73.955713,,10128.0,NY,Hotel Wales,v-1589859954,
1,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",,,,,,,False,4b9c6ac8f964a520276736e3,215 West 94th Street,US,New York,United States,at Broadway,1106,"[215 West 94th Street (at Broadway), New York,...","[{'label': 'display', 'lat': 40.7932977, 'lng'...",40.793298,-73.972092,,10025.0,NY,Days Inn Hotel New York City-Broadway,v-1589859954,
2,"[{'id': '4bf58dd8d48988d1ee931735', 'name': 'H...",,,,,,,False,4bf2fc262d629521cbe55f58,230 W 101st St,US,New York,United States,at Broadway,1247,"[230 W 101st St (at Broadway), New York, NY 10...","[{'label': 'display', 'lat': 40.79793166906806...",40.797932,-73.969834,,10025.0,NY,Broadway Hotel,v-1589859954,
3,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",,,,,,,False,4b1c3322f964a520210424e3,209 W 87th St,US,New York,United States,Broadway,1280,"[209 W 87th St (Broadway), New York, NY 10024,...","[{'label': 'display', 'lat': 40.7889054, 'lng'...",40.788905,-73.975054,,10024.0,NY,Belnord Hotel,v-1589859954,
4,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",,,,,,,False,4bc3a05adce4eee125af719d,244 W 99th St,US,New York,United States,,1194,"[244 W 99th St, New York, NY 10025, United Sta...","[{'label': 'display', 'lat': 40.79669018312864...",40.79669,-73.970555,,10025.0,NY,Hotel 99 Llc,v-1589859954,
5,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",,,,,,,False,4ad2b3d4f964a52083e220e3,45 W 81st St,US,New York,United States,at Columbus Ave,1399,"[45 W 81st St (at Columbus Ave), New York, NY ...","[{'label': 'display', 'lat': 40.78294941406415...",40.782949,-73.973964,,10024.0,NY,Excelsior Hotel NYC,v-1589859954,90484586.0
6,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",,,,,,,False,4b9eda5af964a520530637e3,306 W 94th St,US,New York,United States,,1360,"[306 W 94th St, New York, NY 10025, United Sta...","[{'label': 'display', 'lat': 40.79403677901636...",40.794037,-73.974951,,10025.0,NY,Hotel Alexander New York,v-1589859954,
7,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",,,,,,,False,4ac87dacf964a52038bc20e3,2130 Broadway,US,New York,United States,at 75th St.,2049,"[2130 Broadway (at 75th St.), New York, NY 100...","[{'label': 'display', 'lat': 40.78077753013436...",40.780778,-73.981218,,10023.0,NY,Hotel Beacon NYC,v-1589859954,32743091.0
8,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",,,,,,,False,4aa70ec8f964a520e44b20e3,2688 Broadway,US,New York,United States,at W 103rd St,1250,"[2688 Broadway (at W 103rd St), New York, NY 1...","[{'label': 'display', 'lat': 40.79887278717793...",40.798873,-73.968323,,10025.0,NY,Marrakech Hotel,v-1589859954,
9,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",,,,,,,False,3fd66200f964a520bee71ee3,358 W 58th St,US,New York,United States,at 9th Ave,3173,"[358 W 58th St (at 9th Ave), New York, NY 1001...","[{'label': 'display', 'lat': 40.76829069866834...",40.768291,-73.984868,,10019.0,NY,Hudson Hotel,v-1589859954,


### Data Wrangling to filter the data

In [19]:

# Now for some data-wrangling. Drop the columns I don't need. 
filtered_columns = ['name', 'categories'] + [col for col in dfv.columns if col.startswith('location.')] +  ['id']
dfv_filtered = dfv.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dfv_filtered['categories'] = dfv_filtered.apply(get_category_type, axis=1)
# clean column names by keeping only last term
dfv_filtered.columns = [column.split('.')[-1] for column in dfv_filtered.columns]

dfv_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,Hotel Wales,Hotel,1295 Madison Ave,US,New York,United States,92nd St,648,"[1295 Madison Ave (92nd St), New York, NY 1012...","[{'label': 'display', 'lat': 40.7847375, 'lng'...",40.784737,-73.955713,,10128.0,NY,4ad78cbff964a520140c21e3
1,Days Inn Hotel New York City-Broadway,Hotel,215 West 94th Street,US,New York,United States,at Broadway,1106,"[215 West 94th Street (at Broadway), New York,...","[{'label': 'display', 'lat': 40.7932977, 'lng'...",40.793298,-73.972092,,10025.0,NY,4b9c6ac8f964a520276736e3
2,Broadway Hotel,Hostel,230 W 101st St,US,New York,United States,at Broadway,1247,"[230 W 101st St (at Broadway), New York, NY 10...","[{'label': 'display', 'lat': 40.79793166906806...",40.797932,-73.969834,,10025.0,NY,4bf2fc262d629521cbe55f58
3,Belnord Hotel,Hotel,209 W 87th St,US,New York,United States,Broadway,1280,"[209 W 87th St (Broadway), New York, NY 10024,...","[{'label': 'display', 'lat': 40.7889054, 'lng'...",40.788905,-73.975054,,10024.0,NY,4b1c3322f964a520210424e3
4,Hotel 99 Llc,Hotel,244 W 99th St,US,New York,United States,,1194,"[244 W 99th St, New York, NY 10025, United Sta...","[{'label': 'display', 'lat': 40.79669018312864...",40.79669,-73.970555,,10025.0,NY,4bc3a05adce4eee125af719d
5,Excelsior Hotel NYC,Hotel,45 W 81st St,US,New York,United States,at Columbus Ave,1399,"[45 W 81st St (at Columbus Ave), New York, NY ...","[{'label': 'display', 'lat': 40.78294941406415...",40.782949,-73.973964,,10024.0,NY,4ad2b3d4f964a52083e220e3
6,Hotel Alexander New York,Hotel,306 W 94th St,US,New York,United States,,1360,"[306 W 94th St, New York, NY 10025, United Sta...","[{'label': 'display', 'lat': 40.79403677901636...",40.794037,-73.974951,,10025.0,NY,4b9eda5af964a520530637e3
7,Hotel Beacon NYC,Hotel,2130 Broadway,US,New York,United States,at 75th St.,2049,"[2130 Broadway (at 75th St.), New York, NY 100...","[{'label': 'display', 'lat': 40.78077753013436...",40.780778,-73.981218,,10023.0,NY,4ac87dacf964a52038bc20e3
8,Marrakech Hotel,Hotel,2688 Broadway,US,New York,United States,at W 103rd St,1250,"[2688 Broadway (at W 103rd St), New York, NY 1...","[{'label': 'display', 'lat': 40.79887278717793...",40.798873,-73.968323,,10025.0,NY,4aa70ec8f964a520e44b20e3
9,Hudson Hotel,Hotel,358 W 58th St,US,New York,United States,at 9th Ave,3173,"[358 W 58th St (at 9th Ave), New York, NY 1001...","[{'label': 'display', 'lat': 40.76829069866834...",40.768291,-73.984868,,10019.0,NY,3fd66200f964a520bee71ee3


### Finding Indian Restaurants close to these hotels

In [22]:
# search details
radius = 10
search_query = "Indian Restaurant"

In [24]:
restaurants = []
for index, row in dfv_filtered.iterrows():
    url= 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}'.format(CLIENT_ID, CLIENT_SECRET, row['lat'], row['lng'], VERSION, search_query, radius)
    results = requests.get(url).json()
    venues = results['response']['venues']
    restaurants.append((row['name'],len(venues)))
restaurants.sort(key=lambda x:x[1], reverse=True)
print(restaurants)

[('Hotel Wales', 5), ('Renaissance New York Hotel 57', 3), ('The Algonquin Hotel, Autograph Collection', 3), ('Four Seasons Hotel', 3), ('The Park Lane Hotel', 2), ('Renaissance New York Times Square Hotel', 2), ('Broadway Hotel', 1), ('Belnord Hotel', 1), ('Hotel 99 Llc', 1), ('Marrakech Hotel', 1), ('The Plaza Hotel', 1), ('The Lucerne Hotel', 1), ('1 Hotel Central Park', 1), ('6 Columbus, a SIXTY Hotel', 1), ('The Premier Hotel New York', 1), ('Sanctuary Hotel New York', 1), ('Bentley Hotel', 1), ('The Roosevelt Hotel', 1), ('The Empire Hotel', 1), ('Baccarat Hotel', 1), ('The Empire Hotel Lobby Bar', 1), ('The Empire Hotel Rooftop', 1), ('Hotel Delmonico', 1), ('The Lexington Hotel, Autograph Collection', 1), ('Fifty Hotel & Suites by Affinia', 1), ('Days Inn Hotel New York City-Broadway', 0), ('Excelsior Hotel NYC', 0), ('Hotel Alexander New York', 0), ('Hotel Beacon NYC', 0), ('Hudson Hotel', 0), ('Swimming Pool @ ONE UN Plaza Hotel', 0), ('Trump International Hotel & Tower® New 

### Result

### As we can see that 'Hotel wales' has maximum number of Indian Restaurants within 10 mile radius. Thus Tourist can easily find out which hotel is best suited for their needs.