### Business Problem:

I used to travel to a lot many places as a solo traveller. It has always been a cubesome task to choose a hotel at an optimal
location which is close to all/maximum tourist places. By building an app, it would be easier for any traveller to book a
hotel & cover maximum tourist spots.

### Solution:

I am planning to build an app which based on the travellers location or the place he is gonna stay will suggest all tourist 
places, hotels, restaurants etc. available around the town. This is done using Foursquare API. This app will give an optimal 
area & give the list of hotels, its ratings & reviews for the user to choose from.

### Data:

Foursquare API data will be used extract venues, places, ratings etc & geopy coder to obtain location data. Clustering
based on different areas will be used to obtain local & global optima. Folium library is used to visualize the areas in the
map. For this project Bangalore city in India's data will be used.


In [3]:
import pandas as pd
import numpy as np

In [4]:
data = pd.read_csv("C:\\Users\\IBM_ADMIN\\Desktop\\Bangalore.csv")
data.head()

Unnamed: 0,key,area_name,state,latitude,longitude,accuracy
0,IN/560001,Rajbhavan,Karnataka,12.2667,76.6833,
1,IN/560002,Chamarajendrapeta,Karnataka,12.2667,76.6833,
2,IN/560003,Extension,Karnataka,12.2667,76.6833,4.0
3,IN/560004,Lalbagh West,Karnataka,12.2667,76.6833,
4,IN/560005,Fraser Town,Karnataka,12.2667,76.6833,


In [5]:
data.shape

(61, 6)

In [6]:
data.columns

Index(['key', 'area_name', 'state', 'latitude', 'longitude', 'accuracy'], dtype='object')

In [7]:
print(data['state'].value_counts())
print(data['accuracy'].value_counts())


Karnataka    61
Name: state, dtype: int64
4.0    4
Name: accuracy, dtype: int64


In [8]:
data.dtypes

key           object
area_name     object
state         object
latitude     float64
longitude    float64
accuracy     float64
dtype: object

In [9]:
#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

#!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Folium installed
Libraries imported.


### Using Foursquare API to obtain Bangalore location details

In [10]:
CLIENT_ID = 'VDY3CAVDXPAPUIDZ4YJMCKTQ23FGFEWM05V22YIFVWOFVQF3' # your Foursquare ID
CLIENT_SECRET = 'QZG53ZE55TESWDBPUJEACGTZQCWBD1QLAXLKF5A2X5JYYMXC' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
#print('Your credentails:')
#print('CLIENT_ID: ' + CLIENT_ID)
#print('CLIENT_SECRET:' + CLIENT_SECRET)

In [11]:
#Obtain Latitude and longitude values of Central Bangalore
address = 'Bangalore, Karnataka, India'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

  after removing the cwd from sys.path.


12.9791198 77.5912997


### Obtaining 'Tourist places' within 5000 miles radius of Bangalore

In [12]:
search_query = 'tourist'
radius = 5000
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
#url

In [13]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c99da11351e3d4c798de065'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Bangalore',
  'headerFullLocation': 'Bangalore',
  'headerLocationGranularity': 'city',
  'query': 'tourist',
  'totalResults': 19,
  'suggestedBounds': {'ne': {'lat': 13.024119845000044,
    'lng': 77.63739332570279},
   'sw': {'lat': 12.934119754999955, 'lng': 77.5452060742972}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b9f5637f964a5203f1d37e3',
       'name': 'M. Chinnaswamy Stadium',
       'location': {'address': 'Queens Rd',
        'lat': 12.978144481391702,
        'lng': 77.599222834094,
        'labeledLatLngs': [{'label': 'display',
          'lat': 12

In [14]:
# assign relevant part of JSON to venues
venues = results['response']['groups'][0]['items']
len(venues)

19

### Transforming the data obtained from Foursquare API into a pandas dataframe

In [15]:
# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

Unnamed: 0,reasons.count,reasons.items,referralId,venue.categories,venue.id,venue.location.address,venue.location.cc,venue.location.city,venue.location.country,venue.location.crossStreet,venue.location.distance,venue.location.formattedAddress,venue.location.labeledLatLngs,venue.location.lat,venue.location.lng,venue.location.postalCode,venue.location.state,venue.name,venue.photos.count,venue.photos.groups
0,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4b9f5637f964a5203f1d37e3-0,"[{'id': '4bf58dd8d48988d18a941735', 'name': 'C...",4b9f5637f964a5203f1d37e3,Queens Rd,IN,Bangalore,India,,866,"[Queens Rd, Bangalore 560001, Karnātaka, India]","[{'label': 'display', 'lat': 12.97814448139170...",12.978144,77.599223,560001.0,Karnātaka,M. Chinnaswamy Stadium,0,[]
1,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4c04b63439d476b09ddb31a7-1,"[{'id': '4bf58dd8d48988d12a941735', 'name': 'C...",4c04b63439d476b09ddb31a7,Vidhan Veedi,IN,Bangalore,India,Nr Cubbon Park,63,"[Vidhan Veedi (Nr Cubbon Park), Bangalore 5600...","[{'label': 'display', 'lat': 12.97902708519156...",12.979027,77.591881,560001.0,Karnātaka,Vidhana Soudha,0,[]
2,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4b5e8e7ef964a520719229e3-2,"[{'id': '4bf58dd8d48988d1e6941735', 'name': 'G...",4b5e8e7ef964a520719229e3,High Grounds,IN,Bangalore,India,Sankey Road,1311,"[High Grounds (Sankey Road), Bangalore 560001,...","[{'label': 'display', 'lat': 12.98968085798048...",12.989681,77.585933,560001.0,Karnātaka,Bangalore Golf Club,0,[]
3,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4be9627e9a54a59370fc0a11-3,"[{'id': '52e81612bcbc57f1066b7a35', 'name': 'C...",4be9627e9a54a59370fc0a11,19 St. Mark's Rd.,IN,Bangalore,India,,1169,"[19 St. Mark's Rd., Bangalore, Karnātaka, India]","[{'label': 'display', 'lat': 12.97431139275754...",12.974311,77.600885,,Karnātaka,Bowring Institute,0,[]
4,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4b87be17f964a520bcc931e3-4,"[{'id': '4bf58dd8d48988d1e2931735', 'name': 'A...",4b87be17f964a520bcc931e3,"Seshadripuram, Next to Hotel Grand Ashok",IN,Bangalore,India,"Kumarakrupa Rd,",1582,"[Seshadripuram, Next to Hotel Grand Ashok (Kum...","[{'label': 'display', 'lat': 12.98929492562096...",12.989295,77.581115,560001.0,Karnātaka,Chitra Kala Parishad,0,[]


### Data formatting

In [16]:
dataframe = dataframe.iloc[:,3:-2]

In [17]:
dataframe['venue.categories'] = dataframe['venue.categories'].apply(lambda x: x[0]['name'])

In [18]:
columns = [ col for col in dataframe.columns if col.startswith('venue.')]
dataframe = dataframe.loc[:,columns]

In [19]:
columns_renamed = [ col.replace("venue.", '').replace("location.", '').replace("formatted", '')\
                   .replace("lat", 'latitude').replace("lng", 'longitude')\
                   .replace("postalCode", 'pincode') for col in columns]
dataframe.rename(columns = dict(zip(columns,columns_renamed)), inplace = True)

In [20]:
dataframe = dataframe[['name','categories', 'id', 'Address','latitude', 'longitude']]

In [21]:
dataframe['Address'] = dataframe['Address'].apply(lambda x : ",".join(x))

In [22]:
dataframe.head()

Unnamed: 0,name,categories,id,Address,latitude,longitude
0,M. Chinnaswamy Stadium,Cricket Ground,4b9f5637f964a5203f1d37e3,"Queens Rd,Bangalore 560001,Karnātaka,India",12.978144,77.599223
1,Vidhana Soudha,Capitol Building,4c04b63439d476b09ddb31a7,"Vidhan Veedi (Nr Cubbon Park),Bangalore 560001...",12.979027,77.591881
2,Bangalore Golf Club,Golf Course,4b5e8e7ef964a520719229e3,"High Grounds (Sankey Road),Bangalore 560001,Ka...",12.989681,77.585933
3,Bowring Institute,Club House,4be9627e9a54a59370fc0a11,"19 St. Mark's Rd.,Bangalore,Karnātaka,India",12.974311,77.600885
4,Chitra Kala Parishad,Art Gallery,4b87be17f964a520bcc931e3,"Seshadripuram, Next to Hotel Grand Ashok (Kuma...",12.989295,77.581115


### Obtaining the venues in Bangalore map

In [23]:
bgl_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around Bangalore

# add a red circle marker to represent the Bangalore
folium.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Bangalore',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(bgl_map)

# add the tourist spots as blue circle markers
for lat, lng, label in zip(dataframe.latitude, dataframe.longitude, dataframe.name):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(bgl_map)

# display map

bgl_map

### Methodology
Now we find the centroid of all tourist places using any ML model like in this case (K means clustering). Centroid can be obtained easily by distance measurements of places and finding an intersection where all lines meetup. In this project I have taken Lalbagh West as an centroid area. So we will search for hotels in this area using FourSquare API. we will find ratings, reviews, pricing and get a list of top 5 best hotels to choose in lalbagh west.

In [25]:
#Obtain Latitude and longitude values of lalbagh west in banglore
address = 'Lalbagh West,Bangalore, Karnataka, India'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

  after removing the cwd from sys.path.


12.94646125 77.5800458707851


### Obtaining the list of Hotels around 1000 mile radius of Bangalore

In [26]:
search_query = 'hotel'
radius = 1000
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=VDY3CAVDXPAPUIDZ4YJMCKTQ23FGFEWM05V22YIFVWOFVQF3&client_secret=QZG53ZE55TESWDBPUJEACGTZQCWBD1QLAXLKF5A2X5JYYMXC&ll=12.94646125,77.5800458707851&v=20180604&query=hotel&radius=1000&limit=30'

In [29]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c99dad0db04f53b14e89245'},
 'response': {'venues': [{'id': '4d29ccc66e27a14317362824',
    'name': 'Hotel Nandhini',
    'location': {'address': 'No. 114/2, Near Minerva Circle',
     'crossStreet': 'L.F. Road',
     'lat': 12.955307313850327,
     'lng': 77.5794984145012,
     'labeledLatLngs': [{'label': 'display',
       'lat': 12.955307313850327,
       'lng': 77.5794984145012}],
     'distance': 986,
     'cc': 'IN',
     'city': 'Bangalore',
     'state': 'Karnātaka',
     'country': 'India',
     'formattedAddress': ['No. 114/2, Near Minerva Circle (L.F. Road)',
      'Bangalore',
      'Karnātaka',
      'India']},
    'categories': [{'id': '4bf58dd8d48988d10f941735',
      'name': 'Indian Restaurant',
      'pluralName': 'Indian Restaurants',
      'shortName': 'Indian',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/indian_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1553586896',
    'h

### Assigning the Hotels found around 1000 miles radius of Bangalore to relevant tourist places

In [28]:
# assign relevant part of JSON to venues
venues = results['response']['venues']
# tranform venues into a dataframe
hotels = json_normalize(venues)
hotels = hotels[hotels['categories'].str.len() > 0]
hotels.head()

Unnamed: 0,categories,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.postalCode,location.state,name,referralId
0,"[{'id': '4bf58dd8d48988d10f941735', 'name': 'I...",False,4d29ccc66e27a14317362824,"No. 114/2, Near Minerva Circle",IN,Bangalore,India,L.F. Road,986,"[No. 114/2, Near Minerva Circle (L.F. Road), B...","[{'label': 'display', 'lat': 12.95530731385032...",12.955307,77.579498,,Karnātaka,Hotel Nandhini,v-1553586842
1,"[{'id': '4bf58dd8d48988d10f941735', 'name': 'I...",False,4c24d166f7ced13aab6b236d,,IN,,India,,291,[India],"[{'label': 'display', 'lat': 12.94907856896084...",12.949079,77.579868,,,Kamat Hotel,v-1553586842
2,"[{'id': '4f4530a74b9074f6e4fb0100', 'name': 'B...",False,519ae73c498e1f7c8bf4f483,,IN,,India,,691,[India],"[{'label': 'display', 'lat': 12.94048987881796...",12.94049,77.581808,,,Hotel Sanman,v-1553586842
3,"[{'id': '4bf58dd8d48988d1fb931735', 'name': 'M...",False,4eff5a949a520dfd65e8e62d,,IN,,India,,865,[India],"[{'label': 'display', 'lat': 12.94053579721792...",12.940536,77.585205,,,Hotel Fiesta Grand,v-1553586842
4,"[{'id': '4bf58dd8d48988d143941735', 'name': 'B...",False,4dfac6ad18a8fc7fb43dfec4,,IN,Bangalore,India,,1000,"[Bangalore, Karnātaka, India]","[{'label': 'display', 'lat': 12.93747045945780...",12.93747,77.580145,,Karnātaka,Sanman Hotel,v-1553586842


### Data formatting

In [30]:
hotels['categories'] = hotels['categories'].apply(lambda x: x[0]['name'])

In [31]:
columns_renamed = [ col.replace("location.", '').replace("formatted", '') for col in hotels.columns]
hotels = hotels.rename(columns = dict(zip(hotels.columns,columns_renamed)))
hotels.columns

Index(['categories', 'hasPerk', 'id', 'address', 'cc', 'city', 'country',
       'crossStreet', 'distance', 'Address', 'labeledLatLngs', 'lat', 'lng',
       'postalCode', 'state', 'name', 'referralId'],
      dtype='object')

In [32]:
hotels = hotels[['name','categories','id','Address','lat', 'lng']]

In [33]:
hotels['ratings'] = 0.0

In [34]:
venue_id = '4d29ccc66e27a14317362824' # ID of Hotel Nandhini
url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
url

'https://api.foursquare.com/v2/venues/4d29ccc66e27a14317362824?client_id=VDY3CAVDXPAPUIDZ4YJMCKTQ23FGFEWM05V22YIFVWOFVQF3&client_secret=QZG53ZE55TESWDBPUJEACGTZQCWBD1QLAXLKF5A2X5JYYMXC&v=20180604'

In [35]:
result = requests.get(url).json()
try:
    print(result['response']['venue']['rating'])
except:
    print('This venue has not been rated yet.')

5.9


In [36]:
hotels.head()

Unnamed: 0,name,categories,id,Address,lat,lng,ratings
0,Hotel Nandhini,Indian Restaurant,4d29ccc66e27a14317362824,"[No. 114/2, Near Minerva Circle (L.F. Road), B...",12.955307,77.579498,0.0
1,Kamat Hotel,Indian Restaurant,4c24d166f7ced13aab6b236d,[India],12.949079,77.579868,0.0
2,Hotel Sanman,Boarding House,519ae73c498e1f7c8bf4f483,[India],12.94049,77.581808,0.0
3,Hotel Fiesta Grand,Motel,4eff5a949a520dfd65e8e62d,[India],12.940536,77.585205,0.0
4,Sanman Hotel,Breakfast Spot,4dfac6ad18a8fc7fb43dfec4,"[Bangalore, Karnātaka, India]",12.93747,77.580145,0.0


In [37]:
hotels.set_index('id', inplace= True)

In [38]:
for id in hotels.index:
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(id, CLIENT_ID, CLIENT_SECRET, VERSION)
    result = requests.get(url).json()
    try:
        hotels.loc[id,'ratings'] = result['response']['venue']['rating']
    except:
        hotels.loc[id,'ratings'] = np.nan

### Finding Top 5 Hotels based on the ratings

In [39]:
top5_hotels = hotels.sort_values(by = 'ratings', ascending = False).head()

In [40]:
top5_hotels

Unnamed: 0_level_0,name,categories,Address,lat,lng,ratings
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
4dfac6ad18a8fc7fb43dfec4,Sanman Hotel,Breakfast Spot,"[Bangalore, Karnātaka, India]",12.93747,77.580145,7.3
4d29ccc66e27a14317362824,Hotel Nandhini,Indian Restaurant,"[No. 114/2, Near Minerva Circle (L.F. Road), B...",12.955307,77.579498,5.9
514870c9e4b082ab1c8e53e8,Mysore Mylari Hotel,Indian Restaurant,"[Off Dvg Road Basvangudi, 56004, India]",12.943659,77.571412,5.9
4c24d166f7ced13aab6b236d,Kamat Hotel,Indian Restaurant,[India],12.949079,77.579868,5.7
4c7a45ba278eb71353375880,Hotel T.A.P Gold Crest,Hotel,"[No. 37, M.T.B Road, Near Minerva Circle, Lalb...",12.956152,77.57982,4.6


### To find Top 5 hotels based on the reviews

In [42]:
top5_hotels['Reviews'] = ""

In [43]:
## Ecco Tips
for id in top5_hotels.index:
    limit = 1 # set limit to be greater than or equal to the total number of tips
    url = 'https://api.foursquare.com/v2/venues/{}/tips?client_id={}&client_secret={}&v={}&limit={}'\
    .format(id, CLIENT_ID, CLIENT_SECRET, VERSION, limit)
    
    try:
        results = requests.get(url).json()
        top5_hotels.loc[id, 'Reviews'] = results['response']['tips']['items'][0]['user']['firstName'] + " " + \
        results['response']['tips']['items'][0]['user']['lastName'] + " - " + \
        results['response']['tips']['items'][0]['text']
        
    except:
        pass

In [44]:
top5_hotels

Unnamed: 0_level_0,name,categories,Address,lat,lng,ratings,Reviews
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
4dfac6ad18a8fc7fb43dfec4,Sanman Hotel,Breakfast Spot,"[Bangalore, Karnātaka, India]",12.93747,77.580145,7.3,Bharath Singiri - Khali dosa is just awesome! ...
4d29ccc66e27a14317362824,Hotel Nandhini,Indian Restaurant,"[No. 114/2, Near Minerva Circle (L.F. Road), B...",12.955307,77.579498,5.9,Abhi B - Descent Non Vegetarian food!
514870c9e4b082ab1c8e53e8,Mysore Mylari Hotel,Indian Restaurant,"[Off Dvg Road Basvangudi, 56004, India]",12.943659,77.571412,5.9,Uday UDi - Saagu masala is one of the great ta...
4c24d166f7ced13aab6b236d,Kamat Hotel,Indian Restaurant,[India],12.949079,77.579868,5.7,sarma vns - Best Veg food. But limited meals. ...
4c7a45ba278eb71353375880,Hotel T.A.P Gold Crest,Hotel,"[No. 37, M.T.B Road, Near Minerva Circle, Lalb...",12.956152,77.57982,4.6,Pramathesh Saha - Check the water they provide...



### Results

This applicaion will now be able to suggest an traveller visiting bangalore to choose an best area to stay and best hotel to book.

The final output displays top 5 hotels of best suitable area with ratings and reviews.




### Discussion:

After an data analysis I have found that there many venues which has no detailed address mentioned. I have extracted address using location data available from geopy package.

### Conclusion:

So now any user using this application will now be able to will be able to

1. Enter any desired location he is planning to visit
2. Get the list of tourist places in that location
3. The Model will find an optimal area which is nearby to many tourist places
4. Provide a list of top 5 hotels in an optimal area with ratings and reviews 