# Capstone Project - Looking for a Hotel in Cleveland

## Applied Data Science Capstone by IBM/Coursera

## Table of Contents

+ Introduction. Business Problem
+ Data
+ Methodology
+ Analysis
+ Results and discussion
+ Conclusions

## Introduction. Business Problem

Cleveland is a cosmopolitan city, which offers residents and tourists a lot of artistic and cultural institutions, such as the extensive system of public libraries and centers such as Progressive Field, the Rock and Roll Hall of Fame, and the Playhouse Square Center. Therefore it is very important to have a varied hotel offer that provides both national and international tourists a complete and totally enjoyable visit, so that they can enjoy the entire cultural range offered by the city.

According to https://es.wikipedia.org/wiki/Cleveland Cleveland is the corporate headquarters of many large companies such as National City Corporation, Eaton Corporation, Forest City Enterprises, Sherwin-Williams Company, and KeyCorp. NASA maintains a facility in Cleveland, the Glenn Research Center. 

Therefore, the desire of this analysis is to find an ideal place for the location of a hotel, considering the advantages in displacement and proximity to the places of interest offered by the city. Thus, this analysis would be of interest mainly to the rulers that would attract investment to the city, and to all event organizers who need to attract people from other locations.

## Data

To know how the city is distributed, in terms of restaurant, services, parks, cultural centers, etc., the Cleveland Neighborhoods database taken from Kaggle is used. This base consists solely of the name of the neighborhoods and the location (latitude - longitude). With the help of the Foursquare API you will get the information of the places and their characteristics.

### Firs, I let's download all the dependencies that we will need.

In [42]:
import pandas as pd
import numpy as np

import altair as alt 

from bs4 import BeautifulSoup
import requests

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
from sklearn.cluster import AgglomerativeClustering

import folium # map rendering library

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

#import geocoder # import geocoder
#from vincenty import vincenty_inverse # Library for calculate distance into two geo points

### Next, let's load the data, latitude and longitude.

In [183]:
import json
import pandas as pd

bd = pd.read_csv('C:/LDAB/projects/Coursera_Capstone/Cleveland Neighborhoods.csv')
bd.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13
0,"Asiatown, Cleveland",41.508833,-81.680417,,,,,,,,,,,
1,"Bellaire-Puritas, Cleveland",41.433682,-81.80014,,,,,,,,,,,
2,Broadway-Slavic Village,41.458056,-81.644722,,,,,,,,,,,
3,Brooklyn Centre,41.453446,-81.699402,,,,,,,,,,,
4,Buckeye-Shaker,41.483889,-81.590556,,,,,,,,,,,


In [187]:
bd = bd.drop(['Unnamed: 3', 'Unnamed: 4', 'Unnamed: 5', 'Unnamed: 6', 'Unnamed: 7', 'Unnamed: 8', 'Unnamed: 9', 'Unnamed: 10', 'Unnamed: 11', 'Unnamed: 12', 'Unnamed: 13'], axis=1)
print(bd.sort_values(by='Neighborhood', ascending=False))

                            Neighborhood   Latitude  Longitude
32             Woodland Hills, Cleveland  41.481389 -81.611389
31         Warehouse District, Cleveland  41.497500 -81.701667
30                     University Circle  41.508611 -81.605278
29                      Union-Mills Park  41.454889 -81.614389
28                    Tremont, Cleveland  41.473611 -81.688611
14                             The Flats  41.492000 -81.696000
27                 Stockyards, Cleveland  41.483889 -81.590556
26                   St. Clair-Suprerior        NaN        NaN
25                          Old Brooklyn  41.431559 -81.702332
24                  Ohio City, Cleveland  41.483611 -81.710278
23                      Nottingham, Ohio        NaN        NaN
22                  Nine-Twelve District  41.499444 -81.685833
21                             Lee-Miles  41.440140 -81.564786
20                    Kinsman, Cleveland  41.558000 -81.569000
19                        Kamm's Corners  41.444286 -81

In [188]:
bd = bd.dropna()
bd.shape

(31, 3)

### Use geopy library to get the latitude and longitude values of Toront City.

In [189]:
address = 'Cleveland, OH'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Cleveland are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Cleveland are 41.5051613, -81.6934446.


### Create a map of New York with neighborhoods superimposed on top.

In [190]:
# create map of New York using latitude and longitude values
map_cleveland = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, neighborhood in zip(bd['Latitude'], bd['Longitude'], bd['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_cleveland)  
    
map_cleveland

In [191]:
neighborhood_latitude = bd.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = bd.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = bd.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))


Latitude and longitude values of Asiatown, Cleveland are 41.508833, -81.680417.


In [192]:
# type your answer here
LIMIT = 100
# limit of number of venues returned by Foursquare API
radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=ZVQR4LO4HEVNDYY5IG23KBDF2OOA00HWLLOM5FFTX5YHOQRB&client_secret=QR5M3BNEU5UQZGOXYKNEBPZLMPA3O3HC54ISAERZOKAOWCCA&v=20180605&ll=41.508833,-81.680417&radius=500&limit=100'

### Send the GET request and examine the resutls

In [193]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5d8b82abdb1d81002ca4798d'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Downtown Cleveland',
  'headerFullLocation': 'Downtown Cleveland, Cleveland',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 16,
  'suggestedBounds': {'ne': {'lat': 41.513333004500005,
    'lng': -81.67441902827106},
   'sw': {'lat': 41.5043329955, 'lng': -81.68641497172895}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b631b30f964a5206e632ae3',
       'name': "Danny's Deli",
       'location': {'address': '1658 Saint Clair Ave NE',
        'crossStreet': 'St. Claire',
        'lat': 41.50719909122553,
        'lng': -81.68302650718543,
        'la

### Let's borrow the **get_category_type** function from the Foursquare lab.

In [194]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

### Now I to clean the json and structure it into a *pandas* dataframe.

In [195]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Danny's Deli,Deli / Bodega,41.507199,-81.683027
1,Artefino Art Gallery and Cafe,Coffee Shop,41.506913,-81.679266
2,Firefighters Community Credit Union | FFCCU,Credit Union,41.509644,-81.677282
3,Emperor's Palace,Chinese Restaurant,41.508503,-81.678519
4,Leather Stallion Saloon,Gay Bar,41.509774,-81.678759


In [196]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

16 venues were returned by Foursquare.


### Let's create a function to repeat the same process like in Manhattan

In [197]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['vecis', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### Now write the code to run the above function on each neighborhood and create a new dataframe called *cleveland_venues*.

In [198]:
cleveland_venues = getNearbyVenues(names = bd['Neighborhood'],
                                   latitudes = bd['Latitude'],
                                   longitudes = bd['Longitude']
                                  )


Asiatown, Cleveland
Bellaire-Puritas, Cleveland
Broadway-Slavic Village
Brooklyn Centre
Buckeye-Shaker
Campus Disctict
Central, Cleveland
Clark-Fulton
Collinwood
Detroit-Shoreway
Downtown Cleveland
East 4th Street District (Cleveland)
Edgewater, Cleveland
Fairfax, Cleveland
The Flats
Glenville, Cleveland
Goodrich-Kirtland Park
Hough, Cleveland
Industrial Valley
Kamm's Corners
Kinsman, Cleveland
Lee-Miles
Nine-Twelve District
Ohio City, Cleveland
Old Brooklyn
Stockyards, Cleveland
Tremont, Cleveland
Union-Mills Park
University Circle
Warehouse District, Cleveland
Woodland Hills, Cleveland


In [199]:
print(cleveland_venues.shape)
cleveland_venues.tail()

(676, 7)


Unnamed: 0,vecis,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
671,"Woodland Hills, Cleveland",41.481389,-81.611389,Mr Hero,41.481874,-81.606177,Sandwich Place
672,"Woodland Hills, Cleveland",41.481389,-81.611389,Rainbow Shops,41.482331,-81.606159,Women's Store
673,"Woodland Hills, Cleveland",41.481389,-81.611389,Little Caesers - Buckeye Plaza,41.481844,-81.605934,Pizza Place
674,"Woodland Hills, Cleveland",41.481389,-81.611389,A Taste Of Soul,41.482464,-81.616901,Restaurant
675,"Woodland Hills, Cleveland",41.481389,-81.611389,VILLA,41.482597,-81.605726,Clothing Store


### Let's check how many venues were returned for each neighborhood

In [200]:
cleveland_venues.groupby('vecis').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
vecis,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Asiatown, Cleveland",16,16,16,16,16,16
"Bellaire-Puritas, Cleveland",14,14,14,14,14,14
Broadway-Slavic Village,13,13,13,13,13,13
Brooklyn Centre,11,11,11,11,11,11
Buckeye-Shaker,32,32,32,32,32,32
Campus Disctict,6,6,6,6,6,6
"Central, Cleveland",7,7,7,7,7,7
Collinwood,5,5,5,5,5,5
Detroit-Shoreway,7,7,7,7,7,7
Downtown Cleveland,100,100,100,100,100,100


In [201]:
print('There are {} uniques categories.'.format(len(cleveland_venues['Venue Category'].unique())))

There are 150 uniques categories.


### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [202]:
# one hot encoding
cleveland_onehot = pd.get_dummies(cleveland_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
cleveland_onehot['vecis'] = cleveland_venues['vecis'] 

# move neighborhood column to the first column
fixed_columns = [cleveland_onehot.columns[-1]] + list(cleveland_onehot.columns[:-1])
cleveland_onehot = cleveland_onehot[fixed_columns]

cleveland_onehot

Unnamed: 0,vecis,ATM,American Restaurant,Aquarium,Arepa Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,BBQ Joint,...,Tattoo Parlor,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,"Asiatown, Cleveland",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Asiatown, Cleveland",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Asiatown, Cleveland",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Asiatown, Cleveland",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Asiatown, Cleveland",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,"Asiatown, Cleveland",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,"Asiatown, Cleveland",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,"Asiatown, Cleveland",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,"Asiatown, Cleveland",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,"Asiatown, Cleveland",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [203]:
cleveland_onehot.shape

(676, 151)

In [204]:
cleveland_grouped = cleveland_onehot.groupby('vecis').mean().reset_index()
cleveland_grouped.shape

(29, 151)

### Let's print each neighborhood along with the top 5 most common venues

In [205]:
num_top_venues = 5

for hood in cleveland_grouped['vecis']:
    print("----"+hood+"----")
    temp = cleveland_grouped[cleveland_grouped['vecis'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Asiatown, Cleveland----
                  venue  freq
0   Rental Car Location  0.19
1  Gym / Fitness Center  0.06
2          Night Market  0.06
3        Sandwich Place  0.06
4    Chinese Restaurant  0.06


----Bellaire-Puritas, Cleveland----
                 venue  freq
0                Hotel  0.14
1          Gas Station  0.07
2                  Bar  0.07
3          Pizza Place  0.07
4  Rental Car Location  0.07


----Broadway-Slavic Village----
               venue  freq
0        Pizza Place  0.15
1     Ice Cream Shop  0.08
2     Baseball Field  0.08
3      Grocery Store  0.08
4  Food & Drink Shop  0.08


----Brooklyn Centre----
                  venue  freq
0                 Diner  0.09
1        Rental Service  0.09
2           Gas Station  0.09
3     Mobile Phone Shop  0.09
4  Fast Food Restaurant  0.09


----Buckeye-Shaker----
                 venue  freq
0  American Restaurant  0.09
1   Light Rail Station  0.06
2       Farmers Market  0.03
3    Mobile Phone Shop  0.03
4    Fre

### Let's write a function to sort the venues in descending order.

In [206]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

### Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [207]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['vecis']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhood_venues_sorted = pd.DataFrame(columns=columns)
neighborhood_venues_sorted['vecis'] = cleveland_grouped['vecis']

for ind in np.arange(cleveland_grouped.shape[0]):
    neighborhood_venues_sorted.iloc[ind, 1:] = return_most_common_venues(cleveland_grouped.iloc[ind, :], num_top_venues)

neighborhood_venues_sorted.head()

Unnamed: 0,vecis,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Asiatown, Cleveland",Rental Car Location,Credit Union,Gay Bar,Sandwich Place,Night Market,Nightclub,Chinese Restaurant,Recording Studio,Coffee Shop,Print Shop
1,"Bellaire-Puritas, Cleveland",Hotel,Pizza Place,Diner,New American Restaurant,Chinese Restaurant,Rental Car Location,Gas Station,Bar,Bank,Grocery Store
2,Broadway-Slavic Village,Pizza Place,Restaurant,Pharmacy,Fast Food Restaurant,Ice Cream Shop,Polish Restaurant,Grocery Store,Eastern European Restaurant,Sandwich Place,Food & Drink Shop
3,Brooklyn Centre,Pizza Place,Diner,Chinese Restaurant,Rental Service,Gas Station,Bar,Fast Food Restaurant,Mobile Phone Shop,Intersection,Art Gallery
4,Buckeye-Shaker,American Restaurant,Light Rail Station,Hungarian Restaurant,Diner,Sandwich Place,Breakfast Spot,Burger Joint,Clothing Store,Plaza,Coffee Shop


## Methodology

It is necessary to know how the different categories of entertainment that the city offers are distributed, in each of its neighborhoods, for this purpose the Foursquare is used and thus it is already known which are the most popular. This information is grouped by means of K_means so that those areas that have similar categories are in the same group. 
Run *k*-means to cluster the neighborhood into 5 clusters.

In [209]:
# set number of clusters
kclusters = 5

cleveland_grouped_clustering = cleveland_grouped.drop('vecis', 1)

# run k-means clustering
kmeans = AgglomerativeClustering(n_clusters=kclusters).fit(cleveland_grouped_clustering)
            
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 4, 1, 1, 1, 1, 2, 4, 1], dtype=int64)

### Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [222]:
#add clustering labels
#neighborhood_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

cleveland_merged = bd

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
cleveland_merged = cleveland_merged.join(neighborhood_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

#toronto_merged['Clusteres'] = toronto_merged['Cluster Labels'].astype('int')
cleveland_merged['Cluster Labels'] = cleveland_merged['Cluster Labels'].fillna(2).astype(int)

cleveland_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Asiatown, Cleveland",41.508833,-81.680417,1,Rental Car Location,Credit Union,Gay Bar,Sandwich Place,Night Market,Nightclub,Chinese Restaurant,Recording Studio,Coffee Shop,Print Shop
1,"Bellaire-Puritas, Cleveland",41.433682,-81.80014,1,Hotel,Pizza Place,Diner,New American Restaurant,Chinese Restaurant,Rental Car Location,Gas Station,Bar,Bank,Grocery Store
2,Broadway-Slavic Village,41.458056,-81.644722,4,Pizza Place,Restaurant,Pharmacy,Fast Food Restaurant,Ice Cream Shop,Polish Restaurant,Grocery Store,Eastern European Restaurant,Sandwich Place,Food & Drink Shop
3,Brooklyn Centre,41.453446,-81.699402,1,Pizza Place,Diner,Chinese Restaurant,Rental Service,Gas Station,Bar,Fast Food Restaurant,Mobile Phone Shop,Intersection,Art Gallery
4,Buckeye-Shaker,41.483889,-81.590556,1,American Restaurant,Light Rail Station,Hungarian Restaurant,Diner,Sandwich Place,Breakfast Spot,Burger Joint,Clothing Store,Plaza,Coffee Shop


In [114]:
neighborhoods_data[0]

{'type': 'Feature',
 'properties': {'PRI_NEIGH': 'Grand Boulevard',
  'SEC_NEIGH': 'BRONZEVILLE',
  'SHAPE_AREA': 48492503.1554,
  'SHAPE_LEN': 28196.837157},
 'geometry': {'type': 'Polygon',
  'coordinates': [[[-87.60670812560363, 41.81681377137387],
    [-87.60670480953505, 41.81657908583579],
    [-87.60670022648407, 41.816338713552454],
    [-87.60669581538588, 41.816099357727296],
    [-87.60668982110376, 41.8158118024656],
    [-87.60668357216157, 41.81556631526606],
    [-87.60667660553894, 41.815299912163404],
    [-87.6066796364493, 41.814994168113515],
    [-87.60668235893172, 41.81471953500853],
    [-87.60667153481008, 41.8142816453241],
    [-87.60666414094068, 41.81399460252956],
    [-87.60665643548599, 41.81366052091469],
    [-87.6066508943903, 41.81342058153228],
    [-87.60664694986733, 41.81317320016689],
    [-87.60664346744446, 41.81295477653956],
    [-87.60663536449937, 41.812655873932044],
    [-87.60662867423531, 41.81240396220561],
    [-87.60662516618756, 41

### Finally, let's visualize the resulting clusters

In [224]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(cleveland_merged['Latitude'], cleveland_merged['Longitude'], cleveland_merged['Neighborhood'], cleveland_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [230]:
cleveland_merged.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Asiatown, Cleveland",41.508833,-81.680417,1,Rental Car Location,Credit Union,Gay Bar,Sandwich Place,Night Market,Nightclub,Chinese Restaurant,Recording Studio,Coffee Shop,Print Shop
1,"Bellaire-Puritas, Cleveland",41.433682,-81.80014,1,Hotel,Pizza Place,Diner,New American Restaurant,Chinese Restaurant,Rental Car Location,Gas Station,Bar,Bank,Grocery Store
2,Broadway-Slavic Village,41.458056,-81.644722,4,Pizza Place,Restaurant,Pharmacy,Fast Food Restaurant,Ice Cream Shop,Polish Restaurant,Grocery Store,Eastern European Restaurant,Sandwich Place,Food & Drink Shop
3,Brooklyn Centre,41.453446,-81.699402,1,Pizza Place,Diner,Chinese Restaurant,Rental Service,Gas Station,Bar,Fast Food Restaurant,Mobile Phone Shop,Intersection,Art Gallery
4,Buckeye-Shaker,41.483889,-81.590556,1,American Restaurant,Light Rail Station,Hungarian Restaurant,Diner,Sandwich Place,Breakfast Spot,Burger Joint,Clothing Store,Plaza,Coffee Shop


# Analysis

## Cluster 0

Cluster 0, identified in red, are localities located mainly in the city, away from the coastal zone, and in general in residential areas with some small commercial sites such as drugstore, pizzerias, barber shops, typical of residential places

In [231]:
cleveland_merged.loc[cleveland_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,Lee-Miles,41.44014,-81.564786,0,Mobile Phone Shop,Food,Cupcake Shop,Dance Studio,Deli / Bodega,Dessert Shop,Diner,Discount Store,Dive Bar,Food & Drink Shop
29,Union-Mills Park,41.454889,-81.614389,0,Food,Salon / Barbershop,Donut Shop,Fast Food Restaurant,Farmers Market,Event Service,Electronics Store,Eastern European Restaurant,Dry Cleaner,Dive Bar


## Cluster 1

Cluster 1; It is the one that was composed of the largest number of neighborhoods and therefore also has the largest number of categories offered such as restaurants, shopping centers, galleries, bars, among others.

In [232]:
cleveland_merged.loc[cleveland_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Asiatown, Cleveland",41.508833,-81.680417,1,Rental Car Location,Credit Union,Gay Bar,Sandwich Place,Night Market,Nightclub,Chinese Restaurant,Recording Studio,Coffee Shop,Print Shop
1,"Bellaire-Puritas, Cleveland",41.433682,-81.80014,1,Hotel,Pizza Place,Diner,New American Restaurant,Chinese Restaurant,Rental Car Location,Gas Station,Bar,Bank,Grocery Store
3,Brooklyn Centre,41.453446,-81.699402,1,Pizza Place,Diner,Chinese Restaurant,Rental Service,Gas Station,Bar,Fast Food Restaurant,Mobile Phone Shop,Intersection,Art Gallery
4,Buckeye-Shaker,41.483889,-81.590556,1,American Restaurant,Light Rail Station,Hungarian Restaurant,Diner,Sandwich Place,Breakfast Spot,Burger Joint,Clothing Store,Plaza,Coffee Shop
5,Campus Disctict,41.497778,-81.67,1,Optical Shop,American Restaurant,Fast Food Restaurant,Intersection,Camera Store,Gas Station,Yoga Studio,Farmers Market,Event Service,Electronics Store
6,"Central, Cleveland",41.5,-81.666667,1,Dance Studio,Fast Food Restaurant,Intersection,Mediterranean Restaurant,Athletics & Sports,Camera Store,Gas Station,Yoga Studio,Eastern European Restaurant,Farmers Market
10,Downtown Cleveland,41.498889,-81.689722,1,Hotel,New American Restaurant,Coffee Shop,Steakhouse,Sports Bar,Bar,Lounge,Pizza Place,Burger Joint,Café
11,East 4th Street District (Cleveland),41.498889,-81.69,1,Hotel,Steakhouse,Coffee Shop,New American Restaurant,Bar,Lounge,Burger Joint,Sports Bar,Pizza Place,Pedestrian Plaza
13,"Fairfax, Cleveland",41.483889,-81.590556,1,American Restaurant,Light Rail Station,Hungarian Restaurant,Diner,Sandwich Place,Breakfast Spot,Burger Joint,Clothing Store,Plaza,Coffee Shop
14,The Flats,41.492,-81.696,1,Bike Shop,Beer Bar,Gym,Athletics & Sports,Yoga Studio,Eastern European Restaurant,Flea Market,Fast Food Restaurant,Farmers Market,Event Service


## Cluster 2

Cluster 2; It is the cluster that is most dispersed within the city, with some commercial points and some fast food restaurants, and others.

In [233]:
cleveland_merged.loc[cleveland_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Clark-Fulton,41.66667,-81.716667,2,,,,,,,,,,
8,Collinwood,41.558,-81.569,2,Pizza Place,Convenience Store,Discount Store,Food,Bar,Dry Cleaner,Fast Food Restaurant,Farmers Market,Event Service,Electronics Store
12,"Edgewater, Cleveland",41.431559,-81.702332,2,Food,Event Service,Convenience Store,Bus Stop,Electronics Store,Pizza Place,Gym,Grocery Store,Hot Dog Joint,Cupcake Shop
17,"Hough, Cleveland",41.512334,-81.635213,2,,,,,,,,,,
19,Kamm's Corners,41.444286,-81.818492,2,Convenience Store,Pizza Place,Ice Cream Shop,Pet Store,Café,Donut Shop,Farmers Market,Event Service,Electronics Store,Eastern European Restaurant
20,"Kinsman, Cleveland",41.558,-81.569,2,Pizza Place,Convenience Store,Discount Store,Food,Bar,Dry Cleaner,Fast Food Restaurant,Farmers Market,Event Service,Electronics Store
25,Old Brooklyn,41.431559,-81.702332,2,Food,Event Service,Convenience Store,Bus Stop,Electronics Store,Pizza Place,Gym,Grocery Store,Hot Dog Joint,Cupcake Shop


## Cluster 3

Cluster 3 has as its main category The Light Rail Sation, and a few small businesses such as a jewelry store and a donut shop.

In [234]:
cleveland_merged.loc[cleveland_merged['Cluster Labels'] == 3]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Industrial Valley,41.483333,-81.666667,3,Pier,Light Rail Station,Jewelry Store,Yoga Studio,Donut Shop,Fast Food Restaurant,Farmers Market,Event Service,Electronics Store,Eastern European Restaurant


## Cluster 4

Cluster 4 is mainly composed of restaurants and fast food stores such as pizzerias and sandwicherías

In [235]:
cleveland_merged.loc[cleveland_merged['Cluster Labels'] == 4]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Broadway-Slavic Village,41.458056,-81.644722,4,Pizza Place,Restaurant,Pharmacy,Fast Food Restaurant,Ice Cream Shop,Polish Restaurant,Grocery Store,Eastern European Restaurant,Sandwich Place,Food & Drink Shop
9,Detroit-Shoreway,41.479062,-81.737795,4,Pizza Place,Food & Drink Shop,Deli / Bodega,Bus Stop,Sandwich Place,Greek Restaurant,Discount Store,Dry Cleaner,Event Service,Electronics Store
15,"Glenville, Cleveland",41.533347,-81.616588,4,Pizza Place,Historic Site,Sandwich Place,BBQ Joint,Dive Bar,Farmers Market,Event Service,Electronics Store,Eastern European Restaurant,Dry Cleaner
32,"Woodland Hills, Cleveland",41.481389,-81.611389,4,Sandwich Place,Pizza Place,Restaurant,Women's Store,Clothing Store,Farmers Market,Event Service,Electronics Store,Eastern European Restaurant,Dry Cleaner


# Conclusions and discussion

Una vez analizados los diferentes clusters creados, se puede notar que efectivamente hay algunos lugares que no ofrecen comodidad o cercanía a la mayoría de los sitios de interés para los visitantes que busquen hospedaje. Como es el caso del cluster 3, que solo tiene cerca la estación del tren.  El cluster 0 y el 4, está compuesto al parecer por zonas residenciales que cuentan con unos pocos negocios comerciales y algunos sitios para comer. Incluso el cluster 2 presenta este comportamiento similar, además de estar regado por toda la ciudad, Mientras el cluster 1, está concentrado en lugares populares y de gran afluencia de comercios y restaurantes.