#### Suggestions for TOP venues in Lisbon!
#### Code for Final Report

## Introduction 

Hello everyone!

I am Solange, a citizen from Lisbon, in Portugal.

As you may know, **Lisbon** is blooming with tourists.<br>
This raises a major opportunity for many business owners and entrepreneurs, mostly in the **food & leisure industry**. Nonetheless, without proper planing and market insights, some of them don't succeeded, close their business or even go bankrupt.<br>
This situation raises a huge economic and social concern.Luckily there are great examples that we can learn from, such as New York city and Toronto. Being one of the top destinations for tourists worldwide since years, they still provide diversity and quality for all tastes. <br>
Therefore this project aims to provide insights from these two cities, which will definitely aid **business owners and all the stakeholders, including local authorities and the citizens**, to succeed and hence, make **Lisbon a great destination** for all.

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Data</a>
    
2. <a href="#item2">Methodology</a>
    
3. <a href="#item3">Results 
          
4. <a href="#item4">Discussion

5. <a href="#item5">Conclusion  
</font>
</div>

<a id='item1'></a>

## 1. Data

#### Sources and format
To explore the two cities we start by creating two data frames which include at least Borough and Neighborhoods information. 
New York (NY) data is available as a json file in the following link https://ibm.box.com/shared/static/fbpwbovar7lf8p5sgddm06cgipa2rxpe.json.<br>
It includes features such as Borough, Neighborhood, Latitude and Longitude.

For Toronto, the data is in a table on the Wikipedia page url https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M,into.<br>
The BeautifulSoup was selected among the different website scraping libraries and packages in Python, to transform the data in 
the table on the Wikipedia page into a pandas dataframe. It includes the PostalCode, the Borough and Neighborhood. The latitude (lat) and longitudes (long) will be added directly from url http://cocl.us/Geospatial_data.

#### Dependencies
Several libraries are required for data handling and analysis, such as pandas, numpy and json; for cluster analysis, kmeans from sklearn will be imported as well as matplotlib, requests and geocoders to map the reults.

## Methodology

In this project we will convert NY and Toronto addresses into their equivalent latitude and longitude values. Then, use the Foursquare API to explore neighborhoods in both cities to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. The machine learning algorithm  k-means clustering will be especially useful to quickly segment the neighborhoods based on the top venues categories. In parallel with the Folium library we will visualize the neighborhoods in both cities and their emerging clusters.
Therefore, a significant number of dependencies will be used.

#### Import dependencies

In [None]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

#embed images
from IPython.display import Image

print('Libraries imported.')

#### Download and create New York city dataset

In order to explore NY city top venues, first we need a dataset that essentially contains the neighborhoods and respective coordinates. So, with the 'wget' command will load and the data and then loop to extract the borough, neighborhood, lat and long data from the 'Features' in the json file. Finaly, a 'neighborhood pandas data frame will be created.

In [20]:
!wget -q -O 'newyork_data.json' https://ibm.box.com/shared/static/fbpwbovar7lf8p5sgddm06cgipa2rxpe.json
print('Data downloaded!')

Data downloaded!


In [21]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [114]:
neighborhoods_data = newyork_data['features']
#neighborhoods_data

In [23]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [24]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [25]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


#### Download and create Toronto dataset

Toronto data is in a wikipedia page, so use the BeautifulSoup package to transform the data in the table on the Wikipedia page into a pandas dataframe.

In [26]:
website_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

In [27]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(website_url,'lxml')

In [28]:
wiki_data = soup.find('table',{'class':'wikitable sortable'})

In [29]:
column_names = ['PostalCode', 'Borough', 'Neighborhood'] 
df1 = pd.DataFrame(columns=column_names)
print("shape",df1.shape)
df1

shape (0, 3)


Unnamed: 0,PostalCode,Borough,Neighborhood


Read table from url and assume it is the first and only table in the url

In [30]:
wiki_data = soup.find('table',{'class':'wikitable sortable'})
table = wiki_data
rows = table.find_all("tr")
for row in rows:
        columns = row.find_all("td")
        headers = row.find_all("th")
        if len(columns) == 0 : continue
        Postcode = columns[0].text
        Borough = columns[1].text
        Neighbourhood = columns[2].text#.split("/a")
        df1 = df1.append({'PostalCode': Postcode,'Borough': Borough,'Neighborhood': Neighbourhood}, ignore_index=True)
#clean table       
df1 = df1.replace(r'\n','', regex=True) 
df1.drop(df1.index[0],inplace=True)

print("shape",df1.shape,"type",type(df1))
df1.tail()

shape (288, 3) type <class 'pandas.core.frame.DataFrame'>


Unnamed: 0,PostalCode,Borough,Neighborhood
284,M8Z,Etobicoke,Mimico NW
285,M8Z,Etobicoke,The Queensway West
286,M8Z,Etobicoke,Royal York South West
287,M8Z,Etobicoke,South of Bloor
288,M9Z,Not assigned,Not assigned


Ignore cells with a borough that is Not assigned.

In [31]:
df1=df1[df1.Borough != 'Not assigned']
print("shape",df1.shape)
df1.tail()

shape (212, 3)


Unnamed: 0,PostalCode,Borough,Neighborhood
283,M8Z,Etobicoke,Kingsway Park South West
284,M8Z,Etobicoke,Mimico NW
285,M8Z,Etobicoke,The Queensway West
286,M8Z,Etobicoke,Royal York South West
287,M8Z,Etobicoke,South of Bloor


Group rows with duplicate PostalCode and separate the Neighborhoods with a comma.

In [32]:
df1 = df1.groupby(['PostalCode','Borough'])['Neighborhood'].apply(', '.join).reset_index()
print('shape',df1.shape)

shape (103, 3)


In [33]:
df1['Neighborhood'] = df1.apply(
    lambda row: row['Borough'] if (row['Neighborhood']== 'Not assigned') else row['Neighborhood'],
    axis=1
)

Get the latitude and the longitude coordinates of each neighborhood by merging previous dataframe with the Geospatial data and map.

In [34]:
import io
url="http://cocl.us/Geospatial_data"
s=requests.get(url).content
c=pd.read_csv(io.StringIO(s.decode('utf-8')))
c.rename(columns={'Postal Code':'PostalCode'}, inplace=True)
c.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [116]:
df_m=pd.merge(df1, c, on='PostalCode', how='inner')
print('shape',df_m.shape)
df_m.head()

shape (103, 5)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In order to serach for the top venues in Foursquare, one needs to convert New York city and Toronto addresses into their equivalent latitude and longitudes with geolocator,  using the wget command.

#### NY neighborhoods

The neighborhoods of New York and Toronto will be superimposed on top, using geopy library and Folium, a visualization library, will be used to map.

First a quick look to NY data frame

In [36]:
print('The NY dataframe has {} boroughs and {} neighborhoods between Latitudes {} - {} and Longitudes {} - {} .'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0],
    min(neighborhoods['Latitude']),
    max(neighborhoods['Latitude']),
    min(neighborhoods['Longitude']),
    max(neighborhoods['Longitude'])
    )
)
df_ny=neighborhoods

The NY dataframe has 5 boroughs and 306 neighborhoods between Latitudes 40.50533376115642 - 40.90854282950666 and Longitudes -74.24656934235283 - -73.70884705889246 .


Given the extension of NY city we will explore the top venues only in Manhattan neighborhoods. Given it's multi-cultural population, this Borough area should cover enough diversity to get a snapshot of the different types of venues and hence really interesting results.
So first we'll slice the original data frame and create a new data frame that focus on the neighborhoods in Manhattan.

In [37]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)

In [38]:
print('The NY dataframe has {} boroughs and {} neighborhoods between Latitudes {} - {} and Longitudes {} - {} .'.format(
        len(manhattan_data['Borough'].unique()),
        manhattan_data.shape[0],
    min(manhattan_data['Latitude']),
    max(manhattan_data['Latitude']),
    min(manhattan_data['Longitude']),
    max(manhattan_data['Longitude'])
    )
)
manhattan_data.head()

The NY dataframe has 1 boroughs and 40 neighborhoods between Latitudes 40.70710710727048 - 40.87655077879964 and Longitudes -74.01686930508617 - -73.91065965862981 .


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


#### Toronto neighborhoods

After a quick look to Toronto data frame we learn that there are 11 Boroughs and 103 neighborhoods, so let's focus only in Boroughs that contain "Toronto" in their names and create a new data frame.

In [39]:
print('The Toronto dataframe has {} boroughs and {} neighborhoods between Latitudes {} - {} and Longitudes {} - {} .'.format(
        len(df_m['Borough'].unique()),
        df_m.shape[0],
    min(df_m['Latitude']),
    max(df_m['Latitude']),
    min(df_m['Longitude']),
    max(df_m['Longitude'])
    )
)
df_m.head()

The Toronto dataframe has 11 boroughs and 103 neighborhoods between Latitudes 43.60241370000001 - 43.836124700000006 and Longitudes -79.61581899999999 - -79.16049709999999 .


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [40]:
toronto_data = df_m[df_m['Borough'].str.contains('Toronto')].reset_index(drop=True)
print('The new Toronto dataframe has {} boroughs and {} neighborhoods between Latitudes {} - {} and Longitudes {} - {} .'.format(
        len(toronto_data['Borough'].unique()),
        toronto_data.shape[0],
    min(toronto_data['Latitude']),
    max(toronto_data['Latitude']),
    min(toronto_data['Longitude']),
    max(toronto_data['Longitude'])
    )
)
toronto_data.head()

The new Toronto dataframe has 4 boroughs and 38 neighborhoods between Latitudes 43.6289467 - 43.7280205 and Longitudes -79.4844499 - -79.2930312 .


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


With this filter we were able to simplify the search, and still make it representative of the diversity of both cities considering that approximately 40 neighborhoods is a reasonable number to explore.
So now we will use the Foursquare API to explore the neighborhoods and segment them based on the top venues.<br>
Define Foursquare Credentials and Version:

In [41]:
CLIENT_ID = 'F4GTTDZY2OHQZQPZLKI1NNPFS51PW4E3LINWQVHPLRSW2ASC' # your Foursquare ID
CLIENT_SECRET = 'XPSG5NRBYUFCGYS55BBP1Y5KEUSW15BB5UXSWULW0IJ0GNPN' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: F4GTTDZY2OHQZQPZLKI1NNPFS51PW4E3LINWQVHPLRSW2ASC
CLIENT_SECRET:XPSG5NRBYUFCGYS55BBP1Y5KEUSW15BB5UXSWULW0IJ0GNPN


<a id='item2'></a>

#### Manhattan top venues per neighborhood

Lets start by sending the get request to get a preview of the 10th neighborhood in the Manhattan data frame.

In [42]:
neighborhood_latitude = manhattan_data.loc[10, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = manhattan_data.loc[10, 'Longitude'] # neighborhood longitude value

neighborhood_name = manhattan_data.loc[10, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Lenox Hill are 40.76811265828733, -73.9588596881376.


In [43]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL


'https://api.foursquare.com/v2/venues/explore?&client_id=F4GTTDZY2OHQZQPZLKI1NNPFS51PW4E3LINWQVHPLRSW2ASC&client_secret=XPSG5NRBYUFCGYS55BBP1Y5KEUSW15BB5UXSWULW0IJ0GNPN&v=20180605&ll=40.76811265828733,-73.9588596881376&radius=500&limit=100'

In [44]:
results = requests.get(url).json()
#results

Then will extracts the category of the 100 venues from Lenox Hill in a json file, from the *items* key and create a pandas data frame with that information.

In [45]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [46]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))
nearby_venues.head()

100 venues were returned by Foursquare.


Unnamed: 0,name,categories,lat,lng
0,Whiskey & Wine Off 69,Liquor Store,40.767272,-73.959544
1,Up Thai,Thai Restaurant,40.769898,-73.957598
2,sweetgreen,Salad Place,40.767128,-73.956846
3,Cigar Inn,Smoke Shop,40.768776,-73.956222
4,Anthropologie,Women's Store,40.769296,-73.961085


Now that we confirmed that we got the category of the top 100 venues in a radius of 500m in the Lenox Hill neighborhood we will expand to Manhattan and Toronto's Boroughs using the same procedure, which is, a function that creates the GET request, extracts only relevant data for each venue (name, lat, long and category) within 500 m radius.

In [47]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Create new data frame with the neighborhoods data (name, lat and long) together with the explore data (venue name, lat, long and category).

In [48]:
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


In [49]:
print("Manhattan venues",len(manhattan_venues))
manhattan_venues.head()

Manhattan venues 3313


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
1,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
2,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
3,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop
4,Marble Hill,40.876551,-73.91066,Land & Sea Restaurant,40.877885,-73.905873,Seafood Restaurant


In [50]:
print('There are {} uniques categories.'.format(len(manhattan_venues['Venue Category'].unique())))
#manhattan_venues.groupby('Neighborhood').count()

There are 332 uniques categories.


<a id='item3'></a>

Now we need to create a data frame with the frequency of occurrence of each category grouped by neighborhood.

In [51]:
# one hot encoding
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

manhattan_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Animal Shelter,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,Australian Restaurant,Austrian Restaurant,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Belgian Restaurant,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Bridal Shop,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Business Service,Butcher,Cafeteria,Café,Cambodian Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Caucasian Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Christmas Market,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Academic Building,College Bookstore,College Cafeteria,College Theater,Comedy Club,Community Center,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cultural Center,Cupcake Shop,Cycle Studio,Czech Restaurant,Dance Studio,Daycare,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General College & University,General Entertainment,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Heliport,Herbs & Spices Store,High School,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Kosher Restaurant,Latin American Restaurant,Laundry Service,Lebanese Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Medical Center,Mediterranean Restaurant,Memorial Site,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music School,Music Store,Music Venue,Nail Salon,New American Restaurant,Newsstand,Nightclub,Non-Profit,Noodle House,North Indian Restaurant,Office,Opera House,Optical Shop,Organic Grocery,Other Nightlife,Outdoor Sculpture,Outdoors & Recreation,Paella Restaurant,Pakistani Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Piano Bar,Pie Shop,Pier,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Pub,Public Art,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,River,Rock Climbing Spot,Rock Club,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Snack Place,Soba Restaurant,Social Club,Soup Place,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Stables,Steakhouse,Street Art,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Temple,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Tibetan Restaurant,Tiki Bar,Tourist Information Center,Toy / Game Store,Track,Trail,Train,Tree,Turkish Restaurant,Udon Restaurant,Used Bookstore,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Watch Shop,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [52]:
#Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()
manhattan_grouped.shape
#manhattan_grouped
print('Manhattan venue categories grouped by the {} neighborhoods.'.format(len(manhattan_grouped)))

Manhattan venue categories grouped by the 40 neighborhoods.


Finally, with a function to sort the venues in descending order we can create a data frame with the top 3 venues in Manhattan city.

In [53]:
num_top_venues = 3

for hood in manhattan_grouped['Neighborhood']:
    #print("----"+hood+"----")
    temp = manhattan_grouped[manhattan_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    #print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    #print('\n')

In [55]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [56]:
num_top_venues = 3

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,Battery Park City,Coffee Shop,Park,Hotel
1,Carnegie Hill,Pizza Place,Cosmetics Shop,Coffee Shop
2,Central Harlem,African Restaurant,French Restaurant,Pizza Place
3,Chelsea,Coffee Shop,Italian Restaurant,Ice Cream Shop
4,Chinatown,Chinese Restaurant,Bubble Tea Shop,American Restaurant


<a id='item4'></a>

#### Manhattan - Cluster Neighborhoods

Finally we can cluster the neighborhoods to find the top venues in each cluster and compare with Toronto.
The top 3 venues per neighborhood will be grouped with k-means clustering algorithm in a final data frame named Manhattan_merged.<br>

Run *k*-means to cluster the neighborhood into 3 clusters.

In [57]:
# set number of clusters
kclusters = 3

manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 1, 0, 0, 0, 0, 0, 1, 1, 0], dtype=int32)

In [58]:
manhattan_merged = manhattan_data

# add clustering labels
manhattan_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

manhattan_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,0,Coffee Shop,Discount Store,Yoga Studio
1,Manhattan,Chinatown,40.715618,-73.994279,1,Chinese Restaurant,Bubble Tea Shop,American Restaurant
2,Manhattan,Washington Heights,40.851903,-73.9369,0,Café,Bakery,Mobile Phone Shop
3,Manhattan,Inwood,40.867684,-73.92121,0,Café,Mexican Restaurant,Pizza Place
4,Manhattan,Hamilton Heights,40.823604,-73.949688,0,Mexican Restaurant,Deli / Bodega,Café


<a id='item5'></a>

Now with the data frame with the top venues in Manhattan we'll repeat the same process for Toronto. 

#### Toronto top venues per neighborhood

In [59]:
address = 'Toronto'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

  This is separate from the ipykernel package so we can avoid doing imports until


The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [60]:
neighborhood_latitude = toronto_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = toronto_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = toronto_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of The Beaches are 43.67635739999999, -79.2930312.


In [61]:
url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
#send the get request
results = requests.get(url).json()
#results

In [62]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

Unnamed: 0,name,categories,lat,lng
0,Grover Pub and Grub,Pub,43.679181,-79.297215
1,Starbucks,Coffee Shop,43.678798,-79.298045
2,Upper Beaches,Neighborhood,43.680563,-79.292869
3,Beaches Fitness,Gym / Fitness Center,43.680319,-79.290991
4,Dip 'n Sip,Coffee Shop,43.678897,-79.297745


In [63]:
# type your answer here
toronto_venues = getNearbyVenues(names=toronto_data['Neighborhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  )

The Beaches
The Danforth West, Riverdale
The Beaches West, India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park, Summerhill East
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront, Regent Park
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Roselawn
Forest Hill North, Forest Hill West
The Annex, North Midtown, Yorkville
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie
Dovercourt Village, Dufferin
Little Portugal, Trinity
Brockton, Exhibition Place, Parkdale Village
High Park, The 

In [64]:
print("Toronto venues",len(toronto_venues))

Toronto venues 1707


In [65]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Berczy Park,54,54,54,54,54,54
"Brockton, Exhibition Place, Parkdale Village",19,19,19,19,19,19
Business Reply Mail Processing Centre 969 Eastern,17,17,17,17,17,17
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",14,14,14,14,14,14
"Cabbagetown, St. James Town",47,47,47,47,47,47
Central Bay Street,82,82,82,82,82,82
"Chinatown, Grange Park, Kensington Market",100,100,100,100,100,100
Christie,15,15,15,15,15,15
Church and Wellesley,87,87,87,87,87,87


In [66]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 238 uniques categories.


In [70]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

#group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
#toronto_grouped

In [71]:
num_top_venues = 3

for hood in toronto_grouped['Neighborhood']:
    #print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
   # print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    #print('\n')

In [72]:
num_top_venues = 3

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Thai Restaurant
1,Berczy Park,Coffee Shop,Restaurant,Cocktail Bar
2,"Brockton, Exhibition Place, Parkdale Village",Coffee Shop,Café,Breakfast Spot
3,Business Reply Mail Processing Centre 969 Eastern,Yoga Studio,Auto Workshop,Park
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Lounge,Airport Terminal,Airport Service
5,"Cabbagetown, St. James Town",Coffee Shop,Restaurant,Italian Restaurant
6,Central Bay Street,Coffee Shop,Café,Italian Restaurant
7,"Chinatown, Grange Park, Kensington Market",Bar,Café,Vegetarian / Vegan Restaurant
8,Christie,Café,Grocery Store,Park
9,Church and Wellesley,Japanese Restaurant,Sushi Restaurant,Coffee Shop


#### Toronto - Cluster Neighborhoods

In [73]:
# set number of clusters
kclusters = 3

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

In [74]:
toronto_merged = toronto_data

# add clustering labels
toronto_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,1,Coffee Shop,Gym / Fitness Center,Pub
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,1,Greek Restaurant,Coffee Shop,Ice Cream Shop
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572,1,Park,Sushi Restaurant,Board Shop
3,M4M,East Toronto,Studio District,43.659526,-79.340923,1,Café,Coffee Shop,Bakery
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,1,Bus Line,Park,Swim School


## Results - Examine Clusters

NY city is spread across 5 Boroughs and 306 neighborhoods. Within Manhattan Borough, where there are 40 neighborhoods, in a search radius of 500m, we found 3313 venues, belonging to 332 categories.
For Toronto there are 11 Boroughs and 103 neighborhoods. Four out of those 11 Borough's have "Toronto" in the name, and comprise 38 neighborhoods. In the same search radius, 500m, we found 1707 venues spread in 238 unique categories.<br>
After filtering, a reasonable and comparable number of neighborhoods remained in both cities, as mentioned, approximately 40.
In Manhattan there are 24 Neighborhoods in the 1st cluster,  15 in the 2nd 1 in the 3rd.
In Toronto there are 2 Neighborhoods in the 1st cluster,  35 in the 2nd 1 in the 3rd.

In [75]:
print('In Manhattan there are {} Neighborhoods in the 1st cluster,  {} in the 2nd {} in the 3rd.'.format(len(manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0]), len(manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1]), len(manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2])))

In Manhattan there are 24 Neighborhoods in the 1st cluster,  15 in the 2nd 1 in the 3rd.


In [76]:
print('In Toronto there are {} Neighborhoods in the 1st cluster,  {} in the 2nd {} in the 3rd.'.format(len(toronto_merged.loc[toronto_merged['Cluster Labels'] == 0]), len(toronto_merged.loc[toronto_merged['Cluster Labels'] == 1]), len(toronto_merged.loc[toronto_merged['Cluster Labels'] == 2])))

In Toronto there are 2 Neighborhoods in the 1st cluster,  35 in the 2nd 1 in the 3rd.


##### Manhattan Cluster Map

In [77]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

The geograpical coordinate of Manhattan are 40.7900869, -73.9598295.


##### Toronto Cluster Map

In [78]:
address = 'Toronto'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

  This is separate from the ipykernel package so we can avoid doing imports until


The geograpical coordinate of Toronto are 43.653963, -79.387207.


After careful examination of each neighborhood we determine the common and the discriminating venue categories that distinguish each cluster. 
To facilitate the interpretation a data frame with the top 3 venues per cluster was created and combined in the end to get the TOP 3:

##### Manhattan Cluster 1

In [79]:
NY_cluster1_= manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]
NY_cluster1_

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,Marble Hill,Coffee Shop,Discount Store,Yoga Studio
2,Washington Heights,Café,Bakery,Mobile Phone Shop
3,Inwood,Café,Mexican Restaurant,Pizza Place
4,Hamilton Heights,Mexican Restaurant,Deli / Bodega,Café
5,Manhattanville,Deli / Bodega,Sushi Restaurant,Italian Restaurant
6,Central Harlem,African Restaurant,French Restaurant,Pizza Place
9,Yorkville,Italian Restaurant,Gym,Bar
10,Lenox Hill,Italian Restaurant,Coffee Shop,Sushi Restaurant
12,Upper West Side,Italian Restaurant,Bar,Coffee Shop
14,Clinton,Theater,Coffee Shop,Gym / Fitness Center


In [80]:
NY_cluster1 = pd.DataFrame({'NY_1_1': NY_cluster1_.groupby('1st Most Common Venue')['Neighborhood'].nunique(),
                      'NY_1_2': NY_cluster1_.groupby('2nd Most Common Venue')['Neighborhood'].nunique(),
                      'NY_1_3': NY_cluster1_.groupby('3rd Most Common Venue')['Neighborhood'].nunique()})
NY_cluster1=NY_cluster1.fillna(0)                      
NY_cluster1['Total1'] = NY_cluster1['NY_1_1'] + NY_cluster1['NY_1_2'] +  NY_cluster1['NY_1_3']
NY_cluster1.sort_values(by='Total1', ascending=False,inplace=True)
NY_cluster1.head(3)

Unnamed: 0,NY_1_1,NY_1_2,NY_1_3,Total1
Italian Restaurant,9.0,2.0,2.0,13.0
Coffee Shop,5.0,2.0,2.0,9.0
Sushi Restaurant,0.0,3.0,1.0,4.0


##### Manhattan Cluster 2

In [88]:
NY_cluster2_=manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]
NY_cluster2_

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
1,Chinatown,Chinese Restaurant,Bubble Tea Shop,American Restaurant
7,East Harlem,Mexican Restaurant,Bakery,Latin American Restaurant
8,Upper East Side,Italian Restaurant,Exhibit,Art Gallery
11,Roosevelt Island,Park,Sandwich Place,Deli / Bodega
13,Lincoln Square,Gym / Fitness Center,Theater,Concert Hall
15,Midtown,Hotel,Theater,Steakhouse
19,East Village,Bar,Ice Cream Shop,Wine Bar
20,Lower East Side,Coffee Shop,Café,Latin American Restaurant
21,Tribeca,Italian Restaurant,American Restaurant,Spa
22,Little Italy,Bakery,Café,Yoga Studio


In [89]:
NY_cluster2 = pd.DataFrame({'NY_2_1': NY_cluster2_.groupby('1st Most Common Venue')['Neighborhood'].nunique(),
                      'NY_2_2': NY_cluster2_.groupby('2nd Most Common Venue')['Neighborhood'].nunique(),
                      'NY_2_3': NY_cluster2_.groupby('3rd Most Common Venue')['Neighborhood'].nunique()})
                       
NY_cluster2=NY_cluster2.fillna(0) 
NY_cluster2['Total2'] = NY_cluster2['NY_2_1'] + NY_cluster2['NY_2_2'] +  NY_cluster2['NY_2_3']
NY_cluster2.sort_values(by='Total2', ascending=False,inplace=True)
NY_cluster2.head(3)

Unnamed: 0,NY_2_1,NY_2_2,NY_2_3,Total2
Coffee Shop,4.0,1.0,0.0,5.0
Park,1.0,1.0,1.0,3.0
Café,0.0,2.0,1.0,3.0


##### Manhattan Cluster 3

In [90]:
NY_cluster3_=manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]
NY_cluster3_

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
30,Carnegie Hill,Pizza Place,Cosmetics Shop,Coffee Shop


In [91]:
NY_cluster3 = pd.DataFrame({'NY_3_1': NY_cluster3_.groupby('1st Most Common Venue')['Neighborhood'].nunique(),
                      'NY_3_2': NY_cluster3_.groupby('2nd Most Common Venue')['Neighborhood'].nunique(),
                      'NY_3_3': NY_cluster3_.groupby('3rd Most Common Venue')['Neighborhood'].nunique()})
                       
NY_cluster3=NY_cluster3.fillna(0)
NY_cluster3['Total3'] = NY_cluster3['NY_3_1'] + NY_cluster3['NY_3_2'] +  NY_cluster3['NY_3_3']
NY_cluster3.sort_values(by='Total3', ascending=False,inplace=True)
NY_cluster3

Unnamed: 0,NY_3_1,NY_3_2,NY_3_3,Total3
Coffee Shop,0.0,0.0,1.0,1.0
Cosmetics Shop,0.0,1.0,0.0,1.0
Pizza Place,1.0,0.0,0.0,1.0


Now we can find the most frequent categories in all the clusters and finaly get the top 3 depicted in the final Manhattan_top3 data frame.

In [92]:
frames = [NY_cluster1, NY_cluster2, NY_cluster3]
result = pd.concat(frames)
result=result.fillna(0)
result.shape

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  


(65, 12)

In [93]:
#Sum the frequency of the categories in the 3 clusters 
result['category'] = result.index
result1=result.groupby('category').sum()
result1.shape

(43, 12)

In [94]:
#Determine the final amount of each category in all the cliusters and sort in descending order to get the top 3
Manhattan_top3=result1[['Total1','Total2','Total3']]
Manhattan_top3['Total'] = Manhattan_top3['Total1'] + Manhattan_top3['Total2'] +  Manhattan_top3['Total3']
Manhattan_top3.sort_values(by='Total', ascending=False,inplace=True)
print('Manhattan TOP 3:')
Manhattan_top3.head(3)

Manhattan TOP 3:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.


Unnamed: 0_level_0,Total1,Total2,Total3,Total
category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Italian Restaurant,13.0,3.0,0.0,16.0
Coffee Shop,9.0,5.0,1.0,15.0
Café,3.0,3.0,0.0,6.0


Now we can compare with Toronto's top categories.

##### Toronto Cluster 1

In [95]:
TO_cluster1_=toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
TO_cluster1_

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
24,Central Toronto,0,Coffee Shop,Café,Sandwich Place
27,Downtown Toronto,0,Airport Lounge,Airport Terminal,Airport Service


In [96]:
TO_cluster1 = pd.DataFrame({'TO_1_1': TO_cluster1_.groupby('1st Most Common Venue')['Borough'].nunique(),
                      'TO_1_2': TO_cluster1_.groupby('2nd Most Common Venue')['Borough'].nunique(),
                      'TO_1_3': TO_cluster1_.groupby('3rd Most Common Venue')['Borough'].nunique()})
TO_cluster1=TO_cluster1.fillna(0)                      
TO_cluster1['Total1'] = TO_cluster1['TO_1_1'] + TO_cluster1['TO_1_2'] +  TO_cluster1['TO_1_3']
TO_cluster1.sort_values(by='Total1', ascending=False,inplace=True)
TO_cluster1.head(3)

Unnamed: 0,TO_1_1,TO_1_2,TO_1_3,Total1
Airport Lounge,1.0,0.0,0.0,1.0
Airport Service,0.0,0.0,1.0,1.0
Airport Terminal,0.0,1.0,0.0,1.0


##### Toronto Cluster 2

In [97]:
TO_cluster2_=toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
TO_cluster2_

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,East Toronto,1,Coffee Shop,Gym / Fitness Center,Pub
1,East Toronto,1,Greek Restaurant,Coffee Shop,Ice Cream Shop
2,East Toronto,1,Park,Sushi Restaurant,Board Shop
3,East Toronto,1,Café,Coffee Shop,Bakery
4,Central Toronto,1,Bus Line,Park,Swim School
5,Central Toronto,1,Park,Hotel,Burger Joint
6,Central Toronto,1,Sporting Goods Shop,Coffee Shop,Clothing Store
7,Central Toronto,1,Pizza Place,Dessert Shop,Sandwich Place
8,Central Toronto,1,Playground,Gym,Tennis Court
9,Central Toronto,1,Pub,Coffee Shop,Light Rail Station


In [98]:
TO_cluster2 = pd.DataFrame({'TO_2_1': TO_cluster2_.groupby('1st Most Common Venue')['Borough'].nunique(),
                      'TO_2_2': TO_cluster2_.groupby('2nd Most Common Venue')['Borough'].nunique(),
                      'TO_2_3': TO_cluster2_.groupby('3rd Most Common Venue')['Borough'].nunique()})
TO_cluster2=TO_cluster2.fillna(0)                      
TO_cluster2['Total2'] = TO_cluster2['TO_2_1'] + TO_cluster2['TO_2_2'] +  TO_cluster2['TO_2_3']
TO_cluster2.sort_values(by='Total2', ascending=False,inplace=True)
TO_cluster2.head(3)

Unnamed: 0,TO_2_1,TO_2_2,TO_2_3,Total2
Park,3.0,2.0,2.0,7.0
Coffee Shop,3.0,3.0,1.0,7.0
Café,2.0,2.0,2.0,6.0


##### Toronto Cluster 3

In [99]:
TO_cluster3_=toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
TO_cluster3_

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
28,Downtown Toronto,2,Coffee Shop,Restaurant,Café


In [100]:
TO_cluster3 = pd.DataFrame({'TO_3_1': TO_cluster3_.groupby('1st Most Common Venue')['Borough'].nunique(),
                      'TO_3_2': TO_cluster3_.groupby('2nd Most Common Venue')['Borough'].nunique(),
                      'TO_3_3': TO_cluster3_.groupby('3rd Most Common Venue')['Borough'].nunique()})
TO_cluster3=TO_cluster3.fillna(0)                      
TO_cluster3['Total3'] = TO_cluster3['TO_3_1'] + TO_cluster3['TO_3_2'] +  TO_cluster3['TO_3_3']
TO_cluster3.sort_values(by='Total3', ascending=False,inplace=True)
TO_cluster3.head(3)

Unnamed: 0,TO_3_1,TO_3_2,TO_3_3,Total3
Café,0.0,0.0,1.0,1.0
Coffee Shop,1.0,0.0,0.0,1.0
Restaurant,0.0,1.0,0.0,1.0


In [101]:
frames_to = [TO_cluster1, TO_cluster2, TO_cluster3]
result_to = pd.concat(frames_to)
result_to=result_to.fillna(0)
result_to.shape

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  


(55, 12)

In [102]:
#Sum the frequency of the categories in the 3 clusters 
result_to['category'] = result_to.index
result_to=result_to.groupby('category').sum()
result_to.shape

(49, 12)

In [103]:
#Determine the final amount of each category in all the cliusters and sort in descending order to get the top 3
Toronto_top3=result_to[['Total1','Total2','Total3']]
Toronto_top3['Total'] = result_to['Total1'] + result_to['Total2'] +  result_to['Total3']
Toronto_top3.sort_values(by='Total', ascending=False,inplace=True)
print('Toronto TOP 3:')
Toronto_top3.head(3)

Toronto TOP 3:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.


Unnamed: 0_level_0,Total1,Total2,Total3,Total
category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Coffee Shop,1.0,7.0,1.0,9.0
Café,1.0,6.0,1.0,8.0
Park,0.0,7.0,0.0,7.0


Finally, we can combine the information from both cities to get the __ultimate top 3 categories__.

In [108]:
frames_final = [Manhattan_top3, Toronto_top3]
df3 = pd.concat(frames_final)
df3=df3.fillna(0)
#Sum the frequency of the categories in the 3 clusters 
df3['category'] = df3.index
df3=df3.groupby('category').sum()
df3.shape

Final_top=df3[['Total1','Total2','Total3']]
Final_top['Total'] = df3['Total1'] + df3['Total2'] +  df3['Total3']
Final_top.sort_values(by='Total', ascending=False,inplace=True)
print('TOP 3:')

Final_top3=Final_top.iloc[0:3, 3]
Final_top3

TOP 3:


Defaulting to column, but this will raise an ambiguity error in a future version
  


category
Coffee Shop           24.0
Italian Restaurant    17.0
Café                  14.0
Name: Total, dtype: float64

In [109]:
Final_top.shape

(72, 4)

In [110]:
Final_top

Unnamed: 0_level_0,Total1,Total2,Total3,Total
category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Coffee Shop,10.0,12.0,2.0,24.0
Italian Restaurant,13.0,4.0,0.0,17.0
Café,4.0,9.0,1.0,14.0
Park,1.0,10.0,0.0,11.0
Pizza Place,3.0,3.0,1.0,7.0
Hotel,2.0,5.0,0.0,7.0
Sushi Restaurant,4.0,3.0,0.0,7.0
Bakery,2.0,4.0,0.0,6.0
Bar,2.0,4.0,0.0,6.0
Yoga Studio,2.0,3.0,0.0,5.0


In [111]:
print('BOTTOM 3 in the list:', Final_top.iloc[69:72, 3])

BOTTOM 3 in the list: category
New American Restaurant    1.0
Burger Joint               1.0
African Restaurant         1.0
Name: Total, dtype: float64


## Discussion

More than 5000 venues spread in a 500m radius in Manhattan and Toronto, were explored to determine the most frequent categories and hence those among more than 300 categories with highest probability to succeed in Lisbon.<br>
The request was limited to 100 venues per neighborhood and among the different categories there were Pharmacy, coffees, restaurants,bars, parks and stores.<br>
The most frequent venue categories were selected for each of the nearly 40 neighborhoods in each city dataset. <br>
<br>
The cluster analysis allowed to merge neighborhoods based on their venues. This way we were able to understand which venues were common within and between clusters. 
The cluster number 3 in both cities had a single neighborhood. Still we kept those clusters.<br>
In Manhattan there were 24 Neighborhoods in the 1st cluster,  15 in the 2nd 1 in the 3rd. The top categories between those clusters were __Coffee Shop__ followed by __Italian Restaurant__ and __Café__.<br>
In Toronto there were 2 Neighborhoods in the 1st cluster,  35 in the 2nd 1 in the 3rd and the top categories were again __Coffee__ __Shop__ and __Café__ and a new one, the __Park__.<br>
<br>
Therefore, the TOP venues are mostly related to food and drinks.
<br>
After combining the information from both cities, the bottom categories from the list included mostly services such as stores, markets, pool, pharmacy. There were also venues which are less frequent given their specificity, such as airport lounge, bus line and so on.
<br> 
Interestingly the food industry was also in the bottom of the most frequent, and included specific cuisines, such as African, Vegan, New American and Thai food. Therefore, maybe it is better to deeply understand the local market and their interests before diving in one of these businesses. 

## Conclusion

The __aim__ of this project was to provide a __list with the top venues__, in two of the number one destination for travelers around the world, New York and Toronto.
With the list we hope to provide some guidance to entrepreneurs, business owners, the government and all the stakeholders, so that they can make informed decision which will ultimately lead them to success, improving the touristic experience with top offers and also benefit Lisbon citizens.<br>
<br>
The top category venues were  __Coffee Shop, Italian Restaurant and Café__.<br>
No wonder there are so many sayings about coffee and food.

I leave you with one of my favorite even though it is the end!

In [112]:
Image(url='https://i.pinimg.com/474x/d6/21/38/d62138eb36b119f3f2973d65d0862717--coffee-humor-coffee-quotes.jpg')

### Thank you for reading my final assignment!
This notebook was created by [Solange Pachco]. 
This notebook is part of a course on **Coursera** called *Applied Data Science Capstone*. 