# Description of the problem and a Discussion of the background

Famous Indian restaurant in Newyork is planning to open their branch in Toronto. They approached us to find a best location in Toronto where the branch can be opened. As Toronto already got many Indian restaurant, it's very important to find a spot which is 

* Similar to the current location in Newyork
* Not having much Indian restaurants

# Description of the data and How it will be used to solve the problem

Newyork data will be downloaded from the following site and cleanedup for this project. 

https://cocl.us/new_york_dataset

For Toronto, web scrapping will be done to extract the data from the following site

https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

Once we have the data available, the following approach will be used to solve the problem

* Toronto data will be used first to assess the current restaurant location and the amenities available within 500 meters and set this as a base line for the future location in Newyork
* With the help of Newyork data, we will come up with nice neighbourhoods which is quite similar with the current Toronto neighborhood, but not infested much with Indian restaurants.
* Foursquare data will be used for segmentation and KClustering will be used to bucket the neighbourhood which shows similar behaviour

Once the analysis is carried out, the report will be generated and provided to the client with the following information.

Best top 3 locations in Newyork which shows quite similar structure to current restaurant location in Toronto, but not having more Indian restaurants in those locations, which is a must criteria from the client for this new location selection

In [7]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset

# Loading Newyork Data

In [8]:
# Reading the json as a dict
import json

with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
    


In [9]:
columns = ['Borough','Neighborhood','Lat','Lon']
nyc_df = pd.DataFrame(columns=columns)


In [10]:
for data in newyork_data['features']:
    borough = data['properties']['borough']
    neighbour = data['properties']['name']
    lat = data['properties']['bbox'][1]
    lon = data['properties']['bbox'][0]
    nyc_df = nyc_df.append(
        {'Borough':borough,
         'Neighborhood':neighbour,
         'Lat':lat,
         'Lon':lon   
        },ignore_index=True
    )


In [93]:
nyc_df.shape

(306, 4)

In [13]:
!conda install -c conda-forge folium=0.5.0 --yes 

Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    altair:  2.2.2-py35_1 conda-forge
    branca:  0.3.1-py_0   conda-forge
    folium:  0.5.0-py_0   conda-forge
    vincent: 0.4.4-py_1   conda-forge

altair-2.2.2-p 100% |################################| Time: 0:00:00  39.54 MB/s
branca-0.3.1-p 100% |################################| Time: 0:00:00  36.80 MB/s
vincent-0.4.4- 100% |################################| Time: 0:00:00  40.00 MB/s
folium-0.5.0-p 100% |################################| Time: 0:00:00  45.32 MB/s


In [14]:
import folium

In [15]:
latitude =40.730610
longitude = -73.935242
map_nyc = folium.Map(location=[latitude, longitude], zoom_start=12)

for lat,lan,borough in zip(nyc_df.Lat,nyc_df.Lon,nyc_df.Borough):
    
    label = '{}, {}, {}'.format(lat,lan,borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lan],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_nyc)
map_nyc

In [16]:
rest_lan = 40.7826825671257
rest_lon = -73.95325646837112

In [17]:
CLIENT_ID = 'IYUYZGQ1MKKUXRYJVYPIDLZ5OHJ0FZH0EW43ZDS554AJCIUB' # your Foursquare ID
CLIENT_SECRET = '3X3UA3W0VYCYVLM1LD2KP03E2J1RI4YL3BVGD0SYEWVTWFOM' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)


Your credentails:
CLIENT_ID: IYUYZGQ1MKKUXRYJVYPIDLZ5OHJ0FZH0EW43ZDS554AJCIUB
CLIENT_SECRET:3X3UA3W0VYCYVLM1LD2KP03E2J1RI4YL3BVGD0SYEWVTWFOM


In [18]:
def get_100_venues(borough_latitude,borough_longitude):
    LIMIT = 100 # limit of number of venues returned by Foursquare API
    radius = 500 # define radius
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    borough_latitude, 
    borough_longitude, 
    radius, 
    LIMIT)
    return url

In [19]:
url =get_100_venues(rest_lan,rest_lon)
url


'https://api.foursquare.com/v2/venues/explore?&client_id=IYUYZGQ1MKKUXRYJVYPIDLZ5OHJ0FZH0EW43ZDS554AJCIUB&client_secret=3X3UA3W0VYCYVLM1LD2KP03E2J1RI4YL3BVGD0SYEWVTWFOM&v=20180604&ll=40.7826825671257,-73.95325646837112&radius=500&limit=100'

In [20]:
import requests
results = requests.get(url).json()


In [94]:
cons_category = {
    
    'Bar':  ['Beer Bar',
 'Bar',
 'Sports Bar',
 'Brewery',
 'Distillery',
 'Hookah Bar',
 'Piano Bar',
 'Dive Bar',
 'Salon / Barbershop',
 'Gastropub',
 'Cocktail Bar',
 'Beer Store',
 'Wine Shop',
 'Whisky Bar',
 'Fruit & Vegetable Store',
 'Wine Bar',
 'Karaoke Bar',
 'Jazz Club',
 'Beer Garden',
 'Tiki Bar',
 'Gay Bar',
 'Sake Bar',
 'Hotel Bar',
 'Irish Pub',
 'Beach Bar'],
    
    'Business' :[
        
         'Office',
 'Coworking Space'
 
        
    ],
    
    'Entertainment' :[
        
         'Bowling Alley',
 'Pub',
 'Nightclub',
 'Arcade',
 'Performing Arts Venue',
 'History Museum',
 'Music Venue',
 'Harbor / Marina',
 'Boat or Ferry',
 'Lounge',
 'Art Gallery',
 'Indie Theater',
 'Antique Shop',
 'Social Club',
 'Dance Studio',
 'Other Nightlife',
 'Event Space',
 'Indie Movie Theater',
 'Rock Club',
 'General Entertainment',
 'Nightlife Spot',
 'Opera House',
 'Theater',
 'Baseball Stadium',
 'Movie Theater',
 'Roof Deck',
 'Concert Hall',
 'Music Store',
 'Piercing Parlor',
 'Exhibit',
 'Club House',
 'Street Art',
 'Cultural Center',
 'College Theater',
 'Multiplex',
 'Strip Club'
        
    ],
    
    'FastFood':[
        
         'Food Truck',
'Dessert Shop',
 'Donut Shop',
 'Sandwich Place',
 'Pizza Place',
 'Fried Chicken Joint',
 'Fast Food Restaurant',
 'Bakery',
 'Gourmet Shop',
 'Burger Joint',
 'Wings Joint',
 'Breakfast Spot',
 'Café',
 'Soup Place',
 'BBQ Joint',
 'Frozen Yogurt Shop',
 'Juice Bar',
 'Fish & Chips Shop',
 'Cupcake Shop',
 'Food & Drink Shop',
 'Cheese Shop',
 'Bagel Shop',
 'Taco Place',
 'Tea Room',
 'Snack Place',
 'Butcher',
 'Noodle House',
 'Creperie',
 'Salad Place',
 'Food Stand',
 'Bistro',
 'Burrito Place',
 'Food Court',
 'Hot Dog Joint',
 'Poke Place',
 'Cafeteria',
 'College Cafeteria',
 'Smoothie Shop',
     
        
        
    ],
    
    'Kids':[
        
         'Candy Store',
 'Pet Store',
 'Video Game Store',
 'Electronics Store',
 'Video Store',
 'Pool',
 'Beach',
 'Toy / Game Store',
 'Museum',
 'Outdoors & Recreation',
 'Racetrack',
 'Used Bookstore',
 'Gaming Cafe',
 'Library',
 'School',
 'Climbing Gym',
 'Music School',
 'Public Art',
 'Daycare',
 'College Academic Building',
 'High School',
 'Circus',
 'Recreation Center',
 'Comedy Club',
 'Rock Climbing Spot',
 'General College & University',
 'Pet Café',
 'Theme Park',
 'Baby Store',
 'Laser Tag',
 'Science Museum',
       
        
    ],
    
'Parks' :[
    
     'Construction & Landscaping',
 'Art Museum',
 'Garden Center',
 'Garden',
 'Waterfront',
 'Farm',
 'Dog Run',
 'Skating Rink',
 'Sculpture Garden',
 'Fountain',
 'Community Center',
 'Gym Pool',
 'Memorial Site',
 'Auditorium',
 'Tree',
 'Botanical Garden',
 'Pier',
 'Field',
 'State / Provincial Park',
 'Campground',
 'Church',

    
    
],
    
'Residential':[
    
 'Baseball Field',
 'Basketball Court',
 'Park',
 'Convenience Store',
 'Cosmetics Shop',
 'Plaza',
 'River',
 'Playground',
 'Bank',
 'Home Service',
 'Coffee Shop',
 'Warehouse Store',
 'Trail',
 'Rental Car Location',
 'Supplement Shop',
 'Outdoor Sculpture',
 'Yoga Studio',
 'Gym',
 'Tennis Stadium',
 'Moving Target',
 'Gym / Fitness Center',
 'Track',
 'Intersection',
 'Martial Arts Dojo',
 'Pool Hall',
 'Gymnastics Gym',
 'Neighborhood',
 'Residential Building (Apartment / Condo)',
 'Building',
 'Animal Shelter',

    
],
    
    'Restaurant':[
        
         'Ice Cream Shop',
 'Caribbean Restaurant',
 'Restaurant',
 'Diner',
 'Seafood Restaurant',
 'Deli / Bodega',
 'Chinese Restaurant',
 'Latin American Restaurant',
 'Mexican Restaurant',
 'Spanish Restaurant',
 'Steakhouse',
 'American Restaurant',
 'Italian Restaurant',
 'Sushi Restaurant',
 'French Restaurant',
 'Tapas Restaurant',
 'African Restaurant',
 'Greek Restaurant',
 'Paella Restaurant',
 'Asian Restaurant',
 'Peruvian Restaurant',
 'South American Restaurant',
 'Arepa Restaurant',
 'Buffet',
 'Mediterranean Restaurant',
 'Japanese Restaurant',
 'Southern / Soul Food Restaurant',
 'Thai Restaurant',
 'Food',
 'Comfort Food Restaurant',
 'Middle Eastern Restaurant',
 'Caucasian Restaurant',
 'New American Restaurant',
 'Vietnamese Restaurant',
 'Dim Sum Restaurant',
 'Shabu-Shabu Restaurant',
 'Hotpot Restaurant',
 'Dumpling Restaurant',
 'Polish Restaurant',
 'Vegetarian / Vegan Restaurant',
 'Falafel Restaurant',
 'Ramen Restaurant',
 'Korean Restaurant',
 'Eastern European Restaurant',
 'Russian Restaurant',
 'Varenyky restaurant',
 'Turkish Restaurant',
 'Cajun / Creole Restaurant',
 'North Indian Restaurant',
 'Cuban Restaurant',
 'Pakistani Restaurant',
 'Ethiopian Restaurant',
 'Argentinian Restaurant',
 'Filipino Restaurant',
 'Israeli Restaurant',
 'German Restaurant',
 'Cantonese Restaurant',
 'Halal Restaurant',
 'Shanghai Restaurant',
 'Kebab Restaurant',
 'Hawaiian Restaurant',
 'Taiwanese Restaurant',
 'Lebanese Restaurant',
 'Jewish Restaurant',
 'English Restaurant',
 'Malay Restaurant',
 'Austrian Restaurant',
 'Japanese Curry Restaurant',
 'Czech Restaurant',
 'Afghan Restaurant',
 'Australian Restaurant',
 'South Indian Restaurant',
 'Szechuan Restaurant',
 'Brazilian Restaurant',
 'Scandinavian Restaurant',
 'Udon Restaurant',
 'Gluten-free Restaurant',
 'Moroccan Restaurant',
 'Swiss Restaurant',
 'Modern European Restaurant',
 'Belgian Restaurant',
 'Tibetan Restaurant',
 'Himalayan Restaurant',
 'Empanada Restaurant',
 'Colombian Restaurant',
 'Indonesian Restaurant',
 'Romanian Restaurant',
 'Egyptian Restaurant',
 'Kosher Restaurant',
 'Sri Lankan Restaurant',
 'Tex-Mex Restaurant',
 'Venezuelan Restaurant',
 'Molecular Gastronomy Restaurant',
 'Cambodian Restaurant',
 'Persian Restaurant',
 'Soba Restaurant',
 'Portuguese Restaurant',

        
        
    ],
    
    'Indian Restaurant':[
        
         'Indian Restaurant',
       
        
    ],
    
    'Services':[
        
         'Pharmacy',
 'Laundromat',
 'Platform',
 'Metro Station',
 'Thrift / Vintage Store',
 'Shoe Store',
 'Shipping Store',
 'Spa',
 'Mobile Phone Shop',
 'Eye Doctor',
 'Dry Cleaner',
 'Check Cashing Service',
 'Health & Beauty Service',
 'Waste Facility',
 'Business Service',
 'Tattoo Parlor',
 'Lawyer',
 'Laundry Service',
 'Speakeasy',
 'Gas Station',
 'Massage Studio',
 'Veterinarian',
 'Shoe Repair',
 'Pet Service',
 'Post Office',
 'Locksmith',
 'Storage Facility',
 'Pedestrian Plaza',
 'Weight Loss Center',
 'IT Services',
 'Medical Center',
 'Spiritual Center',
 'Tech Startup',
 'Rental Service',

        
        
    ],
    
    'Shops':[
        
         'Discount Store',
 'Mattress Store',
 'Grocery Store',
 'Liquor Store',
 'Supermarket',
 'Department Store',
 'Big Box Store',
 'Clothing Store',
 "Men's Store",
 'Smoke Shop',
 'Jewelry Store',
 'Optical Shop',
 'Miscellaneous Shop',
 'Sporting Goods Shop',
 'Kids Store',
 'Accessories Store',
 'Outlet Store',
 'Flea Market',
 'Vape Store',
 'Market',
 'Paper / Office Supplies Store',
 'Bookstore',
 'Flower Shop',
 'Furniture / Home Store',
 'Farmers Market',
 'Fish Market',
 "Women's Store",
 'Lingerie Store',
 'Bridal Shop',
 'Other Repair Shop',
 'Record Shop',
 'Arts & Crafts Store',
 'Boutique',
 'Nail Salon',
 'Organic Grocery',
 'Non-Profit',
 'Print Shop',
 'Pie Shop',
 'Chocolate Shop',
 'Gift Shop',
 'Comic Shop',
 'Herbs & Spices Store',
 'Hardware Store',
 'Shopping Mall',
 'Bubble Tea Shop',
 'Motorcycle Shop',
 'Health Food Store',
 'Adult Boutique',
 'Board Shop',
 'Bike Shop',
 'Hobby Shop',
 'Drugstore',
 'College Bookstore',
 'Tailor Shop',
 'Watch Shop',
 'Auto Workshop',
 'Newsstand',
 'Design Studio',
 'Souvlaki Shop',
 'Shop & Service',
 'Automotive Shop',
 'Recording Studio',
 'Tanning Salon',
 'Insurance Office',
 'Camera Store',
 'Leather Goods Store',
 'Duty-free Shop',
 'Factory',

        
    ],
    
    'SightSeeing': [
         'Other Great Outdoors',
 'Lake',
 'Monument / Landmark',
 'Theme Park Ride / Attraction',
 'Surf Spot',
 'Historic Site',
 'Scenic Lookout',
 'Tourist Information Center',
    
        
        
    ],
    
    'Sports':[
         'Cycle Studio',
 'Pilates Studio',
 'Athletics & Sports',
 'Golf Course',
 'Soccer Field',
 'Boxing Gym',
 'Tennis Court',
 'Bike Rental / Bike Share',
 'Hockey Field',
 'Arts & Entertainment',
 'Bike Trail',
 'Mini Golf',
 'Volleyball Court',
 'Skate Park',
 'College Basketball Court',
 'Sports Club',

        
    ],
    
    'Travel':[
         'Bus Station',
 'Bus Stop',
 'Bus Line',
 'Train Station',
 'Airport Tram',
 'Train',
 'Toll Plaza',
 'Heliport',
 'Bridge',
 'Road',
 'Hotel',
 'Hostel',
 'Rest Area',
 'Bed & Breakfast',
 'Resort',
 'Motel',
 'Bath House',
 'Taxi Stand',
 'Hotel Pool'

        
    ]

     
    
    
}

# Getting all Venues

In [22]:
venues_list = []
for neighbour,lat,lon in zip(nyc_df.Neighborhood,nyc_df.Lat,nyc_df.Lon):
    url =get_100_venues(lat,lon)
    results = requests.get(url).json()["response"]['groups'][0]['items']
    venues_list.append([(
        neighbour,
        lat,
        lon,
        v['venue']['name'], 
        v['venue']['location']['lat'], 
        v['venue']['location']['lng'],  
        v['venue']['categories'][0]['name']) for v in results])
    print ("Doing for {}".format (neighbour))
venue_df = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
venue_df.columns = ['Neighbourhood Name','Neighbourhood Lat','Neighbourhood Lon','Venue Name','Venue Lat','Venue Lon','Venue Cat']
venue_df.info()

Doing for Wakefield
Doing for Co-op City
Doing for Eastchester
Doing for Fieldston
Doing for Riverdale
Doing for Kingsbridge
Doing for Marble Hill
Doing for Woodlawn
Doing for Norwood
Doing for Williamsbridge
Doing for Baychester
Doing for Pelham Parkway
Doing for City Island
Doing for Bedford Park
Doing for University Heights
Doing for Morris Heights
Doing for Fordham
Doing for East Tremont
Doing for West Farms
Doing for High  Bridge
Doing for Melrose
Doing for Mott Haven
Doing for Port Morris
Doing for Longwood
Doing for Hunts Point
Doing for Morrisania
Doing for Soundview
Doing for Clason Point
Doing for Throgs Neck
Doing for Country Club
Doing for Parkchester
Doing for Westchester Square
Doing for Van Nest
Doing for Morris Park
Doing for Belmont
Doing for Spuyten Duyvil
Doing for North Riverdale
Doing for Pelham Bay
Doing for Schuylerville
Doing for Edgewater Park
Doing for Castle Hill
Doing for Olinville
Doing for Pelham Gardens
Doing for Concourse
Doing for Unionport
Doing for Ed

In [58]:
allvenues_onehot = pd.get_dummies(venue_df, columns = ['Venue Cat'], prefix="", prefix_sep="")
allvenues_onehot = allvenues_onehot.drop('Venue Name',axis = 1)
allvenues_onehot.head()

Unnamed: 0,Neighbourhood Name,Neighbourhood Lat,Neighbourhood Lon,Venue Lat,Venue Lon,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Tram,...,Waste Facility,Watch Shop,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Wakefield,40.894705,-73.847201,40.894123,-73.845892,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Wakefield,40.894705,-73.847201,40.896649,-73.844846,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Wakefield,40.894705,-73.847201,40.890487,-73.848568,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Wakefield,40.894705,-73.847201,40.890631,-73.849027,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Wakefield,40.894705,-73.847201,40.890656,-73.849192,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [59]:
def combine_columns(columns,newcolumnname,dataframe):
    value = 0
    for column in columns:
        print ("Doing for {}".format(column))
        value = dataframe[column] + value
    
    dataframe = dataframe.drop(columns,axis = 1)
    dataframe[newcolumnname] = value
    
    return dataframe

In [60]:
allvenues_onehot = combine_columns(cons_category['Bar'],'Bar',allvenues_onehot)
allvenues_onehot = combine_columns(cons_category['Business'],'Business',allvenues_onehot)
allvenues_onehot = combine_columns(cons_category['Entertainment'],'Entertainment',allvenues_onehot)
allvenues_onehot = combine_columns(cons_category['FastFood'],'FastFood',allvenues_onehot)
allvenues_onehot = combine_columns(cons_category['Kids'],'Kids',allvenues_onehot)
allvenues_onehot = combine_columns(cons_category['Parks'],'Parks',allvenues_onehot)
allvenues_onehot = combine_columns(cons_category['Residential'],'Residential',allvenues_onehot)
allvenues_onehot = combine_columns(cons_category['Restaurant'],'Restaurant',allvenues_onehot)
allvenues_onehot = combine_columns(cons_category['Services'],'Services',allvenues_onehot)
allvenues_onehot = combine_columns(cons_category['Shops'],'Shops',allvenues_onehot)
allvenues_onehot = combine_columns(cons_category['SightSeeing'],'SightSeeing',allvenues_onehot)
allvenues_onehot = combine_columns(cons_category['Sports'],'Sports',allvenues_onehot)
allvenues_onehot = combine_columns(cons_category['Travel'],'Travel',allvenues_onehot)
allvenues_onehot.head()


Doing for Beer Bar
Doing for Bar
Doing for Sports Bar
Doing for Brewery
Doing for Distillery
Doing for Hookah Bar
Doing for Piano Bar
Doing for Dive Bar
Doing for Salon / Barbershop
Doing for Gastropub
Doing for Cocktail Bar
Doing for Beer Store
Doing for Wine Shop
Doing for Whisky Bar
Doing for Fruit & Vegetable Store
Doing for Wine Bar
Doing for Karaoke Bar
Doing for Jazz Club
Doing for Beer Garden
Doing for Tiki Bar
Doing for Gay Bar
Doing for Sake Bar
Doing for Hotel Bar
Doing for Irish Pub
Doing for Beach Bar
Doing for Office
Doing for Coworking Space
Doing for Bowling Alley
Doing for Pub
Doing for Nightclub
Doing for Arcade
Doing for Performing Arts Venue
Doing for History Museum
Doing for Music Venue
Doing for Harbor / Marina
Doing for Boat or Ferry
Doing for Lounge
Doing for Art Gallery
Doing for Indie Theater
Doing for Antique Shop
Doing for Social Club
Doing for Dance Studio
Doing for Other Nightlife
Doing for Event Space
Doing for Indie Movie Theater
Doing for Rock Club
Doin

Doing for Cycle Studio
Doing for Pilates Studio
Doing for Athletics & Sports
Doing for Golf Course
Doing for Soccer Field
Doing for Boxing Gym
Doing for Tennis Court
Doing for Bike Rental / Bike Share
Doing for Hockey Field
Doing for Arts & Entertainment
Doing for Bike Trail
Doing for Mini Golf
Doing for Volleyball Court
Doing for Skate Park
Doing for College Basketball Court
Doing for Sports Club
Doing for Bus Station
Doing for Bus Stop
Doing for Bus Line
Doing for Train Station
Doing for Airport Tram
Doing for Train
Doing for Toll Plaza
Doing for Heliport
Doing for Bridge
Doing for Road
Doing for Hotel
Doing for Hostel
Doing for Rest Area
Doing for Bed & Breakfast
Doing for Resort
Doing for Motel
Doing for Bath House
Doing for Taxi Stand
Doing for Hotel Pool


Unnamed: 0,Neighbourhood Name,Neighbourhood Lat,Neighbourhood Lon,Venue Lat,Venue Lon,Indian Restaurant,Bar,Business,Entertainment,FastFood,Kids,Parks,Residential,Restaurant,Services,Shops,SightSeeing,Sports,Travel
0,Wakefield,40.894705,-73.847201,40.894123,-73.845892,0,0,0,0,1,0,0,0,0,0,0,0,0,0
1,Wakefield,40.894705,-73.847201,40.896649,-73.844846,0,0,0,0,0,0,0,0,0,1,0,0,0,0
2,Wakefield,40.894705,-73.847201,40.890487,-73.848568,0,0,0,0,0,0,0,0,1,0,0,0,0,0
3,Wakefield,40.894705,-73.847201,40.890631,-73.849027,0,0,0,0,1,0,0,0,0,0,0,0,0,0
4,Wakefield,40.894705,-73.847201,40.890656,-73.849192,0,0,0,0,1,0,0,0,0,0,0,0,0,0


In [89]:
allvenues_grouped = allvenues_onehot.groupby('Neighbourhood Name',axis = 0).sum().reset_index()
drop_columns = ['Neighbourhood Lat','Neighbourhood Lon','Venue Lat','Venue Lon']
allvenues_grouped = allvenues_grouped.drop(drop_columns,axis=1)


In [90]:
allvenue_withborough = nyc_df.join(allvenues_grouped.set_index('Neighbourhood Name'), on='Neighborhood')

In [103]:
allvenue_withborough.head()

Unnamed: 0,Borough,Neighborhood,Lat,Lon,Indian Restaurant,Bar,Business,Entertainment,FastFood,Kids,Parks,Residential,Restaurant,Services,Shops,SightSeeing,Sports,Travel
0,Bronx,Wakefield,40.894705,-73.847201,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,2.0,2.0,0.0,0.0,0.0,0.0
1,Bronx,Co-op City,40.874294,-73.829939,0.0,0.0,0.0,0.0,3.0,0.0,0.0,4.0,2.0,1.0,4.0,0.0,0.0,1.0
2,Bronx,Eastchester,40.887556,-73.827806,0.0,0.0,0.0,1.0,4.0,0.0,0.0,2.0,9.0,3.0,0.0,0.0,0.0,3.0
3,Bronx,Fieldston,40.895437,-73.905643,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bronx,Riverdale,40.890834,-73.912585,0.0,0.0,0.0,0.0,1.0,0.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0


# Top 10 items in every neighbourhood

In [105]:
allvenues_grouped.head()

Unnamed: 0,Neighbourhood Name,Indian Restaurant,Bar,Business,Entertainment,FastFood,Kids,Parks,Residential,Restaurant,Services,Shops,SightSeeing,Sports,Travel
0,Allerton,0,0,0,0,9,0,1,2,6,3,6,0,0,0
1,Annadale,0,1,0,0,2,0,0,1,3,0,0,0,0,1
2,Arden Heights,0,0,0,0,1,1,0,1,0,1,0,0,0,0
3,Arlington,0,0,0,1,0,0,0,1,1,0,1,0,0,1
4,Arrochar,0,0,0,0,5,1,0,1,7,0,2,0,0,4


In [108]:
neighbourlist = allvenues_grouped['Neighbourhood Name']
top_ten_list =[]
for neighbour in neighbourlist:

    temp = allvenues_grouped[allvenues_grouped['Neighbourhood Name']==neighbour].T.reset_index()
    temp.columns=['Neighbor','Frequency']
    temp = temp.iloc[1:]
    temp.sort_values('Frequency',ascending=False,inplace=True)
    top_ten_list.append(["{} ({})" .format(t[0],t[1]) for t in temp[:10].values])
top_ten_list
df_top_ten = pd.DataFrame(top_ten_list)
df_top_ten.columns = ['#1 Favourite','#2 Favourite','#3 Favourite','#4 Favourite','#5 Favourite','#6 Favourite','#7 Favourite','#8 Favourite','#9Favourite','#10 Favourite',]
df_top_ten.set_index(neighbourlist,inplace=True)
df_top_ten.reset_index()

Unnamed: 0,Neighbourhood Name,#1 Favourite,#2 Favourite,#3 Favourite,#4 Favourite,#5 Favourite,#6 Favourite,#7 Favourite,#8 Favourite,#9Favourite,#10 Favourite
0,Allerton,FastFood (9),Restaurant (6),Shops (6),Services (3),Residential (2),Parks (1),Indian Restaurant (0),Bar (0),Business (0),Entertainment (0)
1,Annadale,Restaurant (3),FastFood (2),Bar (1),Residential (1),Travel (1),Indian Restaurant (0),Business (0),Entertainment (0),Kids (0),Parks (0)
2,Arden Heights,FastFood (1),Kids (1),Residential (1),Services (1),Indian Restaurant (0),Bar (0),Business (0),Entertainment (0),Parks (0),Restaurant (0)
3,Arlington,Entertainment (1),Residential (1),Restaurant (1),Shops (1),Travel (1),Indian Restaurant (0),Bar (0),Business (0),FastFood (0),Kids (0)
4,Arrochar,Restaurant (7),FastFood (5),Travel (4),Shops (2),Kids (1),Residential (1),Indian Restaurant (0),Bar (0),Business (0),Entertainment (0)
5,Arverne,SightSeeing (4),FastFood (3),Services (2),Travel (2),Bar (1),Kids (1),Residential (1),Restaurant (1),Shops (1),Indian Restaurant (0)
6,Astoria,Restaurant (38),FastFood (23),Bar (17),Residential (8),Shops (8),Indian Restaurant (2),Entertainment (2),Services (2),Business (0),Kids (0)
7,Astoria Heights,Restaurant (4),FastFood (3),Residential (2),Travel (2),Bar (1),Entertainment (1),Services (1),Shops (1),Indian Restaurant (0),Business (0)
8,Auburndale,Restaurant (6),Shops (4),FastFood (3),Bar (2),Kids (2),Services (2),Residential (1),Sports (1),Indian Restaurant (0),Business (0)
9,Bath Beach,Restaurant (19),FastFood (11),Shops (7),Residential (4),Services (4),Bar (2),Kids (2),SightSeeing (1),Indian Restaurant (0),Business (0)


# Let's do the same analysis for Toronto

In [109]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

In [134]:
page = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
soup = BeautifulSoup(page.text, 'html.parser')

In [135]:
table = soup.find_all('table')[0] 
df = pd.read_html(str(table))

In [136]:
new_df=df[0]


In [155]:
new_df.columns = new_df.iloc[0]
df = new_df.drop(new_df.index[0])
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront


In [156]:
df =df[df.Borough != 'Not assigned']
df['Borough'].unique

<bound method Series.unique of 3            North York
4            North York
5      Downtown Toronto
6      Downtown Toronto
7            North York
8            North York
9          Queen's Park
11            Etobicoke
12          Scarborough
13          Scarborough
15           North York
16            East York
17            East York
18     Downtown Toronto
19     Downtown Toronto
20           North York
23            Etobicoke
24            Etobicoke
25            Etobicoke
26            Etobicoke
27            Etobicoke
28          Scarborough
29          Scarborough
30          Scarborough
32           North York
33           North York
34            East York
35     Downtown Toronto
36                 York
39            Etobicoke
             ...       
237         Scarborough
238         Scarborough
241    Downtown Toronto
242    Downtown Toronto
245           Etobicoke
246           Etobicoke
247           Etobicoke
248         Scarborough
251    Downtown Toronto
252    Do

In [157]:
df=df.groupby([df.Postcode,df.Borough],as_index=False).agg(', '.join)
df = df.rename(columns={'Postcode': 'Postal Code'})
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [158]:
df['Neighbourhood'] = np.where(df['Neighbourhood']=='Not assigned', df['Borough'], df['Neighbourhood'])


In [159]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [160]:
gs_df = pd.read_csv('https://raw.githubusercontent.com/ksnblr/Coursera_Capstone/master/GS.csv')

In [162]:
gs_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [163]:
tor_df = gs_df.join(df.set_index('Postal Code'), on='Postal Code')

In [164]:
tor_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude,Borough,Neighbourhood
0,M1B,43.806686,-79.194353,Scarborough,"Rouge, Malvern"
1,M1C,43.784535,-79.160497,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,43.763573,-79.188711,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,43.770992,-79.216917,Scarborough,Woburn
4,M1H,43.773136,-79.239476,Scarborough,Cedarbrae


# Getting All Venues in Toronto

In [167]:
tor_venues_list = []
for neighbour,lat,lon in zip(tor_df.Neighbourhood,tor_df.Latitude,tor_df.Longitude):
    url =get_100_venues(lat,lon)
    results = requests.get(url).json()["response"]['groups'][0]['items']
    tor_venues_list.append([(
        neighbour,
        lat,
        lon,
        v['venue']['name'], 
        v['venue']['location']['lat'], 
        v['venue']['location']['lng'],  
        v['venue']['categories'][0]['name']) for v in results])
    print ("Doing for {}".format (neighbour))
tor_venue_df = pd.DataFrame([item for venue_list in tor_venues_list for item in venue_list])
tor_venue_df.columns = ['Neighbourhood Name','Neighbourhood Lat','Neighbourhood Lon','Venue Name','Venue Lat','Venue Lon','Venue Cat']
tor_venue_df.head()

Doing for Rouge, Malvern
Doing for Highland Creek, Rouge Hill, Port Union
Doing for Guildwood, Morningside, West Hill
Doing for Woburn
Doing for Cedarbrae
Doing for Scarborough Village
Doing for East Birchmount Park, Ionview, Kennedy Park
Doing for Clairlea, Golden Mile, Oakridge
Doing for Cliffcrest, Cliffside, Scarborough Village West
Doing for Birch Cliff, Cliffside West
Doing for Dorset Park, Scarborough Town Centre, Wexford Heights
Doing for Maryvale, Wexford
Doing for Agincourt
Doing for Clarks Corners, Sullivan, Tam O'Shanter
Doing for Agincourt North, L'Amoreaux East, Milliken, Steeles East
Doing for L'Amoreaux West, Steeles West
Doing for Upper Rouge
Doing for Hillcrest Village
Doing for Fairview, Henry Farm, Oriole
Doing for Bayview Village
Doing for Silver Hills, York Mills
Doing for Newtonbrook, Willowdale
Doing for Willowdale South
Doing for York Mills West
Doing for Willowdale West
Doing for Parkwoods
Doing for Don Mills North
Doing for Flemingdon Park, Don Mills South
Do

Unnamed: 0,Neighbourhood Name,Neighbourhood Lat,Neighbourhood Lon,Venue Name,Venue Lat,Venue Lon,Venue Cat
0,"Rouge, Malvern",43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
2,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Scarborough Historical Society,43.788755,-79.162438,History Museum
3,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Swiss Chalet Rotisserie & Grill,43.767697,-79.189914,Pizza Place
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store


['Fast Food Restaurant',
 'Bar',
 'History Museum',
 'Pizza Place',
 'Electronics Store',
 'Mexican Restaurant',
 'Rental Car Location',
 'Medical Center',
 'Breakfast Spot',
 'Coffee Shop',
 'Korean Restaurant',
 'Hakka Restaurant',
 'Caribbean Restaurant',
 'Athletics & Sports',
 'Thai Restaurant',
 'Bank',
 'Bakery',
 'Fried Chicken Joint',
 'Lounge',
 'Playground',
 'Department Store',
 'Discount Store',
 'Intersection',
 'Bus Line',
 'Metro Station',
 'Bus Station',
 'Park',
 'Soccer Field',
 'Motel',
 'Movie Theater',
 'American Restaurant',
 'Café',
 'General Entertainment',
 'Skating Rink',
 'College Stadium',
 'Chinese Restaurant',
 'Indian Restaurant',
 'Latin American Restaurant',
 'Pet Store',
 'Vietnamese Restaurant',
 'Furniture / Home Store',
 'Sandwich Place',
 'Smoke Shop',
 'Auto Garage',
 'Clothing Store',
 'Italian Restaurant',
 'Noodle House',
 'Pharmacy',
 'Grocery Store',
 'Nail Salon',
 'Golf Course',
 'Pool',
 'Mediterranean Restaurant',
 'Dog Run',
 'Shopping 

In [172]:
tor_ven_category = {
    
    'Bar':  [ 'Bar',
 'Beer Store',
 'Gastropub',
 'Pub',
 'Sports Bar',
 'Brewery',
 'Wine Shop',
 'Sake Bar',
 'Nightclub',
 'Strip Club',
 'Beer Bar',
 'Wine Bar',
 'Hookah Bar',
 'Cocktail Bar',
 'Bistro',
 'Irish Pub',
 'Hotel Bar',
],
    
    'Business' :[
        
          'Cafeteria',
 'Plaza',
 'Business Service',
 'Coworking Space',
 'Office',
 'Building',
 'Construction & Landscaping',

 
        
    ],
    
    'Entertainment' :[
        
          'Movie Theater',
 'General Entertainment',
 'Theater',
 'Video Game Store',
 'Video Store',
 'Dance Studio',
 'Gay Bar',
 'Event Space',
 'Art Gallery',
 'Music Venue',
 'Concert Hall',
 'College Rec Center',
 'Jazz Club',
 'Record Shop',
 'Indie Movie Theater',
 'Recording Studio',

        
    ],
    
    'FastFood':[
        
         'Fast Food Restaurant',
 'Pizza Place',
 'Mexican Restaurant',
 'Breakfast Spot',
 'Coffee Shop',
 'Bakery',
 'Fried Chicken Joint',
 'Café',
 'Chinese Restaurant',
 'Sandwich Place',
 'Burger Joint',
 'Smoothie Shop',
 'Juice Bar',
 'Food Court',
 'Wings Joint',
 'Burrito Place',
 'Deli / Bodega',
 'Bubble Tea Shop',
 'Food & Drink Shop',
 'Fish & Chips Shop',
 'Bagel Shop',
 'Snack Place',
 'Creperie',
 'Taco Place',
 'Donut Shop',
 'Salad Place',
 'Cupcake Shop',
 'Soup Place',
 'Gaming Cafe',
 'Airport Food Court',
 'Mac & Cheese Joint',
 'Food',

        
    ],
    
    'Kids':[
        
          'Playground',
 'Skating Rink',
 'College Stadium',
 'Toy / Game Store',
 'Candy Store',
 'Dessert Shop',
 'Ice Cream Shop',
 'Rock Climbing Spot',
 'Curling Ice',
 'Swim School',
 'Indoor Play Area',
 'Summer Camp',
 'Chocolate Shop',
 'Performing Arts Venue',
 'Comic Shop',
 'Speakeasy',
 'Martial Arts Dojo',
 'Baby Store',
 'Thrift / Vintage Store',

    ],
    
'Parks' :[
    
    'Park',
 'Golf Course',
 'Pool',
 'Dog Run',
 'Trail',
 'Outdoor Sculpture',
 'Sculpture Garden',
 'Fountain',
 'Garden',
 'College Arts Building',
 'Skate Park',
 'Garden Center',

],
    
'Residential':[
    
  'Medical Center',
 'Pharmacy',
 'Grocery Store',
 'Nail Salon',
 'Tea Room',
 'Salon / Barbershop',
 'Cosmetics Shop',
 'Frozen Yogurt Shop',
 'Boutique',
 'Tailor Shop',
 'Butcher',
 'Gym / Fitness Center',
 'Bridal Shop',
 'Neighborhood',
 'Housing Development',
 'Convenience Store',
 'Stationery Store',
 'Music Store',
 'Jewelry Store',
 'Drugstore'

],
    
    'Restaurant':[
        
          'Korean Restaurant',
 'Hakka Restaurant',
 'Caribbean Restaurant',
 'Thai Restaurant',
 'American Restaurant',
 'Latin American Restaurant',
 'Vietnamese Restaurant',
 'Italian Restaurant',
 'Noodle House',
 'Mediterranean Restaurant',
 'Japanese Restaurant',
 'Restaurant',
 'Asian Restaurant',
 'Greek Restaurant',
 'Steakhouse',
 'Ramen Restaurant',
 'Indonesian Restaurant',
 'Sushi Restaurant',
 'Middle Eastern Restaurant',
 'Dim Sum Restaurant',
 'Diner',
 'Falafel Restaurant',
 'Portuguese Restaurant',
 'Comfort Food Restaurant',
 'Seafood Restaurant',
 'New American Restaurant',
 'Taiwanese Restaurant',
 'Theme Restaurant',
 'Ethiopian Restaurant',
 'Afghan Restaurant',
 'French Restaurant',
 'Vegetarian / Vegan Restaurant',
 'Modern European Restaurant',
 'BBQ Joint',
 'Poutine Place',
 'Poke Place',
 'German Restaurant',
 'Belgian Restaurant',
 'Eastern European Restaurant',
 'Brazilian Restaurant',
 'Colombian Restaurant',
 'Gluten-free Restaurant',
 'Jewish Restaurant',
 'Dumpling Restaurant',
 'Doner Restaurant',
 'Hotpot Restaurant',
 'Filipino Restaurant',
 'Cuban Restaurant',
 'Malay Restaurant',
 'Southern / Soul Food Restaurant',
 'Tapas Restaurant',
 'Cajun / Creole Restaurant',
 'Empanada Restaurant',

    ],
    
    'Indian Restaurant':[
        
         'Indian Restaurant',
       
        
    ],
    
    'Services':[
        
         'Bank',
 'Smoke Shop',
 'Auto Garage',
 'Massage Studio',
 'Yoga Studio',
 'Spa',
 'Health & Beauty Service',
 'Tanning Salon',
 'Home Service',
 'Hospital',
 'Check Cashing Service',
 'Auto Workshop',

    ],
    
    'Shops':[
        
         'Electronics Store',
 'Department Store',
 'Discount Store',
 'Pet Store',
 'Furniture / Home Store',
 'Clothing Store',
 'Shopping Mall',
 'Liquor Store',
 'Sporting Goods Shop',
 'Shoe Store',
 "Women's Store",
 'Luggage Store',
 "Men's Store",
 'Arts & Crafts Store',
 'Supermarket',
 'Health Food Store',
 'Warehouse Store',
 'Fruit & Vegetable Store',
 'Bookstore',
 'Fish Market',
 'Cheese Shop',
 'Gourmet Shop',
 'Farmers Market',
 'Flower Shop',
 'Gift Shop',
 'Market',
 'Hobby Shop',
 'Adult Boutique',
 'Antique Shop',
 'Miscellaneous Shop',
 'Lingerie Store',
 'Camera Store',
 'Hardware Store',
 'Organic Grocery',
 'Optical Shop',
 'Accessories Store',
 'Flea Market',
 'Supplement Shop',
 'Shopping Plaza',
 'Mobile Phone Shop',

    ],
    
    'SightSeeing': [
          'Church',
 'Museum',
 'Beach',
 'Art Museum',
 'Monument / Landmark',
 'Opera House',
 'Aquarium',
 'Scenic Lookout',
 'Harbor / Marina',
 'River',

    ],
    
    'Sports':[
          'Athletics & Sports',
 'Soccer Field',
 'Baseball Field',
 'Basketball Court',
 'Gym',
 'Bike Shop',
 'Hockey Arena',
 'Basketball Stadium',
 'Baseball Stadium',
 'College Gym',
 'Field',
 'Climbing Gym',
 'Stadium',

    ],
    
    'Travel':[
          'History Museum',
 'Rental Car Location',
 'Lounge',
 'Intersection',
 'Bus Line',
 'Metro Station',
 'Bus Station',
 'Motel',
 'Hotel',
 'Airport',
 'Bus Stop',
 'Food Truck',
 'Light Rail Station',
 'Historic Site',
 'Other Great Outdoors',
 'Lake',
 'Hostel',
 'General Travel',
 'Train Station',
 'Airport Lounge',
 'Airport Terminal',
 'Airport Gate',
 'Plane',
 'Airport Service',
 'Boat or Ferry',
 'College Auditorium',
 'College Cafeteria',

    ]

       
    
}

In [173]:
tor_allvenues_onehot = pd.get_dummies(tor_venue_df, columns = ['Venue Cat'], prefix="", prefix_sep="")
tor_allvenues_onehot = tor_allvenues_onehot.drop('Venue Name',axis = 1)
tor_allvenues_onehot.head()

Unnamed: 0,Neighbourhood Name,Neighbourhood Lat,Neighbourhood Lon,Venue Lat,Venue Lon,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Rouge, Malvern",43.806686,-79.194353,43.807448,-79.199056,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,43.782533,-79.163085,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,43.788755,-79.162438,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Guildwood, Morningside, West Hill",43.763573,-79.188711,43.767697,-79.189914,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,43.765309,-79.191537,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [174]:
tor_allvenues_onehot = combine_columns(tor_ven_category['Bar'],'Bar',tor_allvenues_onehot)
tor_allvenues_onehot = combine_columns(tor_ven_category['Business'],'Business',tor_allvenues_onehot)
tor_allvenues_onehot = combine_columns(tor_ven_category['Entertainment'],'Entertainment',tor_allvenues_onehot)
tor_allvenues_onehot = combine_columns(tor_ven_category['FastFood'],'FastFood',tor_allvenues_onehot)
tor_allvenues_onehot = combine_columns(tor_ven_category['Kids'],'Kids',tor_allvenues_onehot)
tor_allvenues_onehot = combine_columns(tor_ven_category['Parks'],'Parks',tor_allvenues_onehot)
tor_allvenues_onehot = combine_columns(tor_ven_category['Residential'],'Residential',tor_allvenues_onehot)
tor_allvenues_onehot = combine_columns(tor_ven_category['Restaurant'],'Restaurant',tor_allvenues_onehot)
tor_allvenues_onehot = combine_columns(tor_ven_category['Services'],'Services',tor_allvenues_onehot)
tor_allvenues_onehot = combine_columns(tor_ven_category['Shops'],'Shops',tor_allvenues_onehot)
tor_allvenues_onehot = combine_columns(tor_ven_category['SightSeeing'],'SightSeeing',tor_allvenues_onehot)
tor_allvenues_onehot = combine_columns(tor_ven_category['Sports'],'Sports',tor_allvenues_onehot)
tor_allvenues_onehot = combine_columns(tor_ven_category['Travel'],'Travel',tor_allvenues_onehot)
tor_allvenues_onehot.head()

Doing for Bar
Doing for Beer Store
Doing for Gastropub
Doing for Pub
Doing for Sports Bar
Doing for Brewery
Doing for Wine Shop
Doing for Sake Bar
Doing for Nightclub
Doing for Strip Club
Doing for Beer Bar
Doing for Wine Bar
Doing for Hookah Bar
Doing for Cocktail Bar
Doing for Bistro
Doing for Irish Pub
Doing for Hotel Bar
Doing for Cafeteria
Doing for Plaza
Doing for Business Service
Doing for Coworking Space
Doing for Office
Doing for Building
Doing for Construction & Landscaping
Doing for Movie Theater
Doing for General Entertainment
Doing for Theater
Doing for Video Game Store
Doing for Video Store
Doing for Dance Studio
Doing for Gay Bar
Doing for Event Space
Doing for Art Gallery
Doing for Music Venue
Doing for Concert Hall
Doing for College Rec Center
Doing for Jazz Club
Doing for Record Shop
Doing for Indie Movie Theater
Doing for Recording Studio
Doing for Fast Food Restaurant
Doing for Pizza Place
Doing for Mexican Restaurant
Doing for Breakfast Spot
Doing for Coffee Shop
D

Unnamed: 0,Neighbourhood Name,Neighbourhood Lat,Neighbourhood Lon,Venue Lat,Venue Lon,Indian Restaurant,Bar,Business,Entertainment,FastFood,Kids,Parks,Residential,Restaurant,Services,Shops,SightSeeing,Sports,Travel
0,"Rouge, Malvern",43.806686,-79.194353,43.807448,-79.199056,0,0,0,0,1,0,0,0,0,0,0,0,0,0
1,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,43.782533,-79.163085,0,1,0,0,0,0,0,0,0,0,0,0,0,0
2,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,43.788755,-79.162438,0,0,0,0,0,0,0,0,0,0,0,0,0,1
3,"Guildwood, Morningside, West Hill",43.763573,-79.188711,43.767697,-79.189914,0,0,0,0,1,0,0,0,0,0,0,0,0,0
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,43.765309,-79.191537,0,0,0,0,0,0,0,0,0,0,1,0,0,0


In [177]:
tor_allvenues_grouped = tor_allvenues_onehot.groupby('Neighbourhood Name',axis = 0).sum().reset_index()
drop_columns = ['Neighbourhood Lat','Neighbourhood Lon','Venue Lat','Venue Lon']
tor_allvenues_grouped = tor_allvenues_grouped.drop(drop_columns,axis=1)


In [179]:
tor_allvenue_withborough = tor_df.join(tor_allvenues_grouped.set_index('Neighbourhood Name'), on='Neighbourhood')

In [181]:
tor_allvenue_withborough.tail()

Unnamed: 0,Postal Code,Latitude,Longitude,Borough,Neighbourhood,Indian Restaurant,Bar,Business,Entertainment,FastFood,Kids,Parks,Residential,Restaurant,Services,Shops,SightSeeing,Sports,Travel
98,M9N,43.706876,-79.518188,York,Weston,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
99,M9P,43.696319,-79.532242,Etobicoke,Westmount,0.0,0.0,0.0,0.0,5.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0
100,M9R,43.688905,-79.554724,Etobicoke,"Kingsview Village, Martin Grove Gardens, Richv...",0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
101,M9V,43.739416,-79.588437,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.0,1.0,0.0,0.0,5.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0
102,M9W,43.706748,-79.594054,Etobicoke,Northwest,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0


In [182]:
neighbourlist = tor_allvenues_grouped['Neighbourhood Name']
top_ten_list =[]
for neighbour in neighbourlist:

    temp = tor_allvenues_grouped[tor_allvenues_grouped['Neighbourhood Name']==neighbour].T.reset_index()
    temp.columns=['Neighbor','Frequency']
    temp = temp.iloc[1:]
    temp.sort_values('Frequency',ascending=False,inplace=True)
    top_ten_list.append(["{} ({})" .format(t[0],t[1]) for t in temp[:10].values])
top_ten_list
tor_df_top_ten = pd.DataFrame(top_ten_list)
tor_df_top_ten.columns = ['#1 Favourite','#2 Favourite','#3 Favourite','#4 Favourite','#5 Favourite','#6 Favourite','#7 Favourite','#8 Favourite','#9Favourite','#10 Favourite',]
tor_df_top_ten.set_index(neighbourlist,inplace=True)
tor_df_top_ten.reset_index()

Unnamed: 0,Neighbourhood Name,#1 Favourite,#2 Favourite,#3 Favourite,#4 Favourite,#5 Favourite,#6 Favourite,#7 Favourite,#8 Favourite,#9Favourite,#10 Favourite
0,"Adelaide, King, Richmond",Restaurant (31),FastFood (27),Bar (8),Entertainment (6),Residential (5),Shops (5),Travel (5),Business (3),SightSeeing (3),Sports (3)
1,Agincourt,FastFood (1),Kids (1),Shops (1),Travel (1),Indian Restaurant (0),Bar (0),Business (0),Entertainment (0),Parks (0),Residential (0)
2,"Agincourt North, L'Amoreaux East, Milliken, St...",FastFood (1),Kids (1),Parks (1),Indian Restaurant (0),Bar (0),Business (0),Entertainment (0),Residential (0),Restaurant (0),Services (0)
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",FastFood (5),Residential (3),Bar (1),Indian Restaurant (0),Business (0),Entertainment (0),Kids (0),Parks (0),Restaurant (0),Services (0)
4,"Alderwood, Long Branch",FastFood (4),Bar (1),Kids (1),Residential (1),Sports (1),Indian Restaurant (0),Business (0),Entertainment (0),Parks (0),Restaurant (0)
5,"Bathurst Manor, Downsview North, Wilson Heights",FastFood (8),Residential (3),Restaurant (3),Shops (2),Entertainment (1),Parks (1),Services (1),Indian Restaurant (0),Bar (0),Business (0)
6,Bayview Village,FastFood (2),Restaurant (1),Services (1),Indian Restaurant (0),Bar (0),Business (0),Entertainment (0),Kids (0),Parks (0),Residential (0)
7,"Bedford Park, Lawrence Manor East",FastFood (8),Restaurant (8),Residential (3),Shops (2),Indian Restaurant (1),Bar (1),Business (0),Entertainment (0),Kids (0),Parks (0)
8,Berczy Park,Restaurant (16),FastFood (10),Shops (10),Bar (8),Entertainment (3),Residential (3),Parks (2),SightSeeing (2),Sports (1),Travel (1)
9,"Birch Cliff, Cliffside West",Kids (2),Entertainment (1),FastFood (1),Indian Restaurant (0),Bar (0),Business (0),Parks (0),Residential (0),Restaurant (0),Services (0)


In [212]:
nyc_consolidated_df = allvenue_withborough

In [200]:
tor_consolidated_df=tor_allvenue_withborough.drop('Postal Code',axis=1)

In [213]:
nyc_consolidated_df.head()

Unnamed: 0,Borough,Neighborhood,Lat,Lon,Indian Restaurant,Bar,Business,Entertainment,FastFood,Kids,Parks,Residential,Restaurant,Services,Shops,SightSeeing,Sports,Travel
0,Bronx,Wakefield,40.894705,-73.847201,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,2.0,2.0,0.0,0.0,0.0,0.0
1,Bronx,Co-op City,40.874294,-73.829939,0.0,0.0,0.0,0.0,3.0,0.0,0.0,4.0,2.0,1.0,4.0,0.0,0.0,1.0
2,Bronx,Eastchester,40.887556,-73.827806,0.0,0.0,0.0,1.0,4.0,0.0,0.0,2.0,9.0,3.0,0.0,0.0,0.0,3.0
3,Bronx,Fieldston,40.895437,-73.905643,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bronx,Riverdale,40.890834,-73.912585,0.0,0.0,0.0,0.0,1.0,0.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0


In [202]:
tor_consolidated_df = tor_consolidated_df.rename(columns={'Latitude':'Lat','Longitude':'Lon','Neighbourhood':'Neighborhood'})

In [204]:
tor_consolidated_df = tor_consolidated_df[nyc_consolidated_df.columns]

In [205]:
tor_consolidated_df.head()

Unnamed: 0,Borough,Neighborhood,Lat,Lon,Indian Restaurant,Bar,Business,Entertainment,FastFood,Kids,Parks,Residential,Restaurant,Services,Shops,SightSeeing,Sports,Travel
0,Scarborough,"Rouge, Malvern",43.806686,-79.194353,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
2,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,0.0,0.0,0.0,0.0,3.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0
3,Scarborough,Woburn,43.770992,-79.216917,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
4,Scarborough,Cedarbrae,43.773136,-79.239476,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,3.0,1.0,0.0,0.0,1.0,1.0


In [214]:
nyc_consolidated_df = nyc_consolidated_df.set_index('Borough')

In [207]:
tor_consolidated_df = tor_consolidated_df.set_index('Borough')

In [218]:
tor_consolidated_df.head()

Unnamed: 0_level_0,Neighborhood,Lat,Lon,Indian Restaurant,Bar,Business,Entertainment,FastFood,Kids,Parks,Residential,Restaurant,Services,Shops,SightSeeing,Sports,Travel
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
Scarborough,"Rouge, Malvern",43.806686,-79.194353,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,0.0,0.0,0.0,0.0,3.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0
Scarborough,Woburn,43.770992,-79.216917,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
Scarborough,Cedarbrae,43.773136,-79.239476,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,3.0,1.0,0.0,0.0,1.0,1.0


In [219]:
nyc_consolidated_df['City'] = "NYC"

In [220]:
tor_consolidated_df['City'] = "TORONTO"

In [221]:
nyc_consolidated_df.head()

Unnamed: 0_level_0,Neighborhood,Lat,Lon,Indian Restaurant,Bar,Business,Entertainment,FastFood,Kids,Parks,Residential,Restaurant,Services,Shops,SightSeeing,Sports,Travel,City
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
Bronx,Wakefield,40.894705,-73.847201,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,2.0,2.0,0.0,0.0,0.0,0.0,NYC
Bronx,Co-op City,40.874294,-73.829939,0.0,0.0,0.0,0.0,3.0,0.0,0.0,4.0,2.0,1.0,4.0,0.0,0.0,1.0,NYC
Bronx,Eastchester,40.887556,-73.827806,0.0,0.0,0.0,1.0,4.0,0.0,0.0,2.0,9.0,3.0,0.0,0.0,0.0,3.0,NYC
Bronx,Fieldston,40.895437,-73.905643,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,NYC
Bronx,Riverdale,40.890834,-73.912585,0.0,0.0,0.0,0.0,1.0,0.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,NYC


In [234]:
combined_cities_df=pd.concat([nyc_consolidated_df,tor_consolidated_df])

In [235]:
combined_cities_df=combined_cities_df.reset_index()

In [236]:
latitude =40.730610
longitude = -73.935242
map_nyc = folium.Map(location=[latitude, longitude], zoom_start=12)

for lat,lan,borough in zip(combined_cities_df.Lat,combined_cities_df.Lon,combined_cities_df.Borough):
    
    label = '{}, {}, {}'.format(lat,lan,borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lan],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_nyc)


# Let's do the cluster analysis