# Groceries Warehouse Recommender System

<h2 style='color:#34495e;'>Problem Description</h2>

<p>This project is for a groceries contractor who wants to build a warehouse.<br>
It is important where to select as the location of warehouse. For example, if the warehouse location is selected near a famous restaurant, not only the cost of transportation diminishes but also the quality of service increases.</p>

<h2 style='color:#34495e;'>Data Description</h2>
<p>Geographical location of Toronto city will be needed. The first step is to find neighborhoods inside a specific borough by their corresponding postal codes.<p>

<p>We use Fursquare Api in order to obtain information about different venues in different neighborhoods. We wiil access information about latitude and longitude of venue and it's populaity within a specific category. We can obtain all of that information using Foursquare typical Api.</p>

[Postal Code] [Neighborhood(s)] [Neighborhood Latitude] [Neighborhood Longitude] [Venue] [Venue Summary] [Venue Category] [Distance (meter)]

In [6]:
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
import requests
import json
from pandas.io.json import json_normalize

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import geopy.geocoders
import geocoder

import folium

Libraries are imported.


In [9]:
# Loading the dataset which is about postal codes in Toronto
# This dataset was created in week 3. 
df_toronto = pd.read_csv('toronto_base.csv')
df_toronto.head()

Unnamed: 0.1,Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,0,M8W,Etobicoke,"Alderwood, Long Branch",43.602414,-79.543484
1,1,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
2,2,M6H,West Toronto,Dovercourt Village,43.669005,-79.442259
3,3,M2L,North York,York Mills,43.75749,-79.374714
4,4,M2H,North York,Hillcrest Village,43.803762,-79.363452


In [10]:
# for the city Toronto, latitude and longtitude are manually extracted via google search
toronto_latitude = 43.6932; toronto_longitude = -79.3832
map_toronto = folium.Map(location = [toronto_latitude, toronto_longitude], zoom_start = 10.7)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Borough'], df_toronto['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_toronto)  
    

map_toronto

In [11]:
# df_toronto['Borough'] == 'Scarborough'

# selecting only neighborhoods regarding to "Scarborough" borough.
scarborough_data = df_toronto[df_toronto['Borough'] == 'Scarborough']
scarborough_data = scarborough_data.reset_index(drop=True).drop(columns = 'Unnamed: 0')
scarborough_data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1V,Scarborough,"Agincourt North, Milliken",43.815252,-79.284577
1,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
2,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
3,M1N,Scarborough,Birch Cliff,43.692657,-79.264848
4,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497


In [12]:
address_scar = 'Scarborough, Toronto'
latitude_scar = 43.773077
longitude_scar = -79.257774
print('The geograpical coordinate of "Scarborough" are: {}, {}.'.format(latitude_scar, longitude_scar))

map_Scarborough = folium.Map(location=[latitude_scar, longitude_scar], zoom_start=11.5)

# add markers to map
for lat, lng, label in zip(scarborough_data['Latitude'], scarborough_data['Longitude'], scarborough_data['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius = 10,
        popup = label,
        color ='blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7).add_to(map_Scarborough)  
    
map_Scarborough

The geograpical coordinate of "Scarborough" are: 43.773077, -79.257774.


In [13]:
def foursquare_crawler (postal_code_list, neighborhood_list, lat_list, lng_list, LIMIT = 500, radius = 1000):
    result_ds = []
    counter = 0
    for postal_code, neighborhood, lat, lng in zip(postal_code_list, neighborhood_list, lat_list, lng_list):
         
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, CLIENT_SECRET, VERSION, 
            lat, lng, radius, LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        tmp_dict = {}
        tmp_dict['Postal Code'] = postal_code; tmp_dict['Neighborhood(s)'] = neighborhood; 
        tmp_dict['Latitude'] = lat; tmp_dict['Longitude'] = lng;
        tmp_dict['Crawling_result'] = results;
        result_ds.append(tmp_dict)
        counter += 1
        print('{}.'.format(counter))
        print('Data is Obtained, for the Postal Code {} (and Neighborhoods {}) SUCCESSFULLY.'.format(postal_code, neighborhood))
    return result_ds;

In [14]:
CLIENT_ID = 'TP5G4DYKRUJUJP5SHJIEOTLBNQJNBJA4CKRO4GTDPR3TS1PQ' # your Foursquare ID
CLIENT_SECRET = 'CW5LCVOUJQ1A4WLKIVISPA3VPU3GKMDCNMHX5PKAAEB1CKLQ' # your Foursquare Secret
VERSION = '20180323' # Foursquare API version

In [15]:
print('Crawling different neighborhoods inside "Scarborough"')
Scarborough_foursquare_dataset = foursquare_crawler(list(scarborough_data['Postcode']),
                                                   list(scarborough_data['Neighbourhood']),
                                                   list(scarborough_data['Latitude']),
                                                   list(scarborough_data['Longitude']),)

Crawling different neighborhoods inside "Scarborough"
1.
Data is Obtained, for the Postal Code M1V (and Neighborhoods Agincourt North, Milliken) SUCCESSFULLY.
2.
Data is Obtained, for the Postal Code M1J (and Neighborhoods Scarborough Village) SUCCESSFULLY.
3.
Data is Obtained, for the Postal Code M1B (and Neighborhoods Rouge, Malvern) SUCCESSFULLY.
4.
Data is Obtained, for the Postal Code M1N (and Neighborhoods Birch Cliff) SUCCESSFULLY.
5.
Data is Obtained, for the Postal Code M1C (and Neighborhoods Highland Creek, Rouge Hill, Port Union) SUCCESSFULLY.
6.
Data is Obtained, for the Postal Code M1R (and Neighborhoods Maryvale, Wexford) SUCCESSFULLY.
7.
Data is Obtained, for the Postal Code M1T (and Neighborhoods Tam O'Shanter) SUCCESSFULLY.
8.
Data is Obtained, for the Postal Code M1E (and Neighborhoods Morningside, West Hill) SUCCESSFULLY.
9.
Data is Obtained, for the Postal Code M1L (and Neighborhoods Clairlea, Golden Mile, Oakridge) SUCCESSFULLY.
10.
Data is Obtained, for the Postal

In [17]:
import pickle
with open("Scarborough_foursquare_dataset.txt", "wb") as fp:   #Pickling
    pickle.dump(Scarborough_foursquare_dataset, fp)
print('Received Data from Internet is Saved to Computer.')

Received Data from Internet is Saved to Computer.


In [18]:
with open("Scarborough_foursquare_dataset.txt", "rb") as fp:   # Unpickling
    Scarborough_foursquare_dataset = pickle.load(fp)

In [19]:
# This function is created to connect to the saved list which is the received database. It will extract each venue 
# for every neighborhood inside the database

def get_venue_dataset(foursquare_dataset):
    result_df = pd.DataFrame(columns = ['Postal Code', 'Neighborhood', 
                                           'Neighborhood Latitude', 'Neighborhood Longitude',
                                          'Venue', 'Venue Summary', 'Venue Category', 'Distance'])
    # print(result_df)
    
    for neigh_dict in foursquare_dataset:
        postal_code = neigh_dict['Postal Code']; neigh = neigh_dict['Neighborhood(s)']
        lat = neigh_dict['Latitude']; lng = neigh_dict['Longitude']
        print('Number of Venuse in Coordination "{}" Posal Code and "{}" Negihborhood(s) is:'.format(postal_code, neigh))
        print(len(neigh_dict['Crawling_result']))
        
        for venue_dict in neigh_dict['Crawling_result']:
            summary = venue_dict['reasons']['items'][0]['summary']
            name = venue_dict['venue']['name']
            dist = venue_dict['venue']['location']['distance']
            cat =  venue_dict['venue']['categories'][0]['name']
            
            
            # print({'Postal Code': postal_code, 'Neighborhood': neigh, 
            #                   'Neighborhood Latitude': lat, 'Neighborhood Longitude':lng,
            #                   'Venue': name, 'Venue Summary': summary, 
            #                   'Venue Category': cat, 'Distance': dist})
            
            result_df = result_df.append({'Postal Code': postal_code, 'Neighborhood': neigh, 
                              'Neighborhood Latitude': lat, 'Neighborhood Longitude':lng,
                              'Venue': name, 'Venue Summary': summary, 
                              'Venue Category': cat, 'Distance': dist}, ignore_index = True)
            # print(result_df)
    
    return(result_df)

In [20]:
scarborough_venues = get_venue_dataset(Scarborough_foursquare_dataset)

Number of Venuse in Coordination "M1V" Posal Code and "Agincourt North, Milliken" Negihborhood(s) is:
25
Number of Venuse in Coordination "M1J" Posal Code and "Scarborough Village" Negihborhood(s) is:
11
Number of Venuse in Coordination "M1B" Posal Code and "Rouge, Malvern" Negihborhood(s) is:
13
Number of Venuse in Coordination "M1N" Posal Code and "Birch Cliff" Negihborhood(s) is:
15
Number of Venuse in Coordination "M1C" Posal Code and "Highland Creek, Rouge Hill, Port Union" Negihborhood(s) is:
5
Number of Venuse in Coordination "M1R" Posal Code and "Maryvale, Wexford" Negihborhood(s) is:
29
Number of Venuse in Coordination "M1T" Posal Code and "Tam O'Shanter" Negihborhood(s) is:
34
Number of Venuse in Coordination "M1E" Posal Code and "Morningside, West Hill" Negihborhood(s) is:
24
Number of Venuse in Coordination "M1L" Posal Code and "Clairlea, Golden Mile, Oakridge" Negihborhood(s) is:
29
Number of Venuse in Coordination "M1S" Posal Code and "Agincourt" Negihborhood(s) is:
50
Nu

In [21]:
scarborough_venues.head()

Unnamed: 0,Postal Code,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Summary,Venue Category,Distance
0,M1V,"Agincourt North, Milliken",43.815252,-79.284577,Fahmee Bakery & Jamaican Foods,This spot is popular,Caribbean Restaurant,669
1,M1V,"Agincourt North, Milliken",43.815252,-79.284577,Jim Chai Kee Wonton Noodle 沾仔記,This spot is popular,Noodle House,689
2,M1V,"Agincourt North, Milliken",43.815252,-79.284577,Lotus Pond Vegetarian Restaurant 蓮花素食,This spot is popular,Vegetarian / Vegan Restaurant,934
3,M1V,"Agincourt North, Milliken",43.815252,-79.284577,DaanGo Cake Lab,This spot is popular,Bakery,809
4,M1V,"Agincourt North, Milliken",43.815252,-79.284577,The Brighton Convention & Event Centre,This spot is popular,Event Space,890


In [22]:
scarborough_venues.to_csv('scarborough_venues.csv')

In [23]:
scarborough_venues = pd.read_csv('scarborough_venues.csv')

In [24]:
neigh_list = list(scarborough_venues['Neighborhood'].unique())
print('Number of Neighborhoods inside Scarborough:')
print(len(neigh_list))
print('List of Neighborhoods inside Scarborough:')
neigh_list

Number of Neighborhoods inside Scarborough:
14
List of Neighborhoods inside Scarborough:


['Agincourt North, Milliken',
 'Scarborough Village',
 'Rouge, Malvern',
 'Birch Cliff',
 'Highland Creek, Rouge Hill, Port Union',
 'Maryvale, Wexford',
 "Tam O'Shanter",
 'Morningside, West Hill',
 'Clairlea, Golden Mile, Oakridge',
 'Agincourt',
 'Woburn',
 'Ionview, Kennedy Park',
 'Dorset Park, Scarborough Town Centre, Wexford Heights',
 'Cliffcrest, Cliffside']

In [25]:
neigh_venue_summary = scarborough_venues.groupby('Neighborhood').count()
neigh_venue_summary.drop(columns = ['Unnamed: 0']).head()

Unnamed: 0_level_0,Postal Code,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Summary,Venue Category,Distance
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Agincourt,50,50,50,50,50,50,50
"Agincourt North, Milliken",25,25,25,25,25,25,25
Birch Cliff,15,15,15,15,15,15,15
"Clairlea, Golden Mile, Oakridge",29,29,29,29,29,29,29
"Cliffcrest, Cliffside",13,13,13,13,13,13,13


In [26]:
print('There are {} uniques categories.'.format(len(scarborough_venues['Venue Category'].unique())))

print('Here is the list of different categories:')
list(scarborough_venues['Venue Category'].unique())

There are 104 uniques categories.
Here is the list of different categories:


['Caribbean Restaurant',
 'Noodle House',
 'Vegetarian / Vegan Restaurant',
 'Bakery',
 'Event Space',
 'Chinese Restaurant',
 'Korean Restaurant',
 'Coffee Shop',
 'Dessert Shop',
 'Malay Restaurant',
 'Pizza Place',
 'Hobby Shop',
 'Park',
 'Fast Food Restaurant',
 'Gym',
 'Pharmacy',
 'Shop & Service',
 'Bubble Tea Shop',
 'Shopping Mall',
 'Sandwich Place',
 'Restaurant',
 'Convenience Store',
 'Train Station',
 'Japanese Restaurant',
 'Bowling Alley',
 'Spa',
 'Paper / Office Supplies Store',
 'Greek Restaurant',
 'Fruit & Vegetable Store',
 'Café',
 'Thai Restaurant',
 'General Entertainment',
 'Asian Restaurant',
 'Diner',
 'Bank',
 'Skating Rink',
 'College Stadium',
 'Discount Store',
 'Gym Pool',
 'Burger Joint',
 'Italian Restaurant',
 'Breakfast Spot',
 'Playground',
 'Vietnamese Restaurant',
 'Fish Market',
 'Middle Eastern Restaurant',
 'Seafood Restaurant',
 'Supermarket',
 'Grocery Store',
 'Indian Restaurant',
 'Badminton Court',
 'Smoke Shop',
 'Bar',
 'Intersection',

In [27]:
# one hot encoding
scarborough_onehot = pd.get_dummies(data = scarborough_venues, drop_first  = False, 
                              prefix = "", prefix_sep = "", columns = ['Venue Category'])
scarborough_onehot.head()

Unnamed: 0.1,Unnamed: 0,Postal Code,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Summary,Distance,American Restaurant,Asian Restaurant,Automotive Shop,BBQ Joint,Badminton Court,Bakery,Bank,Bar,Beach,Beer Store,Bowling Alley,Breakfast Spot,Bubble Tea Shop,Burger Joint,Bus Line,Bus Station,Business Service,Café,Cajun / Creole Restaurant,Cantonese Restaurant,Caribbean Restaurant,Chinese Restaurant,Clothing Store,Coffee Shop,College Stadium,Convenience Store,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Electronics Store,Event Space,Fast Food Restaurant,Filipino Restaurant,Fish Market,Flea Market,Food & Drink Shop,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Gas Station,General Entertainment,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hobby Shop,Indian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Latin American Restaurant,Light Rail Station,Liquor Store,Lounge,Malay Restaurant,Market,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Motorcycle Shop,Noodle House,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pool,Pool Hall,Print Shop,Pub,Rental Car Location,Rental Service,Restaurant,Sandwich Place,Seafood Restaurant,Shanghai Restaurant,Shoe Store,Shop & Service,Shopping Mall,Skating Rink,Smoke Shop,Smoothie Shop,Soccer Field,Spa,Sports Bar,Sri Lankan Restaurant,Supermarket,Sushi Restaurant,Taiwanese Restaurant,Thai Restaurant,Thrift / Vintage Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wings Joint
0,0,M1V,"Agincourt North, Milliken",43.815252,-79.284577,Fahmee Bakery & Jamaican Foods,This spot is popular,669,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,1,M1V,"Agincourt North, Milliken",43.815252,-79.284577,Jim Chai Kee Wonton Noodle 沾仔記,This spot is popular,689,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,2,M1V,"Agincourt North, Milliken",43.815252,-79.284577,Lotus Pond Vegetarian Restaurant 蓮花素食,This spot is popular,934,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
3,3,M1V,"Agincourt North, Milliken",43.815252,-79.284577,DaanGo Cake Lab,This spot is popular,809,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,4,M1V,"Agincourt North, Milliken",43.815252,-79.284577,The Brighton Convention & Event Centre,This spot is popular,890,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [31]:
# This list is created manually 
important_list_of_features = [
 
 'Neighborhood',
 'Neighborhood Latitude',
 'Neighborhood Longitude',

 'American Restaurant',
 'Asian Restaurant',

 
 'BBQ Joint',
 
 'Bakery',
 
 
 
 
 
 'Breakfast Spot',

 'Burger Joint',
 
 
 
 'Cajun / Creole Restaurant',
 'Cantonese Restaurant',
 'Caribbean Restaurant',
 'Chinese Restaurant',
 
 'Diner',


 'Fast Food Restaurant',
 'Filipino Restaurant',
 'Fish Market',
 'Food & Drink Shop',
 'Fried Chicken Joint',
 'Fruit & Vegetable Store',
 
 'Greek Restaurant',
 'Grocery Store',
  
 'Indian Restaurant',

 'Italian Restaurant',
 'Japanese Restaurant',
 'Korean Restaurant',
 'Latin American Restaurant',



 'Malay Restaurant',
 
 'Mediterranean Restaurant',
 
 'Mexican Restaurant',
 'Middle Eastern Restaurant',
 
 'Noodle House',
 
 'Pizza Place',
 
 'Restaurant',
 'Sandwich Place',
 'Seafood Restaurant',
 'Shanghai Restaurant',
 
 'Sushi Restaurant',
 'Taiwanese Restaurant',
 
 'Thai Restaurant',
 
 'Vegetarian / Vegan Restaurant',
 
 'Vietnamese Restaurant',
 'Wings Joint']

In [32]:
scarborough_onehot = scarborough_onehot[important_list_of_features].drop(
    columns = ['Neighborhood Latitude', 'Neighborhood Longitude']).groupby(
    'Neighborhood').sum()


scarborough_onehot.head()

Unnamed: 0_level_0,American Restaurant,Asian Restaurant,BBQ Joint,Bakery,Breakfast Spot,Burger Joint,Cajun / Creole Restaurant,Cantonese Restaurant,Caribbean Restaurant,Chinese Restaurant,Diner,Fast Food Restaurant,Filipino Restaurant,Fish Market,Food & Drink Shop,Fried Chicken Joint,Fruit & Vegetable Store,Greek Restaurant,Grocery Store,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Latin American Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Noodle House,Pizza Place,Restaurant,Sandwich Place,Seafood Restaurant,Shanghai Restaurant,Sushi Restaurant,Taiwanese Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1
Agincourt,0,1,1,2,1,0,0,1,2,8,0,0,1,0,0,0,0,0,1,1,0,0,0,0,1,1,0,0,1,2,1,2,1,1,1,0,0,0,1,0
"Agincourt North, Milliken",0,0,0,2,0,0,0,0,1,4,0,1,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,2,0,0,0,0,0,0,0,1,0,0
Birch Cliff,0,1,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0
"Clairlea, Golden Mile, Oakridge",0,0,0,2,0,0,0,0,0,0,1,2,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,1,1,0,0,0,0,0,0,0,0
"Cliffcrest, Cliffside",0,0,0,0,0,1,1,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,1


In [33]:
feat_name_list = list(scarborough_onehot.columns)
restaurant_list = []


for counter, value in enumerate(feat_name_list):
    if value.find('Restaurant') != (-1):
        restaurant_list.append(value)
        
scarborough_onehot['Total Restaurants'] = scarborough_onehot[restaurant_list].sum(axis = 1)
scarborough_onehot = scarborough_onehot.drop(columns = restaurant_list)


feat_name_list = list(scarborough_onehot.columns)
joint_list = []


for counter, value in enumerate(feat_name_list):
    if value.find('Joint') != (-1):
        joint_list.append(value)
        
scarborough_onehot['Total Joints'] = scarborough_onehot[joint_list].sum(axis = 1)
scarborough_onehot = scarborough_onehot.drop(columns = joint_list)

In [34]:
scarborough_onehot


Unnamed: 0_level_0,Bakery,Breakfast Spot,Diner,Fish Market,Food & Drink Shop,Fruit & Vegetable Store,Grocery Store,Noodle House,Pizza Place,Sandwich Place,Total Restaurants,Total Joints
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Agincourt,2,1,0,0,0,0,1,1,2,2,21,1
"Agincourt North, Milliken",2,0,0,0,0,0,0,1,2,0,9,0
Birch Cliff,0,0,1,0,0,0,0,0,0,0,4,0
"Clairlea, Golden Mile, Oakridge",2,0,1,0,0,0,1,0,1,1,4,0
"Cliffcrest, Cliffside",0,0,0,0,0,0,0,0,3,0,3,2
"Dorset Park, Scarborough Town Centre, Wexford Heights",2,0,0,0,0,0,1,0,1,1,13,3
"Highland Creek, Rouge Hill, Port Union",0,1,0,0,0,0,0,0,0,0,1,1
"Ionview, Kennedy Park",0,0,0,0,0,0,2,0,2,1,5,1
"Maryvale, Wexford",1,1,0,1,0,0,3,0,3,0,8,2
"Morningside, West Hill",0,0,0,0,1,0,1,0,4,1,3,2


In [35]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# run k-means clustering
kmeans = KMeans(n_clusters = 5, random_state = 0).fit(scarborough_onehot)

In [36]:
means_df = pd.DataFrame(kmeans.cluster_centers_)
means_df.columns = scarborough_onehot.columns
means_df.index = ['G1','G2','G3','G4','G5']
means_df['Total Sum'] = means_df.sum(axis = 1)
means_df.sort_values(axis = 0, by = ['Total Sum'], ascending=False)

Unnamed: 0,Bakery,Breakfast Spot,Diner,Fish Market,Food & Drink Shop,Fruit & Vegetable Store,Grocery Store,Noodle House,Pizza Place,Sandwich Place,Total Restaurants,Total Joints,Total Sum
G3,2.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,2.0,2.0,21.0,1.0,31.0
G4,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.5,1.5,1.5,12.5,2.0,20.0
G1,1.5,0.5,0.0,0.5,0.0,0.0,1.5,0.5,2.5,0.0,8.5,1.0,16.5
G5,0.0,0.0,0.0,0.0,0.3333333,0.0,1.0,0.0,3.0,0.666667,3.666667,1.666667,10.333333
G2,0.333333,0.166667,0.333333,1.387779e-17,1.387779e-17,0.166667,0.166667,0.0,0.333333,0.5,3.833333,0.166667,6.0


<h2 style='color:#34495e'>Result</h2>
<hr>
<p>Best Group is G3</p>
<p>Second Best Group is G4</p>
<p>Third Best Group is G1</p>