# Food recipes distribution system for venues

### Introduction


In this distribution system distributer want to build a warehouse where it is closest to all venues so that it provide quality of service as well as minimum transportation cost.Let's start to build such system.

#### Installing and importing necessary dependency

In [1]:
# importing libraries
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
from bs4 import BeautifulSoup
import requests # library to handle requests
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
import geopy.geocoders # convert an address into latitude and longitude values

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries are downloaded and imported.')

Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.18.1               |             py_0          51 KB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          84 KB

The following NEW packages will be INSTALLED:

    geographiclib: 1.49-py_0     conda-forge

The following packages will be UPDATED:

    geopy:         1.11.0-py36_0 conda-forge --> 1.18.1-py_0 conda-forge


Downloading and Extracting Packages
geopy-1.18.1         | 51 KB     | ##################################### | 100% 
geographiclib-1.49   | 32 KB     | ##################################### | 100% 
Preparing transaction: done

### Preprocessing required data

In [3]:
# extracting raw data from internet preprocess it in pandas dataframe.

wikipedia_link='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M' #address of page
raw_wikipedia_page= requests.get(wikipedia_link).text

# using beautiful soup to parse the HTML/XML codes.
soup = BeautifulSoup(raw_wikipedia_page,'xml')

# extracting table from raw data
table = soup.find('table')

# intializing variable to store raw data
Postcode      = []
Borough       = []
Neighbourhood = []

# extracting a clean form of the table
for tr_cell in table.find_all('tr'):
    counter = 1
    Postcode_var      = -1
    Borough_var       = -1
    Neighbourhood_var = -1
    
    for td_cell in tr_cell.find_all('td'):
        if counter == 1: 
            Postcode_var = td_cell.text
        if counter == 2: 
            Borough_var = td_cell.text
            tag_a_Borough = td_cell.find('a')
        if counter == 3: 
            Neighbourhood_var = str(td_cell.text).strip()
            tag_a_Neighbourhood = td_cell.find('a')
            
        counter += 1
        
    if (Postcode_var == 'Not assigned' or Borough_var == 'Not assigned' or Neighbourhood_var == 'Not assigned'): 
        continue
    try:
        if ((tag_a_Borough is None) or (tag_a_Neighbourhood is None)):
            continue
    except:
        pass
    if(Postcode_var == -1 or Borough_var == -1 or Neighbourhood_var == -1):
        continue
        
    Postcode.append(Postcode_var)
    Borough.append(Borough_var)
    Neighbourhood.append(Neighbourhood_var)
    
# integrating postal code with more then one neighbourhood.
unique_p = set(Postcode)
print('num of unique Postal codes:', len(unique_p))
Postcode_u      = []
Borough_u       = []
Neighbourhood_u = []

for postcode_unique_element in unique_p:
    p_var = ''; b_var = ''; n_var = ''; 
    for postcode_idx, postcode_element in enumerate(Postcode):
        if postcode_unique_element == postcode_element:
            p_var = postcode_element;
            b_var = Borough[postcode_idx]
            if n_var == '': 
                n_var = Neighbourhood[postcode_idx]
            else:
                n_var = n_var + ', ' + Neighbourhood[postcode_idx]
    Postcode_u.append(p_var)
    Borough_u.append(b_var)
    Neighbourhood_u.append(n_var)
    
# converting variable data to pandas dataframe.
toronto_dict = {'Postcode':Postcode_u, 'Borough':Borough_u, 'Neighbourhood':Neighbourhood_u}
df_toronto = pd.DataFrame.from_dict(toronto_dict)
df_toronto.to_csv('toronto_post_neighbourhood.csv')
df_toronto.head()

num of unique Postal codes: 84


Unnamed: 0,Postcode,Borough,Neighbourhood
0,M9A,Etobicoke,Islington Avenue
1,M4H,East York,Thorncliffe Park
2,M1B,Scarborough,"Rouge, Malvern"
3,M9W,Etobicoke,Northwest
4,M9L,North York,Humber Summit


### Preprocessing coordinate data and combine with postal code

In [5]:
# extracting each latitude and longitude in toronto location
df_geo = pd.read_csv('http://cocl.us/Geospatial_data')
df_geo.columns = ['Postcode', 'Latitude', 'Longitude']
df_geo.set_index("Postcode", inplace=True)

latitude = []
longitude = []
for elem in Postcode_u:
    if elem in df_geo.index.values:
        latitude.append(df_geo.loc[elem]['Latitude'])
        longitude.append(df_geo.loc[elem]['Longitude'])
print('lacation latitude and longitude extracted.')

# creating final processed data
toronto_dict_ = {'Postcode':Postcode_u, 'Borough':Borough_u, 'Neighbourhood':Neighbourhood_u,
              'Latitude': latitude, 'Longitude':longitude}
df_toronto_ = pd.DataFrame.from_dict(toronto_dict_)
df_toronto_.to_csv('toronto_location.csv')
df_toronto_.head()


lacation latitude and longitude extracted.


Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
1,M4H,East York,Thorncliffe Park,43.705369,-79.349372
2,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
3,M9W,Etobicoke,Northwest,43.706748,-79.594054
4,M9L,North York,Humber Summit,43.756303,-79.565963


In [13]:
df_toronto = pd.read_csv('toronto_location.csv')
df_toronto.head()

Unnamed: 0.1,Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,0,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
1,1,M4H,East York,Thorncliffe Park,43.705369,-79.349372
2,2,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
3,3,M9W,Etobicoke,Northwest,43.706748,-79.594054
4,4,M9L,North York,Humber Summit,43.756303,-79.565963


### visualizing map of toronto

In [15]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude 
# getting location of toronto.
address = 'Toronto, Canada'
geolocator = Nominatim(user_agent="tor_explorer")
location = geolocator.geocode(address)
toronto_latitude = location.latitude
toronto_longitude = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))


The geograpical coordinate of Toronto City are 43.653963, -79.387207.


In [16]:

map_toronto = folium.Map(location = [toronto_latitude, toronto_longitude], zoom_start = 10.7)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Borough'], df_toronto['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_toronto)  
    

map_toronto

#### Getting information regarding only "Scarorough" Borough in Toronto.

In [17]:
# selecting only neighborhoods regarding to "Scarborough" borough.
scarborough_data = df_toronto[df_toronto['Borough'] == 'Scarborough']
scarborough_data = scarborough_data.reset_index(drop=True).drop(columns = 'Unnamed: 0')
scarborough_data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1T,Scarborough,Tam O'Shanter,43.781638,-79.304302
2,M1W,Scarborough,Steeles West,43.799525,-79.318389
3,M1X,Scarborough,Upper Rouge,43.836125,-79.205636
4,M1M,Scarborough,"Cliffcrest, Cliffside",43.716316,-79.239476


#### Map of Scarborough

In [19]:
address_scarborough = 'Scarborough, Toronto'
location = geolocator.geocode(address_scarborough)
scarborough_latitude = location.latitude
scarborough_longitude = location.longitude
print('The geograpical coordinate of scarborough City are {}, {}.'.format(scarborough_latitude, scarborough_longitude))

map_scarborough = folium.Map(location = [scarborough_latitude, scarborough_longitude], zoom_start = 12)

# add markers to map
for lat, lng, borough, neighborhood in zip(scarborough_data['Latitude'], scarborough_data['Longitude'], scarborough_data['Borough'], scarborough_data['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_scarborough)  
    

map_scarborough

The geograpical coordinate of scarborough City are 43.773077, -79.257774.


#### Getting venues list in scarborough

In [28]:
def foursquare_data (postal_code_list, neighborhood_list, lat_list, lng_list, LIMIT = 500, radius = 1000):
    result_ds = []
    counter = 0
    for postal_code, neighborhood, lat, lng in zip(postal_code_list, neighborhood_list, lat_list, lng_list):
         
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, CLIENT_SECRET, VERSION, 
            lat, lng, radius, LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        tmp_dict = {}
        tmp_dict['Postal Code'] = postal_code; tmp_dict['Neighborhood(s)'] = neighborhood; 
        tmp_dict['Latitude'] = lat; tmp_dict['Longitude'] = lng;
        tmp_dict['Crawling_result'] = results;
        result_ds.append(tmp_dict)
        counter += 1
        print('{}.'.format(counter))
        print('Data is Obtained, for the Postal Code {} (and Neighborhoods {}) SUCCESSFULLY.'.format(postal_code, neighborhood))
    return result_ds;

In [29]:

print('getting different neighborhoods inside "Scarborough"')
Scarborough_foursquare_dataset = foursquare_data(list(scarborough_data['Postcode']),
                                                   list(scarborough_data['Neighbourhood']),
                                                   list(scarborough_data['Latitude']),
                                                   list(scarborough_data['Longitude']),)

getting different neighborhoods inside "Scarborough"
1.
Data is Obtained, for the Postal Code M1B (and Neighborhoods Rouge, Malvern) SUCCESSFULLY.
2.
Data is Obtained, for the Postal Code M1T (and Neighborhoods Tam O'Shanter) SUCCESSFULLY.
3.
Data is Obtained, for the Postal Code M1W (and Neighborhoods Steeles West) SUCCESSFULLY.
4.
Data is Obtained, for the Postal Code M1X (and Neighborhoods Upper Rouge) SUCCESSFULLY.
5.
Data is Obtained, for the Postal Code M1M (and Neighborhoods Cliffcrest, Cliffside) SUCCESSFULLY.
6.
Data is Obtained, for the Postal Code M1C (and Neighborhoods Highland Creek, Rouge Hill, Port Union) SUCCESSFULLY.
7.
Data is Obtained, for the Postal Code M1R (and Neighborhoods Maryvale, Wexford) SUCCESSFULLY.
8.
Data is Obtained, for the Postal Code M1J (and Neighborhoods Scarborough Village) SUCCESSFULLY.
9.
Data is Obtained, for the Postal Code M1V (and Neighborhoods Agincourt North, Milliken) SUCCESSFULLY.
10.
Data is Obtained, for the Postal Code M1L (and Neighb

In [30]:
import pickle
with open("Scarborough_foursquare_dataset.txt", "wb") as fp:   #Pickling
    pickle.dump(Scarborough_foursquare_dataset, fp)
print('Received Data from Internet is Saved to Computer.')

Received Data from Internet is Saved to Computer.


In [31]:
with open("Scarborough_foursquare_dataset.txt", "rb") as fp:   # Unpickling
    Scarborough_foursquare_dataset = pickle.load(fp)

In [32]:
def get_venue_dataset(foursquare_dataset):
    result_df = pd.DataFrame(columns = ['Postal Code', 'Neighborhood', 
                                           'Neighborhood Latitude', 'Neighborhood Longitude',
                                          'Venue', 'Venue Summary', 'Venue Category', 'Distance'])
    
    for neigh_dict in foursquare_dataset:
        postal_code = neigh_dict['Postal Code']; neigh = neigh_dict['Neighborhood(s)']
        lat = neigh_dict['Latitude']; lng = neigh_dict['Longitude']
        print('Number of Venuse in Coordination "{}" Posal Code and "{}" Negihborhood(s) is:'.format(postal_code, neigh))
        print(len(neigh_dict['Crawling_result']))
        
        for venue_dict in neigh_dict['Crawling_result']:
            summary = venue_dict['reasons']['items'][0]['summary']
            name = venue_dict['venue']['name']
            dist = venue_dict['venue']['location']['distance']
            cat =  venue_dict['venue']['categories'][0]['name']
            result_df = result_df.append({'Postal Code': postal_code, 'Neighborhood': neigh, 
                              'Neighborhood Latitude': lat, 'Neighborhood Longitude':lng,
                              'Venue': name, 'Venue Summary': summary, 
                              'Venue Category': cat, 'Distance': dist}, ignore_index = True)
            
    
    return(result_df)

In [33]:
scarborough_venues = get_venue_dataset(Scarborough_foursquare_dataset)

Number of Venuse in Coordination "M1B" Posal Code and "Rouge, Malvern" Negihborhood(s) is:
16
Number of Venuse in Coordination "M1T" Posal Code and "Tam O'Shanter" Negihborhood(s) is:
30
Number of Venuse in Coordination "M1W" Posal Code and "Steeles West" Negihborhood(s) is:
24
Number of Venuse in Coordination "M1X" Posal Code and "Upper Rouge" Negihborhood(s) is:
0
Number of Venuse in Coordination "M1M" Posal Code and "Cliffcrest, Cliffside" Negihborhood(s) is:
13
Number of Venuse in Coordination "M1C" Posal Code and "Highland Creek, Rouge Hill, Port Union" Negihborhood(s) is:
4
Number of Venuse in Coordination "M1R" Posal Code and "Maryvale, Wexford" Negihborhood(s) is:
25
Number of Venuse in Coordination "M1J" Posal Code and "Scarborough Village" Negihborhood(s) is:
11
Number of Venuse in Coordination "M1V" Posal Code and "Agincourt North, Milliken" Negihborhood(s) is:
30
Number of Venuse in Coordination "M1L" Posal Code and "Clairlea, Golden Mile, Oakridge" Negihborhood(s) is:
29
N

In [20]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [25]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [23]:
# defining Foursquare credentials
CLIENT_ID = 'LIZZF2ALYNUXJTO1P4VLHTEC3SOTSK0BXJ4GQRGZNO1C13A2' # your Foursquare ID
CLIENT_SECRET = 'KUHL2UCVJNINHAZDDHUSBMQVJPJSSWHAGYYTQXZAVPRHXXWC' # your Foursquare Secret
VERSION = '20180605'
LIMIT = 500
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: LIZZF2ALYNUXJTO1P4VLHTEC3SOTSK0BXJ4GQRGZNO1C13A2
CLIENT_SECRET:KUHL2UCVJNINHAZDDHUSBMQVJPJSSWHAGYYTQXZAVPRHXXWC


In [26]:
scarborough_venues = getNearbyVenues(names=scarborough_data['Neighbourhood'],
                                   latitudes=scarborough_data['Latitude'],
                                   longitudes=scarborough_data['Longitude']
                                  )

Rouge, Malvern
Tam O'Shanter
Steeles West
Upper Rouge
Cliffcrest, Cliffside
Highland Creek, Rouge Hill, Port Union
Maryvale, Wexford
Scarborough Village
Agincourt North, Milliken
Clairlea, Golden Mile, Oakridge
Woburn
Ionview, Kennedy Park
Birch Cliff
Morningside, West Hill
Cedarbrae
Agincourt
Dorset Park, Scarborough Town Centre, Wexford Heights


In [34]:
scarborough_venues.head()

Unnamed: 0,Postal Code,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Summary,Venue Category,Distance
0,M1B,"Rouge, Malvern",43.806686,-79.194353,Images Salon & Spa,This spot is popular,Spa,595
1,M1B,"Rouge, Malvern",43.806686,-79.194353,Caribbean Wave,This spot is popular,Caribbean Restaurant,912
2,M1B,"Rouge, Malvern",43.806686,-79.194353,Wendy's,This spot is popular,Fast Food Restaurant,600
3,M1B,"Rouge, Malvern",43.806686,-79.194353,Harvey's,This spot is popular,Fast Food Restaurant,796
4,M1B,"Rouge, Malvern",43.806686,-79.194353,Wendy's,This spot is popular,Fast Food Restaurant,387


In [35]:
# saving clean data
scarborough_venues.to_csv('scarborough_venues.csv')

In [4]:
# loading clean data
scarborough_venues = pd.read_csv('scarborough_venues.csv')
scarborough_venues.drop('Unnamed: 0', axis=1, inplace=True)
scarborough_venues.head()

Unnamed: 0,Postal Code,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Summary,Venue Category,Distance
0,M1B,"Rouge, Malvern",43.806686,-79.194353,Images Salon & Spa,This spot is popular,Spa,595
1,M1B,"Rouge, Malvern",43.806686,-79.194353,Caribbean Wave,This spot is popular,Caribbean Restaurant,912
2,M1B,"Rouge, Malvern",43.806686,-79.194353,Wendy's,This spot is popular,Fast Food Restaurant,600
3,M1B,"Rouge, Malvern",43.806686,-79.194353,Harvey's,This spot is popular,Fast Food Restaurant,796
4,M1B,"Rouge, Malvern",43.806686,-79.194353,Wendy's,This spot is popular,Fast Food Restaurant,387


In [5]:
# Some Summary Information about Neighborhoods inside "Scarborough"
neigh_list = list(scarborough_venues['Neighborhood'].unique())
print('Number of Neighborhoods inside Scarborough: {}'.format(len(neigh_list)))
print('List of Neighborhoods inside Scarborough: {}'.format(neigh_list))


Number of Neighborhoods inside Scarborough: 16
List of Neighborhoods inside Scarborough: ['Rouge, Malvern', "Tam O'Shanter", 'Steeles West', 'Cliffcrest, Cliffside', 'Highland Creek, Rouge Hill, Port Union', 'Maryvale, Wexford', 'Scarborough Village', 'Agincourt North, Milliken', 'Clairlea, Golden Mile, Oakridge', 'Woburn', 'Ionview, Kennedy Park', 'Birch Cliff', 'Morningside, West Hill', 'Cedarbrae', 'Agincourt', 'Dorset Park, Scarborough Town Centre, Wexford Heights']


In [6]:
neigh_venue_summary = scarborough_venues.groupby('Neighborhood').count()
neigh_venue_summary.head()

Unnamed: 0_level_0,Postal Code,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Summary,Venue Category,Distance
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Agincourt,47,47,47,47,47,47,47
"Agincourt North, Milliken",30,30,30,30,30,30,30
Birch Cliff,14,14,14,14,14,14,14
Cedarbrae,30,30,30,30,30,30,30
"Clairlea, Golden Mile, Oakridge",29,29,29,29,29,29,29


In [7]:
print('There are {} uniques categories.'.format(len(scarborough_venues['Venue Category'].unique())))

print('Here is the list of different categories:')
list(scarborough_venues['Venue Category'].unique())

There are 115 uniques categories.
Here is the list of different categories:


['Spa',
 'Caribbean Restaurant',
 'Fast Food Restaurant',
 'Coffee Shop',
 'Paper / Office Supplies Store',
 'Hobby Shop',
 'Martial Arts Dojo',
 'African Restaurant',
 'Chinese Restaurant',
 'Greek Restaurant',
 'Fruit & Vegetable Store',
 'Gym',
 'Sandwich Place',
 'Italian Restaurant',
 'Noodle House',
 'Pharmacy',
 'Seafood Restaurant',
 'Cantonese Restaurant',
 'Mexican Restaurant',
 'Thai Restaurant',
 'Vietnamese Restaurant',
 'Fried Chicken Joint',
 'Pizza Place',
 'Intersection',
 'Park',
 'Shopping Mall',
 'Golf Course',
 'Taiwanese Restaurant',
 'Video Game Store',
 'Market',
 'Grocery Store',
 'Bakery',
 'Hotpot Restaurant',
 'Japanese Restaurant',
 'Breakfast Spot',
 'Cosmetics Shop',
 'Thrift / Vintage Store',
 'Bank',
 'Other Great Outdoors',
 'Tennis Court',
 'Gym Pool',
 'Beach',
 'Furniture / Home Store',
 'Cajun / Creole Restaurant',
 'Sports Bar',
 'Wings Joint',
 'Burger Joint',
 'Playground',
 'Korean Restaurant',
 'Fish Market',
 'Middle Eastern Restaurant',
 'Su

In [8]:
# one hot encoding
scarborough_onehot = pd.get_dummies(data = scarborough_venues, drop_first  = False, 
                              prefix = "", prefix_sep = "", columns = ['Venue Category'])
scarborough_onehot.head()

Unnamed: 0,Postal Code,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Summary,Distance,African Restaurant,American Restaurant,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Badminton Court,Bakery,Bank,Bar,Beach,Beer Store,Bowling Alley,Breakfast Spot,Bubble Tea Shop,Burger Joint,Bus Line,Bus Station,Café,Cajun / Creole Restaurant,Cantonese Restaurant,Caribbean Restaurant,Chinese Restaurant,Coffee Shop,College Stadium,Convenience Store,Cosmetics Shop,Department Store,Dessert Shop,Diner,Discount Store,Electronics Store,Event Space,Fast Food Restaurant,Filipino Restaurant,Fish Market,Flea Market,Food & Drink Shop,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,General Entertainment,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hakka Restaurant,Hobby Shop,Home Service,Hong Kong Restaurant,Hookah Bar,Hotpot Restaurant,Indian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Latin American Restaurant,Light Rail Station,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Motorcycle Shop,Music Store,Noodle House,Optical Shop,Other Great Outdoors,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pool,Pool Hall,Print Shop,Pub,Rental Car Location,Rental Service,Restaurant,Sandwich Place,Seafood Restaurant,Shanghai Restaurant,Shop & Service,Shopping Mall,Skating Rink,Smoke Shop,Smoothie Shop,Soccer Field,Spa,Sporting Goods Shop,Sports Bar,Sri Lankan Restaurant,Supermarket,Sushi Restaurant,Taiwanese Restaurant,Tennis Court,Thai Restaurant,Thrift / Vintage Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wings Joint,Yoga Studio
0,M1B,"Rouge, Malvern",43.806686,-79.194353,Images Salon & Spa,This spot is popular,595,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,M1B,"Rouge, Malvern",43.806686,-79.194353,Caribbean Wave,This spot is popular,912,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,M1B,"Rouge, Malvern",43.806686,-79.194353,Wendy's,This spot is popular,600,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,M1B,"Rouge, Malvern",43.806686,-79.194353,Harvey's,This spot is popular,796,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,M1B,"Rouge, Malvern",43.806686,-79.194353,Wendy's,This spot is popular,387,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [9]:
# This list is created manually which can take food recipes 
important_list_of_features = [
 
 'Neighborhood',
 'Neighborhood Latitude',
 'Neighborhood Longitude',
 'African Restaurant',
 'American Restaurant',
 'Asian Restaurant',
 'BBQ Joint',
 'Bakery',
 'Breakfast Spot',
 'Burger Joint',
 'Cajun / Creole Restaurant',
 'Cantonese Restaurant',
 'Caribbean Restaurant',
 'Chinese Restaurant',
 'Diner',
 'Fast Food Restaurant',
 'Filipino Restaurant',
 'Fish Market',
 'Food & Drink Shop',
 'Fried Chicken Joint',
 'Fruit & Vegetable Store',
 'Greek Restaurant',
 'Grocery Store',
 'Hakka Restaurant',
 'Hong Kong Restaurant',
 'Hotpot Restaurant',
 'Indian Restaurant',
 'Italian Restaurant',
 'Japanese Restaurant',
 'Korean Restaurant',
 'Latin American Restaurant',
 'Malay Restaurant',
 'Mediterranean Restaurant',
 'Mexican Restaurant',
 'Middle Eastern Restaurant',
 'Noodle House',
 'Pizza Place',
 'Restaurant',
 'Sandwich Place',
 'Seafood Restaurant',
 'Shanghai Restaurant',
 'Sushi Restaurant',
 'Taiwanese Restaurant',
 'Thai Restaurant',
 'Vegetarian / Vegan Restaurant',
 'Vietnamese Restaurant',
 'Wings Joint']

In [10]:
# Updating the One-hot Encoded DataFrame and Grouping the Data by Neighborhoods
scarborough_onehot = scarborough_onehot[important_list_of_features].drop(
    columns = ['Neighborhood Latitude', 'Neighborhood Longitude']).groupby(
    'Neighborhood').sum()
scarborough_onehot.head()

Unnamed: 0_level_0,African Restaurant,American Restaurant,Asian Restaurant,BBQ Joint,Bakery,Breakfast Spot,Burger Joint,Cajun / Creole Restaurant,Cantonese Restaurant,Caribbean Restaurant,Chinese Restaurant,Diner,Fast Food Restaurant,Filipino Restaurant,Fish Market,Food & Drink Shop,Fried Chicken Joint,Fruit & Vegetable Store,Greek Restaurant,Grocery Store,Hakka Restaurant,Hong Kong Restaurant,Hotpot Restaurant,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Latin American Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Noodle House,Pizza Place,Restaurant,Sandwich Place,Seafood Restaurant,Shanghai Restaurant,Sushi Restaurant,Taiwanese Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1
Agincourt,0,1,1,0,2,1,0,0,1,2,7,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,1,0,0,1,1,1,2,1,1,1,0,0,0,1,0
"Agincourt North, Milliken",0,0,0,2,2,0,0,0,0,1,6,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,2,0,1,0,0,0,2,2,0,0,0,0,0,0,0,1,0,0
Birch Cliff,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0
Cedarbrae,0,0,0,0,4,0,1,0,0,1,1,0,1,0,0,0,1,0,0,1,1,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1
"Clairlea, Golden Mile, Oakridge",0,0,0,0,2,0,0,0,0,0,0,1,2,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0


Integrating Different Restaurants and Different Joints
(Assuming Different Resaturants Use the Same Raw Groceries)
This Assumption is made for simplicity and due to not having very large dataset about neighborhoods.

In [11]:
feat_name_list = list(scarborough_onehot.columns)
restaurant_list = []


for counter, value in enumerate(feat_name_list):
    if value.find('Restaurant') != (-1):
        restaurant_list.append(value)
        
scarborough_onehot['Total Restaurants'] = scarborough_onehot[restaurant_list].sum(axis = 1)
scarborough_onehot = scarborough_onehot.drop(columns = restaurant_list)


feat_name_list = list(scarborough_onehot.columns)
joint_list = []


for counter, value in enumerate(feat_name_list):
    if value.find('Joint') != (-1):
        joint_list.append(value)
        
scarborough_onehot['Total Joints'] = scarborough_onehot[joint_list].sum(axis = 1)
scarborough_onehot = scarborough_onehot.drop(columns = joint_list)

Showing the Fully-Processed DataFrame about Neighborhoods inside Scarborrough.
This Dataset is Ready for any Machine Learning Algorithm.

In [12]:
scarborough_onehot

Unnamed: 0_level_0,Bakery,Breakfast Spot,Diner,Fish Market,Food & Drink Shop,Fruit & Vegetable Store,Grocery Store,Noodle House,Pizza Place,Sandwich Place,Total Restaurants,Total Joints
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Agincourt,2,1,0,0,0,0,0,1,1,2,22,0
"Agincourt North, Milliken",2,0,0,0,0,0,0,2,2,0,12,2
Birch Cliff,0,0,1,0,0,0,0,0,0,0,3,0
Cedarbrae,4,0,0,0,0,0,1,0,0,0,7,3
"Clairlea, Golden Mile, Oakridge",2,0,1,0,0,0,1,0,1,1,3,0
"Cliffcrest, Cliffside",0,0,0,0,0,0,0,0,3,0,3,2
"Dorset Park, Scarborough Town Centre, Wexford Heights",1,0,0,0,0,0,1,0,1,1,13,4
"Highland Creek, Rouge Hill, Port Union",0,1,0,0,0,0,0,0,0,0,1,1
"Ionview, Kennedy Park",0,0,0,0,0,0,2,0,1,1,6,1
"Maryvale, Wexford",1,1,0,1,0,0,2,0,2,0,8,1


#### Run k-means to Cluster Neighborhoods into 5 Clusters

In [13]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# run k-means clustering
kmeans = KMeans(n_clusters = 5, random_state = 0).fit(scarborough_onehot)

#### Showing Centers of Each Cluster

In [14]:
means_df = pd.DataFrame(kmeans.cluster_centers_)
means_df.columns = scarborough_onehot.columns
means_df.index = ['G1','G2','G3','G4','G5']
means_df['Total Sum'] = means_df.sum(axis = 1)
means_df.sort_values(axis = 0, by = ['Total Sum'], ascending=False)

Unnamed: 0,Bakery,Breakfast Spot,Diner,Fish Market,Food & Drink Shop,Fruit & Vegetable Store,Grocery Store,Noodle House,Pizza Place,Sandwich Place,Total Restaurants,Total Joints,Total Sum
G1,2.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,2.0,22.0,0.0,29.0
G3,1.0,0.0,0.0,0.0,0.0,0.0,0.666667,1.0,1.333333,1.0,12.666667,2.333333,20.0
G5,4.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,7.0,3.0,15.0
G2,0.5,0.333333,0.0,0.166667,0.166667,0.166667,1.0,0.0,1.333333,0.833333,6.333333,0.666667,11.5
G4,0.4,0.2,0.4,0.0,0.0,0.0,0.2,0.0,0.8,0.2,2.6,0.6,5.4


### Result:
> - Best Group is G1;
> - Second Best Group is G3;
> - Third Best Group is G5;

Inserting "kmeans.labels_" into the Original Scarborough DataFrame and Finding the Corresponding Group for Each Neighborhood.

In [16]:
neigh_summary = pd.DataFrame([scarborough_onehot.index, 1 + kmeans.labels_]).T
neigh_summary.columns = ['Neighborhood', 'Group']
neigh_summary

Unnamed: 0,Neighborhood,Group
0,Agincourt,1
1,"Agincourt North, Milliken",3
2,Birch Cliff,4
3,Cedarbrae,5
4,"Clairlea, Golden Mile, Oakridge",4
5,"Cliffcrest, Cliffside",4
6,"Dorset Park, Scarborough Town Centre, Wexford ...",3
7,"Highland Creek, Rouge Hill, Port Union",4
8,"Ionview, Kennedy Park",2
9,"Maryvale, Wexford",2


## Best Neighborhood result.

In [17]:
neigh_summary[neigh_summary['Group'] == 1]

Unnamed: 0,Neighborhood,Group
0,Agincourt,1


## Second best group

In [18]:
neigh_summary[neigh_summary['Group'] == 3]

Unnamed: 0,Neighborhood,Group
1,"Agincourt North, Milliken",3
6,"Dorset Park, Scarborough Town Centre, Wexford ...",3
14,Tam O'Shanter,3


## Thired best group

In [19]:
neigh_summary[neigh_summary['Group'] == 5]

Unnamed: 0,Neighborhood,Group
3,Cedarbrae,5


# Thank you