
## Problem Description

**There is a groceries contractor in one of the boroughs of Toronto (Scarborough). This contractor provides places such as: Different types of Restaurants, Bakery, Breakfast Spot, Brewery and Café with fresh and high-quality groceries. The contractor wants to build a warehouse for the groceries it buys from villagers and farmers inside the borough, so that they will support more customers and also bring better "Quality of Service" to the old customers.
For example, if the warehouse is close to those old and famous restaurants, then the vegetables and other groceries would be delivered to the restaurant in the right time and there would be no delay so the restaurant cooks can start their job from the morning and the Quality of Service will be high and this contractor will gain more reputation and income.¶**

## DATA REQUIRED
**[Postal Code] [Neighborhood(s)] [Neighborhood Latitude] [Neighborhood Longitude] [Venue] [Venue Summary] [Venue Category] [Distance (meter)]**
**[M1L] [Clairlea, Golden Mile, Oakridge] [43.711112] [-79.284577] [Tim Hortons] [This spot is popular] [Coffee Shop] [592]**

#### SCARPING THE DATA TABLE FROM THE WEB PAGE


In [11]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests

In [12]:
#Retrieve data from Wikipedia

url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
wiki_html = requests.get(url).text
soup = BeautifulSoup(wiki_html, 'html.parser')

data = []
for tr in soup.tbody.find_all('tr'):
    data.append([ td.get_text().strip() for td in tr.find_all('td')])

In [13]:
#2
#2.1 - Turn data into Pandas dataframe
df=pd.DataFrame(data,columns=['PostalCode','Borough','Neighborhood2'])

In [14]:
#2.2 Cleanup Borough column
# Find indexes of rows that have "Not assigned" in Borough column
indexNames = df[(df['Borough'] == "Not assigned")].index

# Drop rows that have "Not assigned" in Borough column
df.drop(indexNames,inplace=True)

# Drop the first row
df.dropna(inplace=True)

In [15]:
#2.3 Collpase data
# Combine multiple rows into one row based on PostalCode and Borough
df=df.groupby(['PostalCode','Borough'])['Neighborhood2'].apply(', '.join).reset_index()

In [61]:
# Replace "Not assigned" in Neighborhood column with the value in Borough column
def custom_fx(data):
    if data['Neighborhood2']=='Not assigned':
        var=data['Borough']
    else:
        var=data['Neighborhood2']
    return var

# Apply the function
df['Neighborhood']=df.apply(custom_fx,axis='columns')

# Check that there is no more "Not assigned" in Neighborhood column
print("There are {} rows that have 'Not assigned' in Neighborhood column in the dataframe".format(
    len(df[df['Neighborhood']=='Not assigned'])
)
     )

# Delete Neighborhood2 column
df.drop(columns='Neighborhood2')

There are 0 rows that have 'Not assigned' in Neighborhood column in the dataframe


Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,Malvern / Rouge
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek
2,M1E,Scarborough,Guildwood / Morningside / West Hill
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,Kennedy Park / Ionview / East Birchmount Park
7,M1L,Scarborough,Golden Mile / Clairlea / Oakridge
8,M1M,Scarborough,Cliffside / Cliffcrest / Scarborough Village West
9,M1N,Scarborough,Birch Cliff / Cliffside West


In [16]:
#Explore dataset
print("The shape of the dataframe is {}. The dataset has {} rows.".format
      (df.shape,df.shape[0]))

The shape of the dataframe is (103, 3). The dataset has 103 rows.


In [17]:
# Read csv file
lonlat = pd.read_csv('http://cocl.us/Geospatial_data')
lonlat.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [18]:
#2 Merging the two tables

# Column names of lonlat
print("Column names of lonlat dataframe are: {}, {}, and {}.".format(lonlat.columns[0],lonlat.columns[1],lonlat.columns[2]))
print("Column names of df dataframe are: {}, {}, and {}.".format(df.columns[0],df.columns[1],df.columns[2]))

# Change the name "Postal Code" in lonlat to "PostalCode"
lonlat.rename(columns={'Postal Code':'PostalCode'},inplace=True)

# Left join
trt_geo=pd.merge(df,lonlat,how='left',on='PostalCode')

Column names of lonlat dataframe are: Postal Code, Latitude, and Longitude.
Column names of df dataframe are: PostalCode, Borough, and Neighborhood2.


In [19]:
trt_geo.head()



Unnamed: 0,PostalCode,Borough,Neighborhood2,Latitude,Longitude
0,M1B,Scarborough,Malvern / Rouge,43.806686,-79.194353
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek,43.784535,-79.160497
2,M1E,Scarborough,Guildwood / Morningside / West Hill,43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


### obtaining the lat and longtitude values of the data

In [20]:
 !conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library



Solving environment: done

# All requested packages already installed.



In [24]:
# for the city Toronto, latitude and longtitude are manually extracted via google search
toronto_latitude = 43.6932; toronto_longitude = -79.3832
map_toronto = folium.Map(location = [toronto_latitude, toronto_longitude], zoom_start = 10.7)

# add markers to map
for lat, lng, borough, neighborhood2 in zip(trt_geo['Latitude'], trt_geo['Longitude'], trt_geo['Borough'], trt_geo['Neighborhood2']):
    label = '{}, {}'.format(neighborhood2, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_toronto)  
    

map_toronto

In [25]:
CLIENT_ID = '3X105Z1GXNBRMKD4WL2CYDKNWT3GJPGNHZD5VQD32523PXBD' # your Foursquare ID
CLIENT_SECRET = 'SI3RQU0ZCQNDPNHKREWHSCEE1ZQZ0S2PFSSS1YYH1QPEUTU2' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [26]:
scarborough_data = trt_geo[trt_geo['Borough'] == 'Scarborough'].reset_index(drop=True)
scarborough_data.head(7)

Unnamed: 0,PostalCode,Borough,Neighborhood2,Latitude,Longitude
0,M1B,Scarborough,Malvern / Rouge,43.806686,-79.194353
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek,43.784535,-79.160497
2,M1E,Scarborough,Guildwood / Morningside / West Hill,43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,Kennedy Park / Ionview / East Birchmount Park,43.727929,-79.262029


In [27]:
address_scar = 'Scarborough,Toronto'
latitude_scar = 43.773077
longitude_scar = -79.257774
print('The geograpical coordinate of Scarborough are {}, {}.'.format(latitude_scar, longitude_scar))

The geograpical coordinate of Scarborough are 43.773077, -79.257774.


In [28]:
map_scarb = folium.Map(location=[latitude_scar, longitude_scar], zoom_start=12)

# add markers to map
for lat, lng, label in zip(scarborough_data['Latitude'], scarborough_data['Longitude'], scarborough_data['Neighborhood2']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_scarb)  
    
map_scarb

In [29]:
neighborhood_latitude = scarborough_data.loc[0, 'Latitude'] # neighbourhood latitude value
neighborhood_longitude = scarborough_data.loc[0, 'Longitude'] # neighbourhood longitude value

neighborhood_name = scarborough_data.loc[0, 'Neighborhood2'] # neighbourhood name

print('Latitude and longitude values of "{}" are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of "Malvern / Rouge" are 43.806686299999996, -79.19435340000001.


In [30]:
LIMIT = 100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude_scar, longitude_scar, VERSION, radius, LIMIT)

In [31]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e8ec113d03993001bf7f9b6'},
 'response': {'headerLocation': 'Scarborough City Centre',
  'headerFullLocation': 'Scarborough City Centre, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 44,
  'suggestedBounds': {'ne': {'lat': 43.7775770045, 'lng': -79.25155367954714},
   'sw': {'lat': 43.7685769955, 'lng': -79.26399432045285}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4c059bcd7083952134097bce',
       'name': 'SEPHORA',
       'location': {'address': '300 Borough Drive',
        'crossStreet': 'at Scarborough Town Centre',
        'lat': 43.77501688366838,
        'lng': -79.25810909472256,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.77501688366838,
          'lng': -79.25810909

In [32]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [33]:
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

venues = results['response']['groups'][0]['items']  
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(10)

Unnamed: 0,name,categories,lat,lng
0,SEPHORA,Cosmetics Shop,43.775017,-79.258109
1,Disney Store,Toy / Game Store,43.775537,-79.256833
2,American Eagle Outfitters,Clothing Store,43.776012,-79.258334
3,St. Andrews Fish & Chips,Fish & Chips Shop,43.771865,-79.252645
4,Tommy Hilfiger,Clothing Store,43.776015,-79.257369
5,DAVIDsTEA,Tea Room,43.77632,-79.258688
6,Chipotle Mexican Grill,Mexican Restaurant,43.77641,-79.258069
7,Hot Topic,Clothing Store,43.77545,-79.257929
8,Shoppers Drug Mart,Pharmacy,43.773305,-79.251662
9,Coliseum Scarborough Cinemas,Movie Theater,43.775995,-79.255649


In [34]:
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

venues = results['response']['groups'][0]['items']  
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(10)

Unnamed: 0,name,categories,lat,lng
0,SEPHORA,Cosmetics Shop,43.775017,-79.258109
1,Disney Store,Toy / Game Store,43.775537,-79.256833
2,American Eagle Outfitters,Clothing Store,43.776012,-79.258334
3,St. Andrews Fish & Chips,Fish & Chips Shop,43.771865,-79.252645
4,Tommy Hilfiger,Clothing Store,43.776015,-79.257369
5,DAVIDsTEA,Tea Room,43.77632,-79.258688
6,Chipotle Mexican Grill,Mexican Restaurant,43.77641,-79.258069
7,Hot Topic,Clothing Store,43.77545,-79.257929
8,Shoppers Drug Mart,Pharmacy,43.773305,-79.251662
9,Coliseum Scarborough Cinemas,Movie Theater,43.775995,-79.255649


In [35]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

44 venues were returned by Foursquare.


In [36]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [37]:
scarborough_venues = getNearbyVenues(names=scarborough_data['Neighborhood2'],
                                   latitudes=scarborough_data['Latitude'],
                                   longitudes=scarborough_data['Longitude']
                                  )

Malvern / Rouge
Rouge Hill / Port Union / Highland Creek
Guildwood / Morningside / West Hill
Woburn
Cedarbrae
Scarborough Village
Kennedy Park / Ionview / East Birchmount Park
Golden Mile / Clairlea / Oakridge
Cliffside / Cliffcrest / Scarborough Village West
Birch Cliff / Cliffside West
Dorset Park / Wexford Heights / Scarborough Town Centre
Wexford / Maryvale
Agincourt
Clarks Corners / Tam O'Shanter / Sullivan
Milliken / Agincourt North / Steeles East / L'Amoreaux East
Steeles West / L'Amoreaux West
Upper Rouge


In [38]:
scarborough_venues.head(10)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Malvern / Rouge,43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant
1,Malvern / Rouge,43.806686,-79.194353,T Hamilton & Son Roofing Inc,43.807985,-79.198194,Construction & Landscaping
2,Rouge Hill / Port Union / Highland Creek,43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
3,Guildwood / Morningside / West Hill,43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store
4,Guildwood / Morningside / West Hill,43.763573,-79.188711,Big Bite Burrito,43.766299,-79.19072,Mexican Restaurant
5,Guildwood / Morningside / West Hill,43.763573,-79.188711,Enterprise Rent-A-Car,43.764076,-79.193406,Rental Car Location
6,Guildwood / Morningside / West Hill,43.763573,-79.188711,RBC Royal Bank,43.76679,-79.191151,Bank
7,Guildwood / Morningside / West Hill,43.763573,-79.188711,Woburn Medical Centre,43.766631,-79.192286,Medical Center
8,Guildwood / Morningside / West Hill,43.763573,-79.188711,Lawrence Ave E & Kingston Rd,43.767704,-79.18949,Intersection
9,Guildwood / Morningside / West Hill,43.763573,-79.188711,Eggsmart,43.7678,-79.190466,Breakfast Spot


In [39]:
scarborough_venues.tail(10)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
83,Steeles West / L'Amoreaux West,43.799525,-79.318389,KFC,43.798938,-79.318854,Fast Food Restaurant
84,Steeles West / L'Amoreaux West,43.799525,-79.318389,Tim Hortons,43.799102,-79.318715,Coffee Shop
85,Steeles West / L'Amoreaux West,43.799525,-79.318389,Pizza Pizza,43.797909,-79.318113,Pizza Place
86,Steeles West / L'Amoreaux West,43.799525,-79.318389,Eggsmart,43.796375,-79.318681,Breakfast Spot
87,Steeles West / L'Amoreaux West,43.799525,-79.318389,McDonald's,43.798249,-79.318167,Fast Food Restaurant
88,Steeles West / L'Amoreaux West,43.799525,-79.318389,Metro,43.79783,-79.318492,Supermarket
89,Steeles West / L'Amoreaux West,43.799525,-79.318389,Super Taste Noodle House,43.798217,-79.318513,Noodle House
90,Steeles West / L'Amoreaux West,43.799525,-79.318389,RBC Royal Bank,43.798236,-79.317952,Bank
91,Steeles West / L'Amoreaux West,43.799525,-79.318389,Taco Bell,43.798898,-79.318701,Fast Food Restaurant
92,Steeles West / L'Amoreaux West,43.799525,-79.318389,Ocean Nails Spa,43.79529,-79.320101,Nail Salon


In [40]:
scarborough_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,4,4,4,4,4,4
Birch Cliff / Cliffside West,4,4,4,4,4,4
Cedarbrae,9,9,9,9,9,9
Clarks Corners / Tam O'Shanter / Sullivan,13,13,13,13,13,13
Cliffside / Cliffcrest / Scarborough Village West,2,2,2,2,2,2
Dorset Park / Wexford Heights / Scarborough Town Centre,6,6,6,6,6,6
Golden Mile / Clairlea / Oakridge,8,8,8,8,8,8
Guildwood / Morningside / West Hill,7,7,7,7,7,7
Kennedy Park / Ionview / East Birchmount Park,4,4,4,4,4,4
Malvern / Rouge,2,2,2,2,2,2


In [41]:
print('There are {} uniques categories.'.format(len(scarborough_venues['Venue Category'].unique())))

There are 57 uniques categories.


In [42]:
# one hot encoding
scarb_onehot = pd.get_dummies(scarborough_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
scarb_onehot['Neighborhood'] = scarborough_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [scarb_onehot.columns[-1]] + list(scarb_onehot.columns[:-1])
scarb_onehot = scarb_onehot[fixed_columns]

scarb_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Athletics & Sports,Auto Garage,Bakery,Bank,Bar,Breakfast Spot,Bus Line,...,Sandwich Place,Shopping Mall,Skating Rink,Smoke Shop,Soccer Field,Supermarket,Thai Restaurant,Thrift / Vintage Store,Vietnamese Restaurant,Women's Store
0,Malvern / Rouge,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Malvern / Rouge,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Rouge Hill / Port Union / Highland Creek,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Guildwood / Morningside / West Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Guildwood / Morningside / West Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [43]:
scarb_grouped = scarb_onehot.groupby('Neighborhood').mean().reset_index()
scarb_grouped.head(7)

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Athletics & Sports,Auto Garage,Bakery,Bank,Bar,Breakfast Spot,Bus Line,...,Sandwich Place,Shopping Mall,Skating Rink,Smoke Shop,Soccer Field,Supermarket,Thai Restaurant,Thrift / Vintage Store,Vietnamese Restaurant,Women's Store
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Birch Cliff / Cliffside West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Cedarbrae,0.0,0.0,0.111111,0.0,0.111111,0.111111,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0
3,Clarks Corners / Tam O'Shanter / Sullivan,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0
4,Cliffside / Cliffcrest / Scarborough Village West,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Dorset Park / Wexford Heights / Scarborough To...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0
6,Golden Mile / Clairlea / Oakridge,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.125,...,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0


In [44]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [45]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = scarb_grouped['Neighborhood']

for ind in np.arange(scarb_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(scarb_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Latin American Restaurant,Lounge,Breakfast Spot,Chinese Restaurant,Women's Store,Construction & Landscaping,Hakka Restaurant,Grocery Store,General Entertainment,Gas Station
1,Birch Cliff / Cliffside West,College Stadium,General Entertainment,Skating Rink,Café,Women's Store,Ice Cream Shop,Hakka Restaurant,Grocery Store,Gas Station,Fried Chicken Joint
2,Cedarbrae,Caribbean Restaurant,Hakka Restaurant,Thai Restaurant,Athletics & Sports,Bakery,Bank,Gas Station,Lounge,Fried Chicken Joint,Women's Store
3,Clarks Corners / Tam O'Shanter / Sullivan,Pharmacy,Pizza Place,Intersection,Bank,Noodle House,Fried Chicken Joint,Fast Food Restaurant,Gas Station,Italian Restaurant,Thai Restaurant
4,Cliffside / Cliffcrest / Scarborough Village West,American Restaurant,Motel,Women's Store,College Stadium,Ice Cream Shop,Hakka Restaurant,Grocery Store,General Entertainment,Gas Station,Fried Chicken Joint
5,Dorset Park / Wexford Heights / Scarborough To...,Indian Restaurant,Chinese Restaurant,Thrift / Vintage Store,Vietnamese Restaurant,Pet Store,College Stadium,Hakka Restaurant,Grocery Store,General Entertainment,Gas Station
6,Golden Mile / Clairlea / Oakridge,Bakery,Intersection,Soccer Field,Metro Station,Bus Line,Park,Ice Cream Shop,Bank,Hakka Restaurant,Grocery Store
7,Guildwood / Morningside / West Hill,Intersection,Bank,Breakfast Spot,Rental Car Location,Medical Center,Electronics Store,Mexican Restaurant,Construction & Landscaping,Hakka Restaurant,Grocery Store
8,Kennedy Park / Ionview / East Birchmount Park,Coffee Shop,Bus Station,Discount Store,Department Store,College Stadium,Ice Cream Shop,Hakka Restaurant,Grocery Store,General Entertainment,Gas Station
9,Malvern / Rouge,Fast Food Restaurant,Construction & Landscaping,Women's Store,Insurance Office,Ice Cream Shop,Hakka Restaurant,Grocery Store,General Entertainment,Gas Station,Fried Chicken Joint


In [46]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

scarb_data = scarborough_data.drop(16)
# set number of clusters
kclusters = 5

scarb_grouped_clustering = scarb_grouped.drop('Neighborhood', 1)


# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(scarb_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 
#len(kmeans.labels_)#=16
#scarborough_data.shape

array([0, 0, 0, 0, 0, 0, 0, 0, 3, 4], dtype=int32)

## RESULTS

**From the above method we can find that the various areas pf toronto can be found and the data analyisis is done to explore the various neighboorhoods of the torono.With the above data one can conclude the data provided by the various sources to conclude on the decisions.**