# <center>  Final Project - The Battle of the Neighborhoods </center>
## <center>   Applied Data Science Capstone </center>
## <title> <center>  Business Site Location Analysis by Clustering and Segmenting Neighborhoods in Ottawa, ON, CANADA </center> </title>
### <center> Sher Baz Khan </center> 
### <center> March 10, 2019 </center>

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Importance of Location in Business Venturing](#location)
* [Study Area](#study_area)
* [Data Dependencies](#data)
* [Web Scraping](#web_scraping)
* [Methodology](#methodology)
* [Analysis of Neighborhoods](#analysis)
* [Cluster Neighborhoods](#cluster)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)
* [Future work](#future)
* [References](#references)

## Introduction: Business Problem <a name="introduction"></a>

In this project, We analyze the location data by segmenting and clustering neighborhoods in **Ottawa**, ON - the capital city of CANADA. The objective of this analysis is to venture out feasibility of establishing a small business enterprise in a particular neighborhood in the city of Ottawa. For this purpose, we are focusing on a particular neighborhood called **Orleans-Ottawa** which is one of the fast growing neighborhoods in the capital city. We are interested in setting up a small business enterprise, say, a **Bookstore** in **Orleans-Ottawa** neighborhood.  Therefore our objective dig out whether it is feasible to set up a **Bookstore** in the **Orleans** neighborhood or not. Our target is to check whether there is such an enterprise already existing in that neighborhood apart from other basic businesses. If there is no such business established yet, then we will of course go for it.

In this project we try to find an optimal location for a **Bookstore**. This report will be targeted to those stakeholders interested in opening a **Bookstore** in **Orleans-Ottawa**, ON. Since there are so many bookstores already established in various neighborhoods in Ottawa we will try to detect those neighborhoods where there are no such bookstores established. 

By using methods of data science we will generate some potential neighborhoods that would be based on this single criterion. We will show different commonality conditions of various neighborhoods which will provide the most optimum location to choose from by interested stakeholders. 

## Importance of Location in a Business Venturing <a name="location"></a>

When it comes to a site selection for a future business start up, location plays fundamentally vital role as it provides the direct link between demand and supply in a given marketplace - demand by customers for services or goods and supply of essential resources. The importance of location varies from one sort of business to another. In the case of a retail outlet, location must depend on the trafic flow, population density, interests of customers and many other factors. 

While establishing a new outlet in a certain location it must be reviewed and take into account several other locations. It should not be a once-only decision. Business location in necessary for successful operations and growth. The suitability of a business location and the customers potential in terms of their purchase power may lead a business to succeed or fail.

## Study Area <a name="study_area"></a>

**Ottawa** [1] is the national **capital city of Canada**. It is situated in the east of southern Ontario, near the city of Montréal. The city of Ottawa borders Gatineau of the French speaking Quebec province. Ottawa has a city population of 964,743 and a metropolitan population of 1,323,783 making it the fourth-largest city in Canada. Ottawa has been focus of attractions for traders from Europe and America in the history. It was shaped by construction of the **Rideau Canal, a UNESCO World Heritage Site** and the lumber industry. Ottawa is known for being welcoming, inclusiveness and diversity of people of different backgrounds.

Ottawa's real development started in the second half of the 20th century. In 1960s, the Greber Plan transformed the capital's appearance and removed much of the old industrial infrastructure. Ottawa became known as Silicon Valley North in 1980s when large high tech companies were established. It brought economic growth and as a consequence caused a rapid increase in population. The city amalgamated all neighboring areas in 2001. Ottawa is continuously progressing in areas such as population, transportation and economic growth [https://en.wikipedia.org/wiki/History_of_Ottawa] [2]

## Methodology <a name="methodology"></a>

We focus our attention on those areas where there is no **Bookstore** established in the neighborhoods of our interest at the time of our investigation. This will be the only criterion to quest for a prospective future business venture. Next, we cluster few neighborhoods including the neighborhood **Orleans-Ottawa** so that we would be able to analyze neighboring locations keeping the above mentioned criterion in view. We create a data frame using the **Pandas** library in **Python**. We import **KMeans** for clustering the neighborhoods from **sklearn.cluster** library. Furthermore, from **geopy.geocoders** we use **Nominatim** library in order to convert addresses into *latitude* and *longitude* values.

We use the **Foursquare API** [4] to explore various neighborhoods in Ottawa ON. We use the **explore** function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. We use the ***k*-means** clustering algorithm to complete this task. Finally, we use the **Folium** library to visualize all neighborhoods in Ottawa and their emerging clusters. We in particular focus on the the neighborhood **Orleans-Ottawa** to see the possibility of establishing a **Bookstore** in that neighborhood. Next we show maps of all such locations by creating ***k*-means clustering** method in order to specify promising areas so that it could present an optimal venue location to be considered by stakeholders.

## Data Dependencies <a name="data"></a>

In order to achieve our objective we will fetch data using the Google's search engine from the *wikipedia's* webpage *https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_K* which contains all the neighborhoods in the **City of Ottawa-ON**. This data is in raw form. We observed that the data does not include GPS coordinates corresponding to the postal codes of the neighborhoods. To overcome this issue we extract the data for respective coordinates from another web source, *https://www.gps-coordinates.net/* [4]. 

First and foremost, we will get rid of that part of data which is not related to our investigation like removing those neighborhoods of which postal codes have *not yet been assigned* or *not in use*. Before manipulating the data it is necessary to cleanse it. We then incorporate the *Latitudes* and *Longitudes* corresponding to the postal codes of all neighborhoods. 

First and foremost, lets import required libraries and download pakages we need.

In [1]:
import numpy as np 

import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata: done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2019.3.9   |       hecc5488_0         146 KB  conda-forge
    certifi-2019.3.9           |           py36_0         149 KB  conda-forge
    conda-4.6.8                |           py36_0         876 KB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    geopy-1.18.1               |             py_0          51 KB  conda-forge
    openssl-1.1.1b             |       h14c3975_1         4.0 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         5.2 MB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::g

## Web Scraping <a name="web_scraping"></a>

First of all we require a dataset in order to segement the neighborhoods which contains the location data like, postal codes of respective neighborhoods as well as * Latitudes* and *Logitudes* of corresponding neighborhoods. For this, we download the dataset from the web source https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_K and explore the data.

In [2]:
data = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_K')

df = pd.DataFrame(data[0])
df.head()

Unnamed: 0,0,1,2,3,4,5,6
0,K1AGovernment of CanadaOttawa and Gatineau off...,K2AOttawa(Highland Park / McKellar Park /Westb...,K4AOttawa(Fallingbrook),K6AHawkesbury,K7ASmiths Falls,K8APembrokeCentral and northern subdivisions,K9ACobourg
1,K1BOttawa(Blackburn Hamlet / Pine View / Sheff...,K2BOttawa(Britannia /Whitehaven / Bayshore / P...,K4BOttawa(Navan),K6BNot assigned,K7BNot assigned,K8BPembroke(Pleasant View / Fairview),K9BNot assigned
2,K1COttawa(Orleans),K2COttawa(Queensway / Copeland Park / Central ...,K4COttawa(Cumberland),K6CNot assigned,K7CCarleton Place,K8CNot assigned,K9CNot assigned
3,K1EOttawa(Queenswood),K2EOttawa(Eastern Nepean: Fisher Heights/ Park...,K4ENot assigned,K6ENot assigned,K7ENot assigned,K8ENot assigned,K9ENot assigned
4,K1GOttawa(Riverview / Hawthorne / Canterbury /...,"K2GOttawa(Centrepointe, Meadowlands, City View...",K4GNot assigned,K6GNot assigned,K7GGananoque,K8GNot assigned,K9GNot assigned


As we see above that this dataset is in the raw form. It does not contain the *Latitude* and *Longitude* values. This needs to be refined and be cleansed. In order to accomplish this we save this data into a csv file and incorporate the *Latitude* and *Longitude* values corresponding to the postal codes. We use another web source https://www.gps-coordinates.net to get these values. 

We then upload the updated csv file named *final project.csv*. Then we transform this data into a *pandas* dataframe. 

In [3]:
data = pd.read_csv("final_project.csv") 
df = pd.DataFrame(data)
df.head()

Unnamed: 0,PostalCode,Neighborhoods,Latitude,Longitude
0,K1B,Blackburn Hamlet / Pine View / Sheffield Glen,45.42042,-75.59603
1,K1C,Orleans,45.46253,-75.53009
2,K1E,Queenswood,45.47389,-75.5054
3,K1G,Riverview / Hawthorne / Canterbury / Hunt Club...,45.38954,-75.62517
4,K1H,Alta Vista / Billings Bridge,45.38777,-75.65906


And see how many rows does this data frame have.

In [4]:
df.shape

(40, 4)

#### Let's create a map of Ottawa-ON with neighborhoods superimposed on top.

In [5]:
address = 'Ottawa, ON'

geolocator = Nominatim(user_agent="ot_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Ottawa, ON are: ({}, {}).'.format(latitude, longitude))

The geograpical coordinate of Ottawa, ON are: (45.421106, -75.690308).


In [6]:
# create map of Ottawa, ON using latitude and longitude values
map_ottawa = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, code, neighborhood in zip(df['Latitude'], df['Longitude'], df['PostalCode'], df['Neighborhoods']):
    label = '{}, {}'.format(code, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_ottawa)  
    
map_ottawa

We used the **Folium** library for visualiztion purpose. One can just click on any red circle mark to see the name of the neighborhood corresponding to its postal code. One can also zoom in/out the above map.

#### Clustering neighborhoods including **Orleans-Ottawa**.

We focus our attention on only one neighborhood to segment and cluster and that is *Oleans-Ottawa* neighborhood. 
Let's create an another data frame out of the origninal one.

In [7]:
df_data = df[(df['PostalCode'] == 'K1C') | (df['PostalCode'] == 'K1B') | (df['PostalCode'] == 'K1E')].reset_index(drop=True)
df_data.head()

Unnamed: 0,PostalCode,Neighborhoods,Latitude,Longitude
0,K1B,Blackburn Hamlet / Pine View / Sheffield Glen,45.42042,-75.59603
1,K1C,Orleans,45.46253,-75.53009
2,K1E,Queenswood,45.47389,-75.5054


We now start segmenting the neighborhoods and explore using the **Foursquare API**.

#### Let's define Foursquare Credentials and Version.

In [8]:
CLIENT_ID = 'ACP2CI0OP4HWKIAAMMLRATKK2WE1GUO24BOY3HTPTTGGZLBI' # your Foursquare ID
CLIENT_SECRET = 'VFF1A3QNFR5XZWBFJRWSRIDDWXWTLI0UO1FRHIARV0VDBBFM' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ACP2CI0OP4HWKIAAMMLRATKK2WE1GUO24BOY3HTPTTGGZLBI
CLIENT_SECRET:VFF1A3QNFR5XZWBFJRWSRIDDWXWTLI0UO1FRHIARV0VDBBFM


Getting the neighborhood of our interest by cearting a subdata frame.

In [9]:
df.loc[1, 'Neighborhoods']

'Orleans'

In [10]:
neighborhood_latitude = df.loc[1, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df.loc[1, 'Longitude'] # neighborhood longitude value

neighborhood_name = df.loc[1, 'Neighborhoods'] # neighborhood name

print('Latitude and longitude values of Neighborhood: {} are: ({}, {}).'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Neighborhood: Orleans are: (45.46253, -75.53009).


#### We now obtain the top 100 venues that are in **Orleans-Ottawa** neighborhood within a radius of 500 meters.

We first create the GET request URL.

In [11]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=ACP2CI0OP4HWKIAAMMLRATKK2WE1GUO24BOY3HTPTTGGZLBI&client_secret=VFF1A3QNFR5XZWBFJRWSRIDDWXWTLI0UO1FRHIARV0VDBBFM&v=20180604&ll=45.46253,-75.53009&radius=500&limit=100'

Let's examine the results out of the GET request.

In [12]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c8925d4db04f5650474a251'},
 'response': {'headerLocation': 'Chapel Hill',
  'headerFullLocation': 'Chapel Hill, Ottawa',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 4,
  'suggestedBounds': {'ne': {'lat': 45.4670300045, 'lng': -75.52368600944428},
   'sw': {'lat': 45.4580299955, 'lng': -75.53649399055573}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4ba6418cf964a520a93f39e3',
       'name': 'Rexall Drugstore',
       'location': {'address': '1615 Orleans Blvd',
        'lat': 45.46141611779227,
        'lng': -75.5250399098691,
        'labeledLatLngs': [{'label': 'display',
          'lat': 45.46141611779227,
          'lng': -75.5250399098691}],
        'distance': 413,
        'postalCode': 'K1C 7E2',
    

Let's get the request **get_category_type**  to get the information in the **items** key from the **Foursquare API**.


In [13]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

We now clean the json and structure it into a **pandas** dataframe.

In [14]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Rexall Drugstore,Pharmacy,45.461416,-75.52504
1,Boomerang Kids,Thrift / Vintage Store,45.461037,-75.524897
2,Louis Perrault Park,Playground,45.458599,-75.530604
3,Cedar Valley Lebanese Food,Restaurant,45.461094,-75.524313


Let's see the number of venues were returned by Foursquare.

In [17]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

4 venues were returned by Foursquare.


## Explore all Neighborhoods including Orleans-Ottawa

Let's create a function to repeat the same process to all the neighborhoods in Ottawa.

In [18]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhoods', 
                  'Neighborhoods Latitude', 
                  'Neighborhoods Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now write the code to run the above function on each neighborhood and create a new dataframe called *df_venues*.

In [19]:
df_venues = getNearbyVenues(names=df['Neighborhoods'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Blackburn Hamlet / Pine View / Sheffield Glen
Orleans
Queenswood
Riverview / Hawthorne / Canterbury / Hunt Club Park
Alta Vista / Billings Bridge
Beacon Hill / Cyrville / Carson Grove
Overbrook / Forbes/ Manor Park / Viscount Alexander Park / Finter Quarries
Vanier/ McKay Lake area
Rockcliffe Park / New Edinburgh
Lower Town / Byward Market / Sandy Hill / University of Ottawa
Downtown
Dalhousie Ward
The Glebe / Old Ottawa South / Old Ottawa East / Carleton University / Dow's Lake area
Blossom Park / Greenboro / Leitrim) / Findlay Creek
Heron Gate / Heron Park / Riverside Park / Hunt Club / Riverside South / YOW
Chapel Hill South / Blackburn
South Gloucester
Civic Hospital / Island Park / Hintonburg / Mechanicsville / Champlain Park
Westboro / Carlington
Highland Park / McKellar Park /Westboro /Glabar Park /Carlingwood
Britannia /Whitehaven / Bayshore / Pinecrest
Queensway / Copeland Park / Central Park / Bel Air /Carleton Heights
Eastern Nepean: Fisher Heights/ Parkwood Hills / Borden F

Let's see the size of the resulting dataframe.

In [20]:
print(df_venues.shape)
df_venues.head()

(527, 7)


Unnamed: 0,Neighborhoods,Neighborhoods Latitude,Neighborhoods Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Blackburn Hamlet / Pine View / Sheffield Glen,45.42042,-75.59603,Apple Saddlery,45.419272,-75.59959,Shoe Store
1,Blackburn Hamlet / Pine View / Sheffield Glen,45.42042,-75.59603,Eclipse Asian Cuisine,45.418327,-75.596766,Asian Restaurant
2,Blackburn Hamlet / Pine View / Sheffield Glen,45.42042,-75.59603,Sushi Kan,45.418031,-75.597849,Sushi Restaurant
3,Blackburn Hamlet / Pine View / Sheffield Glen,45.42042,-75.59603,Big Al's Aquarium Services,45.418138,-75.59599,Pet Store
4,Blackburn Hamlet / Pine View / Sheffield Glen,45.42042,-75.59603,Marks,45.418654,-75.595985,Clothing Store


Let's find out how many venues were returned for each neighborhood.

In [21]:
df_venues.groupby('Neighborhoods').count()

Unnamed: 0_level_0,Neighborhoods Latitude,Neighborhoods Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhoods,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alta Vista / Billings Bridge,3,3,3,3,3,3
Barrhaven,38,38,38,38,38,38
Beacon Hill / Cyrville / Carson Grove,5,5,5,5,5,5
Beaverbrook / South March,3,3,3,3,3,3
Bells Corners / Arlington Woods/Redwood / Qualicum / Crystal Beach,4,4,4,4,4,4
Blackburn Hamlet / Pine View / Sheffield Glen,11,11,11,11,11,11
Blossom Park / Greenboro / Leitrim) / Findlay Creek,11,11,11,11,11,11
Bridlewood,4,4,4,4,4,4
Britannia /Whitehaven / Bayshore / Pinecrest,15,15,15,15,15,15
Centrepointe / Meadowlands / City View / Craig Henry / Tangelwood / Grenfell Glen / Davidson Heights,4,4,4,4,4,4


Now we find out unique categories that have been created from all the returned venues.

In [22]:
print('There are {} uniques categories.'.format(len(df_venues['Venue Category'].unique())))

There are 160 uniques categories.


## Analysis of Neighborhoods <a name="analysis"></a>

We now start analyzing neighborhoods.

In [23]:

df_onehot = pd.get_dummies(df_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
df_onehot['Neighborhoods'] = df_venues['Neighborhoods'] 

# move neighborhood column to the first column
fixed_columns = [df_onehot.columns[-1]] + list(df_onehot.columns[:-1])
df_onehot = df_onehot[fixed_columns]

df_onehot.head()

Unnamed: 0,Neighborhoods,Adult Boutique,Airport Terminal,American Restaurant,Art Gallery,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Workshop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Beer Bar,Beer Store,Belgian Restaurant,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Station,Café,Camera Store,Cantonese Restaurant,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cupcake Shop,Curling Ice,Deli / Bodega,Department Store,Diner,Discount Store,Dive Bar,Dog Run,Electronics Store,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Space,Farmers Market,Fast Food Restaurant,Flea Market,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gas Station,Gastropub,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hardware Store,Health Food Store,Historic Site,Hobby Shop,Hockey Arena,Home Service,Hostel,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irish Pub,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Korean Restaurant,Library,Liquor Store,Lounge,Malay Restaurant,Memorial Site,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Moving Target,Museum,National Park,New American Restaurant,Newsstand,Noodle House,Office,Other Nightlife,Other Repair Shop,Outdoor Supply Store,Paper / Office Supplies Store,Park,Persian Restaurant,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pool,Portuguese Restaurant,Pub,Record Shop,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Smoothie Shop,Soccer Field,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Street Art,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Theme Park,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Yoga Studio
0,Blackburn Hamlet / Pine View / Sheffield Glen,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Blackburn Hamlet / Pine View / Sheffield Glen,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Blackburn Hamlet / Pine View / Sheffield Glen,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Blackburn Hamlet / Pine View / Sheffield Glen,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Blackburn Hamlet / Pine View / Sheffield Glen,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And see the size of the new dataframe.

In [24]:
df_onehot.shape

(527, 161)

Let's take the mean of the frequency of occurrence of each category by grouping neighborhoods.

In [25]:
df_grouped = df_onehot.groupby('Neighborhoods').mean().reset_index()
df_grouped

Unnamed: 0,Neighborhoods,Adult Boutique,Airport Terminal,American Restaurant,Art Gallery,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Workshop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Beer Bar,Beer Store,Belgian Restaurant,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Station,Café,Camera Store,Cantonese Restaurant,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cupcake Shop,Curling Ice,Deli / Bodega,Department Store,Diner,Discount Store,Dive Bar,Dog Run,Electronics Store,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Space,Farmers Market,Fast Food Restaurant,Flea Market,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gas Station,Gastropub,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hardware Store,Health Food Store,Historic Site,Hobby Shop,Hockey Arena,Home Service,Hostel,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irish Pub,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Korean Restaurant,Library,Liquor Store,Lounge,Malay Restaurant,Memorial Site,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Moving Target,Museum,National Park,New American Restaurant,Newsstand,Noodle House,Office,Other Nightlife,Other Repair Shop,Outdoor Supply Store,Paper / Office Supplies Store,Park,Persian Restaurant,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pool,Portuguese Restaurant,Pub,Record Shop,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Smoothie Shop,Soccer Field,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Street Art,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Theme Park,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Yoga Studio
0,Alta Vista / Billings Bridge,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Barrhaven,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.078947,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.026316,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.078947,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.026316,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.078947,0.0,0.0,0.026316,0.0,0.026316,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.026316,0.0,0.026316,0.0,0.0,0.0
2,Beacon Hill / Cyrville / Carson Grove,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Beaverbrook / South March,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bells Corners / Arlington Woods/Redwood / Qual...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Blackburn Hamlet / Pine View / Sheffield Glen,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.090909,0.0,0.0,0.090909,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Blossom Park / Greenboro / Leitrim) / Findlay ...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.181818,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Bridlewood,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Britannia /Whitehaven / Bayshore / Pinecrest,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Centrepointe / Meadowlands / City View / Craig...,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Let's see the new size of the dataframe in order to confirm the grouped size.

In [26]:
df_grouped.shape

(37, 161)

Let's look at top 5 most common venues of each neighborhood.

In [27]:
num_top_venues = 5

for hood in df_grouped['Neighborhoods']:
    print("----"+hood+"----")
    temp = df_grouped[df_grouped['Neighborhoods'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Alta Vista / Billings Bridge----
                  venue  freq
0         Moving Target  0.33
1            Playground  0.33
2  Fast Food Restaurant  0.33
3     Other Repair Shop  0.00
4         National Park  0.00


----Barrhaven----
                  venue  freq
0  Fast Food Restaurant  0.11
1            Restaurant  0.08
2         Grocery Store  0.08
3           Coffee Shop  0.08
4   American Restaurant  0.05


----Beacon Hill / Cyrville / Carson Grove----
                   venue  freq
0  Entertainment Service   0.2
1                   Café   0.2
2         Sandwich Place   0.2
3     Mexican Restaurant   0.2
4       Malay Restaurant   0.2


----Beaverbrook / South March----
                  venue  freq
0  Fast Food Restaurant  0.33
1                  Café  0.33
2           Coffee Shop  0.33
3        Adult Boutique  0.00
4  Outdoor Supply Store  0.00


----Bells Corners / Arlington Woods/Redwood / Qualicum / Crystal Beach----
                  venue  freq
0          Liquor Store  0

Let's write a function to sort the venues in descending order by putting it into a new *pandas* dataframe.

In [28]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Let's display the top 10 venues for each neighborhood.

In [29]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhoods']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhoods'] = df_grouped['Neighborhoods']

for ind in np.arange(df_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(df_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alta Vista / Billings Bridge,Playground,Fast Food Restaurant,Moving Target,Convenience Store,Farmers Market,Event Space,Ethiopian Restaurant,Entertainment Service,English Restaurant,Electronics Store
1,Barrhaven,Fast Food Restaurant,Coffee Shop,Restaurant,Grocery Store,Pizza Place,American Restaurant,Pet Store,Department Store,Hardware Store,Movie Theater
2,Beacon Hill / Cyrville / Carson Grove,Mexican Restaurant,Malay Restaurant,Café,Sandwich Place,Entertainment Service,Yoga Studio,Electronics Store,Farmers Market,Event Space,Ethiopian Restaurant
3,Beaverbrook / South March,Fast Food Restaurant,Coffee Shop,Café,Yoga Studio,Electronics Store,Farmers Market,Event Space,Ethiopian Restaurant,Entertainment Service,English Restaurant
4,Bells Corners / Arlington Woods/Redwood / Qual...,Grocery Store,Pharmacy,Clothing Store,Liquor Store,Dog Run,Farmers Market,Event Space,Ethiopian Restaurant,Entertainment Service,English Restaurant


## Cluster Neighborhoods  <a name="cluster"></a>

Here we now run *k*-means to cluster the neighborhoods into 5 clusters.

In [35]:
# set number of clusters
kclusters = 5

df_grouped_clustering = df_grouped.drop('Neighborhoods', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(df_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 1, 1, 1, 3, 1, 1], dtype=int32)

We now create a new dataframe showing top 10 venues for the **Orleans-Ottawa** neighborhood that includes the cluster.

In [37]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

df_merged = df_data

# merge ottawa_grouped with ottawa_data to add latitude/longitude for all neighborhoods
df_merged = df_merged.join(neighborhoods_venues_sorted.set_index('Neighborhoods'), on='Neighborhoods')

df_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Neighborhoods,Latitude,Longitude,Cluster Labels,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,K1B,Blackburn Hamlet / Pine View / Sheffield Glen,45.42042,-75.59603,1,1,Food Truck,Clothing Store,Furniture / Home Store,Sushi Restaurant,Shoe Store,Golf Course,Fast Food Restaurant,Park,Hardware Store,Asian Restaurant
1,K1C,Orleans,45.46253,-75.53009,1,1,Playground,Pharmacy,Restaurant,Thrift / Vintage Store,Dive Bar,Farmers Market,Event Space,Ethiopian Restaurant,Entertainment Service,English Restaurant
2,K1E,Queenswood,45.47389,-75.5054,3,3,Park,Home Service,Dog Run,Fast Food Restaurant,Farmers Market,Event Space,Ethiopian Restaurant,Entertainment Service,English Restaurant,Electronics Store


In [39]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_merged['Latitude'], df_merged['Longitude'], df_merged['Neighborhoods'], df_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [40]:
Orleans_ottawa = df_merged.loc[df_merged['Cluster Labels'] == 1, df_merged.columns[[1] + list(range(5, df_merged.shape[1]))]]
Orleans_ottawa.head()

Unnamed: 0,Neighborhoods,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Blackburn Hamlet / Pine View / Sheffield Glen,1,Food Truck,Clothing Store,Furniture / Home Store,Sushi Restaurant,Shoe Store,Golf Course,Fast Food Restaurant,Park,Hardware Store,Asian Restaurant
1,Orleans,1,Playground,Pharmacy,Restaurant,Thrift / Vintage Store,Dive Bar,Farmers Market,Event Space,Ethiopian Restaurant,Entertainment Service,English Restaurant


## Results and Discussion <a name="results"></a>

We have found that many **Bookstores** are already set up in various neighborhoods in the city of **Ottawa, ON**. However, there are locations in certain neighborhoods where there is a shortage of such business ventures. There seem to be higher chances to setup a new business like a **Bookstore** when there is acute need of one, keeping some other vital factors in mind as mentioned above. It is clearly seen from the above analysis that in the neighborhood of our interest i.e., **Orleans-Ottawa** where there is no such business (**Bookstore**) established at the time of our investigation. 

It shows that the  neighborhood of our interest provides an optimum location for stakeholders in their quest for investment. It is worth mentioning here that though there are other vital factors where a prospective investor should look into apart from the scarcity of similar businesses in a particular location or neighborhood. For a prospective business owner there must be more than one locations available to choose the best one. It is therefore, we have mentioned in our analysis more than one neighborhood apart from **Orleans-Ottawa** so that a stakeholder should have more than one options available to choose the best location.

It was our interest in this project that we could be able to identify that some locations exit in certain neighborhoods where one can investigate in search for a future business venture. But this does not mean that those are the only optimal locations for setting up a new **Bookstore**! The objective of our analysis was to provide a first hand information on exploring areas in the **city of Ottawa** with either existing or non-existing **Bookstores** in various neighborhoods. It is to be noted that the locations identified in our analysis should be considered as a starting point only. Therefore, it is recommended that factors like customers' participation in the prospective business, accessibility, growth, and many other relevant factors and conditions must also be taken into account.

## Conclusion <a name="conclusion"></a>

Our objective in this project was to explore various areas of Ottawa-Ontario in order to facilitate stakeholders somehow in searching out options for optimal locations for establishing a new bookstore. Using the Foursquare data analysis we identified neighborhoods to generate a collection of locations which satisfy our basic and sole requirement regarding the existence of bookstores in various neighborhoods of the city. Potential locations were clustered in order to create neighborhoods of interest so that exploration may be carried out by stakeholders. 

Options in this direction were made available to select from so that interested stakeholders may find it easy to choose in view of other relevant conditions. It will provide a ready-made choices for interested investors to have an estimated decision based on other vital characteristics of locations in certain neighborhoods.

## Future work <a name="future"></a>

We know that the City of Ottawa is the corporate entity of municipal government in Ottawa, Ontario, Canada. This corporation is responsible for providing services to the public. Administratively, Ottawa is composed of 23 wards. Each ward is represented by city councilor and a mayor. 

In order to extend the analysis presented in this project one can obtain a new dataset containing all the neighborhoods in conjunction with all the 23 wards in the city of Ottawa. By doing this it can open further avenues of in-depth analysis so that stakeholders can diversify their search in the quest for more suitable locations by providing more available options. 

## References <a name="reference"></a>

1. Postal_Codes_of_Ottawa_K — Wikipedia
2. Ottawa — Wikipedia
3. GPS-Coordinates.NET
4. Forsquare API