<h1 style="color:red;"> Capstone Project: Battle of Neighborhoods</h1>

### Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">
<ol>
    <li><a href="#1.-Introduction---Business-Problem">Introduction - Business Problem</a>        
    <li><a href="#2.Data">Data</a>        
    <li><a href="#3.-Methodology">Methodology</a>
</ol>
</div>

<h2 style="color:blue;">1. <u>Introduction - Business Problem</u> </h2>
<h3>Opening a Yoga Studio</h3>

<i>I want to open a new Yoga Studio in Toronto. Where should I open it? </i>
This is one of the questions that an entrepreneur would need to answer before starting his or her business. Location is key for a new business and one that requires careful research. In this capstone project, I will attempt to solve this problem through data analysis

<h2 style="color:blue;">2.<u>Data</u></h2>

To solve the problem, we will need data from various sources: 
<ul>
    <li><b>Foursquare</b>: we will be leveraging the existing venues for each neighborhood in order to analyze how many yoga studios are in each neighborhood. This will be used to assess the competition in each neighborhood</li>
    <li><b>Demographics</b>: age and sex. Since yoga studios tend to be popular among women, we will be taking this into consideration in our analysis. We will be looking at the number of female population for each neighborhood. This data will be retrieved from the Open Data section on the website of the City of Toronto. </li>
    <li><b>Average income per neighborhood</b>: It's important to know the purchasing power of the residents of each neighborhood before establishing a business. This will also be retrieved from the City of Toronto website.</li?
    
</ul>
Contains information licensed under the Open Government Licence – Toronto. Link: https://open.toronto.ca/open-data-license/

<h2 style="color:blue;">3. <u>Methodology</u></h2>

### Neighborhood Data Profile
In this section we will be building the Neighborhood data profile which is composed of Demographics (Age, sex), number of population in each neighborhood and the average income. 

#### Data Cleaning and Wrangling

In [1]:
import pandas as pd
import numpy as np
import wget
import requests
import csv
import geocoder
import geopy
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium
import json 
print("Libraries imported")

Libraries imported


Let's import our source file: 

In [2]:
csvfile = "C:/Users/nirin/OneDrive/Documents/Capstone Data/neighbourhood-profiles-2016.csv"
dfnp = pd.read_csv(csvfile,
                 header=0,
                 delimiter=',',                 
                 quotechar='"',
                 error_bad_lines=False,
                 engine='python')
dfnp.shape

(2383, 146)

In [3]:
dfnp.tail()

Unnamed: 0,_id,Category,Topic,Data Source,Characteristic,City of Toronto,Agincourt North,Agincourt South-Malvern West,Alderwood,Annex,...,Willowdale West,Willowridge-Martingrove-Richview,Woburn,Woodbine Corridor,Woodbine-Lumsden,Wychwood,Yonge-Eglinton,Yonge-St.Clair,York University Heights,Yorkdale-Glen Park
2378,2379,Mobility,Mobility status - Place of residence 5 years ago,Census Profile 98-316-X2016001,Migrants,400950,3170,3145,925,6390,...,3765,2270,7260,985,620,1350,2425,2310,4965,1345
2379,2380,Mobility,Mobility status - Place of residence 5 years ago,Census Profile 98-316-X2016001,Internal migrants,184120,880,980,680,3930,...,1545,1110,1720,610,395,780,1260,1355,1700,580
2380,2381,Mobility,Mobility status - Place of residence 5 years ago,Census Profile 98-316-X2016001,Intraprovincial migrants,141135,735,760,615,2630,...,1070,960,1400,350,320,570,970,1025,1490,445
2381,2382,Mobility,Mobility status - Place of residence 5 years ago,Census Profile 98-316-X2016001,Interprovincial migrants,42985,135,220,70,1310,...,475,150,335,250,85,210,290,325,195,135
2382,2383,Mobility,Mobility status - Place of residence 5 years ago,Census Profile 98-316-X2016001,External migrants,216835,2280,2170,245,2460,...,2220,1175,5540,395,220,575,1160,955,3285,775


Let's drop the rows based on the value of the Topic and Categories since not all of the information will be useful

In [4]:
dfnp = dfnp.drop(dfnp[(dfnp["Category"].isin(['Aboriginal peoples',
                              'Ethnic origin',
                              'Housing',
                              'Immigration and citizenship',
                              'Journey to work',
                              'Language',
                              'Mobility',
                              'Neighbourhood Information',
                              'Visible Minority',
                              'Language of work'
                             ]))].index)
dfnp.shape

(709, 146)

In [5]:
dfnp = dfnp.drop(dfnp[(dfnp["Topic"].isin([
    'Income sources',
    'Household and dwelling characteristics',
    'Family characteristics',
    'Household type',
    'Family characteristics of adults',
    'Income of households in 2015',
    'Income of economic families in 2015',
    'Low income in 2015',
    'Major field of study - Classification of Instructional Programs (CIP) 2016',
    'Location of study compared with province or territory of residence with countries outside Canada',
    'Work activity during the reference year',
    'Class of worker',
    'Occupation - National Occupational Classification (NOC) 2016',
    'Industry - North American Industry Classification System (NAICS) 2012',
    'Place of work status',
    'Income taxes',
    'Highest certificate, diploma or degree',
    'Population and dwellings',
    'Visible minority population'
        
]))].index)
dfnp.shape

(161, 146)

In [6]:
dfnp = dfnp.drop(dfnp[(dfnp["Characteristic"].isin([
    'Children (0-14 years)',
'Youth (15-24 years)',
'Working Age (25-54 years)',
'Pre-retirement (55-64 years)',
'Seniors (65+ years)',
'Older Seniors (85+ years)',
'Marital status for the population aged 15 years and over',
'Total - Income statistics in 2015 for the population aged 15 years and over in private households',
'Number of total income recipients aged 15 years and over in private households',
'Median total income in 2015 among recipients ($)',
'Number of after-tax income recipients aged 15 years and over in private households - 100% data',
'Median after-tax income in 2015 among recipients ($)',
'Number of market income recipients aged 15 years and over in private households - 100% data',
'Median market income in 2015 among recipients ($)',
'Number of government transfers recipients aged 15 years and over in private households - 100% data',
'Median government transfers in 2015 among recipients ($)',
'Number of employment income recipients aged 15 years and over in private households - 100% data',
'Median employment income in 2015 among recipients ($)',
'Total - Income statistics in 2015 for the population aged 15 years and over in private households - 25% sample data',
'Number of total income recipients aged 15 years and over in private households - 25% sample data',
'Average total income in 2015 among recipients ($)',
'Number of after-tax income recipients aged 15 years and over in private households - 25% sample data',
'Average after-tax income in 2015 among recipients ($)',
'Number of market income recipients aged 15 years and over in private households - 25% sample data',
'Average market income in 2015 among recipients ($)',
'Number of government transfers recipients aged 15 years and over in private households - 25% sample data',
'Average government transfers in 2015 among recipients ($)',
'Number of employment income recipients aged 15 years and over in private households - 25% sample data',
'Average employment income in 2015 among recipients ($)',
'Total - Employment income statistics for the population aged 15 years and over in private households - 25% sample data',
'Number of employment income recipients aged 15 years and over in private households who worked full year full time in 2015 - 25% sample data',
'Median employment income in 2015 for full-year full-time workers ($)',
'Average employment income in 2015 for full-year full-time workers ($)',
'Composition of total income in 2015 of the population aged 15 years and over in private households (%) - 100% data',
'Market income (%)',
'Employment income (%)',
'Government transfers (%)',
'Total - Population aged 15 years and over by Labour force status - 25% sample data'    
]))].index)
                      
dfnp.shape

(146, 146)

In [7]:
dfnp.dropna(subset = ["Annex"], inplace=True)
dfnp = dfnp.drop(dfnp[(dfnp["_id"].isin(['946','947','948','949','950','952','953','954','959','960','961','962','963','964','966','967','968','969','975','976','977','978','979','980','981','982','983','984','985','986','987','988','989','990','991','992','993','994','995','996','997','998','999','1000','1001','1002','1003','1004','1005','1006','1007','1008','1009','1010','1011','1012','1013','1014','1016','1017','1892','1893','1894','1895','1896','1897','1899','1900','1901','1902','1903','1904','1905','1907'
]))].index)
dfnp = dfnp.drop(labels=['Data Source','Category','City of Toronto','_id'],axis=1)


Now that our data is clean, let's create data frames based on the Topic and Visualize them

### Age - Sex

In [9]:
df_age = dfnp[(dfnp["Topic"]=="Age characteristics")]
df_age=df_age.drop(labels = ['Topic'],axis = 1)
df_age.set_index('Characteristic', inplace=True)
df_age = df_age.transpose()
df_age.head()

Visualizing the age by neighborhood

In [16]:
df_age.head(10)

Characteristic,Male: 0 to 04 years,Male: 05 to 09 years,Male: 10 to 14 years,Male: 15 to 19 years,Male: 20 to 24 years,Male: 25 to 29 years,Male: 30 to 34 years,Male: 35 to 39 years,Male: 40 to 44 years,Male: 45 to 49 years,...,Female: 55 to 59 years,Female: 60 to 64 years,Female: 65 to 69 years,Female: 70 to 74 years,Female: 75 to 79 years,Female: 80 to 84 years,Female: 85 to 89 years,Female: 90 to 94 years,Female: 95 to 99 years,Female: 100 years and over
Agincourt North,660,695,660,840,1015,1015,835,680,760,890,...,1165,1070,985,690,575,485,350,160,60,10
Agincourt South-Malvern West,575,540,460,780,1000,1045,820,625,610,760,...,915,795,690,450,405,350,205,100,20,0
Alderwood,360,270,225,285,355,355,410,455,420,440,...,485,400,325,210,180,210,130,70,5,5
Annex,445,365,325,465,1215,2080,1610,1055,835,850,...,915,940,950,700,565,425,345,260,90,25
Banbury-Don Mills,570,660,675,715,700,645,735,735,815,1010,...,1005,895,955,790,730,650,615,360,105,20
Bathurst Manor,435,355,415,490,530,465,485,580,435,535,...,605,470,415,275,280,285,265,165,45,10
Bay Street Corridor,470,230,130,585,2485,2115,1695,1010,560,500,...,500,425,445,320,250,170,135,50,5,0
Bayview Village,455,395,410,520,735,1075,1040,805,685,605,...,775,660,585,440,355,315,230,115,25,0
Bayview Woods-Steeles,205,260,320,385,445,405,285,230,310,390,...,525,475,520,400,400,330,265,180,55,10
Bedford Park-Nortown,675,795,880,880,765,405,450,590,665,760,...,885,730,660,505,390,305,260,140,35,10


<h2 style="color:blue;">4. <u>Results</u></h2
    

<h2 style="color:blue;">5. <u>Discussion</u></h2>

<h2 style="color:blue;">6. <u>Conclusion</u><h2>