# Peer-graded Assignment: Capstone Project - The Battle of Neighborhoods

## Introduction

### A description of the problem and a discussion of the background.

*Clearly define a problem or an idea of your choice, where you would need to leverage the Foursquare location data to solve or execute. Remember that data science problems always target an audience and are meant to help a group of stakeholders solve a problem, so make sure that you explicitly describe your audience and why they would care about your problem.*

The problem or idea to be explored is:  

***If someone is looking to open a restaurant in Toronto, what neighborhoods would be best for them to open it?***

For potential restaurant owners, the success of their business venture will rely on consideration of population and income of the target demographics in the neighbourhoods as well as consideration of the existing competitors that the company may face. If a restaurant owner were to open a restaurant that is in a low-population area, the brand may not become widly recognized or attended by locals or attract people from other areas, affecting the revenue. If a restaurant is open in an area with low-income the restaurant will again not likely have many customers or frequency of repeat visits even if the population is there as the customer base may not be able to afford dinning out. Combining low-income and low-population would be an aggrivated scenario affecting the ability for the restaurant to continue operations. Moreover, even in well populated and good income neighbourhoods, if there are a lot of well-known restaurant options already existing, customer loyalty might be strong and the new openning ignored. All of these factors are important to consider in a new venture as they directly impact profitability and the possible longevity of the restaurants existence. Therefore, for potential restaurant owners, the problem of location is worth paying attention to and doing some analysis before even starting the search for a venue. 

### A description of the data and how it will be used to solve the problem.

*Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.*

The necessary information about demographics including neighborhood population and incomes will be sources primarily from the Toronto city census website (https://www.toronto.ca/city-government/data-research-maps/open-data/open-data-catalogue/#8c732154-5012-9afe-d0cd-ba3ffc813d5a). This information is publicly accessible. 

The second phase, once screaning of neighbourhoods has been done to identify the most attractive neighborhoods, will be to determine the competitors that already exist in these target neighbourhoods. This can be considered in a few different ways including the simple number of restaurants per neighborhood, the number of people per restaurant in a neighborhood, or the quality score rating of the restaurants in the neighborhood. This data on competitors will be collected via Foursquare API.

## Pre-analysis download and set-up

In [None]:
# library to handle data in a vectorized manner
import numpy as np 

# library for data analsysis
import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import matplotlib.cm as cm
import matplotlib.colors as colors

# convert an address into latitude and longitude values
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim 

# map rendering library
!conda install -c conda-forge folium=0.5.0 --yes 
import folium 

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2020.4.5.2         |   py36h9f0ad1d_0         152 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    geopy-1.22.0               |     pyh9f0ad1d_0          63 KB  conda-forge
    ca-certificates-2020.4.5.2 |       hecda079_0         147 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0           conda-forge
    geopy:          

In [None]:
# Toronto Census Data - Neighbourhood Profiles 2016 (CSV)
# https://www.toronto.ca/city-government/data-research-maps/open-data/open-data-catalogue/#8c732154-5012-9afe-d0cd-ba3ffc813d5a

csv_path='https://www.toronto.ca/ext/open_data/catalog/data_set_files/2016_neighbourhood_profiles.csv'
df = pd.read_csv(csv_path,encoding='latin1')
print('Data loaded')

## Methodology

Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, if any, and what machine learnings were used and why.

## Results

Results section where you discuss the results.

## Discussion

Discussion section where you discuss any observations you noted and any recommendations you can make based on the results.

## Conclusion

Conclusion section where you conclude the report.

## Data Analysis Work

In [None]:
df.head()

In [None]:
#Collecting neighbourhood names
Neighbourhoods = list(df.columns.values)
Neighbourhoods = Neighbourhoods[5:]
print(Neighbourhoods)

#Building a new data frame with the neighbourhoods, population, and income
dfToronto = pd.DataFrame(index=Neighbourhoods, columns=["Population_2016","Income_2016"])

#Populating the dataset
# Population_2016 = Population, 2016
# Income_2016 = Total income: Average amount ($)

for index, row in dfToronto.iterrows():
    dfToronto.at[index, 'Population_2016'] = df[index][2]
    dfToronto.at[index, 'Income_2016'] = df[index][2264]
    

dfToronto.sort_values('Income_2016')