# Capstone Project - (Week 1)
### Applied Data Science Capstone by IBM

 ## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project our main focus is to find a suitable and **popular business** that promises to be successful in the **city of Toronto.**

This project will be particularly **beneficial for entrepreneurs** who want to start a business but don't have any ideas in particular.
I plan to find popular business in **multiple neighborhoods** of Toronto and see how many of them are there in that particular area, we'll also do a **frequency count of popular venues** in Toronto (to see the choice of people)

After that we'll identify neighborhoods in Toronto where those popular business are either absent or the frequency is very less, thus we'll choose a neighborhood which is farthest from the popular venues (and don't have one) so that we can set-up our business idea in that area with high chances of getting success.

This will also **beneficial for the community** of that neighborhood since they'll not have to go to long places to get the same thing which is now in thier own neighborhood.

## Data <a name="data"></a>

execution using the data | explaination 
According to the problem statement defined above, we'll get our data in following steps:
1. Get the list of Boroughs and neighborhoods from the **internet**
2. Get the coordinates of those neighborhoods from **geocode** or any other souce
3. Get the list of venues (max. 100 per neighborhood) using **Foursquare** in 500m radius of the neighborhoods 

Execution strategy using the above mentioned data:
Once we have the data with the category of venues under 500m radius of all neighborhoods in Toronto
* group the data based on the venue category (so as to count the number of venues in a particular category)
* Analyze popular venues in the city using a bar graph (to get the category as well as the count of venues)
* Choose top 3 feasible businesses (after sorting the data which have a high frequency in the city)
* Mark those businesses in the city map using folium to identify the spread of the venues in the city and find spots which are far from the venues
* After identifying potential neighborhoods where we can start the business this model is ready to give start-up ideas to entrepreneurs 

### Getting the list of neighborhoods

In [3]:
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

In [4]:
# The code was removed by Watson Studio for sharing.

In [5]:
df_data_0.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M4N,Central Toronto,Lawrence Park
1,M5N,Central Toronto,Roselawn
2,M4P,Central Toronto,Davisville North
3,M5P,Central Toronto,Forest Hill North & West
4,M4R,Central Toronto,North Toronto West


### Getting the coordinates of neighborhoods

Importing essential libraries

In [6]:
import numpy as np 

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json 

from geopy.geocoders import Nominatim 

import requests 
from pandas.io.json import json_normalize 

import matplotlib.cm as cm
import matplotlib.colors as colors

!conda install -c conda-forge folium=0.5.0 --yes 
import folium 

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    branca-0.4.0               |             py_0          26 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    ------------------------------------------------------------
                       

In [7]:
# The code was removed by Watson Studio for sharing.

In [8]:
df_data.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Let's merge the dataframes to get a single table

In [9]:
df_merge = pd.merge(left=df_data_0, right=df_data, left_on='PostalCode', right_on='PostalCode')
df_merge.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
1,M5N,Central Toronto,Roselawn,43.711695,-79.416936
2,M4P,Central Toronto,Davisville North,43.712751,-79.390197
3,M5P,Central Toronto,Forest Hill North & West,43.696948,-79.411307
4,M4R,Central Toronto,North Toronto West,43.715383,-79.405678


Let's make a table for the location of Boroughs (we'll use it to visualize Borough locations later)

In [60]:
df = pd.DataFrame(df_merge['Borough'].unique())
df = df.rename(columns={0:'Borough'})
df

Unnamed: 0,Borough
0,Central Toronto
1,Downtown Toronto
2,East Toronto
3,East York
4,Etobicoke
5,Mississauga
6,North York
7,Scarborough
8,West Toronto
9,York


Let's add the coordinates of Borough in the table made above

In [76]:
i = 0
la = []
lo = []
while (i<10):
    address = '{}, CA'.format(df['Borough'][i])
    geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode(address)
    lati = location.latitude
    la.append(lati)
    longi = location.longitude
    lo.append(longi)
    i+=1

In [77]:
latit = pd.DataFrame(la)
latit = latit.rename(columns={0:'Latitude'})
longit = pd.DataFrame(lo)
longit = longit.rename(columns={0:'Longitude'})

In [78]:
df_borough = pd.merge(left=df, right=latit, left_index=True, right_index=True)

In [79]:
df_borough = pd.merge(left=df_borough, right=longit, left_index=True, right_index=True)

In [99]:
df_borough

Unnamed: 0,Borough,Latitude,Longitude
0,Central Toronto,43.72802,-79.38879
1,Downtown Toronto,43.656322,-79.380916
2,East Toronto,43.62479,-79.393492
3,East York,43.699971,-79.33252
4,Etobicoke,43.671459,-79.552492
5,Mississauga,43.590338,-79.645729
6,North York,43.754326,-79.449117
7,Scarborough,43.773077,-79.257774
8,West Toronto,43.651571,-79.48445
9,York,46.088526,-66.930803


#### Let's visualize the Boroughs on the map with the neighborhoods

In [73]:
address = 'Central Toronto, Toronto, CA'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [103]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)
    
for lat, lng, borough in zip(df_borough['Latitude'], df_borough['Longitude'], df_borough['Borough']):
    label = '{}'.format(borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=15,
        popup=label,
        color='green',
        fill=True,
        fill_color='green',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  

for lat, lng, borough, neighborhood in zip(df_merge['Latitude'], df_merge['Longitude'], df_merge['Borough'], df_merge['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    

map_toronto

In [115]:
from folium import plugins

# let's start again with a clean copy of the map of San Francisco
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# instantiate a mark cluster object for the incidents in the dataframe
incidents = plugins.MarkerCluster().add_to(map_toronto)

# loop through the dataframe and add each data point to the mark cluster
for lat, lng, label, in zip(df_merge['Latitude'], df_merge['Longitude'], df_merge['Borough']):
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(incidents)

# display map
map_toronto

### Now let's get the list of venues for every neighborhood from Foursquare

storing the credentials

In [116]:
# The code was removed by Watson Studio for sharing.

Function for getting nearby venues for all neighborhoods

In [117]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [119]:
toronto_venues = getNearbyVenues(names=df_merge['Neighborhood'],
                                   latitudes=df_merge['Latitude'],
                                   longitudes=df_merge['Longitude']
                                  )

Lawrence Park
Roselawn
Davisville North
Forest Hill North & West
North Toronto West
The Annex, North Midtown, Yorkville
Davisville
Moore Park, Summerhill East
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
Stn A PO Boxes
St. James Town, Cabbagetown
First Canadian Place, Underground city
Church and Wellesley
The Beaches
The Danforth West, Riverdale
India Bazaar, The Beaches West
Studio District
Business reply mail Processing Centre
Parkview Hill, Woodbine Gardens
Woodbine Heights
Leaside
Tho

In [120]:
print(toronto_venues.shape)
toronto_venues.head()

(2133, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Lawrence Park,43.72802,-79.38879,Lawrence Park Ravine,43.726963,-79.394382,Park
1,Lawrence Park,43.72802,-79.38879,Zodiac Swim School,43.728532,-79.38286,Swim School
2,Lawrence Park,43.72802,-79.38879,TTC Bus #162 - Lawrence-Donway,43.728026,-79.382805,Bus Line
3,Roselawn,43.711695,-79.416936,Rosalind's Garden Oasis,43.712189,-79.411978,Garden
4,Davisville North,43.712751,-79.390197,Homeway Restaurant & Brunch,43.712641,-79.391557,Breakfast Spot


From the above operation we can see that we have **2133 total venues**

Now our data is finally ready for the analysis which we had planned to perform on it by making clusters of our data according to venue categories and then keeping our focus on the frequency of occurance of top venues by city as well as by various neighborhoods.
We'll do that in the **analysis section** after setting our methodology clear in the **methodology section.**

## Methodology <a name="methodology"></a>

## Analysis <a name="analysis"></a>

## Results and Discussion <a name="results"></a>

## Conclusion <a name="conclusion"></a>

In [None]:
# The code was removed by Watson Studio for sharing.