# Capstone Notebook

This notebook will contain all work for the Applied Data Science Capstone course

In [8]:
import pandas as pd
import numpy as np
import requests
import random

!pip install geopy
from geopy.geocoders import Nominatim

from pandas import json_normalize

!pip install folium
import folium



In [None]:
print("Hello Capstone Project Course!")

Hello Capstone Project Course!


## Introduction/Business Problem
If you are looking to open a Cafe in Canada and sell coffee as an entrepreneur, you will be facing tough competition in a potentially saturated market. Therefore a key problem to solve is:

**Where is the best place to open up a Cafe?**

Solving this problem gives us information and prevents opening a Cafe in a non-saturated area may provide a new business the best opportunity to thrive without competition. This is a very difficult question to solve and another issue arises:

**How do we measure/determine which is the best place to start a Cafe?**

Data science and exploration of location data will be key to finding the solution.

## Data
To solve the problem, we will need the following data:


*   List of neighbourhoods in Canada. This defines the scope of the project to the city of Sydney. This data can be sourced from webscraping Wikipedia pages and or some location data service.
*   Latitude and longitude coordinates of these neighbourhoods for visualisation, clustering and other purposes. This data can be sourced from location data services.
*   Venue data for finding saturated neighbourhoods of Cafes. This data can be sourced from Foursquare's API.

The culmination of this data will provide insights into saturation of venues in specific neighbourhoods and potential vacancies in locations.



# Methodology
### List of neighbourhoods in Canada


In [9]:
df=pd.read_html("https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&oldid=945633050")[0]

In [10]:
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


In [11]:
df = df[df.Borough != 'Not assigned']
df['Neighbourhood']=df['Neighbourhood'].replace('Not assigned', df['Borough'])
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor


Geospatial data:

In [None]:
!wget -q -O 'geospatial_data.csv' http://cocl.us/Geospatial_data
print('Data downloaded!')

In [None]:
df_geo = pd.read_csv('geospatial_data.csv')
df_geo.columns

In [None]:
df_geo = df_geo.rename({'Postal Code':'Postcode'}, axis=1)

In [None]:
df2 = pd.merge(df_grouped,df_geo)

In [None]:
df2.head()

### Venue Data


In [None]:
# Foursquare Credentials
CLIENT_ID = 'FF0YXBOX3Y2E0QICY4DW3LTM5IP0CEL3EHIFRMQZBVZU0UK0' # your Foursquare ID
CLIENT_SECRET = 'YJO42BH315TOCCT3LODT1CMB20YLUNO14OMPQ2JBNE5CEAQN' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

In [None]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            100)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    print('Found {} venues in {} neighborhoods.'.format(nearby_venues.shape[0], len(venues_list)))
    
    return(nearby_venues)