# This notebook will be mainly used for the capstone project

In [2]:
import numpy as np
import pandas as pd


from shapely.geometry import Point, Polygon
import geopandas as gpd #library to work with GeoJSON files
import requests # library to handle requests
from pandas.io.json import json_normalize # tranforming json file into a pandas dataframe library

import matplotlib.pyplot as plt #library for plotting
!conda install -c conda-forge folium=0.5.0 --yes
import folium # library to create interactive maps

import googlemaps #library for Google Maps Service's API

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



In [3]:
pd.set_option('display.max_columns', None)

In [4]:
print('Hello Capstone Project Course!')

Hello Capstone Project Course!


<h1> Peer-graded Assignment: Capstone Project - The Battle of Neighborhoods (Week 1) </h1>

# Introduction/Business Problem

I’d like to move to another neighbourhood in Vancouver, BC. One important decision making criteria is the walking distance to all the places that I frequent most. However, I don't have an overview of which areas meet this need better than others.
  
The distances between the block that I live on and the following locations are relevant to me:
    - beach
    - school
    - coffee shop
    - restaurant
    - park

# Data  
The following is an explanation and discussion of the data that will be used (including examples).

### Geolocation of all of Vancouver, BC's blocks  
The city of Vancouver publishes all its block numberes and corresponding location data on its Open Data Portal: <a href='https://opendata.vancouver.ca/explore/dataset/block-numbers/information/?location=14,49.2706,-123.13172'>link</a>.

**Loading the data & checking the quality:**

In [5]:
bnr = gpd.read_file('data/block-numbers.geojson')

DriverError: data/block-numbers.geojson: No such file or directory

In [None]:
bnr.head(3)

In [None]:
bnr.isnull()['geometry'].value_counts()

**Converting the unit of the Coordinate Reference System (CRS) to metres so that we can calculate distances:**

In [None]:
#Initiating CRS
bnr.crs

In [None]:
#Transforming the CRS to a system that uses meters as a unit of measurement
bnr_metres = bnr.to_crs(epsg=3153)

In [None]:
#Visualizing the Points
bnr_metres.plot(figsize=(10, 10))

**Foursquare**  
The Foursquare API provides information on locations of interest (beaches, schools, coffee shops, restaurants, grocery shops) 

In [None]:
CLIENT_ID = 'OJUT1X3N551XHV3QJEO2DHG4D4HJPGH1OXE32VZFJQCNNSDY' # my Foursquare ID
CLIENT_SECRET = '3LTK240VBKNGYARSYYFM3ZVTHRXSTF0344QJDABG2MKWNTP5' # my Foursquare Secret
VERSION = '20191112'
LIMIT = 1000
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

In [None]:
#Location data of a block & search parameters
latitude = pd.Series(bnr['geometry'])[0].y
longitude = pd.Series(bnr['geometry'])[0].x

search_query = 'Coffee Shop'
radius = 15000

In [None]:
# create URL
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    latitude, 
    longitude, 
    VERSION, 
    search_query, 
    radius, 
    LIMIT
)

url # display URL

In [None]:
#Sending the GET request and saving the resutls
results = requests.get(url).json()
results

In [None]:
# Assign relevant part of JSON to venues
venues = results['response']['venues']
# Tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.shape

In [None]:
# Keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# Clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

# Function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# Filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

In [None]:
dataframe_filtered[dataframe_filtered['categories']==search_query].head(3)

In [None]:
#Convert lan & lng into a Series with Point elements
point_ser = pd.Series([Point(x) for x in zip(dataframe_filtered['lng'], dataframe_filtered['lat'])])

In [None]:
#Add Point column
dataframe_filtered['geometry'] = point_ser
dataframe_filtered.head(3)

In [None]:
#Convert to GeoDataFrame
geo_df = gpd.GeoDataFrame(dataframe_filtered)
#Initialize the Coordinate Reference System
geo_df.crs = {'init' :'epsg:4326'}
geo_df.head(3)

**Converting the unit of the Coordinate Reference System (CRS) from degrees to metres so that we can interpret distances (<a href='https://epsg.io/3153'>link</a>):**

In [None]:
geo_df.to_crs(epsg=3153, inplace=True)

**Calcuating the distance between a venue of interest and a specific block:**

In [None]:
#Example: Find the distance between the first block and the first result of the search query
distance = bnr_metres['geometry'][0].distance(geo_df['geometry'][0])
print(f"The {str.lower(search_query)} is {int(distance)} metres away from the block.")

<h1> Peer-graded Assignment: Capstone Project - The Battle of Neighborhoods (Week 2) </h1>

**Dataframe used for segmentation:**  
Blocks, distance to closest Beach, distance to closest School, ... 

**Approach for gathering all data required:**

1) Identify the location of all venues of interest  
1.1) Identify one location point per neighborhood  
1.2) Request all "Coffee Shop", "Restaurant", "School", "Beach", "Park" around that location (radius=5 km, limit=max)  
1.3) Merge with results of other neighborhoods into one DataFrame and remove duplicates  
  
2) For each block in Vancouver, identify the nearest venue per category  
2.1) For each block in Vancouver, calculate the distance to venue type 1, 2, 3...  
2.2) Identify the closest venue per venue type  

**1.1) Identifying one location point (centroid) per neighborhood**

In [None]:
area_names = bnr_metres['geo_local_area'].unique()
area_names

In [None]:
#Return Series containing all blocks of a local area
def block_ser(i):
    return bnr_metres[bnr_metres['geo_local_area'] == area_names[i]]

In [None]:
def calc_centroid(i):    
    #Bounds =  minx, miny, maxx, maxy values
    bounds = block_ser(i)['geometry'].total_bounds
    #Calculate centroid
    return Point((bounds[2]+bounds[0])/2, (bounds[3]+bounds[1])/2)

In [None]:
calc_centroid(0)

**1.2) Requesting all "Coffee Shop", "Restaurant", "School", "Beach", "Park" around that location (radius=5 km, limit=max)**