# The Battle of Neighborhoods - Week 1

## Introduction: Business Problem

### Problem Background and Description

According to 2016 census by Statistics Canada, Vancouver is one of the most ethnically and linguistically cities in Canada. With the immigration wave continuing, the need for finding and enjoying diverse cuisines is on the rise. This report explores which neighborhoods in the city of Vancouver have the most and the best Italian restaurants. Based on the analysis, we can answer these two questions "Where should I invest or open a Italian restaurant to gain a success?" and "Where should I go for great Italian food?"

### Target Audience

Stakeholders who would like to start or expand a Italian restaurant will have interest in this report. This analysis will recommend them the best neighborhood in Vancouver to open a restaurant, in order to produce more business profit. Besides, this report will help people to find great Italian food by locating the neighborhoods with highly-rated Italian cuisine.

## Data Description(rewrite)

We will use the following datasets to analysis the city of Vancouver

* In order to get the Vancouver neighborhoods data, we need to first scrape the Wikipedia page https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M and wrangle the data. After obtaining the data in a table of postal codes and their corresponding neighborhood and borough, we tranform the data into a dataframe and process the data.

* In order to utilize the Foursquare location data, we need to get the latitude and longtitude coordinates of each neighborhood. The following is a link to a csv file containing the geographical coordiantes of each postal code http://cocl.us/Geospatial_data. This completes our dataframe.

* Once we have the latitude and longtitude data, we can leverage Foursquare API to explore venues information for each neighborhood in the city of Vancouver. We are particularly interested in the venues within Italian restaurant category and their associated ratings, likes and tips.

# The Battle of Neighborhoods - Week 2

## Methodology

### Neighborhood candidates
Here we define the neighborhoods in Vancouver as a grid of cells covering most of the city of Vancouver, which is approx. 10 x 10 kilometers centred around the city of Vacnouver: Queen Elizabeth Park. We first need to get the latitude and longtitude coordinates of Queen Elizabeth Park. Here we use the **geopy** library in Python. 

In [None]:
!conda install -c conda-forge geopy --yes

In [1]:
!pip install geopy



In [2]:
from geopy.geocoders import Nominatim

def get_coordinate(address):
    try:
        geolocator = Nominatim(user_agent="van_explorer")
        location = geolocator.geocode(address)
        latitude = location.latitude
        longitude = location.longitude
        return [latitude, longitude]
    except:
        return [None, None]

address = 'Queen Elizabeth Park, Vancouver, British Columbia'
center_van = get_coordinate(address)
latitude = center_van[0]
longitude = center_van[1]
print('The geograpical coordinate of {} are {}, {}.'.format(address, latitude, longitude))

The geograpical coordinate of Queen Elizabeth Park, Vancouver, British Columbia are 49.24103355, -123.111959297168.


Now we need to create our neighborhood candidates, which are circular areas with radius approx. 250 meters centred around Queen Elizabeth Park. We will calculate the distance of neighborhood centers (approx. 500 meters) on a Cartisian 2D plane and project the coordinates of neighborhood centers to a 3D globe. Therefore, we need to define functions to transform coordinates in meters to latitude/longtitude in degrees and the reverse. Here we use **pyproj** library in Python.

In [3]:
!pip install pyproj



In [4]:
from pyproj import Proj, transform
import math

def xy_to_latlon(x, y):
    inProj = Proj(proj="utm", zone=10, datum='WGS84')
    outProj = Proj(proj="latlong", datum='WGS84')
    lon, lat = transform(inProj,outProj,x,y)
    return [lat, lon]
    
def latlon_to_xy(lat, lon):
    inProj = Proj(proj="latlong", datum='WGS84')
    outProj = Proj(proj="utm", zone=10, datum='WGS84')
    x, y = transform(inProj,outProj,lon, lat)
    return [x, y]

def cal_distance(x1,y1,x2,y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx * dx + dy * dy)

We offset every other row so that every neighborhood center is equally distsant from its neighbor centers. This leads to a **hexagonal grid of cells**.

In [5]:
van_center_x, van_center_y = latlon_to_xy(center_van[0], center_van[1]) 

x_min = van_center_x - 5000
x_num = 21
side = 500*math.sqrt(3)/2
y_num = 2*(int(5000/side)+1)-1
y_min = van_center_y - side * int(5000/side)

latitudes = []
longitudes = []
xs = []
ys = []
for i in range(0, y_num):
    y = y_min + i * side
    offset = 250 if i % 2 != 0 else 0
    for j in range(0, x_num):
        x = x_min + j * 500 + offset
        if (cal_distance(x, y, van_center_x, van_center_y) <= 5000):
            xs.append(x)
            ys.append(y)
            lat, lon = xy_to_latlon(x, y)
            latitudes.append(lat)
            longitudes.append(lon)

print(len(latitudes), 'candidate neighborhoods are generated.')        


364 candidate neighborhoods are generated.


Now let's visualize these neighborhoods using **folium** library in Python.

In [6]:
!pip install folium



In [7]:
import folium
map_van = folium.Map(location=center_van, zoom_start=13)
folium.Marker(center_van, popup = 'Queen Elizabeth Park').add_to(map_van)
for lat, lon in zip(latitudes, longitudes):
    folium.Circle([lat, lon], radius=250, color='blue').add_to(map_van)
map_van

Once we have the latitude and longitude coordinates of our neighborhood candidates, we again use the geopy library to get their addresses.

In [8]:
def get_address(latitude, longitude):
    try:
        geolocator = Nominatim(user_agent="van_explorer")
        location = geolocator.reverse('{}, {}'.format(latitude, longitude))
        return location
    except:
        return None

location = get_address(center_van[0], center_van[1])
print('Address of [{}, {}] is: {} '.format(latitude, longitude, location))

Address of [49.24103355, -123.111959297168] is: Cambie Village, Riley Park, Vancouver, Metro Vancouver Regional District, British Columbia, Canada 


In [9]:
addresses = []
for lat, lon in zip(latitudes, longitudes):
    address = get_address(lat, lon)
    if address is None:
        address='No Address'
    address = address[0].replace(', British Columbia, Canada', '')
    addresses.append(address)

Now we can put the address and their corresponding latitude and longitude into a dataframe

In [13]:
import pandas as pd
df_locations = pd.DataFrame({'Address': addresses,
                                'Latitude': latitudes,
                                'Longitude': longitudes,
                                'X': xs,
                                'Y': ys})
df_locations.head()

Unnamed: 0,Address,Latitude,Longitude,X,Y
0,"Middle Arm Bridge, Airport Road, Burkeville, R...",49.198166,-123.132452,490350.603254,5449494.0
1,"River Road, Bridgeport, Golden Village, Richmo...",49.198174,-123.125589,490850.603254,5449494.0
2,"Univar Canada Ltd, 9800, Van Horne Way, Bridge...",49.198181,-123.118726,491350.603254,5449494.0
3,"Gilmore Court, Bridgeport, East Cambie, Richmo...",49.198188,-123.111863,491850.603254,5449494.0
4,"River Drive, Bridgeport, East Cambie, Richmond...",49.198194,-123.104999,492350.603254,5449494.0


### Foursquare API

Now we can use **Foursquare API** to explore neighborhoods in the city of Vancouver.

In [14]:
CLIENT_ID = 'FKIAIY5ZSALYDTBLJ45X4GK0ADAI4T03VFJHSWLRV5NEKE0F' # my Foursquare ID
CLIENT_SECRET = 'S4X4SNBIDU4IAZAZ4TAMEST0UY2OL2H2AVN5MNOE13SWGUC0' # my Foursquare Secret
VERSION = '20180605' # Foursquare API version
import requests
def getNearbyVenues(names, latitudes, longitudes, radius=250):
    venues_list = []
    LIMIT = 100
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        
        # Create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            VERSION,
            lat,
            lng,
            radius,
            LIMIT)
        
        # Make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        #Return only relavant information for each nearby venue
        venues_list.append([(
            name,
            lat,
            lng,
            v['venue']['name'],
            v['venue']['location']['lat'],
            v['venue']['location']['lng'],
            v['venue']['categories'][0]['name']) for v in results])
        
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood',
                                'Neighborhood Latitude',
                                'Neighborhood Longitude',
                                'Venue',
                                'Venue Latitude',
                                'Venue Longitude',
                                'Venue Category']
    return (nearby_venues)

In [15]:
vancouver_venues = getNearbyVenues(names=df_locations['Address'],
                                latitudes=df_locations['Latitude'],
                                longitudes=df_locations['Longitude'])


Middle Arm Bridge, Airport Road, Burkeville, Richmond, Metro Vancouver Regional District


KeyError: 'groups'

In [16]:
#print(vancouver_venues.shape)
vancouver_venues.head()

NameError: name 'vancouver_venues' is not defined

In [14]:
vancouver_res=vancouver_venues[vancouver_venues['Venue Category'].str.contains("Restaurant")]
vancouver_jp_res=vancouver_venues[(vancouver_venues['Venue Category']=='Japanese Restaurant') 
                                 | (vancouver_venues['Venue Category']=='Sushi Restaurant')
                                 |(vancouver_venues['Venue Category']=='Ramen Restaurant')]   
print('Total number of restaurants in Vancouver:',len(vancouver_res))
print('Total number of Japanese restaurants in Vancouver:',len(vancouver_jp_res))
print('Percentage of Japanese restaurants in Vancouver: {:.2f}%'.format(len(vancouver_jp_res)/len(vancouver_res)*100))
#vancouver_res.groupby(['Venue Category']).size()

NameError: name 'vancouver_venues' is not defined

In [49]:
import folium
map_van_jp_res = folium.Map(location=center_van, zoom_start=13)
folium.Marker(center_van, popup = 'Queen Elizabeth Park').add_to(map_van_jp_res)
for lat, lon in zip(vancouver_res['Venue Latitude'], vancouver_res['Venue Longitude']):
    folium.Circle([lat, lon], radius=100, color='blue', fill=True, fill_color='blue', fill_capacity=1).add_to(map_van_jp_res)
for lat, lon in zip(vancouver_jp_res['Venue Latitude'], vancouver_jp_res['Venue Longitude']):
    folium.Circle([lat, lon], radius=100, color='red', fill=True, fill_color='red').add_to(map_van_jp_res)

map_van_jp_res

### Analysis

In [90]:
from folium import plugins
from folium.plugins import HeatMap

japanese_latlons = [lat, lon] for lat, lon in zip(vancouver_jp_res['Venue Latitude'], vancouver_jp_res['Venue Longitude'])

map_van_jp_res = folium.Map(location=center_van, zoom_start=13)
HeatMap(japanese_latlons).add_to(map_van_jp_res)

folium.Circle(center_van, radius=1000, fill=False, color='white').add_to(map_van_jp_res)
folium.Circle(center_van, radius=2000, fill=False, color='white').add_to(map_van_jp_res)
folium.Circle(center_van, radius=3000, fill=False, color='white').add_to(map_van_jp_res)

map_van_jp_res

Unnamed: 0_level_0,Number of Japanese restaurants in area
Neighborhood,Unnamed: 1_level_1
"1018, West 50th Avenue, Oakridge, Vancouver, Metro Vancouver Regional District",1
"1020, Mainland Street, Davie Village, Yaletown, Vancouver, Metro Vancouver Regional District",2
"1101, West 8th Avenue, South Granville, Fairview, Vancouver, Metro Vancouver Regional District",1
"1292, Venables Street, Grandview-Woodland, Vancouver, Metro Vancouver Regional District",1
"1462, Burrard Street, Davie Village, West End, Vancouver, Metro Vancouver Regional District",4
"1649, East 11th Avenue, Grandview-Woodland, Vancouver, Metro Vancouver Regional District",3
"1680, West 8th Avenue, South Granville, Kitsilano, Vancouver, Metro Vancouver Regional District",4
"1833, West 4th Avenue, South Granville, Kitsilano, Vancouver, Metro Vancouver Regional District",2
"1887, Crowe Street, The Village Shops, Yaletown, Vancouver, Metro Vancouver Regional District",1
"196, East King Edward Avenue, Cambie Village, Riley Park, Vancouver, Metro Vancouver Regional District",3


In [None]:
#vancouver_jp_res = vancouver_jp_res.groupby(['Neighborhood'])['Neighborhood'].count().to_frame()
vancouver_jp_res = vancouver_jp_res.rename(columns={'Neighborhood': 'Number of Japanese restaurants in area'})
df_locations = df_locations.rename(columns={'Address': 'Neighborhood'})
df_locations = df_locations.merge(vancouver_jp_res, on='Neighborhood', how='left').fillna(int(0))
df_locations['Number of Japanese restaurants in area'] = df_locations['Number of Japanese restaurants in area'].astype(int)