# The Battle of Neighborhoods - Week 1

## Introduction: Business Problem

### Problem Background and Description

According to 2016 census by Statistics Canada, Vancouver is one of the most ethnical and linguistical cities in Canada. With the immigration wave continuing, the need for finding and enjoying diverse cuisines is on the rise. This report explores the optimal location in the city of Vancouver for opening or investing a Japanese restaurant. We will define the specific conditions for an optimal location in the following Data Description section.

### Target Audience

Stakeholders who would like to start or expand a Japanese restaurant will have interest in this report. This analysis will recommend them an optimal neighborhood in Vancouver to open a restaurant, in order to produce more business profit. Besides, this report will help people to find neighborhoods with high Japanese restaurants density.

## Data Description

We will use the following datasets to analysis the city of Vancouver

* **Neighborhood candidates** 

We first use the coordinates of the city center, to generate a group of cells, covering most of the Vancouver city (a circular area with radius approx. 5 kilometers centred around the city center). These cells are circles with radius 250 meters, named as neighborhood candidates.

* **Foursquare API**

We then utilize Foursquare API to explore neighborhoods data. In order to use the Foursquare location data, we need to get the latitude and longtitude coordinates of each neighborhood. Once we have the latitude and longtitude data, we can leverage Foursquare API to explore venues information for each neighborhood in the city of Vancouver. Some features extracted include 'Venue', 'Venue Category', 'Venue Latitude', 'Venue Longitude', etc. We are particularly interested in the venues within Japanese restaurant category. We are particularly interested in the venues within Japanese restaurant category.

* **Analysis**

We explore nearby venues of each neighborhood centers and focus on Japanese restaurants. We care about two properties of the neighborhood information: the number of Japanese restaurants in the area and the closest distance from the neighborhood center to any Japanese restaurant. Finally, we will recommend locations with no more than one Japanese restaurant in neiborhood area (approx. 250 meters) and no Japanese restaurant within 400 meters' distance from the neighborhood centers. Besides, these locations are within 3 kilometers' distance from the city center.

# The Battle of Neighborhoods - Week 2

## Methodology

### Neighborhood candidates
Here we define the neighborhoods in Vancouver as a grid of cells covering most of the city of Vancouver, which is a circular area with radius approx. 5 kilometers centred around the city of Vacnouver: **Queen Elizabeth Park**. We first need to get the latitude and longtitude coordinates of Queen Elizabeth Park. Here we use the **geopy** library in Python. 

In [1]:
!pip install geopy

[33mDEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support[0m


In [28]:
from geopy.geocoders import Nominatim

def get_coordinate(address):
    try:
        geolocator = Nominatim(user_agent="van_explorer")
        location = geolocator.geocode(address)
        latitude = location.latitude
        longitude = location.longitude
        return [latitude, longitude]
    except:
        return [None, None]

address = 'Queen Elizabeth Park, Vancouver, British Columbia'
center_van = get_coordinate(address)
latitude = center_van[0]
longitude = center_van[1]
print('The geograpical coordinate of {} are {}, {}.'.format(address, latitude, longitude))

The geograpical coordinate of Queen Elizabeth Park, Vancouver, British Columbia are 49.24103355, -123.111959297.


Now we need to create our neighborhood candidates, which are circular areas with radius approx. 250 meters centred around Queen Elizabeth Park. We will calculate the distance of neighborhood centers (approx. 500 meters) on a Cartisian 2D plane and project the coordinates of neighborhood centers to a 3D globe. Therefore, we need to define functions to transform coordinates in meters to latitude/longtitude in degrees and the reverse. Here we use **pyproj** library in Python.

In [3]:
!pip install pyproj

[33mDEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support[0m


In [29]:
from pyproj import Proj, transform
import math

def xy_to_latlon(x, y):
    inProj = Proj(proj="utm", zone=10, datum='WGS84')
    outProj = Proj(proj="latlong", datum='WGS84')
    lon, lat = transform(inProj,outProj,x,y)
    return [lat, lon]
    
def latlon_to_xy(lat, lon):
    inProj = Proj(proj="latlong", datum='WGS84')
    outProj = Proj(proj="utm", zone=10, datum='WGS84')
    x, y = transform(inProj,outProj,lon, lat)
    return [x, y]

def cal_distance(x1,y1,x2,y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx * dx + dy * dy)

We offset every other row so that every neighborhood center is equally distsant from its neighbor centers. This leads to a **hexagonal grid of cells**.

In [30]:
van_center_x, van_center_y = latlon_to_xy(center_van[0], center_van[1]) 

x_min = van_center_x - 5000
x_num = 21
side = 500*math.sqrt(3)/2
y_num = 2*(int(5000/side)+1)-1
y_min = van_center_y - side * int(5000/side)

latitudes = []
longitudes = []
xs = []
ys = []
for i in range(0, y_num):
    y = y_min + i * side
    offset = 250 if i % 2 != 0 else 0
    for j in range(0, x_num):
        x = x_min + j * 500 + offset
        if (cal_distance(x, y, van_center_x, van_center_y) <= 5000):
            xs.append(x)
            ys.append(y)
            lat, lon = xy_to_latlon(x, y)
            latitudes.append(lat)
            longitudes.append(lon)

print(len(latitudes), 'candidate neighborhoods are generated.')        


(364, 'candidate neighborhoods are generated.')


Now let's visualize these neighborhoods using **folium** library in Python.

In [6]:
!pip install folium

[33mDEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support[0m


In [7]:
import folium
map_van = folium.Map(location=center_van, zoom_start=13)
folium.Marker(center_van, popup = 'Queen Elizabeth Park').add_to(map_van)
for lat, lon in zip(latitudes, longitudes):
    folium.Circle([lat, lon], radius=250, color='blue').add_to(map_van)
map_van



Once we have the latitude and longitude coordinates of our neighborhood candidates, we again use the geopy library to get their addresses.

In [31]:
def get_address(latitude, longitude):
    try:
        geolocator = Nominatim(user_agent="van_explorer")
        location = geolocator.reverse('{}, {}'.format(latitude, longitude))
        return location
    except:
        return None

location = get_address(center_van[0], center_van[1])
print('Address of [{}, {}] is: {} '.format(latitude, longitude, location))

Address of [49.24103355, -123.111959297] is: Cambie Village, Riley Park, Vancouver, Metro Vancouver Regional District, British Columbia, Canada 


In [32]:
addresses = []
for lat, lon in zip(latitudes, longitudes):
    address = get_address(lat, lon)
    if address is None:
        address='No Address'
    address = address[0].replace(', British Columbia, Canada', '')
    addresses.append(address)

Now we can put the address and their corresponding latitude and longitude into a dataframe

In [33]:
import pandas as pd
df_locations = pd.DataFrame({'Address': addresses,
                                'Latitude': latitudes,
                                'Longitude': longitudes,
                                'X': xs,
                                'Y': ys})
df_locations.head()

Unnamed: 0,Address,Latitude,Longitude,X,Y
0,"Middle Arm Bridge, Airport Road, Burkeville, R...",49.198166,-123.132452,490350.603254,5449494.0
1,"River Road, Bridgeport, Golden Village, Richmo...",49.198174,-123.125589,490850.603254,5449494.0
2,"Univar Canada Ltd, 9800, Van Horne Way, Bridge...",49.198181,-123.118726,491350.603254,5449494.0
3,"Gilmore Court, Bridgeport, East Cambie, Richmo...",49.198188,-123.111863,491850.603254,5449494.0
4,"River Drive, Bridgeport, East Cambie, Richmond...",49.198194,-123.104999,492350.603254,5449494.0


### Foursquare API

Now we can use **Foursquare API** to explore neighborhoods in the city of Vancouver.

In [34]:
CLIENT_ID = 'X1STVM304PYWZLQUWRS5RTS5PTIL3CTOGRJT4IT3EOYK305M' # my Foursquare ID
CLIENT_SECRET = 'NB1XT1CLNSP31V4TDCJ2FNR5I5NS15ETYX5VL0EONPQ0IMDC' # my Foursquare Secret
VERSION = '20180605' # Foursquare API version
import requests
def getNearbyVenues(names, latitudes, longitudes, radius=250):
    venues_list = []
    LIMIT = 100
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        
        # Create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            VERSION,
            lat,
            lng,
            radius,
            LIMIT)
        
        # Make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        #Return only relavant information for each nearby venue
        venues_list.append([(
            name,
            lat,
            lng,
            v['venue']['name'],
            v['venue']['location']['lat'],
            v['venue']['location']['lng'],
            v['venue']['categories'][0]['name']) for v in results])
        
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood',
                                'Neighborhood Latitude',
                                'Neighborhood Longitude',
                                'Venue',
                                'Venue Latitude',
                                'Venue Longitude',
                                'Venue Category']
    return (nearby_venues)

Apply the above function to get the neighborhood information of each candidate

In [None]:
vancouver_venues = getNearbyVenues(names=df_locations['Address'],
                                latitudes=df_locations['Latitude'],
                                longitudes=df_locations['Longitude'])


Filter our venues within restaurant category and especially in **Japanese Restaurant**. By checking the Foursquare website https://developer.foursquare.com/docs/resources/categories, we found venues with category **Sushi Restaurant** or **Ramen Restaurant** are also categorized as Japanese restaurant.

In [36]:
vancouver_res=vancouver_venues[vancouver_venues['Venue Category'].str.contains("Restaurant")]
vancouver_jp_res=vancouver_venues[(vancouver_venues['Venue Category']=='Japanese Restaurant') 
                                 | (vancouver_venues['Venue Category']=='Sushi Restaurant')
                                 |(vancouver_venues['Venue Category']=='Ramen Restaurant')]   
print('Total number of restaurants in Vancouver:',len(vancouver_res))
print('Total number of Japanese restaurants in Vancouver:',len(vancouver_jp_res))
print('Percentage of Japanese restaurants in Vancouver: {:.2f}%'.format(len(vancouver_jp_res)/len(vancouver_res)*100))
#vancouver_res.groupby(['Venue Category']).size()

('Total number of restaurants in Vancouver:', 617)
('Total number of Japanese restaurants in Vancouver:', 119)
Percentage of Japanese restaurants in Vancouver: 0.00%


The above result implies 19.28% of the restaurants in Vancouver are Japanese restaurants. Now let's use **folium** library to visulize the restaurants and Japanese restaurants distribution in Vancouver.

In [15]:
import folium
map_van_jp_res = folium.Map(location=center_van, zoom_start=13)
folium.Marker(center_van, popup = 'Queen Elizabeth Park').add_to(map_van_jp_res)
for lat, lon in zip(vancouver_res['Venue Latitude'], vancouver_res['Venue Longitude']):
    folium.Circle([lat, lon], radius=100, color='blue', fill=True, fill_color='blue', fill_capacity=1).add_to(map_van_jp_res)
for lat, lon in zip(vancouver_jp_res['Venue Latitude'], vancouver_jp_res['Venue Longitude']):
    folium.Circle([lat, lon], radius=100, color='red', fill=True, fill_color='red').add_to(map_van_jp_res)

map_van_jp_res

As shown above, many restaurants are clustering in neighborhoods **Downtown** and **Kitsilano**. Some popular routes include **West Broadway**, **Granville Street**, **Kingsway**, **Main Street**, **Cambie Street** and **Commercial Drive**. Another intersting fact is many restaurants cluster in the intersection of streets. A typical example is the intersection in **Kerrisdale** neighborhood.

### Analysis

We further use heatmap to visualize the density of restaurants in Vancouver. Since we are intrersted in locations near the city center, we also show 1, 2, 3 kilometers boundary in this heatmap.

In [17]:
from folium import plugins
from folium.plugins import HeatMap

res_latlons = [[lat, lon] for lat, lon in zip(vancouver_res['Venue Latitude'], vancouver_res['Venue Longitude'])]

map_van_res = folium.Map(location=center_van, zoom_start=13)
HeatMap(res_latlons).add_to(map_van_res)

folium.Circle(center_van, radius=1000, fill=False, color='white').add_to(map_van_res)
folium.Circle(center_van, radius=2000, fill=False, color='white').add_to(map_van_res)
folium.Circle(center_van, radius=3000, fill=False, color='white').add_to(map_van_res)

map_van_res

As we can see, within 3 kilometers distance from the Queen Elizabeth Park, the majority of restaurants are clustering on the north side of the city center, the **north-west** and **south-west** areas have very low restaurant density. Now let's look more specifically into the Japanese restaurants density in Vancouver.

In [19]:
japanese_latlons = [[lat, lon] for lat, lon in zip(vancouver_jp_res['Venue Latitude'], vancouver_jp_res['Venue Longitude'])]

map_van_jp_res = folium.Map(location=center_van, zoom_start=13)
HeatMap(japanese_latlons).add_to(map_van_jp_res)

folium.Circle(center_van, radius=1000, fill=False, color='white').add_to(map_van_jp_res)
folium.Circle(center_van, radius=2000, fill=False, color='white').add_to(map_van_jp_res)
folium.Circle(center_van, radius=3000, fill=False, color='white').add_to(map_van_jp_res)

map_van_jp_res

Similarly, the north side has very high Japanese restaurant density, and the **north-west**, **north-east** and **south** sides have relatively low Japanese restaurant density. To gain more insights in the neighborhood data, we then calculate the **number of Japanese restaurants** in each neighborhood and the **closest distance** from the neighborhood center to any Japanese restaurant.

In [37]:
vancouver_res_new = vancouver_res.groupby(['Neighborhood'])['Neighborhood'].count().to_frame()
vancouver_res_new = vancouver_res_new.rename(columns={'Neighborhood': 'Number of restaurants in area'})
df_locations = df_locations.rename(columns={'Address': 'Neighborhood'})
df_locations = df_locations.merge(vancouver_res_new, on='Neighborhood', how='left').fillna(int(0))
df_locations['Number of restaurants in area'] = df_locations['Number of restaurants in area'].astype(int)

In [38]:
def get_nearest_res(x, y):
    d_min = 10000
    for lat, lon in zip(vancouver_jp_res['Venue Latitude'], vancouver_jp_res['Venue Longitude']):
        x1, y1 = latlon_to_xy(lat, lon)
        d = cal_distance(x, y, x1, y1)
        if d < d_min:
            d_min = d
    return d_min

center_distance = []
for x, y in zip(df_locations['X'], df_locations['Y']):
    center_distance.append(cal_distance(x, y, van_center_x, van_center_y))

df_locations['Distance to city center'] = center_distance
df_locations_3km = df_locations[df_locations['Distance to city center']<=3000]

res_distance = []
for x, y in zip(df_locations_3km['X'], df_locations_3km['Y']):
    res_distance.append(get_nearest_res(x, y))
df_locations_3km['Distance to closest Japanese restaurant'] = res_distance

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Now let's gather all the information of the neighborhoods which are 3 kilometers distant from the city center in a dataframe.

In [39]:
df_locations_3km 

Unnamed: 0,Neighborhood,Latitude,Longitude,X,Y,Number of restaurants in area,Distance to city center,Distance to closest Japanese restaurant
65,"Oak Street, Marpole, Vancouver, Metro Vancouve...",49.217645,-123.129071,490600.603254,5.451659e+06,0,2883.140649,436.472607
66,"West 59th Avenue, Marpole, Vancouver, Metro Va...",49.217653,-123.122205,491100.603254,5.451659e+06,0,2704.163457,446.399993
67,"West 58th Avenue, Oakridge, Vancouver, Metro V...",49.217660,-123.115339,491600.603254,5.451659e+06,0,2610.076627,841.449684
68,"J.W. Sexsmith Elementary School, 7410, Columbi...",49.217667,-123.108474,492100.603254,5.451659e+06,0,2610.076627,820.601047
69,"East 58th Avenue, Sunset, Vancouver, Metro Van...",49.217673,-123.101608,492600.603254,5.451659e+06,2,2704.163457,697.946445
70,"East 58th Avenue, Sunset, Vancouver, Metro Van...",49.217679,-123.094742,493100.603254,5.451659e+06,2,2883.140649,894.914745
82,"1498, West 54th Avenue, Oakridge, Vancouver, M...",49.221528,-123.139381,489850.603254,5.452092e+06,0,2947.456531,972.086457
83,"West 54th Avenue, Oakridge, Vancouver, Metro V...",49.221536,-123.132515,490350.603254,5.452092e+06,1,2633.913438,496.094314
84,"West 54th Avenue, Oakridge, Vancouver, Metro V...",49.221544,-123.125648,490850.603254,5.452092e+06,1,2384.848004,69.815365
85,"Ash Crescent, Oakridge, Vancouver, Metro Vanco...",49.221551,-123.118782,491350.603254,5.452092e+06,0,2222.048604,513.457693


Recall that our two criterias for optimal locations are:
* No more than two restaurants within 250 meters radius;
* No Japanese restaurants within 400 meters radius.

In [42]:
import numpy as np

good_location_count = np.array(df_locations_3km['Number of restaurants in area']<=2)
print('Number of locations with no more than two restaurants within 250m:', good_location_count.sum())

good_location_dis = np.array(df_locations_3km['Distance to closest Japanese restaurant']>=400)
print('Number of locations with no Japanese restaurants within 400m:', good_location_dis.sum())

good_location_both = np.logical_and(good_location_count, good_location_dis)
print('Number of locations satisfy the above two conditions:', good_location_both.sum())


('Number of locations with no more than two restaurants within 250m:', 104)
('Number of locations with no Japanese restaurants within 400m:', 81)
('Number of locations satisfy the above two conditions:', 78)


Now let's visualize those locations satisying our criterias

In [43]:
df_good_location = df_locations_3km[good_location_both]
map_van_filter = folium.Map(location=center_van, zoom_start=13)
folium.Marker(center_van, popup = 'Queen Elizabeth Park').add_to(map_van_filter)
for lat, lon in zip(df_good_location['Latitude'], df_good_location['Longitude']):
    folium.Circle([lat, lon], radius=50, color='blue', fill=True, fill_color='blue', fill_capacity=10).add_to(map_van_filter)
map_van_filter

There are 78 locations of our interest. Next let's use KMeans algorithm to cluster these locations and obtain our final suggestions.

In [44]:
from sklearn.cluster import KMeans
kclusters = 10
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(df_good_location[['X','Y']].values)
df_good_location['cluster'] = kmeans.labels_

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.


In [45]:
cluster_centers = [xy_to_latlon(val[0], val[1]) for val in kmeans.cluster_centers_]

for lat, lon in cluster_centers:
    folium.Circle([lat, lon], radius=1000, color='blue', fill=True, fill_color='blue', fill_capacity=10).add_to(map_van_filter)

map_van_filter

We end up with 10 cluster centers, which will be our final suggestions for optimal locations of opening or investing Japanese restaurants. Now let's obtain the addresses of these cluster centers from their latitude and longitude coordinates.

In [46]:
for lat, lon in cluster_centers:
    print(get_address(lat, lon))

West 26th Avenue, Cambie Village, Riley Park, Vancouver, Metro Vancouver Regional District, British Columbia, Canada
East 40th Avenue, South Hill, Vancouver, Metro Vancouver Regional District, British Columbia, Canada
West 52nd Avenue, Oakridge, Vancouver, Metro Vancouver Regional District, British Columbia, Canada
1498, Granville Street, South Granville, Shaughnessy, Vancouver, Metro Vancouver Regional District, British Columbia, Canada
Langara Golf Course, Langara Golf Course Access, Oakridge, Vancouver, Metro Vancouver Regional District, British Columbia, Canada
East 23rd Avenue, Kensington-Cedar Cottage, Vancouver, Metro Vancouver Regional District, British Columbia, Canada
Willow Street, Cambie Village, South Cambie, Vancouver, Metro Vancouver Regional District, British Columbia, Canada
East 53rd Avenue, Sunset, Vancouver, Metro Vancouver Regional District, British Columbia, Canada
Sherbrooke Street, South Hill, Vancouver, Metro Vancouver Regional District, British Columbia, Canad

## Results and Conclusion

In summary, 19.28% restaurants in the city of Vancouver are Japanese restaurants, including sushi restaurants and ramen restaurants. The majority of them are clustering in the north side of the city center: Queen Elizabeth Park. Downtown and Kitsilano are two popular neighborhoods, and some popular routes include West Broadway, Granville Street, Kingsway, Main Street, Cambie Street and Commercial Drive. 

We also provide ten optimal locations as suggestion for stakeholders who are interested in opening or investing a Japanese restaurant. These locations satisfy two conditions: there are no more than two restaurants in radius of 250 meters and there are no Japanese restaurants in radius of 400 meters. Plus, they are all within 3 kilometers distant from the city center.

## References

The idea of choosing neighborhood candidates and the criterias to choose optimal locations are from 
https://cocl.us/coursera_capstone_notebook.