# Capstone Project - The Battle of Neighborhoods

## Introduction/Business Problem

Introduction where you discuss the business problem and who would be interested in this project.

#### "Would you recommend a location in Hong Kong to open a new cinema?"  
My boss, the stakeholder wants to **open a new cinema as company's new business** and ask me this question.
  
He explains that in customer point of view, watching movie is a part of whole afternoon or night activities. Cinema should has **many restaurants and shopping places nearby**. Transportation is also an important factor. Customer can walk to cinema within **5 minutes** from **public transport facilities** such as bus stop and metro station.  
  
He wants me to concentrate on selection of cinema location according to its nearby environment. Cinema facility and rental price is not my concern. He lists out his **top 10 favorite cinemas** in Hong Kong with rating.  

I work with my teammates and select **5 possible locations** to build the cinema. Which location should be suggested to the stakeholder?

## Data

Data where you describe the data that will be used to solve the problem and the source of the data.

According to the question, I need to find following data to resolve the problem.

#### 1. Geographic coordinate of Hong Kong cinemas

I need to **compare 5 possible locations with current cinemas** in Hong Kong. Therefore, I need to find a list of Hong Kong cinema and cinemas' geographic coordinates. Luckily, I can find the list and coordinates from the website https://hkmovie6.com/cinema .

In [1]:
# Import necessary library
import json
import pandas as pd

In [2]:
# Download the cinema list
!wget -O hk_cinema_list.json https://hkmovie6.com/api/cinemas/lists

--2018-09-27 01:29:16--  https://hkmovie6.com/api/cinemas/lists
Resolving hkmovie6.com (hkmovie6.com)... 104.27.132.135, 104.27.133.135, 2606:4700:30::681b:8487, ...
Connecting to hkmovie6.com (hkmovie6.com)|104.27.132.135|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/json]
Saving to: ‘hk_cinema_list.json’

hk_cinema_list.json     [ <=>                  ]  51.74K   291KB/s   in 0.2s   

2018-09-27 01:29:17 (291 KB/s) - ‘hk_cinema_list.json’ saved [52984]



In [3]:
# Convert the JSON data into DataFrmae
cinemas_json = None
with open('hk_cinema_list.json', 'r') as f:
    cinemas_json = json.load(f)
    
cinemas = []
for data in cinemas_json['data']:
    cinemas.append({
        'Name': data['name'],
        'Address': data['address'],
        'Latitude': data['lat'],
        'Longitude': data['lon']
    })
df_cinemas = pd.DataFrame(cinemas, columns=['Name','Address','Latitude','Longitude'])

In [4]:
print('There are {} cinemas in Hong Kong'.format(len(df_cinemas)))

There are 68 cinemas in Hong Kong


In [5]:
df_cinemas.head()

Unnamed: 0,Name,Address,Latitude,Longitude
0,Emperor Cinemas - Entertainment Building,"3/F, Emperor Cinemas Entertainment Building, 3...",22.281453,114.15423
1,The Coronet @ Emperor Cinemas - Entertainment ...,"3/F, Emperor Cinemas Entertainment Building, 3...",22.281453,114.15423
2,Emperor Cinemas - Tuen Mun,"3/F, New Town Commercial Arcade, 2 Tuen Lee St...",22.390776,113.975983
3,Broadway Circuit - CYBERPORT,"Shop L1 - 3, Level 1, The Arcade, 100 Cyberpor...",22.261067,114.129825
4,Broadway Circuit - PALACE IFC,"Podium L1, IFC Mall, 8 Finance Street, Central",22.285545,114.157979


#### 2. Geographic coordinates of 5 possible cinema addresses
I also need to know the geographic coordinates of 5 possible cinemas. I can use Google Map API to find this information

In [14]:
possible_locations = [
    { 'Location': 'L1', 'Address': 'Sau Mau Ping Shopping Centre, Sau Mau Ping'},
    { 'Location': 'L2', 'Address': 'Tuen Mun Ferry, Tuen Mun'},
    { 'Location': 'L3', 'Address': 'Un Chau Shopping Centre, Cheung Sha Wan'},
    { 'Location': 'L4', 'Address': 'Prosperity Millennia Plaza, North Point'},
    { 'Location': 'L5', 'Address': 'Tsuen Fung Centre Shopping Arcade, Tsuen Wan'},
]

In [7]:
# install the google map api client library
!pip install -U googlemaps

Collecting googlemaps
  Downloading https://files.pythonhosted.org/packages/5a/3d/13b4230f3c1b8a586cdc8d8179f3c6af771c11247f8de9c166d1ab37f51d/googlemaps-3.0.2.tar.gz
Requirement not upgraded as not directly required: requests<3.0,>=2.11.1 in /home/jupyterlab/conda/lib/python3.6/site-packages (from googlemaps) (2.18.4)
Requirement not upgraded as not directly required: chardet<3.1.0,>=3.0.2 in /home/jupyterlab/conda/lib/python3.6/site-packages (from requests<3.0,>=2.11.1->googlemaps) (3.0.4)
Requirement not upgraded as not directly required: idna<2.7,>=2.5 in /home/jupyterlab/conda/lib/python3.6/site-packages (from requests<3.0,>=2.11.1->googlemaps) (2.6)
Requirement not upgraded as not directly required: urllib3<1.23,>=1.21.1 in /home/jupyterlab/conda/lib/python3.6/site-packages (from requests<3.0,>=2.11.1->googlemaps) (1.22)
Requirement not upgraded as not directly required: certifi>=2017.4.17 in /home/jupyterlab/conda/lib/python3.6/site-packages (from requests<3.0,>=2.11.1->googlema

In [9]:
google_act = None
with open('google_map_act.json', 'r') as f:
    google_act = json.load(f)
    
GOOGLE_MAP_API_KEY = google_act['api_key']    

import googlemaps
gmaps = googlemaps.Client(key=GOOGLE_MAP_API_KEY)

In [10]:
# Retrieve geolocation and create the dataframe of pending cinema addresses
def getLatLng(address):
    latlnt = gmaps.geocode('{}, Hong Kong'.format(address))
    return (latlnt[0]['geometry']['location']['lat'], latlnt[0]['geometry']['location']['lng'])

Dataframe of 5 possible locations with geographic coordinates information

In [15]:
for loc in possible_locations:        
    (lat, lng) = getLatLng(loc['Address'])
    loc['Latitude'] = lat
    loc['Longitude'] = lng
    
df_possible_locations = pd.DataFrame(possible_locations, columns=['Location', 'Address', 'Latitude', 'Longitude'])
df_possible_locations

Unnamed: 0,Location,Address,Latitude,Longitude
0,L1,"Sau Mau Ping Shopping Centre, Sau Mau Ping",22.319503,114.232187
1,L2,"Tuen Mun Ferry, Tuen Mun",22.37178,113.966039
2,L3,"Un Chau Shopping Centre, Cheung Sha Wan",22.33728,114.156457
3,L4,"Prosperity Millennia Plaza, North Point",22.291698,114.208168
4,L5,"Tsuen Fung Centre Shopping Arcade, Tsuen Wan",22.372112,114.119317


#### 3. Favorite cinema list of stakeholder

The favorite cinema list is an important information that I can **use it as profile to select the best location**.  
Stakeholder further explains that the rating is range of 1.0 (worst) to 5.0 (best) values

In [16]:
boss_favorite = [
    {'Name': 'Boradway Circuit - MONGKONG', 'Rating': 4.5},
    {'Name': 'Boradway Circuit - The ONE', 'Rating': 4.5},
    {'Name': 'Grand Ocean', 'Rating': 4.3},
    {'Name': 'The Grand Cinema', 'Rating': 3.4},
    {'Name': 'AMC Pacific Place', 'Rating': 2.3},
    {'Name': 'UA IMAX @ Airport', 'Rating': 1.5},
]

df_boss_favorite = pd.DataFrame(boss_favorite, columns=['Name','Rating'])
df_boss_favorite

Unnamed: 0,Name,Rating
0,Boradway Circuit - MONGKONG,4.5
1,Boradway Circuit - The ONE,4.5
2,Grand Ocean,4.3
3,The Grand Cinema,3.4
4,AMC Pacific Place,2.3
5,UA IMAX @ Airport,1.5


#### 4. Eating, Shopping and Public transportation facility around cinema
The recommended cinema location needs to have many eating and shopping venues nearby. Convenient public transport is also required.  
I can use FourSquare API to find these venues around the location. 

5 minutes walking distance is about 500m. I think it is the suitable distance to search nearby venues.

However, the API provides maximum 50 results only, so it is better to search venues by category. Following categories will be used for finding the target venues. Full list of categories: https://developer.foursquare.com/docs/resources/categories

In [18]:
cinema = df_cinemas.loc[0]

In [20]:
print('Use the first cinema "{}" in the list as example to explore venues nearyby'.format(cinema['Name']))

Use the first cinema "Emperor Cinemas - Entertainment Building" in the list as example to explore venues nearyby


In [30]:
fs_categories = {
    'Food': '4d4b7105d754a06374d81259',
    'Shop & Service': '4d4b7105d754a06378d81259',
    'Bus Stop': '52f2ab2ebcbc57f1066b8b4f',
    'Metro Station': '4bf58dd8d48988d1fd931735',
    'Nightlife Spot': '4d4b7105d754a06376d81259',
    'Arts & Entertainment': '4d4b7104d754a06370d81259'
}

In [21]:
# Install FourSquare client library
!pip install foursquare

Collecting foursquare
  Downloading https://files.pythonhosted.org/packages/7e/9f/21ef283c50eb576eaebb0525d8a988baffe4d59ac2bbb1f9d84434bdf616/foursquare-1%212016.9.12.tar.gz
Building wheels for collected packages: foursquare
  Running setup.py bdist_wheel for foursquare ... [?25ldone
[?25h  Stored in directory: /home/jupyterlab/.cache/pip/wheels/c1/a4/ff/e07a4f4f02ef7189c5b1e0738a09131f6c5f2de811ce3a39a0
Successfully built foursquare
[31mdistributed 1.21.8 requires msgpack, which is not installed.[0m
Installing collected packages: foursquare
Successfully installed foursquare-1!2016.9.12


In [25]:
fs_act = None
with open('fs_act.json') as json_data:
    fs_act = json.load(json_data)

In [26]:
import foursquare
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
fs = foursquare.Foursquare(client_id=fs_act['client_id'], client_secret=fs_act['client_secret'])

In [27]:
RADIUS = 500 # 500m, around 5 minutes walking time

In [31]:
# Define a function to search nearby information and convert the result as dataframe
def venues_nearby(latitude, longitude, category):    
    results = fs.venues.search(
        params = {
            'query': category, 
            'll': '{},{}'.format(latitude, longitude),
            'radius': RADIUS,
            'categoryId': fs_categories[category]
        }
    )    
    df = json_normalize(results['venues'])
    cols = ['Name','Latitude','Longitude','Tips','Users','Visits']    
    if( len(df) == 0 ):        
        df = pd.DataFrame(columns=cols)
    else:        
        df = df[['name','location.lat','location.lng','stats.tipCount','stats.usersCount','stats.visitsCount']]
        df.columns = cols
    print('{} "{}" venues are found within {}m of location'.format(len(df), category, RADIUS))
    return df
    

Find number of MTR station around the cinema

In [32]:
venues_nearby(cinema['Latitude'], cinema['Longitude'], 'Metro Station').head()

2 "Metro Station" venues are found within 500m of location


Unnamed: 0,Name,Latitude,Longitude,Tips,Users,Visits
0,MTR Central Station (港鐵中環站),22.281911,114.158406,0,0,0
1,MTR Hong Kong Station (港鐵香港站),22.284926,114.158314,0,0,0


Find number of bus station around the cinema

In [33]:
venues_nearby(cinema['Latitude'], cinema['Longitude'], 'Bus Stop').head()

30 "Bus Stop" venues are found within 500m of location


Unnamed: 0,Name,Latitude,Longitude,Tips,Users,Visits
0,Seymour Road / Robinson Road Bus Stop 西摩道／羅便臣道巴士站,22.280465,114.150347,0,0,0
1,Douglas Street Bus Stop 德忌利士街巴士站,22.283273,114.15691,0,0,0
2,HSBC Headquarters Bus Stop 匯豐總行巴士站,22.280577,114.159446,0,0,0
3,Dr. Sun Yat-Sen Museum Bus Stop 孫中山紀念館巴士站,22.279132,114.152743,0,0,0
4,Hang Seng Bank Head Office Bus Stop 恒生銀行總行巴士站,22.283998,114.156038,0,0,0


Find eating places around the cinema

In [34]:
venues_nearby(cinema['Latitude'], cinema['Longitude'], 'Food').head()

25 "Food" venues are found within 500m of location


Unnamed: 0,Name,Latitude,Longitude,Tips,Users,Visits
0,Mana! Fast Slow Food,22.282921,114.154651,0,0,0
1,Good Luck Thai Food (鴻運泰國美食),22.281165,114.155296,0,0,0
2,Chiu Lung Fast Food (昭隆美食),22.282659,114.156753,0,0,0
3,Soul Food,22.281668,114.152495,0,0,0
4,Sun Hing Fast Food (新興美食),22.282521,114.156717,0,0,0


In [35]:
venues_nearby(cinema['Latitude'], cinema['Longitude'], 'Arts & Entertainment').head()

12 "Arts & Entertainment" venues are found within 500m of location


Unnamed: 0,Name,Latitude,Longitude,Tips,Users,Visits
0,Tai Kwun Centre for Heritage and Arts (大館古蹟及藝術館),22.281668,114.154216,0,0,0
1,Wah Tung China Arts Limited (華通陶瓷藝術有限公司),22.283046,114.152723,0,0,0
2,Ravenel Fine Arts Limited 睿芙奧,22.281819,114.156906,0,0,0
3,KONG Arts Space,22.281751,114.1533,0,0,0
4,State Of The Arts,22.282225,114.155006,0,0,0


With above data, I can build a **content-based recommender systems** to resolve the problem.  

Combine with FourSquare API on counting how many different venues (Food, Transport, Night Life) and Hong Kong cinema list, a **cinema nearby venues matrix** can be built. Stakeholder's favorite list is the **profile** to combine with cinema nearby venues matrix to become a **weighted matrix of favorite cinema**.

The weighted matrix can be applied on **5 possible locations with venues information** to generate a ranking result. The **the top one** on the ranking list can be recommended to the stakeholder.


## Methodology 

Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, and what machine learnings were used and why.

TBD

## Results 

Results section where you discuss the results.

TBD

## Discussion 

Discussion section where you discuss any observations you noted and any recommendations you can make based on the results.TBC

TBD

## Conclusion 

Conclusion section where you conclude the report.

TBD