#### 1. Geographic coordinate of Hong Kong cinemas

I need to **compare 5 possible locations with current cinemas** in Hong Kong. Therefore, I need to find a list of Hong Kong cinema and cinemas' geographic coordinates. Luckily, I can find the list and coordinates from the website https://hkmovie6.com/cinema .

In [1]:
import json
import pandas as pd

In [2]:
!wget -O hk_cinema_list.json https://hkmovie6.com/api/cinemas/lists

--2019-08-01 15:20:37--  https://hkmovie6.com/api/cinemas/lists
Resolving hkmovie6.com (hkmovie6.com)... 104.31.67.1, 104.31.66.1, 2606:4700:30::681f:4301, ...
Connecting to hkmovie6.com (hkmovie6.com)|104.31.67.1|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/json]
Saving to: ‘hk_cinema_list.json’

    [ <=>                                   ] 56,711       360KB/s   in 0.2s   

2019-08-01 15:20:38 (360 KB/s) - ‘hk_cinema_list.json’ saved [56711]



In [3]:
cinemas_json = None
with open('hk_cinema_list.json', 'r', encoding='utf-8') as f:
    cinemas_json = json.load(f)
    
cinemas = []
for data in cinemas_json['data']:    
    cinemas.append({
        'Name': data['name'],
        'ChiName': data['chiName'],
        'Address': data['address'],
        'Latitude': data['lat'],
        'Longitude': data['lon']
    })
df_cinemas = pd.DataFrame(cinemas, columns=['Name','ChiName','Address','Latitude','Longitude'])

In [4]:
print('There are {} cinemas in Hong Kong'.format(len(df_cinemas)))

There are 71 cinemas in Hong Kong


In [5]:
df_cinemas.head()

Unnamed: 0,Name,ChiName,Address,Latitude,Longitude
0,Emperor Cinemas - Entertainment Building,英皇戲院 - 娛樂行,"3/F, Emperor Cinemas Entertainment Building, 3...",22.281453,114.15423
1,Emperor Cinemas - Ma On Shan,英皇戲院 - 馬鞍山新港城中心,"L2, MOSTown, Sai Sha Road, Ma On Shan, N.T.",22.42412,114.230957
2,Emperor Cinemas - Tuen Mun,英皇戲院 - 屯門新都商場,"3/F, New Town Commercial Arcade, 2 Tuen Lee St...",22.390776,113.975983
3,The Coronet @ Emperor Cinemas - Entertainment ...,The Coronet @ 英皇戲院 - 娛樂行,"3/F, Emperor Cinemas Entertainment Building, 3...",22.281453,114.15423
4,Festival Grand Cinema,Festival Grand Cinema,"Level UG, Festival Walk, 80 Tat Chee Avenue, K...",22.337882,114.174325


In [6]:
possible_locations = [
    { 'Location': 'L1', 'Address': 'Sau Mau Ping Shopping Centre, Sau Mau Ping'},
    { 'Location': 'L2', 'Address': 'Tuen Mun Ferry, Tuen Mun'},
    { 'Location': 'L3', 'Address': 'Un Chau Shopping Centre, Cheung Sha Wan'},
    { 'Location': 'L4', 'Address': 'Prosperity Millennia Plaza, North Point'},
    { 'Location': 'L5', 'Address': 'Tsuen Fung Centre Shopping Arcade, Tsuen Wan'},
]

In [7]:
!pip install -U googlemaps

Requirement already up-to-date: googlemaps in /opt/conda/envs/Python36/lib/python3.6/site-packages (3.0.2)


In [8]:
google_act = None
with open('google_map_act.json', 'r') as f:
    google_act = json.load(f)
    
GOOGLE_MAP_API_KEY = google_act['api_key']    

import googlemaps
gmaps = googlemaps.Client(key=GOOGLE_MAP_API_KEY)

FileNotFoundError: [Errno 2] No such file or directory: 'google_map_act.json'

In [9]:
# Retrieve geolocation and create the dataframe of pending cinema addresses
def getLatLng(address):
    latlnt = gmaps.geocode('{}, Hong Kong'.format(address))
    return (latlnt[0]['geometry']['location']['lat'], latlnt[0]['geometry']['location']['lng'])

In [10]:
for loc in possible_locations:        
    (lat, lng) = getLatLng(loc['Address'])
    loc['Latitude'] = lat
    loc['Longitude'] = lng
    
df_possible_locations = pd.DataFrame(possible_locations, columns=['Location', 'Address', 'Latitude', 'Longitude'])
df_possible_locations

NameError: name 'gmaps' is not defined

#### 3. Favorite cinema

In [11]:
boss_favorite = [
    {'Name': 'Broadway Circuit - MONGKOK', 'Rating': 4.5},
    {'Name': 'Broadway Circuit - The ONE', 'Rating': 4.5},
    {'Name': 'Grand Ocean', 'Rating': 4.3},
    {'Name': 'The Grand Cinema', 'Rating': 3.4},
    {'Name': 'AMC Pacific Place', 'Rating': 2.3},
    {'Name': 'UA IMAX @ Airport', 'Rating': 1.5},
]

df_boss_favorite = pd.DataFrame(boss_favorite, columns=['Name','Rating'])
df_boss_favorite

Unnamed: 0,Name,Rating
0,Broadway Circuit - MONGKOK,4.5
1,Broadway Circuit - The ONE,4.5
2,Grand Ocean,4.3
3,The Grand Cinema,3.4
4,AMC Pacific Place,2.3
5,UA IMAX @ Airport,1.5


#### 4. Eating, Shopping and Public transportation facility around cinema
The recommended cinema location needs to have many eating and shopping venues nearby. Convenient public transport is also required.  
These data can be found by using FourSquare API to find these venues around the location. The radius of exploration distance is set to 500 meters, which is about 5 minutes walking distance.

Following type of venue category will be used to search

In [12]:
fs_categories = {
    'Food': '4d4b7105d754a06374d81259',
    'Shop & Service': '4d4b7105d754a06378d81259',
    'Bus Stop': '52f2ab2ebcbc57f1066b8b4f',
    'Metro Station': '4bf58dd8d48988d1fd931735',
    'Nightlife Spot': '4d4b7105d754a06376d81259',
    'Arts & Entertainment': '4d4b7104d754a06370d81259'
}

In [13]:
', '.join([ cat for cat in fs_categories])

'Food, Shop & Service, Bus Stop, Metro Station, Nightlife Spot, Arts & Entertainment'

In [14]:
cinema = df_cinemas.loc[0]

In [15]:
print('Use the first cinema "{}" in the list as example to explore venues nearyby'.format(cinema['Name']))

Use the first cinema "Emperor Cinemas - Entertainment Building" in the list as example to explore venues nearyby


In [16]:
!pip install foursquare



In [17]:
fs_act = None
with open('fs_act.json') as json_data:
    fs_act = json.load(json_data)

FileNotFoundError: [Errno 2] No such file or directory: 'fs_act.json'

In [18]:
import foursquare
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
fs = foursquare.Foursquare(client_id=fs_act['client_id'], client_secret=fs_act['client_secret'])

TypeError: 'NoneType' object is not subscriptable

In [19]:
RADIUS = 500 

In [20]:
def venues_nearby(latitude, longitude, category, verbose=True):    
    results = fs.venues.search(
        params = {
            'query': category, 
            'll': '{},{}'.format(latitude, longitude),
            'radius': RADIUS,
            'categoryId': fs_categories[category]
        }
    )    
    df = json_normalize(results['venues'])
    cols = ['Name','Latitude','Longitude','Tips','Users','Visits']    
    if( len(df) == 0 ):        
        df = pd.DataFrame(columns=cols)
    else:        
        df = df[['name','location.lat','location.lng','stats.tipCount','stats.usersCount','stats.visitsCount']]
        df.columns = cols
    if( verbose ):
        print('{} "{}" venues are found within {}m of location'.format(len(df), category, RADIUS))
    return df
    

Find Metro Station around the cinema

In [21]:
venues_nearby(cinema['Latitude'], cinema['Longitude'], 'Metro Station').head()

NameError: name 'fs' is not defined

Find Bus Stop around the cinema

In [22]:
venues_nearby(cinema['Latitude'], cinema['Longitude'], 'Bus Stop').head()

NameError: name 'fs' is not defined

Find eating places around the cinema

In [23]:
venues_nearby(cinema['Latitude'], cinema['Longitude'], 'Food').head()

NameError: name 'fs' is not defined

In [24]:
venues_nearby(cinema['Latitude'], cinema['Longitude'], 'Arts & Entertainment').head()

NameError: name 'fs' is not defined

## Methodology 

Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, and what machine learnings were used and why.

With above data, I can use content-based recommendation technique to resolve the problem.

Combine with FourSquare API which provides how many venues in different category of Hong Kong cinemas, a matrix which captured characteristic of venues nearby cinema are built. Stakeholder's favorite list is the profile to combine with the matrix to become a weighted matrix of favorite cinema.

The weighted matrix can be applied on 5 target locations with venues information to generate a ranking result. The the top one on the ranking list can be recommended to the stakeholder.

Before building the matrix, I have to prepare the required data and apply some data analysis.

#### Data Cleansing and Preparation

Check the cinemas dataset contains any duplicated address

In [27]:
duplicated = df_cinemas.duplicated('Address', keep=False)
df_cinemas[duplicated].sort_values('Address')

Unnamed: 0,Name,ChiName,Address,Latitude,Longitude
15,Cinema City VICTORIA (Causeway Bay),Cinema City VICTORIA (銅鑼灣),"2-8 Sugar Street, Causeway Bay, Hong Kong",22.279805,114.187126
16,Diamond Suite VIP House @ Cinema City VICTORIA...,Diamond Suite VIP House @ Cinema City VICTORIA...,"2-8 Sugar Street, Causeway Bay, Hong Kong",22.279805,114.187126
0,Emperor Cinemas - Entertainment Building,英皇戲院 - 娛樂行,"3/F, Emperor Cinemas Entertainment Building, 3...",22.281453,114.15423
3,The Coronet @ Emperor Cinemas - Entertainment ...,The Coronet @ 英皇戲院 - 娛樂行,"3/F, Emperor Cinemas Entertainment Building, 3...",22.281453,114.15423
46,IMAX @ UA iSQUARE,IMAX @ UA iSQUARE,"7/F, iSQUARE, 63 Nathan Road, Tsimshatsui",22.296648,114.171974
49,Phoenix Club @ UA iSQUARE,鳯凰影院 @ UA iSQUARE,"7/F, iSQUARE, 63 Nathan Road, Tsimshatsui",22.296648,114.171974
52,UA iSQUARE,UA iSQUARE,"7/F, iSQUARE, 63 Nathan Road, Tsimshatsui",22.296648,114.171974
45,IMAX @ UA Cine Moko,IMAX @ UA Cine Moko,"L4, MOKO, 193 Prince Edward Road West, Mongkok...",22.3238,114.172
51,UA Cine Moko,UA Cine Moko,"L4, MOKO, 193 Prince Edward Road West, Mongkok...",22.3238,114.172
47,IMAX @ UA MegaBox,IMAX @ UA MegaBox,"Level 11, MegaBox, Enterprise Square 5, 38 Wan...",22.319533,114.208555


Some "special house" in cinema are separated as a new cinema in www.hkmovie6.com  
These records are duplicated in my case and should be corrected.

In [28]:
# The Grand SC Starsuite -> The Grand Cinema
df_cinemas.loc[29, 'Name'] = 'The Grand Cinema'

# XXX @ UA MegaBox -> UA MegaBox
df_cinemas.loc[44, 'Name'] = 'UA MegaBox'
df_cinemas.loc[45, 'Name'] = 'UA MegaBox'

# BEA IMAX @ UA Cine Moko -> UA Cine Moko
df_cinemas.loc[42, 'Name'] = 'UA Cine Moko'

# XXX @ UA iSQUARE -> iSQUARE
df_cinemas.loc[43, 'Name'] = 'UA iSQUARE'
df_cinemas.loc[46, 'Name'] = 'UA iSQUARE'

# Emperor Cinemas - Entertainment Building
df_cinemas.loc[1, 'Name'] = 'Emperor Cinemas - Entertainment Building'

# Cinema City VICTORIA (Causeway Bay)
df_cinemas.loc[6, 'Name'] = 'Cinema City VICTORIA (Causeway Bay)'

In [29]:
df_cinemas[duplicated]

Unnamed: 0,Name,ChiName,Address,Latitude,Longitude
0,Emperor Cinemas - Entertainment Building,英皇戲院 - 娛樂行,"3/F, Emperor Cinemas Entertainment Building, 3...",22.281453,114.15423
3,The Coronet @ Emperor Cinemas - Entertainment ...,The Coronet @ 英皇戲院 - 娛樂行,"3/F, Emperor Cinemas Entertainment Building, 3...",22.281453,114.15423
15,Cinema City VICTORIA (Causeway Bay),Cinema City VICTORIA (銅鑼灣),"2-8 Sugar Street, Causeway Bay, Hong Kong",22.279805,114.187126
16,Diamond Suite VIP House @ Cinema City VICTORIA...,Diamond Suite VIP House @ Cinema City VICTORIA...,"2-8 Sugar Street, Causeway Bay, Hong Kong",22.279805,114.187126
45,UA MegaBox,IMAX @ UA Cine Moko,"L4, MOKO, 193 Prince Edward Road West, Mongkok...",22.3238,114.172
46,UA iSQUARE,IMAX @ UA iSQUARE,"7/F, iSQUARE, 63 Nathan Road, Tsimshatsui",22.296648,114.171974
47,IMAX @ UA MegaBox,IMAX @ UA MegaBox,"Level 11, MegaBox, Enterprise Square 5, 38 Wan...",22.319533,114.208555
48,Oscars Club @ UA MegaBox,Oscars Club @ UA MegaBox,"Level 11, MegaBox, Enterprise Square 5, 38 Wan...",22.319533,114.208555
49,Phoenix Club @ UA iSQUARE,鳯凰影院 @ UA iSQUARE,"7/F, iSQUARE, 63 Nathan Road, Tsimshatsui",22.296648,114.171974
51,UA Cine Moko,UA Cine Moko,"L4, MOKO, 193 Prince Edward Road West, Mongkok...",22.3238,114.172


In [30]:
df_cinemas.drop_duplicates('Address', inplace=True, keep='first')

Drop the duplicated cinema records

In [31]:
df_cinemas[df_cinemas.duplicated('Name')]

Unnamed: 0,Name,ChiName,Address,Latitude,Longitude
1,Emperor Cinemas - Entertainment Building,英皇戲院 - 馬鞍山新港城中心,"L2, MOSTown, Sai Sha Road, Ma On Shan, N.T.",22.42412,114.230957
15,Cinema City VICTORIA (Causeway Bay),Cinema City VICTORIA (銅鑼灣),"2-8 Sugar Street, Causeway Bay, Hong Kong",22.279805,114.187126
45,UA MegaBox,IMAX @ UA Cine Moko,"L4, MOKO, 193 Prince Edward Road West, Mongkok...",22.3238,114.172
46,UA iSQUARE,IMAX @ UA iSQUARE,"7/F, iSQUARE, 63 Nathan Road, Tsimshatsui",22.296648,114.171974


In [32]:
df_cinemas.head()

Unnamed: 0,Name,ChiName,Address,Latitude,Longitude
0,Emperor Cinemas - Entertainment Building,英皇戲院 - 娛樂行,"3/F, Emperor Cinemas Entertainment Building, 3...",22.281453,114.15423
1,Emperor Cinemas - Entertainment Building,英皇戲院 - 馬鞍山新港城中心,"L2, MOSTown, Sai Sha Road, Ma On Shan, N.T.",22.42412,114.230957
2,Emperor Cinemas - Tuen Mun,英皇戲院 - 屯門新都商場,"3/F, New Town Commercial Arcade, 2 Tuen Lee St...",22.390776,113.975983
4,Festival Grand Cinema,Festival Grand Cinema,"Level UG, Festival Walk, 80 Tat Chee Avenue, K...",22.337882,114.174325
5,Grand Kornhill Cinema,康怡戲院,"4/F, Kornhill Plaza South, 2 Kornhill Road, Qu...",22.284218,114.216428


In [33]:
df_cinemas['ChiName'].to_frame()

Unnamed: 0,ChiName
0,英皇戲院 - 娛樂行
1,英皇戲院 - 馬鞍山新港城中心
2,英皇戲院 - 屯門新都商場
4,Festival Grand Cinema
5,康怡戲院
6,皇室戲院
7,MCL 長沙灣戲院
8,MCL 粉嶺戲院
9,MCL 新都城戲院
10,MCL 海怡戲院


Cinema '新光戲院大劇場' and '大館' should be considered as cinema in Hong Kong. These records must be rmeoved

In [34]:
df_cinemas.drop(index=[65,67], inplace=True)
df_cinemas.drop(axis=1, columns=['ChiName'], inplace=True)
df_cinemas.head()

Unnamed: 0,Name,Address,Latitude,Longitude
0,Emperor Cinemas - Entertainment Building,"3/F, Emperor Cinemas Entertainment Building, 3...",22.281453,114.15423
1,Emperor Cinemas - Entertainment Building,"L2, MOSTown, Sai Sha Road, Ma On Shan, N.T.",22.42412,114.230957
2,Emperor Cinemas - Tuen Mun,"3/F, New Town Commercial Arcade, 2 Tuen Lee St...",22.390776,113.975983
4,Festival Grand Cinema,"Level UG, Festival Walk, 80 Tat Chee Avenue, K...",22.337882,114.174325
5,Grand Kornhill Cinema,"4/F, Kornhill Plaza South, 2 Kornhill Road, Qu...",22.284218,114.216428


In [35]:
df_cinemas.shape

(62, 4)

In [36]:
from pathlib import Path

venues_csv = Path('./cinemas_venues.csv')
df_venues = None

# check the venues data is explored and downloaded 
if( venues_csv.exists() ):
    df_venues = pd.read_csv('./cinemas_venues.csv')
else:    
    # construct a dataframe to store data
    df_venues = pd.DataFrame(columns=['Cinema Name', 'Category', 'Name', 'Latitude', 'Longitude', 'Tips', 'Users', 'Visits'])
    for (name, address, latitude, longitude) in df_cinemas.itertuples(index=False):
        for cat, cat_id in fs_categories.items():
            df = venues_nearby(latitude, longitude, cat, verbose=False)
            df['Cinema Name'] = name
            df['Category'] = cat
            df_venues = df_venues.append(df, sort=True)
    df_venues.to_csv('cinemas_venues.csv', index=False)

NameError: name 'fs' is not defined

In [37]:
print('Total {} of venues are found'.format(len(df_venues)))

Total 0 of venues are found
