# Capstone Project - The Battle of the Neighborhoods (Final project notebook)

#                                                                          New Cineplex in London

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

The objective of this project is to analyse and select the best location in the city of London, United Kingdom to open a new Cineplex. Specifically, this report would be targeted to Propert developers or stakeholders interested in opening an **Cineplex** in **London**, United Kingdom.

Since there are many Cineplex in London we will try to detect **locations that are not crowded with Cineplex**. Primarily we will be focusing on the crowded residential areas where there are no **Cineplex** 

Using data science methodology and machine learning techniques like clustering, this project aims to provide solutions to answer the business question: In a multi-cultural city like London, if a property developer or a stakeholder is looking to open a new Cineplex, where would you recommend, they open it?

We will use our data science power to generate a few most promising neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that the best location can be chosen by the proprty developers.

## Data

Based on definition of our problem, factor that will influence our decision are:
* Number of existing Cineplex in the neighborhood
* Most crowded areas in the neighborhood
* distance of neighborhood from city center

Following data sources will be needed to extract/generate the required information:
* List of neighborhoods in London. This defines the scope of this project which is confined to the city of London.
* Latitude and Longitude coordinates of those neighbourhoods. This is required to plot the map and get the venue data.
* Venue data, particularly data related to Cineplex. We will use this data to perform clustering on neighbourhoods.

## 1. Import libraries

In [4]:
!pip install folium
!pip install geocoder

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |████████████████████████████████| 102kB 343kB/s ta 0:00:01
Collecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6


In [5]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")

Libraries imported.


## 2.Scrap data from Wikipedia page into a DataFrame

In [13]:
data = requests.get("https://en.wikipedia.org/wiki/List_of_areas_of_London")
soup = BeautifulSoup(data.content, 'html.parser')
tb = soup.find('table', class_='wikitable')
links = tb.findAll('a')
areas = []
for link in links:
    areas.append(link.get('title'))

In [14]:
df = pd.DataFrame()
df['Areas'] = areas

In [15]:
df = df.drop_duplicates(keep="first")
df = df.drop(df.index[1])

In [16]:
df.reset_index(inplace = True)
df

Unnamed: 0,index,Areas
0,0,Abbey Wood
1,3,"Acton, London"
2,6,"Addington, London"
3,9,Addiscombe
4,12,"Albany Park, Bexley"
5,14,Aldborough Hatch
6,17,Aldgate
7,20,Aldwych
8,23,Alperton
9,26,Anerley


In [18]:
df.drop(['index'], axis=1,inplace=True)
df

Unnamed: 0,Areas
0,Abbey Wood
1,"Acton, London"
2,"Addington, London"
3,Addiscombe
4,"Albany Park, Bexley"
5,Aldborough Hatch
6,Aldgate
7,Aldwych
8,Alperton
9,Anerley


## 3. Get the geographical coordinates

In [19]:
def get_latlng(Areas):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, London, United Kingdom'.format(Areas))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [20]:
coords = [ get_latlng(neighborhood) for neighborhood in df["Areas"].tolist() ]

In [21]:
coords

[[51.492450000000076, 0.12127000000003818],
 [51.51324000000005, -0.2674599999999714],
 [51.42812400000001, -0.044685000000009634],
 [51.472745062125455, -0.20332414815600952],
 [51.435700000000054, 0.12588000000005195],
 [54.09199000000007, -1.381659999999954],
 [51.513308435920024, -0.077762090020195],
 [51.513306704512985, -0.11709219462088734],
 [51.52687087712042, -0.2064400519240089],
 [51.412330000000054, -0.06538999999997941],
 [51.500500000000045, -0.0605099999999652],
 [51.441797702204326, -0.1669723574931327],
 [51.565745595259266, -0.13491693967198248],
 [51.54475711282045, -0.08312244678432754],
 [51.57976511268053, -0.030328559133434508],
 [51.62243647082235, -0.1272340503978188],
 [51.44822000000005, -0.1483899999999494],
 [51.50760000000008, -0.09324999999995498],
 [51.520050000000026, -0.09246999999993477],
 [51.518137500000016, 0.013389749999993115],
 [51.58511000000004, 0.07841000000001941],
 [51.44787082152143, -0.03199033202216419],
 [51.47457000000003, -0.24211999

In [22]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [23]:
# merge the coordinates into the original dataframe
df['Latitude'] = df_coords['Latitude']
df['Longitude'] = df_coords['Longitude']

In [24]:
# check the neighborhoods and the coordinates
print(df.shape)
df

(531, 3)


Unnamed: 0,Areas,Latitude,Longitude
0,Abbey Wood,51.49245,0.12127
1,"Acton, London",51.51324,-0.26746
2,"Addington, London",51.428124,-0.044685
3,Addiscombe,51.472745,-0.203324
4,"Albany Park, Bexley",51.4357,0.12588
5,Aldborough Hatch,54.09199,-1.38166
6,Aldgate,51.513308,-0.077762
7,Aldwych,51.513307,-0.117092
8,Alperton,51.526871,-0.20644
9,Anerley,51.41233,-0.06539


In [25]:
# save the DataFrame as CSV file
df.to_csv("London_df.csv", index=False)

## 4. Create a map of London with neighborhoods superimposed on top

In [26]:
# get the coordinates of London
address = 'London, United Kingdom'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London, United Kingdom {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London, United Kingdom 51.5073219, -0.1276474.


In [27]:
# create map of London using latitude and longitude values
map_London = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(df['Latitude'], df['Longitude'], df['Areas']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_London)  
    
map_London

In [28]:
# save the map as HTML file
map_London.save('map_London.html')

## 5. Use the Foursquare API to explore the neighborhoods

In [29]:
# define Foursquare Credentials and Version
CLIENT_ID = 'NYUVVZZHTBPX0UOTHNXP1ZMB4WCN5VIL53QLZOLYYYIUVRXM' # your Foursquare ID
CLIENT_SECRET = 'BRK0VQ2BZOEJZHMBCC44OWGJEYT4WNAC13QJIJNFHD0YIZ1B' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: NYUVVZZHTBPX0UOTHNXP1ZMB4WCN5VIL53QLZOLYYYIUVRXM
CLIENT_SECRET:BRK0VQ2BZOEJZHMBCC44OWGJEYT4WNAC13QJIJNFHD0YIZ1B


### Now, let's get the top 100 venues that are within a radius of 2000 meters.

In [30]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(df['Latitude'], df['Longitude'], df['Areas']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [31]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(46387, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Abbey Wood,51.49245,0.12127,Sainsbury's,51.492824,0.120724,Supermarket
1,Abbey Wood,51.49245,0.12127,Lesnes Abbey,51.489526,0.125839,Historic Site
2,Abbey Wood,51.49245,0.12127,Lidl,51.496152,0.118417,Supermarket
3,Abbey Wood,51.49245,0.12127,Morrisons Thamesmead,51.507276,0.105392,Supermarket
4,Abbey Wood,51.49245,0.12127,Wilko,51.505596,0.103845,Furniture / Home Store


### Let's check how many venues were returned for each neighorhood

In [32]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abbey Wood,18,18,18,18,18,18
"Acton, London",100,100,100,100,100,100
"Addington, London",92,92,92,92,92,92
Addiscombe,100,100,100,100,100,100
"Albany Park, Bexley",40,40,40,40,40,40
Aldborough Hatch,4,4,4,4,4,4
Aldgate,100,100,100,100,100,100
Aldwych,100,100,100,100,100,100
Alperton,100,100,100,100,100,100
Anerley,100,100,100,100,100,100


### Let's find out how many unique categories can be curated from all the returned venues

In [33]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 403 uniques categories.


In [34]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:50]

array(['Supermarket', 'Historic Site', 'Furniture / Home Store',
       'Fast Food Restaurant', 'Train Station', 'Grocery Store',
       'Clothing Store', 'Campground', 'Gym / Fitness Center',
       'Warehouse Store', 'Trail', 'Café', 'Eastern European Restaurant',
       'Pub', 'Creperie', 'Hotel', 'Park', 'Brewery', 'Bakery',
       'Falafel Restaurant', 'Go Kart Track', 'Coffee Shop',
       'Middle Eastern Restaurant', 'Recreation Center',
       'Sushi Restaurant', 'Breakfast Spot', 'Gastropub', 'Wine Shop',
       'Gym', 'Japanese Restaurant', 'Hookah Bar', 'Mini Golf',
       'Portuguese Restaurant', 'Bowling Alley', 'Film Studio',
       'Fish & Chips Shop', 'Museum', 'Lebanese Restaurant', 'Office',
       'Italian Restaurant', 'Moroccan Restaurant', 'French Restaurant',
       'Convenience Store', 'Thai Restaurant', 'Indian Restaurant',
       'Pastry Shop', 'Music Venue', 'Mediterranean Restaurant',
       'Halal Restaurant', 'Chinese Restaurant'], dtype=object)

In [36]:
# check if the results contain "Cineplex"
"Neighborhood" in venues_df['VenueCategory'].unique()

True

## 6. Analyze Each Neighborhood

In [37]:
# one hot encoding
kl_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
kl_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [kl_onehot.columns[-1]] + list(kl_onehot.columns[:-1])
kl_onehot = kl_onehot[fixed_columns]

print(kl_onehot.shape)
kl_onehot.head()

(46387, 404)


Unnamed: 0,Neighborhoods,Accessories Store,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Animal Shelter,Antique Shop,Aquarium,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Australian Restaurant,Austrian Restaurant,Auto Garage,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Betting Shop,Bike Shop,Bistro,Boarding House,Boat or Ferry,Bookstore,Border Crossing,Botanical Garden,Boutique,Bowling Alley,Boxing Gym,Brasserie,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Buddhist Temple,Buffet,Bulgarian Restaurant,Burger Joint,Burrito Place,Bus Station,Bus Stop,Business Service,Butcher,Cable Car,Café,Cajun / Creole Restaurant,Camera Store,Campground,Canal,Canal Lock,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Castle,Caucasian Restaurant,Cave,Cemetery,Chaat Place,Champagne Bar,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Cigkofte Place,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Football Field,College Quad,College Soccer Field,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Costume Shop,Coworking Space,Creperie,Cricket Ground,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distillery,Dive Bar,Doner Restaurant,Donut Shop,Dosa Place,Dry Cleaner,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Film Studio,Financial or Legal Service,Fish & Chips Shop,Fish Market,Fishing Store,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Forest,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General College & University,General Entertainment,German Restaurant,Gift Shop,Go Kart Track,Golf Course,Golf Driving Range,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Halal Restaurant,Harbor / Marina,Hardware Store,Health & Beauty Service,Health Food Store,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Hockey Field,Hockey Rink,Home Service,Hookah Bar,Hostel,Hotel,Hotel Bar,Hunan Restaurant,Hungarian Restaurant,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Indoor Play Area,Intersection,Irish Pub,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kebab Restaurant,Korean Restaurant,Kosher Restaurant,Lake,Laser Tag,Latin American Restaurant,Laundromat,Lebanese Restaurant,Library,Light Rail Station,Lighthouse,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Military Base,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Motorcycle Shop,Movie Theater,Moving Target,Multiplex,Museum,Music Store,Music Venue,Nail Salon,Nature Preserve,Neighborhood,New American Restaurant,Newsstand,Nightclub,Noodle House,North Indian Restaurant,Observatory,Office,Okonomiyaki Restaurant,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Repair Shop,Outdoor Gym,Outdoor Sculpture,Outdoor Supply Store,Outdoors & Recreation,Outlet Mall,Outlet Store,Paintball Field,Pakistani Restaurant,Palace,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Pie Shop,Pier,Pilates Studio,Pizza Place,Planetarium,Platform,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Pool Hall,Portuguese Restaurant,Post Office,Print Shop,Pub,Public Art,RV Park,Racecourse,Racetrack,Rafting,Ramen Restaurant,Record Shop,Recording Studio,Recreation Center,Rental Car Location,Reservoir,Residential Building (Apartment / Condo),Restaurant,River,Road,Rock Climbing Spot,Rock Club,Roof Deck,Rugby Pitch,Rugby Stadium,Russian Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Science Museum,Scottish Restaurant,Sculpture Garden,Seafood Restaurant,Shaanxi Restaurant,Shabu-Shabu Restaurant,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Smoothie Shop,Snack Place,Soccer Field,Soccer Stadium,Social Club,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvenir Shop,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stables,Stadium,Stationery Store,Steakhouse,Street Art,Street Food Gathering,Student Center,Supermarket,Sushi Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Tea Room,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Toy / Game Store,Track,Track Stadium,Trail,Train Station,Tram Station,Tunnel,Turkish Restaurant,Udon Restaurant,University,Vape Store,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Whisky Bar,Windmill,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Xinjiang Restaurant,Yakitori Restaurant,Yoga Studio,Zoo,Zoo Exhibit
0,Abbey Wood,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Abbey Wood,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Abbey Wood,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Abbey Wood,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Abbey Wood,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

kl_grouped = kl_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(kl_grouped.shape)
kl_grouped

In [44]:
len(kl_grouped[kl_grouped["Movie Theater"] > 0])

200

### Create a new DataFrame for Cineplex(Movie Theatre) data only

In [45]:
London_Cineplex = kl_grouped[["Neighborhoods","Movie Theater"]]

In [46]:
London_Cineplex.head()

Unnamed: 0,Neighborhoods,Movie Theater
0,Abbey Wood,0.0
1,"Acton, London",0.01
2,"Addington, London",0.0
3,Addiscombe,0.0
4,"Albany Park, Bexley",0.0


## 7. Cluster Neighborhoods

Run k-means to cluster the neighborhoods in London into 3 clusters.

In [59]:
# set number of clusters
kclusters = 4

London_clustering = London_Cineplex.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(London_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 2, 0, 0, 0, 0, 0, 1, 3, 2], dtype=int32)

In [60]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
London_merged = London_Cineplex.copy()

# add clustering labels
London_merged["Cluster Labels"] = kmeans.labels_

In [61]:
London_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
London_merged.head()

Unnamed: 0,Neighborhood,Movie Theater,Cluster Labels
0,Abbey Wood,0.0,0
1,"Acton, London",0.01,2
2,"Addington, London",0.0,0
3,Addiscombe,0.0,0
4,"Albany Park, Bexley",0.0,0


In [62]:
# merge df with London_merged to add latitude/longitude for each neighborhood
London_merged = London_merged.join(df.set_index("Areas"), on="Neighborhood")

print(London_merged.shape)
London_merged.head() # check the last columns!

(531, 5)


Unnamed: 0,Neighborhood,Movie Theater,Cluster Labels,Latitude,Longitude
0,Abbey Wood,0.0,0,51.49245,0.12127
1,"Acton, London",0.01,2,51.51324,-0.26746
2,"Addington, London",0.0,0,51.428124,-0.044685
3,Addiscombe,0.0,0,51.472745,-0.203324
4,"Albany Park, Bexley",0.0,0,51.4357,0.12588


In [63]:
# sort the results by Cluster Labels
print(London_merged.shape)
London_merged.sort_values(["Cluster Labels"], inplace=True)
London_merged

(531, 5)


Unnamed: 0,Neighborhood,Movie Theater,Cluster Labels,Latitude,Longitude
0,Abbey Wood,0.0,0,51.49245,0.12127
326,Noak Hill,0.0,0,51.62217,0.2258
324,"Newington, London",0.0,0,51.550949,-0.085175
323,"Newbury Park, London",0.0,0,51.519344,-0.098296
322,New Southgate,0.0,0,51.61438,-0.1427
321,New Malden,0.0,0,51.400904,-0.244971
320,New Eltham,0.0,0,51.43353,0.06378
319,New Cross,0.0,0,51.47489,-0.04038
318,New Barnet,0.0,0,51.627294,-0.253759
317,New Addington,0.0,0,51.428124,-0.044685


In [64]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(London_merged['Latitude'], London_merged['Longitude'], London_merged['Neighborhood'], London_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [65]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

## 8. Examine Clusters

### Cluster 0

In [66]:
London_merged.loc[London_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Movie Theater,Cluster Labels,Latitude,Longitude
0,Abbey Wood,0.0,0,51.49245,0.12127
326,Noak Hill,0.0,0,51.62217,0.2258
324,"Newington, London",0.0,0,51.550949,-0.085175
323,"Newbury Park, London",0.0,0,51.519344,-0.098296
322,New Southgate,0.0,0,51.61438,-0.1427
321,New Malden,0.0,0,51.400904,-0.244971
320,New Eltham,0.0,0,51.43353,0.06378
319,New Cross,0.0,0,51.47489,-0.04038
318,New Barnet,0.0,0,51.627294,-0.253759
317,New Addington,0.0,0,51.428124,-0.044685


### Cluster 1

In [67]:
London_merged.loc[London_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Movie Theater,Cluster Labels,Latitude,Longitude
523,"Woodlands, London",0.02,1,51.55875,-0.17837
449,Swiss Cottage,0.02,1,51.54367,-0.17265
511,Westminster,0.02,1,51.50008,-0.12802
514,Whitechapel,0.02,1,51.51917,-0.05966
50,"Blackwall, London",0.02,1,51.51077,-0.00419
47,Blackfen,0.02,1,51.50642,-0.12721
460,Tokyngton,0.02,1,51.50642,-0.12721
53,"Botany Bay, London",0.02,1,51.50642,-0.12721
7,Aldwych,0.02,1,51.513307,-0.117092
367,Ponders End,0.02,1,51.645487,-0.046534


### Cluster 2

In [68]:
London_merged.loc[London_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Movie Theater,Cluster Labels,Latitude,Longitude
172,Fitzrovia,0.01,2,51.51873,-0.13737
280,Leyton,0.01,2,51.55885,-0.00733
271,Lambeth,0.01,2,51.49084,-0.11108
242,Homerton,0.01,2,51.5469,-0.04234
482,Vauxhall,0.01,2,51.47623,-0.13056
42,Bethnal Green,0.01,2,51.52669,-0.06257
258,Kensington,0.01,2,51.49906,-0.19874
247,Hornsey,0.01,2,51.5817,-0.12093
167,"Farringdon, London",0.01,2,51.52015,-0.10451
33,"Bedford Park, London",0.01,2,51.51906,-0.12895


### Cluster 3

In [69]:
London_merged.loc[London_merged['Cluster Labels'] == 3]

Unnamed: 0,Neighborhood,Movie Theater,Cluster Labels,Latitude,Longitude
354,Park Royal,0.033333,3,51.5277,-0.26747
283,Lisson Grove,0.03,3,51.52263,-0.165442
8,Alperton,0.03,3,51.526871,-0.20644
102,Cockfosters,0.045455,3,51.634541,-0.205739
419,"Southborough, Bromley",0.028571,3,51.395991,0.045796
178,Freezywater,0.03,3,51.52146,-0.16749
305,Millbank,0.03,3,51.495629,-0.125308
152,Edgware,0.03,3,51.522743,-0.174008
213,Hanworth,0.032258,3,51.43054,-0.39023
287,"Longford, London",0.03,3,51.525659,-0.142517


## Observations:

Looking at the clusters the maximum number of Cineplex's are present in Cluster 1, very small number of Cineplex's are present in Cluster 2 and absolutely no Cineplex's are present in Cluster 0. Building a Movie Theater in any one of the locations in Cluster 0 will be a right decision to make. 