# Capstone Project - The Battle of Neighborhoods (Week 1)

### Table of Contents
* Introduction: Business Problem
* Data
* Methodology
* Analysis
* Results and Discussion
* Conclusion

## A. Introduction/Business Problem

California is one of many state located in the Western United States. It is the third most populous state among other states of United States. The most popular county in California is Los Angeles, as it is also being the most populous county in the United States, having around four million residents.

Los Angeles is becoming a good place to set up business following the fact that it has the third-largest metropolitan economy in the world. One of the booming business in current trend is gym or sport centre. Moreover, Los Angeles is famous for people who is pursuing career in entertainment industry, which makes it important for them to take care of their body, beauty, and health. That makes Gym location being one of important indicator when people consider and choosing where to live. Apart from that, opening new gym means that we need to consider its preconditioned location too, like its neighborhood characteristic, population, and also nearby competitor. It's causing a big problem to answer, where is the most recommended neighborhood in Los Angeles to open gym?

## B. Data

To solve this problem it is important to gain proper data that are needed for analyzing characteristic of neighborhoods in Los Angeles. Some data related to Los Angeles which are needed such as:
* Name of region and neighborhoods
    - Source: http://maps.latimes.com/neighborhoods/neighborhood/list/
* Location of neighborhoods (Langitude and Latitude)
    - To derived map and get common venues around neighborhoods 
    - Source: geocoders and "http://s3-us-west-2.amazonaws.com/boundaries.latimes.com/archive/1.0/boundary-set/la-county-neighborhoods-v5.geojson"

* Number of residents
    - To get views and compare the population on every neighborhood 
    - Source: http://maps.latimes.com/neighborhoods/population/total/neighborhood/list/
* Common venues around every city
    - To get views and compare characteristic on every neighborhood using venue's characteristic
    - Source: Foursquare


## C. Methodology

Method we will use are divided into 3 parts. First, importing all data needed and merging into a main dataframe. Second, rank common venues around every city for analyzing neighborhood characteristic. Third, checking potential neighborhoods by its population and competitor nearby.

In [1]:
import numpy as np
import pandas as pd 
import requests
import json


!pip install geopy
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

! pip install folium==0.5.0
import folium # plotting library

!pip install bs4
from bs4 import BeautifulSoup

print('Folium installed')
print('Libraries imported.')

Collecting geopy
[?25l  Downloading https://files.pythonhosted.org/packages/0c/67/915668d0e286caa21a1da82a85ffe3d20528ec7212777b43ccd027d94023/geopy-2.1.0-py3-none-any.whl (112kB)
[K     |████████████████████████████████| 112kB 10.8MB/s eta 0:00:01
[?25hCollecting geographiclib<2,>=1.49 (from geopy)
  Downloading https://files.pythonhosted.org/packages/8b/62/26ec95a98ba64299163199e95ad1b0e34ad3f4e176e221c40245f211e425/geographiclib-1.50-py3-none-any.whl
Installing collected packages: geographiclib, geopy
Successfully installed geographiclib-1.50 geopy-2.1.0
Collecting bs4
  Downloading https://files.pythonhosted.org/packages/10/ed/7e8b97591f6f456174139ec089c769f89a94a1a4025fe967691de971f314/bs4-0.0.1.tar.gz
Collecting beautifulsoup4 (from bs4)
[?25l  Downloading https://files.pythonhosted.org/packages/d1/41/e6495bd7d3781cee623ce23ea6ac73282a373088fcd0ddc809a047b18eae/beautifulsoup4-4.9.3-py3-none-any.whl (115kB)
[K     |████████████████████████████████| 122kB 5.7MB/s eta 0:00:01


First, we will import the neighborhood list data from the LA times website and make it into a pandas dataframe

In [2]:
# import neighborhood data from source URL
url = "http://maps.latimes.com/neighborhoods/neighborhood/list/"
data = requests.get(url).text 
table = BeautifulSoup(data, 'html5lib')
tables = table.find_all('table')
tables[0].prettify()

# make empty table
df = pd.DataFrame(columns=["Neighborhood", "Region"])

# convert html into table data
for row in tables[0].tbody.find_all("tr"):
    col = row.find_all("td")
    if (col != []):
        neigh = col[0].text
        region = col[1].text
        df = df.append({"Neighborhood":neigh, "Region":region}, ignore_index=True)
        

df = df.replace(' ','_', regex=True)
df = df.replace('\n____________________','', regex=True)
df = df.replace('\n________________','', regex=True)
df = df.replace('_',' ', regex=True)
df

Unnamed: 0,Neighborhood,Region
0,Acton,Antelope Valley
1,Adams-Normandie,South L.A.
2,Agoura Hills,Santa Monica Mountains
3,Agua Dulce,Northwest County
4,Alhambra,San Gabriel Valley
...,...,...
267,Willowbrook,South L.A.
268,Wilmington,Harbor
269,Windsor Square,Central L.A.
270,Winnetka,San Fernando Valley


After getting all name of the neighborhood and region they belong to, we import population data from LA times and turn it into another pandas dataframe

In [3]:
# import population data from source URL
url2 = "http://maps.latimes.com/neighborhoods/population/total/neighborhood/list/"
data2 = requests.get(url2).text 
table2 = BeautifulSoup(data2, 'html5lib')
tables2 = table2.find_all('table')

# make empty table
dfpop = pd.DataFrame(columns=["Neighborhood", "Population"])

# convert html into table data
for row in tables2[1].tbody.find_all("tr"):
    col = row.find_all("td")
    if (col != []):
        neigh = col[1].text
        pop = col[2].string
        dfpop = dfpop.append({"Neighborhood":neigh, "Population":pop}, ignore_index=True)

dfpop = dfpop.replace(' ','_', regex=True)
dfpop = dfpop.replace('\n________________________','', regex=True)
dfpop = dfpop.replace('\n____________________','', regex=True)
dfpop = dfpop.replace('_',' ', regex=True)
dfpop = dfpop.replace(',','', regex=True)

dfpop['Population'] = pd.to_numeric(dfpop.Population, errors='coerce')

dfpop

Unnamed: 0,Neighborhood,Population
0,Long Beach,462731
1,Glendale,195047
2,Santa Clarita,157447
3,Pomona,150269
4,Torrance,137933
...,...,...
260,Lake Hughes,571
261,Hasley Canyon,423
262,West San Dimas,309
263,Unincorporated Catalina Island,139


We then merge both dataframe, sorting it to the highest number of population, and finally dropping empty data

In [4]:
# Merge both dataframe
dfnew = df.merge(dfpop, on='Neighborhood', how='left')
dfnew

#Sort by highest population
dfnew.sort_values(by=['Population'], ascending=False).dropna().reset_index(drop=True)

Unnamed: 0,Neighborhood,Region,Population
0,Long Beach,Harbor,462731.0
1,Glendale,Verdugos,195047.0
2,Santa Clarita,Northwest County,157447.0
3,Pomona,Pomona Valley,150269.0
4,Torrance,South Bay,137933.0
...,...,...,...
260,Lake Hughes,Northwest County,571.0
261,Hasley Canyon,Northwest County,423.0
262,West San Dimas,San Gabriel Valley,309.0
263,Unincorporated Catalina Island,Harbor,139.0


Next we are going to import all neighborhood's location by importing latitude and longitude from Geolocator, make new dataframe from the result, and then merging it into the main dataframe

In [5]:
location = [x for x in df['Neighborhood'].unique().tolist() 
            if type(x) == str]
latitude = []
longitude = []
for i in range(0, len(location)):
    try:
        address = location[i] + ', CA'
        geolocator = Nominatim(user_agent="ny_explorer")
        loc = geolocator.geocode(address)
        latitude.append(loc.latitude)
        longitude.append(loc.longitude)
        
    except:
        # in the case the geolocator does not work, then add nan element to list to keep the right size
        latitude.append(np.nan)
        longitude.append(np.nan)

In [6]:
# create a dataframe with the location, latitude and longitude
df_ = pd.DataFrame({'Neighborhood':location, 
                    'Neigh_latitude': latitude,
                    'Neigh_longitude':longitude})

# merge on main dataframe with location dataframe 
new_df = dfnew.merge(df_, on='Neighborhood', how='left').dropna().sort_values(by=['Population'], ascending=False).reset_index(drop=True)

new_df

Unnamed: 0,Neighborhood,Region,Population,Neigh_latitude,Neigh_longitude
0,Long Beach,Harbor,462731.0,33.769016,-118.191604
1,Glendale,Verdugos,195047.0,34.146942,-118.247847
2,Santa Clarita,Northwest County,157447.0,34.391664,-118.542586
3,Pomona,Pomona Valley,150269.0,34.055381,-117.751750
4,Torrance,South Bay,137933.0,33.835849,-118.340629
...,...,...,...,...,...
255,Lake Hughes,Northwest County,571.0,34.677458,-118.448711
256,Hasley Canyon,Northwest County,423.0,34.478487,-118.661582
257,West San Dimas,San Gabriel Valley,309.0,34.102245,-117.819512
258,Unincorporated Catalina Island,Harbor,139.0,49.152548,-55.340537


Now, we are going to see Los Angeles map along with their neighborhood

In [89]:
# Storing foursquare data
CLIENT_ID = '3MJJYZ2LJ3YSHHJALWHMB0WFQ3ZWJ5DGWRLFUMHIDED1OJFS' # your Foursquare ID
CLIENT_SECRET = 'IVU5EBWRPCJN1WTEQ1ZLHNVYVKAOGZ211DRYP5S1WZHXKRLL' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value
ACCESS_TOKEN = 'L1CEVSVZRKZVHPTNRV2TZWPBPLP4LEJAG515JFNDTU51SXQJ' # your FourSquare Access Token



In [8]:
address = 'Los Angeles, CA'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of LA are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of  are 34.0536909, -118.242766.


Checking map for population in every neighborhood

In [87]:
m = folium.Map(location=[34.0536909, -118.242766], zoom_start=10)
la_geo = "http://s3-us-west-2.amazonaws.com/boundaries.latimes.com/archive/1.0/boundary-set/la-county-neighborhoods-v5.geojson"

m.choropleth(geo_data=la_geo,
                    data = new_df,
                    columns=['Neighborhood','Population'],
                    key_on='feature.properties.name',
                    fill_color='YlOrRd',
                    fill_opacity=0.7, 
                    line_opacity=0.2,
                    legend_name='Population LA')
m

In this part we are going to get nearby venues to each of neighborhood. We then rank by counting each category and look for the mean of its appearance

In [None]:
# Function for getting nearby venues
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [11]:
la_venues = getNearbyVenues(names=new_df['Neighborhood'], latitudes=new_df['Neigh_latitude'], longitudes=new_df['Neigh_longitude'])


Long Beach
Glendale
Santa Clarita
Pomona
Torrance
Pasadena
East Los Angeles
Lancaster
Palmdale
El Monte
Koreatown
Inglewood
Downey
West Covina
Westlake
Van Nuys
Norwalk
Burbank
South Gate
Compton
Boyle Heights
Carson
Hawthorne
Alhambra
Santa Monica
Whittier
San Pedro
Lakewood
North Hollywood
Hollywood
Sun Valley
Baldwin Park
Pacoima
East Hollywood
Bellflower
Lynwood
Sylmar
Panorama City
Redondo Beach
Pico Rivera
Reseda
Montebello
Huntington Park
Sherman Oaks
Florence-Firestone
Monterey Park
Woodland Hills
Valley Glen
Gardena
Highland Park
Northridge
Diamond Bar
Paramount
South Whittier
Hacienda Heights
Rosemead
Canoga Park
North Hills
Arcadia
Mid-City
Cerritos
Wilmington
Covina
Granada Hills
Glendora
Historic South-Central
Westwood
Rowland Heights
Winnetka
La Mirada
Vermont Square
Azusa
Bell Gardens
Florence
Altadena
Palms
Pico-Union
Encino
Mid-Wilshire
Westchester
Rancho Palos Verdes
La Puente
El Sereno
Central-Alameda
Echo Park
Harbor Gateway
San Gabriel
Culver City
West Hills
Venice

In [12]:
print(la_venues.shape)
la_venues.head()

(4593, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Long Beach,33.769016,-118.191604,L'Opera Ristorante,33.768296,-118.192375,Italian Restaurant
1,Long Beach,33.769016,-118.191604,George's Greek Cafe,33.768746,-118.192493,Greek Restaurant
2,Long Beach,33.769016,-118.191604,Beachwood BBQ & Brewing,33.77007,-118.1911,Brewery
3,Long Beach,33.769016,-118.191604,Gaucho Grill,33.769591,-118.192321,Argentinian Restaurant
4,Long Beach,33.769016,-118.191604,The Promenade,33.76775,-118.190775,Plaza


In [13]:
la_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adams-Normandie,11,11,11,11,11,11
Agoura Hills,8,8,8,8,8,8
Agua Dulce,8,8,8,8,8,8
Alhambra,44,44,44,44,44,44
Alondra Park,1,1,1,1,1,1
...,...,...,...,...,...,...
Willowbrook,4,4,4,4,4,4
Wilmington,13,13,13,13,13,13
Windsor Square,39,39,39,39,39,39
Winnetka,12,12,12,12,12,12


In [14]:
print('There are {} uniques categories.'.format(len(la_venues['Venue Category'].unique())))

There are 351 uniques categories.


In [15]:
# one hot encoding
la_onehot = pd.get_dummies(la_venues[['Venue Category']], prefix="", prefix_sep="")


# add neighborhood column back to dataframe
la_onehot['NeighborhoodName'] = la_venues['Neighborhood'] 

# move neighborhood column to the first column
# alternative: la_onehot.insert(0, 'Neighborhood', la_onehot.pop('Neighborhood'))
fixed_columns = [la_onehot.columns[-1]] + list(la_onehot.columns[:-1])
la_onehot = la_onehot[fixed_columns]

la_onehot.head()


Unnamed: 0,NeighborhoodName,ATM,Accessories Store,Adult Boutique,African Restaurant,Airport,American Restaurant,Amphitheater,Andhra Restaurant,Antique Shop,...,Watch Shop,Water Park,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio
0,Long Beach,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Long Beach,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Long Beach,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Long Beach,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Long Beach,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [16]:
la_grouped = la_onehot.groupby('NeighborhoodName').mean().reset_index()
la_grouped

Unnamed: 0,NeighborhoodName,ATM,Accessories Store,Adult Boutique,African Restaurant,Airport,American Restaurant,Amphitheater,Andhra Restaurant,Antique Shop,...,Watch Shop,Water Park,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio
0,Adams-Normandie,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000
1,Agoura Hills,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000
2,Agua Dulce,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000
3,Alhambra,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.022727,0.0,0.000000
4,Alondra Park,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
236,Willowbrook,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000
237,Wilmington,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000
238,Windsor Square,0.025641,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000
239,Winnetka,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000


In [17]:
# Function to sort mean values to rank nearby venue categories
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['NeighborhoodName']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['NeighborhoodName'] = la_grouped['NeighborhoodName']

for ind in np.arange(la_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(la_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()


Unnamed: 0,NeighborhoodName,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adams-Normandie,Sushi Restaurant,Convenience Store,Playground,Taco Place,Gas Station,Park,Latin American Restaurant,Grocery Store,Food,Dessert Shop
1,Agoura Hills,Indian Restaurant,Pizza Place,Breakfast Spot,Gym,Deli / Bodega,Pharmacy,Chinese Restaurant,Gas Station,English Restaurant,Escape Room
2,Agua Dulce,Convenience Store,Restaurant,Grocery Store,Gift Shop,Café,Pizza Place,Mexican Restaurant,Bakery,Financial or Legal Service,Event Space
3,Alhambra,Bakery,Burger Joint,Chinese Restaurant,Café,Sushi Restaurant,Ice Cream Shop,Seafood Restaurant,Stationery Store,Thai Restaurant,Tailor Shop
4,Alondra Park,Park,Yoga Studio,Indie Movie Theater,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm


Cluster the nighborhood by the most common venue

In [18]:
# set number of clusters
kclusters = 7

la_grouped_clustering = la_grouped.drop('NeighborhoodName', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(la_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 1, 0, 1, 5, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 2, 0, 1, 0, 0, 1,
       1, 1, 2, 0, 0, 0, 0, 0, 6, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1,
       1, 0, 0, 0, 0, 1, 6, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1,
       0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 0, 1, 0, 0, 0, 0, 0, 1,
       1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1,
       0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1,
       0, 1, 0, 1, 0, 5, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 2, 1, 0,
       1, 0, 1, 1, 1, 1, 0, 4, 1, 1, 1, 1, 1, 0, 3, 2, 0, 2, 1, 0, 0, 1,
       1, 0, 1, 6, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 4, 0, 1,
       0, 1, 4, 1, 0, 1, 1, 5, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0,
       0, 1, 1, 1, 2, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 2, 0, 1, 0, 1],
      dtype=int32)

In [19]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'ClusterLabels', kmeans.labels_)

la_merged = new_df

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
la_merged = la_merged.join(neighborhoods_venues_sorted.set_index('NeighborhoodName'), on='Neighborhood')

la_merged.head() # check the last columns

Unnamed: 0,Neighborhood,Region,Population,Neigh_latitude,Neigh_longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Long Beach,Harbor,462731.0,33.769016,-118.191604,1.0,Hotel,Bar,American Restaurant,Coffee Shop,Seafood Restaurant,Chinese Restaurant,Mexican Restaurant,Brewery,Ice Cream Shop,Plaza
1,Glendale,Verdugos,195047.0,34.146942,-118.247847,1.0,Coffee Shop,Grocery Store,Seafood Restaurant,Discount Store,Clothing Store,Shopping Plaza,Automotive Shop,Thai Restaurant,Paper / Office Supplies Store,Gym / Fitness Center
2,Santa Clarita,Northwest County,157447.0,34.391664,-118.542586,1.0,Gymnastics Gym,Intersection,Park,Pool,Carpet Store,Yoga Studio,Falafel Restaurant,Eastern European Restaurant,Electronics Store,English Restaurant
3,Pomona,Pomona Valley,150269.0,34.055381,-117.75175,1.0,Bar,Mexican Restaurant,Taco Place,Coffee Shop,Nightclub,Sandwich Place,Music Venue,Southern / Soul Food Restaurant,Gastropub,Sushi Restaurant
4,Torrance,South Bay,137933.0,33.835849,-118.340629,1.0,Performing Arts Venue,Pool,Event Space,Theater,Recreation Center,Fondue Restaurant,Falafel Restaurant,Dumpling Restaurant,Eastern European Restaurant,Electronics Store


In [20]:
# drop NaN data
la_merged.dropna(inplace=True)
print(la_merged.shape)

# check cluster labels generated for each row in the dataframe
la_merged['ClusterLabels'].value_counts()

(241, 16)


1.0    130
0.0     92
2.0      7
4.0      4
5.0      4
6.0      3
3.0      1
Name: ClusterLabels, dtype: int64

In [21]:
# create map for neighborhood clustering
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(la_merged['Neigh_latitude'], la_merged['Neigh_longitude'], la_merged['Neighborhood'], la_merged['ClusterLabels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### EXAMINE NEIGHBORHOOD

### CLUSTER 1

In [22]:
la_merged.loc[la_merged['ClusterLabels'] == 0, la_merged.columns[[0] + [2] + list(range(5, la_merged.shape[1]))]].head(20)

Unnamed: 0,Neighborhood,Population,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,East Los Angeles,125897.0,0.0,Bakery,Sporting Goods Shop,Track,Chinese Restaurant,Mexican Restaurant,Furniture / Home Store,Sandwich Place,Yoga Studio,Falafel Restaurant,Electronics Store
8,Palmdale,117432.0,0.0,Fast Food Restaurant,Pizza Place,Supermarket,Burger Joint,Cajun / Creole Restaurant,Sandwich Place,Sushi Restaurant,Japanese Restaurant,Thai Restaurant,Dance Studio
9,El Monte,116249.0,0.0,Mexican Restaurant,Mobile Phone Shop,Fast Food Restaurant,Clothing Store,Coffee Shop,Shoe Store,Thai Restaurant,Frozen Yogurt Shop,Sandwich Place,Grocery Store
14,Westlake,103839.0,0.0,Fast Food Restaurant,Clothing Store,Donut Shop,Music Venue,Supermarket,Coffee Shop,Thai Restaurant,Bed & Breakfast,Sandwich Place,Farmers Market
15,Van Nuys,103770.0,0.0,Convenience Store,Chinese Restaurant,Sandwich Place,Mexican Restaurant,ATM,Restaurant,Bank,Latin American Restaurant,Burger Joint,Thrift / Vintage Store
17,Burbank,100316.0,0.0,Mexican Restaurant,Deli / Bodega,Warehouse Store,Sandwich Place,Asian Restaurant,Optical Shop,Tanning Salon,Fast Food Restaurant,Rental Service,Grocery Store
18,South Gate,96418.0,0.0,Pharmacy,Fast Food Restaurant,Italian Restaurant,Supplement Shop,Arts & Entertainment,Optical Shop,Video Game Store,Discount Store,Mexican Restaurant,Beer Garden
19,Compton,94384.0,0.0,Fast Food Restaurant,Donut Shop,Pizza Place,Video Game Store,Fried Chicken Joint,Sandwich Place,Discount Store,Bank,Dessert Shop,Grocery Store
20,Boyle Heights,92756.0,0.0,Fast Food Restaurant,Mexican Restaurant,Bakery,Grocery Store,ATM,Dive Bar,Fried Chicken Joint,Café,Sushi Restaurant,Bank
22,Hawthorne,86265.0,0.0,Fast Food Restaurant,Mobile Phone Shop,Warehouse Store,Chinese Restaurant,Snack Place,Mexican Restaurant,Fried Chicken Joint,Market,Supplement Shop,Asian Restaurant


### CLUSTER 2

In [23]:
la_merged.loc[la_merged['ClusterLabels'] == 1, la_merged.columns[[0] + [2] + list(range(5, la_merged.shape[1]))]].head(10)

Unnamed: 0,Neighborhood,Population,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Long Beach,462731.0,1.0,Hotel,Bar,American Restaurant,Coffee Shop,Seafood Restaurant,Chinese Restaurant,Mexican Restaurant,Brewery,Ice Cream Shop,Plaza
1,Glendale,195047.0,1.0,Coffee Shop,Grocery Store,Seafood Restaurant,Discount Store,Clothing Store,Shopping Plaza,Automotive Shop,Thai Restaurant,Paper / Office Supplies Store,Gym / Fitness Center
2,Santa Clarita,157447.0,1.0,Gymnastics Gym,Intersection,Park,Pool,Carpet Store,Yoga Studio,Falafel Restaurant,Eastern European Restaurant,Electronics Store,English Restaurant
3,Pomona,150269.0,1.0,Bar,Mexican Restaurant,Taco Place,Coffee Shop,Nightclub,Sandwich Place,Music Venue,Southern / Soul Food Restaurant,Gastropub,Sushi Restaurant
4,Torrance,137933.0,1.0,Performing Arts Venue,Pool,Event Space,Theater,Recreation Center,Fondue Restaurant,Falafel Restaurant,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
5,Pasadena,134941.0,1.0,American Restaurant,Sushi Restaurant,Steakhouse,Coffee Shop,Japanese Restaurant,Clothing Store,Hotel,Burger Joint,Cosmetics Shop,Furniture / Home Store
7,Lancaster,120783.0,1.0,Liquor Store,Pet Store,Bar,Scenic Lookout,Yoga Studio,Farm,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space
10,Koreatown,115070.0,1.0,Korean Restaurant,Café,Restaurant,Japanese Restaurant,Coffee Shop,Ice Cream Shop,Brazilian Restaurant,Bubble Tea Shop,BBQ Joint,Bakery
11,Inglewood,112482.0,1.0,Cupcake Shop,Ice Cream Shop,Bus Station,Caribbean Restaurant,BBQ Joint,Café,Bank,Pizza Place,Mexican Restaurant,Cosmetics Shop
12,Downey,107323.0,1.0,Football Stadium,American Restaurant,Fried Chicken Joint,Track,High School,Shoe Store,Cycle Studio,Dance Studio,Electronics Store,English Restaurant


### CLUSTER 3

In [24]:
la_merged.loc[la_merged['ClusterLabels'] == 2, la_merged.columns[[0] + [2] + list(range(5, la_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Population,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
52,Paramount,55319.0,2.0,Park,Business Service,Liquor Store,Burger Joint,Mexican Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm
67,Rowland Heights,47524.0,2.0,Park,Convenience Store,Video Store,Chinese Restaurant,Soup Place,Taco Place,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room
88,West Hills,38814.0,2.0,Park,Business Service,Yoga Studio,Costume Shop,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm
102,Willowbrook,34336.0,2.0,Convenience Store,Liquor Store,Park,Burger Joint,Yoga Studio,Farmers Market,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space
116,Baldwin Hills/Crenshaw,30123.0,2.0,Music Venue,Park,Yoga Studio,Farmers Market,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant
209,Rolling Hills Estates,8334.0,2.0,Farm,Park,Business Service,Yoga Studio,Costume Shop,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant
220,Beverlywood,6080.0,2.0,Park,Business Service,Yoga Studio,Costume Shop,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm


From clustering above, we can see that there are 3 big cluster (Cluster 1,2,3) and 4 smaller cluster (Cluster 4,5,6,7). In this case we will take a look at 3 biggest cluster that are Cluster 1, 2, and 3.

Los Angeles is a big county, venues are almost fairly distributed causing some cluster looks a bit similar. Although their similarity, we can derive some differences too. Here are description for cluster analysis:

Cluster1 : Having restaurant as its common venue. Good place to visit at day. Place to shop for specific hobbies or interest such as automotive, mobile, video games or vintage shopping.

Cluster 2: Place for various activities and having various choice of restaurants. Having lots of good place for fun such as, arts venue, music venue, pool, event space,stadium, event space, and recreation center. Having various kind of place to enjoy food/drink/snack such as, restaurant, cafe, bakery, dessert, coffee/tea shops, and bar. Having the most population among other cluster.

Cluster 3: Have a lots of convenience store and park. Venues for food and activity are well distributed, such as restaurant, field/track/gym/playground, grocery store/supermarket, furniture/hardware store, and event space. Overall, its easy to fulfill daily need and good place for living.

While it is important to open Gym that is close to as many neighborhoods as possible, like at Cluster 1, we also need to find best characteristic of the neighborhood too. After looking at analysis above we can see that Gym is best located at Cluster 3 which has good characteristic place for living. Gym members are likely belong to a facility that is located near where they live. In that case we choose East Los Angeles as the most recommended place to open Gym. It's because it has the most population among its cluster. Also, looking at its characteristic of neighborhood, East Los Angeles has a lot of common facilities/venues which is in line with health caring such as Bakery, Sporting Goods Shop, Track, Sandwich Place, and Farmers Market. Also it doesn't have unhealthy venues such as Bar, Liquor Store, and Fast Food Restaurant. Besides that, we don't see any Gym in the 10th most common venues in this neighborhood, so it's a good place to open Gym while it has less competitors nearby.

#### CHECKING POTENTIAL NEIGHBORHOOD (EAST LOS ANGELES)

In [104]:
neighborhood_latitude = la_merged.loc[6, 'Neigh_latitude'] # neighborhood latitude value
neighborhood_longitude = la_merged.loc[6, 'Neigh_longitude'] # neighborhood longitude value

neighborhood_name = la_merged.loc[6, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

search_query = 'Gym'
radius = 3500

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, neighborhood_latitude, neighborhood_longitude, ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)
results = requests.get(url).json()

# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)

# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered


Latitude and longitude values of East Los Angeles are 34.0239015, -118.1720157.




Unnamed: 0,name,categories,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,address,crossStreet,id
0,hilda solis workout room/gym,Gym,34.039097,-118.172539,"[{'label': 'display', 'lat': 34.03909683227539...",1692,90022.0,US,Los Angeles,CA,United States,"[Los Angeles, CA 90022, United States]",,,50f9f875e4b0ef4b8eb5701d
1,Gym at Noho14,,34.025475,-118.205715,"[{'label': 'display', 'lat': 34.02547538333334...",3114,90023.0,US,North Hollywood,CA,United States,"[Lankershim, North Hollywood, CA 90023, United...",Lankershim,,4e3c2b6f1838cb1b82de53a8
2,House Gym,,34.00518,-118.160471,"[{'label': 'display', 'lat': 34.00518, 'lng': ...",2340,90040.0,US,Commerce,CA,United States,"[Fitzgerald Ave (Harbor Ave), Commerce, CA 900...",Fitzgerald Ave,Harbor Ave,4d3848239516236a58b59451
3,Chunky Club Gym,Gym / Fitness Center,34.029977,-118.141677,"[{'label': 'display', 'lat': 34.02997679611101...",2879,90640.0,US,Montebello,CA,United States,"[765 Simmons (Dewar), Montebello, CA 90640, Un...",765 Simmons,Dewar,4d5f6d95ef378cfaf9d06fa6
4,3RD Street Gym,Gym,34.000955,-118.149472,"[{'label': 'display', 'lat': 34.000955, 'lng':...",3294,,US,,California,United States,"[California, United States]",,,4e854dd9f790ec9ce4abd485
5,Jin's Gym,Athletics & Sports,34.039886,-118.138338,"[{'label': 'display', 'lat': 34.039886, 'lng':...",3580,,US,Commerce,CA,United States,"[Commerce, CA, United States]",,,5168cd04e4b0fc9f2ba3672e
6,Fortrees Gym,Gym / Fitness Center,34.01533,-118.13417,"[{'label': 'display', 'lat': 34.01533, 'lng': ...",3619,,US,Ciudad Del Este,CA,United States,"[Km 6 1/2, Ciudad Del Este, CA, United States]",Km 6 1/2,,5805488338fa60355cfc1c93
7,East Los Angeles College - Women's Gym,Gym,34.042021,-118.148844,"[{'label': 'display', 'lat': 34.04202130419063...",2939,91754.0,US,Monterey Park,CA,United States,"[1301 Avenida Cesar Chavez, Monterey Park, CA ...",1301 Avenida Cesar Chavez,,4d065307c2e53704201dc367
8,Everlast Gym,Gym,33.993889,-118.190075,"[{'label': 'display', 'lat': 33.993889, 'lng':...",3733,,US,Maywood,CA,United States,"[Hidden, Maywood, CA, United States]",Hidden,,4e109f5ed22d759bae5afa98
9,Maywood Boxing Gym,,33.98956,-118.17886,"[{'label': 'display', 'lat': 33.98956, 'lng': ...",3874,90270.0,US,Maywood,CA,United States,"[4747 E 56th St, Maywood, CA 90270, United Sta...",4747 E 56th St,,4d56f229143ca09337f0c1fc


In [115]:
venues_map=map_ela

In [117]:

map_ela = folium.Map(location=[neighborhood_latitude, neighborhood_longitude], zoom_start=14)

# add a red circle marker to represent the East Los Angeles
folium.CircleMarker(
    [neighborhood_latitude, neighborhood_longitude],
    radius=10,
    color='red',
    popup='East Los Angeles',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Gym as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

In [97]:

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
venues = results['response']['groups'][0]['items']
    
nearby_venues_ela = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues_ela =nearby_venues_ela.loc[:, filtered_columns]

# filter the category for each row
nearby_venues_ela['venue.categories'] = nearby_venues_ela.apply(get_category_type, axis=1)

# clean columns
nearby_venues_ela.columns = [col.split(".")[-1] for col in nearby_venues_ela.columns]

nearby_venues_ela.head()

KeyError: 'groups'

In [27]:
# create map of E.L.A. using latitude and longitude values
map_ela = folium.Map(location=[neighborhood_latitude, neighborhood_longitude], zoom_start=14)

# add markers to map
for lat, lng, label in zip(nearby_venues_ela['lat'], nearby_venues_ela['lng'], nearby_venues_ela['categories']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_ela)  
    
map_ela