# The Battle of the Neighborhoods
## - choosing the neighborhood for opening a restraunt in Los Angeles

## Introduction

Los Angeles is a very diverse city and it is one of the financial capitals of USA. So it's a good place to open a restraunt. 

One of the most important things about opening a restraunt is to choose the right location. This decision is based on your target market. Also knowing if there is already lots of competition restraunt in that area is important too. Do some searious research on the target location is really helpful. So I will use FourSquare API to explore the neighborhoods of Los Angeles and do the segmenting and clustering the neighborhoods to analyze the suitable neighborhoods. 

Based on these data analysis, people can choose which location is better for openning a restraunt in Los Angeles region.

## Data

For this project, I used the online wikipedia webpage (https://en.wikipedia.org/wiki/List_of_districts_and_neighborhoods_of_Los_Angeles) to get the neighborhoods of Los Angeles city.

Since I need to extract data from a webpage, I imported library of BeautifulSoup to clean the html file. I slicsed the data from the webpage contained the neighborhoods names and created a pandas dataframe with one column of Neighborhoods for further analysis.

Then I imported nominatim to find the latitude and longitude of the corresponding neighborhood followed by inserting into the previous dataframe with column names of latitude and longitude.

In [1]:
from bs4 import BeautifulSoup
import requests
import csv
import pandas as pd
import numpy as np
import re

# get the table's original data
source = requests.get('https://en.wikipedia.org/wiki/List_of_districts_and_neighborhoods_of_Los_Angeles').text
soup = BeautifulSoup(source,'lxml')

# get all data of the neighborhood(except ones can't find the latitude and longitude, 
# also the neighborhoods that far from the central Los Angeles) and make a list
data = []

table = soup.find_all('div', class_='div-col')

for raw_data  in  table[0].find_all('li'):
    all_data=raw_data.text
    all_data=all_data.rstrip()
    if re.search("/", all_data) or re.search(",", all_data) or re.search("Beachwood Canyon", all_data) or re.search("Holmby Hills", all_data) or re.search("NoHo Arts District", all_data) or re.search("Picfair Village", all_data) or re.search("Yucca Corridor", all_data):
        continue
    data.append(all_data)

for raw_data  in  table[1].find_all('li'):
    all_data=raw_data.text
    all_data=all_data.rstrip()
    if re.search("/", all_data) or re.search(",", all_data) or re.search("Beachwood Canyon", all_data) or re.search("Holmby Hills", all_data) or re.search("NoHo Arts District", all_data) or re.search("Picfair Village", all_data) or re.search("Yucca Corridor", all_data):
        continue
    data.append(all_data)
# creat the dataframe
data_array0=np.array(data)
data_array1 = [x.split("[")[0] for x in data_array0]

not_wanted = ["Arlington Heights","Canterbury Knolls","Del Rey", "Edendale", "Harvard Park", "Ladera", "Nichols Canyon", "Rancho Park", "Sunland","Valley Glen"]
data_array = [x.split("[")[0] for x in data_array1 if x not in not_wanted]

df = pd.DataFrame(data_array, columns ={"Neighborhood"})
df.columns = ['Neighborhood']
df['Latitude']=''
df['Longitude']=''

df.shape

(176, 3)

In [2]:
# use geopy to get the latitude and longitude of neighborhoods of Los Angeles

from geopy.geocoders import Nominatim

for x in range(len(data_array)):
    
    neighborhood_name = df['Neighborhood'][x]
    address = neighborhood_name +', CA' 
    geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
   
    df['Latitude'][x] = latitude
    df['Longitude'][x] = longitude

df

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Angelino Heights,34.0703,-118.255
1,Arleta,34.2413,-118.432
2,Arts District,42.3178,-83.0415
3,Atwater Village,34.1164,-118.256
4,Baldwin Hills,34.0076,-118.351
5,Baldwin Village,47.5956,-57.6407
6,Baldwin Vista,34.4302,-119.732
7,Benedict Canyon,34.0494,-118.4
8,Beverly Crest,35.4645,-80.826
9,Beverly Glen,43.0554,-82.1759


Then I imported geopy library and foium library for making a map. There are totally 176 neighborhoods in this dataframe. I created a map of neighborhoods of Los Angeles city.

In [3]:
import folium

address = 'Los Angeles, CA'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

map_losangeles = folium.Map(location=[latitude, longitude], zoom_start=9)

# add markers to map
for lat, lng, label in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_losangeles)  
    
map_losangeles

### Explore Neighborhoods with Foursquare

After cleaning, selecting and visualizing the data set, I utilized Foursquare API to explore the neighborhoods of Los Angeles. Foursquare API has a massive data set of location data. They crowd sourced their data and then we can build our data set and add venues and explore the region that we are interested in. In my project, I requested the top 100 venues in Los Angeles region with four columns of venues name, venues categories, venues latitude and longitude respectively.

In [4]:
# Define Foursquare Credentials and Version
CLIENT_ID = 'T4QSA02HPBRQJQ33VE21ICHW5M32LR2G5TYY40P41P4LZ0KE' 
CLIENT_SECRET = 'OJRVKHKWGOHBZ000LI25UBXHBOFL5QRK0TFVMYS4JD3LNKA3' 
VERSION = '20180605'

In [5]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [6]:
#let's get the top 100 venues that are in The Beaches within a radius of 500 meters.
radius=500
LIMIT=100

losangeles_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )


Angelino Heights
Arleta
Arts District
Atwater Village
Baldwin Hills
Baldwin Village
Baldwin Vista
Benedict Canyon
Beverly Crest
Beverly Glen
Beverly Grove
Beverly Hills Post Office
Beverly Park
Beverlywood
Boyle Heights
Brentwood
Brentwood Circle
Brentwood Glen
Broadway-Manchester
Brookside
Bunker Hill
Cahuenga Pass
Canoga Park
Carthay
Castle Heights
Central-Alameda
Central City
Century City
Chatsworth
Chesterfield Square
Cheviot Hills
Chinatown
Civic Center
Crenshaw
Crestwood Hills
Cypress Park
Downtown
Eagle Rock
East Gate Bel Air
East Hollywood
Echo Park
El Sereno
Elysian Heights
Elysian Park
Elysian Valley
Encino
Exposition Park
Faircrest Heights 
Fairfax
Fashion District
Financial District
Florence
Flower District
Franklin Hills
Gallery Row
Garvanza
Glassell Park
Gramercy Park
Granada Hills
Green Meadows
Griffith Park
Hancock Park
Harbor City
Harbor Gateway
Harvard Heights
Hermon
Highland Park
Historic Core
Hollywood
Hollywood Dell
Hollywood Hills
Hollywood Hills West
Hyde Park
Je

In [7]:
print(losangeles_venues.shape)
losangeles_venues.head()

(3141, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Angelino Heights,34.070289,-118.254796,Halliwell Manor,34.069329,-118.254165,Performing Arts Venue
1,Angelino Heights,34.070289,-118.254796,Guisados,34.070262,-118.250437,Taco Place
2,Angelino Heights,34.070289,-118.254796,Eightfold Coffee,34.071245,-118.250698,Coffee Shop
3,Angelino Heights,34.070289,-118.254796,The Park's Finest BBQ,34.066519,-118.254291,BBQ Joint
4,Angelino Heights,34.070289,-118.254796,Subliminal Projects,34.07229,-118.250737,Art Gallery


In [8]:
# Let's check how many venues were returned for each neighborhood
losangeles_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Angelino Heights,22,22,22,22,22,22
Arleta,4,4,4,4,4,4
Arts District,44,44,44,44,44,44
Atwater Village,41,41,41,41,41,41
Baldwin Hills,2,2,2,2,2,2
Baldwin Vista,5,5,5,5,5,5
Benedict Canyon,3,3,3,3,3,3
Beverly Crest,1,1,1,1,1,1
Beverly Grove,11,11,11,11,11,11
Beverly Park,22,22,22,22,22,22


In [9]:
# Let's find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(losangeles_venues['Venue Category'].unique())))

There are 323 uniques categories.


## Analyze Each Neighborhood

Before analyzing the data set, I need to create a new data frame including neighborhood name, neighborhood latitude, neighborhood longitude, venues name, venues latitude, venus longitude and venus categories. Then I inputted each time the venues was visited. Now I can group rows by neighborhood name and take the mean of the frequency of occurrence of each venues category. Then the data frames of the top ten most common venues were created.

In [10]:
# one hot encoding
losangeles_onehot = pd.get_dummies(losangeles_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
losangeles_onehot['Neighborhood'] = losangeles_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [losangeles_onehot.columns[-1]] + list(losangeles_onehot.columns[:-1])
losangeles_onehot = losangeles_onehot[fixed_columns]

losangeles_onehot.head()

Unnamed: 0,Yoga Studio,ATM,Accessories Store,Adult Boutique,Airport Lounge,Airport Terminal,American Restaurant,Amphitheater,Antique Shop,Aquarium,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [11]:
# And let's examine the new dataframe size.
losangeles_onehot.shape

(3141, 323)

In [12]:
# let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
losangeles_grouped = losangeles_onehot.groupby('Neighborhood').mean().reset_index()
losangeles_grouped

Unnamed: 0,Neighborhood,Yoga Studio,ATM,Accessories Store,Adult Boutique,Airport Lounge,Airport Terminal,American Restaurant,Amphitheater,Antique Shop,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Angelino Heights,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
1,Arleta,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
2,Arts District,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.022727,0.000000,0.022727,0.000000,0.000000,0.000000,0.000000
3,Atwater Village,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.024390,0.000000,0.000000,0.000000
4,Baldwin Hills,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
5,Baldwin Vista,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
6,Benedict Canyon,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
7,Beverly Crest,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
8,Beverly Grove,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
9,Beverly Park,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.045455,0.0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000


In [13]:
losangeles_grouped.shape

(139, 323)

In [14]:
# Let's print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in losangeles_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = losangeles_grouped[losangeles_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Angelino Heights----
            venue  freq
0      Taco Place  0.09
1  Breakfast Spot  0.05
2     Pizza Place  0.05
3     Record Shop  0.05
4      Boxing Gym  0.05


----Arleta----
                venue  freq
0       Movie Theater  0.50
1       Historic Site  0.25
2  Seafood Restaurant  0.25
3         Yoga Studio  0.00
4     Nature Preserve  0.00


----Arts District----
                       venue  freq
0                        Pub  0.07
1                       Café  0.07
2                Coffee Shop  0.07
3  Middle Eastern Restaurant  0.05
4               Burger Joint  0.05


----Atwater Village----
                    venue  freq
0             Coffee Shop  0.07
1              Restaurant  0.05
2                 Theater  0.05
3               Pet Store  0.05
4  Thrift / Vintage Store  0.05


----Baldwin Hills----
         venue  freq
0         Park   0.5
1        Trail   0.5
2  Yoga Studio   0.0
3   Nail Salon   0.0
4       Office   0.0


----Baldwin Vista----
                venu

         venue  freq
0  Pizza Place  0.50
1       Market  0.25
2   Food Truck  0.25
3  Music Venue  0.00
4       Office  0.00


----Griffith Park----
             venue  freq
0   Scenic Lookout  0.33
1         Tea Room  0.17
2             Park  0.17
3            Trail  0.17
4  Nature Preserve  0.17


----Hancock Park----
         venue  freq
0   Art Museum  0.19
1   Food Truck  0.11
2       Museum  0.07
3  Art Gallery  0.07
4  Coffee Shop  0.06


----Harbor City----
             venue  freq
0      Wings Joint  0.25
1      Gas Station  0.25
2       Taco Place  0.25
3  Thai Restaurant  0.25
4      Yoga Studio  0.00


----Harbor Gateway----
                    venue  freq
0  Furniture / Home Store  0.12
1           Deli / Bodega  0.05
2              Steakhouse  0.05
3     American Restaurant  0.05
4        Tapas Restaurant  0.05


----Harvard Heights----
                       venue  freq
0  Middle Eastern Restaurant   0.5
1            Thai Restaurant   0.5
2                 Nail Salon   

                  venue  freq
0            Shoe Store  0.11
1     Mobile Phone Shop  0.11
2   Filipino Restaurant  0.11
3  Fast Food Restaurant  0.07
4              Pharmacy  0.07


----Park La Brea----
                                      venue  freq
0                                Art Museum  0.13
1  Residential Building (Apartment / Condo)  0.13
2                    Furniture / Home Store  0.07
3                                      Park  0.07
4                      Other Great Outdoors  0.07


----Pico Robertson----
                    venue  freq
0                   Field  0.33
1         Warehouse Store  0.33
2  Furniture / Home Store  0.33
3             Yoga Studio  0.00
4             Music Venue  0.00


----Playa Vista----
        venue  freq
0  Food Truck  0.22
1        Park  0.14
2        Café  0.11
3         Gym  0.08
4        Pool  0.06


----Playa del Rey----
                  venue  freq
0  Gym / Fitness Center  0.18
1                  Park  0.09
2          Liquor Store 

                     venue  freq
0              Pizza Place   1.0
1              Yoga Studio   0.0
2              Music Venue   0.0
3                   Office   0.0
4  North Indian Restaurant   0.0


----Wholesale District----
                  venue  freq
0  Fast Food Restaurant  0.14
1           Coffee Shop  0.14
2        Breakfast Spot  0.10
3           Pizza Place  0.10
4            Food Court  0.05


----Wilmington----
                venue  freq
0        Liquor Store  0.07
1          Restaurant  0.07
2  Seafood Restaurant  0.07
3      Discount Store  0.07
4      Sandwich Place  0.07


----Wilshire Center----
                 venue  freq
0                Hotel   0.2
1  Rental Car Location   0.1
2               Bistro   0.1
3             Bus Line   0.1
4          Coffee Shop   0.1


----Wilshire Park----
                 venue  freq
0  American Restaurant   0.5
1            Pet Store   0.5
2          Yoga Studio   0.0
3           Nail Salon   0.0
4         Optical Shop   0.0


----

In [15]:
# let's write a function to sort the venues in descending order.
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [16]:
# let's create the new dataframe and display the top 10 venues for each neighborhood.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = losangeles_grouped['Neighborhood']

for ind in np.arange(losangeles_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(losangeles_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Angelino Heights,Taco Place,Boxing Gym,Trail,Breakfast Spot,Market,Boutique,Motel,Jewelry Store,Bakery,BBQ Joint
1,Arleta,Movie Theater,Historic Site,Seafood Restaurant,Women's Store,Entertainment Service,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School
2,Arts District,Café,Pub,Coffee Shop,Bar,Nightclub,Burger Joint,Middle Eastern Restaurant,Restaurant,Sushi Restaurant,Bank
3,Atwater Village,Coffee Shop,Pet Store,Gym,Sporting Goods Shop,Theater,Restaurant,Thrift / Vintage Store,Boutique,Mediterranean Restaurant,Italian Restaurant
4,Baldwin Hills,Trail,Park,Women's Store,Entertainment Service,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant


## Cluster Neighborhoods

After I customize the data set, I can apply customer segmentation on this historical data. The algorithms that I used for segmentation is K-means clustering. The number of cluster that I chose is 5. I imported K-means to cluster the data set and imported forlium to visualize the five  clusters.

### Methodology

For my project, I choose K-means clustering as the model to analyze the data that obtained from Foursquare API. K-means clustering is a type of unsupervised learning, which is used when you have unlabeled data (i.e., data without defined categories or groups). The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K. The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided. Data points are clustered based on feature similarity. The results of the K-means clustering algorithm are the centroids of the K clusters.Rather than defining groups before looking at the data, clustering allows you to find and analyze the groups that have formed organically.

The K-means clustering algorithm is used in my project to find groups which have not been explicitly labeled in the data. The results can be used to help make a decision of open a restaurant assumptions. In other words, it can give suggestion about what types of groups exist is good for opening what kind of restaurant. Once the algorithm has been run and the groups are defined, any new data can be easily assigned to the correct group.

In [17]:
from sklearn.cluster import KMeans
# Run k-means to cluster the neighborhood into 5 clusters.
kclusters = 5
losangeles_grouped_clustering = losangeles_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(losangeles_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 1, 0, 0, 0, 0, 0], dtype=int32)

In [18]:
# Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

losangeles_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
losangeles_merged = losangeles_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

losangeles_merged # check the last columns!

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Angelino Heights,34.0703,-118.255,0.0,Taco Place,Boxing Gym,Trail,Breakfast Spot,Market,Boutique,Motel,Jewelry Store,Bakery,BBQ Joint
1,Arleta,34.2413,-118.432,0.0,Movie Theater,Historic Site,Seafood Restaurant,Women's Store,Entertainment Service,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School
2,Arts District,42.3178,-83.0415,0.0,Café,Pub,Coffee Shop,Bar,Nightclub,Burger Joint,Middle Eastern Restaurant,Restaurant,Sushi Restaurant,Bank
3,Atwater Village,34.1164,-118.256,0.0,Coffee Shop,Pet Store,Gym,Sporting Goods Shop,Theater,Restaurant,Thrift / Vintage Store,Boutique,Mediterranean Restaurant,Italian Restaurant
4,Baldwin Hills,34.0076,-118.351,1.0,Trail,Park,Women's Store,Entertainment Service,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant
5,Baldwin Village,47.5956,-57.6407,,,,,,,,,,,
6,Baldwin Vista,34.4302,-119.732,0.0,Liquor Store,Insurance Office,Tanning Salon,Chinese Restaurant,BBQ Joint,Farmers Market,Falafel Restaurant,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant
7,Benedict Canyon,34.0494,-118.4,0.0,Gym Pool,Other Repair Shop,Food Truck,Women's Store,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant
8,Beverly Crest,35.4645,-80.826,0.0,Stadium,Women's Store,Ethiopian Restaurant,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant,Entertainment Service
9,Beverly Glen,43.0554,-82.1759,,,,,,,,,,,


In [19]:
losangeles_merged.dropna(inplace=True)
losangeles_merged = losangeles_merged.astype({"Cluster Labels":int})
losangeles_merged

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Angelino Heights,34.0703,-118.255,0,Taco Place,Boxing Gym,Trail,Breakfast Spot,Market,Boutique,Motel,Jewelry Store,Bakery,BBQ Joint
1,Arleta,34.2413,-118.432,0,Movie Theater,Historic Site,Seafood Restaurant,Women's Store,Entertainment Service,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School
2,Arts District,42.3178,-83.0415,0,Café,Pub,Coffee Shop,Bar,Nightclub,Burger Joint,Middle Eastern Restaurant,Restaurant,Sushi Restaurant,Bank
3,Atwater Village,34.1164,-118.256,0,Coffee Shop,Pet Store,Gym,Sporting Goods Shop,Theater,Restaurant,Thrift / Vintage Store,Boutique,Mediterranean Restaurant,Italian Restaurant
4,Baldwin Hills,34.0076,-118.351,1,Trail,Park,Women's Store,Entertainment Service,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant
6,Baldwin Vista,34.4302,-119.732,0,Liquor Store,Insurance Office,Tanning Salon,Chinese Restaurant,BBQ Joint,Farmers Market,Falafel Restaurant,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant
7,Benedict Canyon,34.0494,-118.4,0,Gym Pool,Other Repair Shop,Food Truck,Women's Store,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant
8,Beverly Crest,35.4645,-80.826,0,Stadium,Women's Store,Ethiopian Restaurant,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant,Entertainment Service
10,Beverly Grove,53.5697,-113.402,0,Fast Food Restaurant,Bakery,Buffet,Pharmacy,Hotel,Coffee Shop,Thrift / Vintage Store,Diner,Grocery Store,Liquor Store
12,Beverly Park,34.0638,-118.265,0,Art Gallery,Thai Restaurant,Park,Filipino Restaurant,Latin American Restaurant,Bubble Tea Shop,Supermarket,Café,Liquor Store,Asian Restaurant


In [20]:
import matplotlib.cm as cm
import matplotlib.colors as colors
# let's visualize the resulting clusters
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=9)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(losangeles_merged['Latitude'], losangeles_merged['Longitude'], losangeles_merged['Neighborhood'], losangeles_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine Clusters

In [21]:
# Cluster 1
losangeles_merged.loc[losangeles_merged['Cluster Labels'] == 0, losangeles_merged.columns[[0] + list(range(4, losangeles_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Angelino Heights,Taco Place,Boxing Gym,Trail,Breakfast Spot,Market,Boutique,Motel,Jewelry Store,Bakery,BBQ Joint
1,Arleta,Movie Theater,Historic Site,Seafood Restaurant,Women's Store,Entertainment Service,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School
2,Arts District,Café,Pub,Coffee Shop,Bar,Nightclub,Burger Joint,Middle Eastern Restaurant,Restaurant,Sushi Restaurant,Bank
3,Atwater Village,Coffee Shop,Pet Store,Gym,Sporting Goods Shop,Theater,Restaurant,Thrift / Vintage Store,Boutique,Mediterranean Restaurant,Italian Restaurant
6,Baldwin Vista,Liquor Store,Insurance Office,Tanning Salon,Chinese Restaurant,BBQ Joint,Farmers Market,Falafel Restaurant,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant
7,Benedict Canyon,Gym Pool,Other Repair Shop,Food Truck,Women's Store,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant
8,Beverly Crest,Stadium,Women's Store,Ethiopian Restaurant,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant,Entertainment Service
10,Beverly Grove,Fast Food Restaurant,Bakery,Buffet,Pharmacy,Hotel,Coffee Shop,Thrift / Vintage Store,Diner,Grocery Store,Liquor Store
12,Beverly Park,Art Gallery,Thai Restaurant,Park,Filipino Restaurant,Latin American Restaurant,Bubble Tea Shop,Supermarket,Café,Liquor Store,Asian Restaurant
13,Beverlywood,Paper / Office Supplies Store,Women's Store,Donut Shop,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant,Entertainment Service


In [22]:
# Cluster 2
losangeles_merged.loc[losangeles_merged['Cluster Labels'] == 1, losangeles_merged.columns[[0] + list(range(4, losangeles_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Baldwin Hills,Trail,Park,Women's Store,Entertainment Service,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant
70,Hollywood Hills,Trail,Women's Store,Entertainment Service,Drugstore,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant
71,Hollywood Hills West,Trail,Women's Store,Entertainment Service,Drugstore,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant
131,South Robertson,Park,Women's Store,Donut Shop,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant,Entertainment Service
151,Vermont Vista,Park,Trail,Women's Store,Entertainment Service,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant


In [23]:
# Cluster 3
losangeles_merged.loc[losangeles_merged['Cluster Labels'] == 2, losangeles_merged.columns[[0] + list(range(4, losangeles_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
51,Florence,Pizza Place,Women's Store,Ethiopian Restaurant,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant,Entertainment Service
57,Gramercy Park,Pizza Place,Market,Food Truck,Women's Store,Ethiopian Restaurant,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School
136,Sylmar,Pizza Place,Mexican Restaurant,Food,Food Truck,Event Space,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant
168,Whitley Heights,Pizza Place,Women's Store,Ethiopian Restaurant,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant,Entertainment Service


In [24]:
# Cluster 4
losangeles_merged.loc[losangeles_merged['Cluster Labels'] == 3, losangeles_merged.columns[[0] + list(range(4, losangeles_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
132,Spaulding Square,Music Venue,Golf Course,Donut Shop,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant,Entertainment Service
140,Toluca Lake,Golf Course,Women's Store,Donut Shop,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant,Entertainment Service


In [25]:
# Cluster 5
losangeles_merged.loc[losangeles_merged['Cluster Labels'] == 4, losangeles_merged.columns[[0] + list(range(4, losangeles_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
130,South Park,Hockey Arena,Concert Hall,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant,Entertainment Service,Ethiopian Restaurant


## Results

I used Foursquare API to explore the venues in each neighborhood. Foursquare is a technology company that built a massive data set of location data. What is interesting about Foursquare is that they were very smart about building their data set. I defined a function with Python to get the venues in a neighborhood with the corresponding latitude and longitude and then put these venues in a data frame. Then I tried to get the top 100 venues that are in each neighborhood within a radius of 500 meters. The results shows that there is 3141 venues in total for 139 neighborhoods. And there are 323 unique venues categories in total.

Then I used K-means clustering algorithm to calculate the obtained venues data set. The first step of calculation is to input the venues obtained from Foursquare into a data frame. The second step is to get the mean of the frequency of occurrence of each category of venues grouped by the neighborhood. Third step is to get the top ten most frequently visited venues in each neighborhood and put them into a data frame. Last step is to import K-Means from sklearn.cluster and do the calculation with k=5. Then I labeled the cluster with corresponding numbers.

Next is to visualize the obtained clusters in a map. After importing matplotlib.cm and matplotlib.colors, I create a map which showed the five clusters with different colors of Los Angeles neighborhoods.

## Discussion

Let's look at the first cluster I have. There are 127 neighborhoods in this cluster. In this cluster the diversity of the type of restaurant is significant. It ranges from fast food like food truck, taco place to different type of restaurant like Chinese restaurant,Japanese restaurant, Seafood restaurant et al. So in these neighborhoods, I think it's the perfect place to open a restaurant. Because fast food or restaurant are on the top listed visiting place. People frequently to eat outside. Also all types of restaurant can be found. That means you have a wide range of choice of the type of restaurant.

There are five neighborhoods in cluster 2. In this cluster, nearly no restraunt in top 4 common venues. Most common venues are public place like trail or park. But in top 10 common venues the dumpling restaurant or English restaurant are dominant. So my suggestion for opening restaurant in these five neighborhoods will be these two types of restaurant.

In cluster 3, four neighborhoods were included. In these four neighborhoods, the first most common venues are all pizza place. This fact indicate that pizza is popular in these two neighborhoods. So open a pizza place is a good suggestion. But one the other side, the competition of pizza place should be considered too. Since other types of restaurant is also acceptable in these neighborhoods, other kinds of restaurant is also be good choice.

In cluster 4, two neighborhoods were studied. The top 1 most common venues are music venues and golf course. The 3rd most common venues are donut shop. So my suggestion for these places to open a restaurant is donut shop. Actually the dumpling retaurant, Eastern European Restaurant and English Restaurant are all popular too.

In cluster 5, there are only one neighborhoods. There are different types of restaurant in this neighborhoods. So opening a type of restaurant that already exis is a good idea. Also since the diversity of restaurant in this neighborhood, opening a new type of restaurant maybe a good choice too.

## Conclusions

In this project, I analyzed the neighborhoods of Los Angeles with data set that obtained from Foursquare API. I chose k-mean clustering algorithm to built the model. And I visualized the results by creating a map. The model that I made can be very useful in helping people choose how to open a certain type of restaurant in the better neighborhood. For example, I create five clusters for all neighborhoods in Los Angeles region. For each cluster, the venues showed the most common venues they visited recently. So we can decide what kind of restaurant we can open in certain place.