<a href="https://colab.research.google.com/github/heyl-steve/Coursera_Capstone/blob/main/Heyl_Capstone_Tattoo_Shop.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1>Select Location for a Tattoo Shop in Los Angeles</h1>
<h2>Heyl - Capstone Project</h2>

---
<h3>Description of the problem:</h3> Where to locate a tattoo business in Los Angeles

**Scenario**: The owner of a (hypothetical) tattoo shop in San Diego wants to expand into Los Angeles.

**Background**: The tattoo artist started her business some years ago in downtown San Diego. After a few years in that location, she moved to a building next to an LGBTQ+ bar and the business grew rapidly. As the shop's reputation grew among her LGBTQ+ clients, she found a percentage of them were coming from the Los Angeles areas, so she has decided to open a second shop there.
She is looking for a location in Los Angeles that is similar to the neighborhood where her San Diego shop is located.

<h3>Data Needed:</h3> Current location in San Diego, locations of LGBT+ bars in Los Angeles, & FourSquare data about nearby businesses, etc.

**Approach to solving the problem:** I'm looking for locations in Los Angeles that are "similar" to the current location in San Diego. Since a major component of the location in San Diego is proximity to an LGBTQ+ bar, I will assign "neighborhoods" in Los Angeles centered on LGBTQ+ bars and cluster them based on the other venues in each "neighborhood". 
To determine which of these are "similar" to the current location, I will add the current location to the dataframe when creating the clusters. Thus the potential location(s) for the new shop in Los Angeles will be those that cluster with the location of the current shop in San Diego.
Finally, I will map the resulting location(s) in Los Angeles.

Note: Since some LGBTQ+ bars are located near each other, some "nearby venues" may be associated with more than one bar.
Note: This is a hypothetical situation, but the LGBTQ+ bars are real.

**Acquiring the data (San Diego):** I will choose an address next to existing gay bar in San Diego as the location of the tattoo shop.

**Acquiring the data (Los Angeles):** Although FourSquare has a venue category for "gay bars", I found that many of the venues returned by a FourSquare API call for "gay bars" were not "gay" or "bars". Some of the venues listed by that method included sushi restaurants and nail salons. 
So I decided to use a Google search for "top LGBTQ bars in Los Angeles" and will manually create a CSV file of the names, addresses, and latitude/longitude (see below).
Note: FourSquare does not have a venue category for the more inclusive term "LGBTQ bars".

CSV File
<pre>
id	name	location.lat	location.lng	location.formattedAddress
1	Abbey	34.083711	-118.38534	692 N Robertson Blvd, West Hollywood, CA 90069
2	Hi Tops	34.084729	-118.3852	8933 Santa Monica Blvd, West Hollywood, CA 90069
3	Eagle LA	34.091388	-118.28393	4219 Santa Monica Blvd, Los Angeles, CA 90029
4	Revolver	34.085786	-118.38356	8851 Santa Monica Blvd, West Hollywood, CA 90069
5	Precinct	34.0498726	-118.24934	357 S Broadway, Los Angeles, CA 90013
6	Micky’s	34.085568	-118.383847	8857 Santa Monica Blvd, West Hollywood, CA 90069
7	Fubar	34.090818	-118.364593	7994 Santa Monica Blvd, West Hollywood, CA 90046
8	Flaming Saddles	34.915184	-118.147613	8811 Santa Monica Blvd, West Hollywood, CA 90069
9	Faultline	34.083667	-118.292434	4216 Melrose Ave, Los Angeles, CA 90029
10	Akbar	34.095938	-118.28421	4356 Sunset Blvd, Los Angeles, CA 90029
11	Redline	34.04503	-118.248992	131 E 6th St, Los Angeles, CA 90014
13	Bullet Bar	34.17226	-118.360121	10522 Burbank Blvd, North Hollywood, CA 91601
14	C-frenz	34.198546	-118.535844	7026 Reseda Blvd, Reseda, CA 91335
15	TigerHeat	34.102916	-118.327129	1735 Vine St, Los Angeles, CA 90028
16	Club Tempo	34.090512	-118.310203	5520 Santa Monica Blvd, Los Angeles, CA 90038
17	New Jalisco Bar	34.050304	-118.245458	245 S Main St, Los Angeles, CA 90012
18	Silver Platter	34.0594434	-118.283553	2700 W 7th St, Los Angeles, CA 90057
19	The Gay Bar	34.187419	-118.448479	Sherman Oaks, CA 91423, United States
20	Gayle’s Bar	33.855141	-118.385626	Redondo Beach, CA 90277, United States
21	The Trashy Lad	34.221655	-118.46694	Los Angeles, CA, United States
</pre>
---


In [None]:
# import libaries we need to get started

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

!pip install geopy #for FourSquare API

!pip install folium
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [None]:
# Lat & Long of Los Angeles, CA = (34.0522, -118.2437)
la_latitude = 34.0522
la_longitude = -118.2437

<h2>Step 1 - Get list of potential locations in Los Angeles and current location in San Diego</h2>


In [None]:
# Lat & Long of current location in San Diego (North Park)
sd_latitude = 32.74818
sd_longitude = -117.12864

In [None]:
# as noted in the Scenario, we are interested in locations near LGBT bars
# read list of selected LGBT bars from a CSV file on my google drive
url = 'https://drive.google.com/file/d/1HU4ErMJt4yJgjoWgCpwLYdbyJNIWplj7/view?usp=sharing'
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]

df_bars = pd.read_csv(path)
df_bars.head()

Unnamed: 0,id,name,location.lat,location.lng,location.formattedAddress
0,1,Abbey,34.083711,-118.38534,"692 N Robertson Blvd, West Hollywood, CA 90069"
1,2,Hi Tops,34.084729,-118.3852,"8933 Santa Monica Blvd, West Hollywood, CA 90069"
2,3,Eagle LA,34.091388,-118.28393,"4219 Santa Monica Blvd, Los Angeles, CA 90029"
3,4,Revolver,34.085786,-118.38356,"8851 Santa Monica Blvd, West Hollywood, CA 90069"
4,5,Precinct,34.049873,-118.24934,"357 S Broadway, Los Angeles, CA 90013"


In [None]:
# create map of LA LGBT bars using latitude and longitude values 
map_la_lgbtq = folium.Map(location=[la_latitude, la_longitude], zoom_start=11)

# add markers to map
for lat, lng, bar_name, bar_addr in zip(df_bars['location.lat'], df_bars['location.lng'], df_bars['name'], df_bars['location.formattedAddress']):
    label = '{}, {}'.format(bar_name, bar_addr)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_la_lgbtq)  

    
map_la_lgbtq

In [None]:
# add the current location of the business (defined above) to the dataframe in row 0
# note: we will exclude row 0 when we map the clusters
df_bars.loc[-1] = [0, 'Current_Location_SD', sd_latitude, sd_longitude, 'Hypothetical Tattoos - Address in San Diego']
df_bars.index = df_bars.index + 1  # shifting index
df_bars.sort_index(inplace=True)

df_bars.head()

Unnamed: 0,id,name,location.lat,location.lng,location.formattedAddress
0,0,Current_Location_SD,32.74818,-117.12864,Hypothetical Tattoos - Address in San Diego
1,1,Abbey,34.083711,-118.38534,"692 N Robertson Blvd, West Hollywood, CA 90069"
2,2,Hi Tops,34.084729,-118.3852,"8933 Santa Monica Blvd, West Hollywood, CA 90069"
3,3,Eagle LA,34.091388,-118.28393,"4219 Santa Monica Blvd, Los Angeles, CA 90029"
4,4,Revolver,34.085786,-118.38356,"8851 Santa Monica Blvd, West Hollywood, CA 90069"


<h3>Step 1 Complete</h3>

<h2>Step 2 - Get list of nearby venues for each location in the dataframe</h2>

In [None]:
# import libaries we need to for exploring the locations

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas import json_normalize # tranform JSON file into a pandas dataframe


In [None]:
# set FourSquare defaults
CLIENT_ID = 'OYH3I15E0DNFHKA003BYXMSTBHDAEVAK0N14YMI4AIHPC55K' # Foursquare ID
CLIENT_SECRET = 'QGVECOVC25B5TBBJW0RCKOUZUMNXHNOMVEHGMPYKM1YGGJUZ' # Foursquare Secret
ACCESS_TOKEN = '' # must be defined, but we use the ID and secret
VERSION = '20180605' # Foursquare API version
LIMIT = 350 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: OYH3I15E0DNFHKA003BYXMSTBHDAEVAK0N14YMI4AIHPC55K
CLIENT_SECRET:QGVECOVC25B5TBBJW0RCKOUZUMNXHNOMVEHGMPYKM1YGGJUZ


<h4>2.1 - Define some functions we will need</h4>

In [None]:
# we're going to need a function that extracts the category of a venue from the json that FourSquare gives us
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [None]:
# we're going to need a function that gives us the venues near each location of interest
# assume radius of 500 meters from each location we're using
# there will be some overlap since some of the bars are within 500 meters of each other
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Bar_Location', 
                  'Location Latitude', 
                  'Location Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

<h4>2.2 - Explore the locations</h4>

In [None]:
# build a dataframe of all the venues that are near each location
# remember that one location is in San Diego
df_venues_near_bar = getNearbyVenues(names=df_bars['name'], latitudes=df_bars['location.lat'], longitudes=df_bars['location.lng'])
df_venues_near_bar.groupby('Bar_Location').count()

Unnamed: 0_level_0,Location Latitude,Location Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Bar_Location,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abbey,97,97,97,97,97,97
Akbar,22,22,22,22,22,22
Bullet Bar,25,25,25,25,25,25
C-frenz,30,30,30,30,30,30
Club Tempo,35,35,35,35,35,35
Current_Location_SD,61,61,61,61,61,61
Eagle LA,34,34,34,34,34,34
Faultline,28,28,28,28,28,28
Fubar,29,29,29,29,29,29
Gayle’s Bar,12,12,12,12,12,12


In [None]:
print("There are {} venues near the locations indicated".format(len(df_venues_near_bar)))
print('There are {} unique categories.'.format(len(df_venues_near_bar['Venue Category'].unique())))

There are 966 venues near the locations indicated
There are 184 unique categories.


In [None]:
df_venues_near_bar.head()


Unnamed: 0,Bar_Location,Location Latitude,Location Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Current_Location_SD,32.74818,-117.12864,Pigment,32.74753,-117.129932,Accessories Store
1,Current_Location_SD,32.74818,-117.12864,Tribute Pizza,32.74749,-117.128068,Pizza Place
2,Current_Location_SD,32.74818,-117.12864,Dark Horse Coffee Roasters,32.747342,-117.130323,Coffee Shop
3,Current_Location_SD,32.74818,-117.12864,FatBoy's Cornerstore & Deli,32.748582,-117.128778,Liquor Store
4,Current_Location_SD,32.74818,-117.12864,URBN Coal Fired Pizza,32.748396,-117.127313,Pizza Place


<h3>Step 2 Complete</h3>

<h2>Step 3 - Find most common venue categories near each location</h2>

<h4>3.1 - Define some functions we'll need</h4>

In [None]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

<h4>3.2 - Get venue categories for each location</h4>

In [None]:
# convert 'venue category' column to multiple one hot encoding columns
locations_onehot = pd.get_dummies(df_venues_near_bar[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
locations_onehot['Bar_Location'] = df_venues_near_bar['Bar_Location'] 

# move neighborhood column to the first column
fixed_columns = [locations_onehot.columns[-1]] + list(locations_onehot.columns[:-1])
locations_onehot = locations_onehot[fixed_columns]

print(locations_onehot.shape)
locations_onehot.head()

(966, 185)


Unnamed: 0,Bar_Location,ATM,Accessories Store,Adult Boutique,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Astrologer,Athletics & Sports,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Beer Bar,Beer Store,Big Box Store,Board Shop,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Cafeteria,Café,Cajun / Creole Restaurant,Candy Store,Cantonese Restaurant,Cheese Shop,Chinese Restaurant,Church,Clothing Store,Cocktail Bar,Coffee Shop,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Design Studio,Dessert Shop,Diner,Discount Store,Distillery,Dive Bar,Dog Run,Donut Shop,Electronics Store,Escape Room,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Travel,Gift Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Health & Beauty Service,Historic Site,Hookah Bar,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Insurance Office,Intersection,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Latin American Restaurant,Lingerie Store,Liquor Store,Locksmith,Lounge,Marijuana Dispensary,Market,Martial Arts School,Massage Studio,Medical School,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Movie Theater,Music School,Music Venue,Nail Salon,Neighborhood,New American Restaurant,Nightclub,Noodle House,North Indian Restaurant,Office,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Plaza,Poke Place,Pool,Pool Hall,Ramen Restaurant,Record Shop,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Rock Club,Roof Deck,Russian Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,School,Seafood Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,South American Restaurant,Spa,Spanish Restaurant,Speakeasy,Steakhouse,Storage Facility,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Thai Restaurant,Theater,Thrift / Vintage Store,Tiki Bar,Toy / Game Store,Trade School,Train Station,Udon Restaurant,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wings Joint,Yoga Studio
0,Current_Location_SD,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Current_Location_SD,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Current_Location_SD,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Current_Location_SD,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Current_Location_SD,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


<h4>3.3 - Group venue categories for each location</h4>

In [None]:
# take the mean of the frequency of occurance of each category for each location of interest
# and group the rows
locations_grouped = locations_onehot.groupby('Bar_Location').mean().reset_index()
locations_grouped.head()

Unnamed: 0,Bar_Location,ATM,Accessories Store,Adult Boutique,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Astrologer,Athletics & Sports,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Beer Bar,Beer Store,Big Box Store,Board Shop,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Cafeteria,Café,Cajun / Creole Restaurant,Candy Store,Cantonese Restaurant,Cheese Shop,Chinese Restaurant,Church,Clothing Store,Cocktail Bar,Coffee Shop,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Design Studio,Dessert Shop,Diner,Discount Store,Distillery,Dive Bar,Dog Run,Donut Shop,Electronics Store,Escape Room,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Travel,Gift Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Health & Beauty Service,Historic Site,Hookah Bar,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Insurance Office,Intersection,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Latin American Restaurant,Lingerie Store,Liquor Store,Locksmith,Lounge,Marijuana Dispensary,Market,Martial Arts School,Massage Studio,Medical School,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Movie Theater,Music School,Music Venue,Nail Salon,Neighborhood,New American Restaurant,Nightclub,Noodle House,North Indian Restaurant,Office,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Plaza,Poke Place,Pool,Pool Hall,Ramen Restaurant,Record Shop,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Rock Club,Roof Deck,Russian Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,School,Seafood Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,South American Restaurant,Spa,Spanish Restaurant,Speakeasy,Steakhouse,Storage Facility,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Thai Restaurant,Theater,Thrift / Vintage Store,Tiki Bar,Toy / Game Store,Trade School,Train Station,Udon Restaurant,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wings Joint,Yoga Studio
0,Abbey,0.010309,0.0,0.0,0.020619,0.0,0.010309,0.010309,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010309,0.020619,0.0,0.0,0.0,0.0,0.010309,0.041237,0.010309,0.0,0.0,0.0,0.0,0.020619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030928,0.030928,0.051546,0.0,0.0,0.0,0.010309,0.0,0.0,0.0,0.0,0.0,0.010309,0.020619,0.0,0.0,0.0,0.010309,0.010309,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020619,0.0,0.010309,0.0,0.0,0.010309,0.0,0.010309,0.020619,0.0,0.0,0.0,0.0,0.123711,0.0,0.0,0.010309,0.010309,0.020619,0.010309,0.0,0.0,0.010309,0.0,0.0,0.030928,0.0,0.0,0.010309,0.0,0.0,0.0,0.010309,0.0,0.020619,0.010309,0.0,0.0,0.010309,0.0,0.010309,0.0,0.0,0.0,0.0,0.0,0.010309,0.0,0.0,0.0,0.010309,0.0,0.0,0.0,0.030928,0.0,0.0,0.0,0.0,0.0,0.010309,0.0,0.0,0.010309,0.020619,0.0,0.010309,0.0,0.020619,0.0,0.0,0.0,0.030928,0.0,0.010309,0.010309,0.0,0.0,0.0,0.0,0.0,0.010309,0.0,0.010309,0.0,0.010309,0.010309,0.0,0.0,0.010309,0.0,0.0,0.010309,0.0,0.010309,0.0,0.0,0.0,0.0,0.0,0.010309,0.0,0.0,0.0,0.020619,0.010309,0.0,0.010309,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Akbar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.045455
2,Bullet Bar,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0
3,C-frenz,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0
4,Club Tempo,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.057143,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.057143,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.085714,0.0,0.028571,0.0,0.0,0.0,0.0,0.057143,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.085714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.028571,0.057143,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.028571,0.0,0.0,0.057143,0.0,0.0,0.028571,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0


<h4>3.4 - Create a dataframe with each location and the most common venue categories nearby</h4>

In [None]:
num_top_venues = 10

ordinal_suffixes = ['st', 'nd', 'rd']

# create columns for the data frame
# NOTE - this only works for num_top_venues < 20
columns = ['Bar_Location']
# create column names 1st, 2nd, etc.
for indx in np.arange(num_top_venues):
    try:
        # first three will succeed
        columns.append('{}{} Most Common Venue'.format(indx+1, num_top_venues[indx]))
    except:
        # the rest will fall here
        columns.append('{}th Most Common Venue'.format(indx+1))

# create a new dataframe
locations_venue_types_sorted = pd.DataFrame(columns=columns)
locations_venue_types_sorted['Bar_Location'] = locations_grouped['Bar_Location']

for indx in np.arange(locations_grouped.shape[0]):
    locations_venue_types_sorted.iloc[indx, 1:] = return_most_common_venues(locations_grouped.iloc[indx, :], num_top_venues)

locations_venue_types_sorted.head(6)

Unnamed: 0,Bar_Location,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbey,Gay Bar,Coffee Shop,Boutique,Pizza Place,Hotel,Cocktail Bar,Clothing Store,Mexican Restaurant,Gym,Burger Joint
1,Akbar,Bar,Coffee Shop,Thai Restaurant,Yoga Studio,Gymnastics Gym,Grocery Store,Intersection,Italian Restaurant,Garden Center,Men's Store
2,Bullet Bar,Music Venue,Martial Arts School,Convenience Store,Furniture / Home Store,Pharmacy,Music School,Dive Bar,Coffee Shop,Church,Mexican Restaurant
3,C-frenz,Vietnamese Restaurant,Gym,Fast Food Restaurant,Chinese Restaurant,Pharmacy,Massage Studio,Fried Chicken Joint,Latin American Restaurant,Locksmith,Sandwich Place
4,Club Tempo,Fast Food Restaurant,Grocery Store,Bakery,Latin American Restaurant,Mexican Restaurant,Food Truck,Coffee Shop,Market,Flea Market,Café
5,Current_Location_SD,Coffee Shop,Brewery,Pizza Place,Breakfast Spot,Mexican Restaurant,Pharmacy,Music Venue,Café,Yoga Studio,Beer Bar


<h3>Step 3 Complete</h3>

<h2>Step 4 - Cluster each location by most common venues</h2>

<h4>4.1 - Create Clusters</h4>

In [None]:
# import libaries we need to for clustering 

# import k-means from clustering stage
from sklearn.cluster import KMeans

In [None]:
# set number of clusters
# note: I ran this for values of kclusters between 3 and 15, settled on 8
# when kclusters = 3, 14 addresses shared the same cluster as the San Diego location
# when kclusters = 4, 13 addresses shared the same cluster as the San Diego location
# when kclusters = 5, 11 addresses shared the same cluster as the San Diego location
# when kclusters = 6, 8 addresses shared the same cluster as the San Diego location
# when kclusters = 7, 9 addresses shared the same cluster as the San Diego location
# when kclusters = 8, 2 addresses shared the same cluster as the San Diego location
# when kclusters = 9, 7 addresses shared the same cluster as the San Diego location
# when kclusters = 10, 6 addresses shared the same cluster as the San Diego location
# when kclusters = 11, 5 addresses shared the same cluster as the San Diego location
# when kclusters = 12, 4 addresses shared the same cluster as the San Diego location
# when kclusters = 13, 4 addresses shared the same cluster as the San Diego location
# when kclusters = 14, 4 addresses shared the same cluster as the San Diego location
# when kclusters = 15, 0 addresses shared the same cluster as the San Diego location

kclusters = 8

location_grouped_clustering = locations_grouped.drop('Bar_Location', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(location_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 7, 7, 2, 6, 1, 7, 1, 7, 3], dtype=int32)

In [None]:
# add clustering labels
locations_venue_types_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

df_locations_clustered = df_bars

# merge the grouped & clustered locations with the sorted venue list
df_locations_clustered = df_locations_clustered.join(locations_venue_types_sorted.set_index('Bar_Location'), on='name')

df_locations_clustered.head()

Unnamed: 0,id,name,location.lat,location.lng,location.formattedAddress,Cluster Labels,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,Current_Location_SD,32.74818,-117.12864,Hypothetical Tattoos - Address in San Diego,1.0,Coffee Shop,Brewery,Pizza Place,Breakfast Spot,Mexican Restaurant,Pharmacy,Music Venue,Café,Yoga Studio,Beer Bar
1,1,Abbey,34.083711,-118.38534,"692 N Robertson Blvd, West Hollywood, CA 90069",0.0,Gay Bar,Coffee Shop,Boutique,Pizza Place,Hotel,Cocktail Bar,Clothing Store,Mexican Restaurant,Gym,Burger Joint
2,2,Hi Tops,34.084729,-118.3852,"8933 Santa Monica Blvd, West Hollywood, CA 90069",0.0,Gay Bar,Coffee Shop,Boutique,Mexican Restaurant,Burger Joint,Hotel,Gym,Bar,Sushi Restaurant,Park
3,3,Eagle LA,34.091388,-118.28393,"4219 Santa Monica Blvd, Los Angeles, CA 90029",7.0,Coffee Shop,Bar,Vegetarian / Vegan Restaurant,Cocktail Bar,Thai Restaurant,Yoga Studio,French Restaurant,Pet Store,New American Restaurant,Neighborhood
4,4,Revolver,34.085786,-118.38356,"8851 Santa Monica Blvd, West Hollywood, CA 90069",0.0,Gay Bar,Coffee Shop,Burger Joint,Gym,Hotel,Sushi Restaurant,Boutique,Park,New American Restaurant,Bar


<h4>4.2 - Find locations with same cluster as Current San Diego Location</h4>

In [None]:
# find the row containing Current Location SD
df_SD_Cluster_row = df_locations_clustered[df_locations_clustered['name']=='Current_Location_SD']
# what is the cluster of the Current Location SD?
n_SD_Cluster = float(df_SD_Cluster_row['Cluster Labels'])
print("Current Location San Diego cluster = {}".format(n_SD_Cluster))

Current Location San Diego cluster = 1.0


In [None]:
# find all the rows in this cluster
# i.e. these are the locations like the Current Location SD
df_locations_like_SD = df_locations_clustered[df_locations_clustered['Cluster Labels']==n_SD_Cluster]
print("There are {} locations like the one in San Diego".format(df_locations_like_SD['name'].count()))
df_locations_like_SD[['name', 'location.formattedAddress']]

There are 3 locations like the one in San Diego


Unnamed: 0,name,location.formattedAddress
0,Current_Location_SD,Hypothetical Tattoos - Address in San Diego
9,Faultline,"4216 Melrose Ave, Los Angeles, CA 90029"
16,New Jalisco Bar,"245 S Main St, Los Angeles, CA 90012"


In [None]:
# just curious - what are the other locations
# i.e. these are the locations NOT like the Current Location SD
df_locations_not_like_SD = df_locations_clustered[df_locations_clustered['Cluster Labels']!=n_SD_Cluster]
df_locations_not_like_SD[['name', 'location.formattedAddress']]

Unnamed: 0,name,location.formattedAddress
1,Abbey,"692 N Robertson Blvd, West Hollywood, CA 90069"
2,Hi Tops,"8933 Santa Monica Blvd, West Hollywood, CA 90069"
3,Eagle LA,"4219 Santa Monica Blvd, Los Angeles, CA 90029"
4,Revolver,"8851 Santa Monica Blvd, West Hollywood, CA 90069"
5,Precinct,"357 S Broadway, Los Angeles, CA 90013"
6,Micky’s,"8857 Santa Monica Blvd, West Hollywood, CA 90069"
7,Fubar,"7994 Santa Monica Blvd, West Hollywood, CA 90046"
8,Flaming Saddles,"8811 Santa Monica Blvd, West Hollywood, CA 90069"
10,Akbar,"4356 Sunset Blvd, Los Angeles, CA 90029"
11,Redline,"131 E 6th St, Los Angeles, CA 90014"


<h3>Step 4 Complete</h3>

<h2>Step 5 - Map the clusters in Los Angeles</h2>

In [None]:
# import libaries we need to for plotting

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors


<h4>5.1 - Remove the Current San Diego Location</h4>
We only want to map the clusters in Los Angeles

In [None]:
df_locations_like_SD.shape

(3, 16)

In [None]:
df_locations_like_SD = df_locations_like_SD[df_locations_like_SD.name!='Current_Location_SD']

In [None]:
print(df_locations_like_SD.shape)
df_locations_like_SD.head()

(2, 16)


Unnamed: 0,id,name,location.lat,location.lng,location.formattedAddress,Cluster Labels,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,9,Faultline,34.083667,-118.292434,"4216 Melrose Ave, Los Angeles, CA 90029",1.0,Pizza Place,Coffee Shop,Mexican Restaurant,Theater,Pharmacy,Intersection,Music Venue,Pool Hall,Record Shop,Fast Food Restaurant
16,17,New Jalisco Bar,34.050304,-118.245458,"245 S Main St, Los Angeles, CA 90012",1.0,Sushi Restaurant,Coffee Shop,Japanese Restaurant,Bar,Clothing Store,Breakfast Spot,Shopping Mall,Mexican Restaurant,Boutique,Candy Store


<h4>5.2 - Map the potential locations in Los Angeles</h4>

In [None]:
# create map of potential locations in Los Angeles for the new shop
map_potential_la_locations = folium.Map(location=[la_latitude, la_longitude], zoom_start=13)

# add markers to map
for lat, lng, label in zip(df_locations_like_SD['location.lat'], df_locations_like_SD['location.lng'], df_locations_like_SD['name'] + " - " + df_locations_like_SD['location.formattedAddress']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_potential_la_locations)  
    
map_potential_la_locations

<h3>Step 5 Complete</h3>

<h2>Conclusion</h2>

The new location should be either **downtown near the New Jalisco Bar** or in **East Hollywood/Silver Lake near the Faultline bar**. The final decision may come down to the relative rents, parking, and transit access between the two locations. These are factors that are best evaluated with onsite visits. 