# Battle of the Neighborhoods - New Dispensary

# Introduction

With the legal landscape rapidly changing, regulation and decriminalization of Marijuana is significantly increasing.  A substance that was once considered dangerous through urban legends and propaganda has been debunked and is being touted for a combination of its medicinal benefits, increased safety vs alternative substances, plant by-products and more.  In the changing environment, it is the perfect time to invest in this growing $16 Billion industry.  
  
The goal of this project is to determine the optimal location for opening a new dispensary in cities in which marijuana is recreationally legal.  The initial location and underlying data may then be used to find a storefront, establish a supply chain, build marketing campaign and develop a fiscal business plan to get investor contribution.

# Data 

## To determine the best location for opening a number of factors will be looked at:

#### 2019 State Laws
###### https://disa.com/map-of-marijuana-legality-by-state

#### 2019 Local Venue Data
###### https://foursquare.com

#### 2017-2018 Population and Crime Statistics  
###### https://ucr.fbi.gov/crime-in-the-u.s/2018/preliminary-report/tables/table-4/table-4.xls/view  

#### 2019 State Tax Rate
###### https://taxfoundation.org/sales-tax-rates-2019/ 

#### List of US States
###### https://simple.wikipedia.org/wiki/List_of_U.S._states

- First, the legality will be assessed for each state and city.  
- The legality will be cross referenced against a US-population database to find the most densely populated cities with population > 100,000 people.
- Locations of current dispensaries within 1 mile (1.609 km) will be checked.  
 - If a dispensary exists, the location will be ruled out.  
- Another factor that will be assessed is the demographic of the area.  
 - Cities or neighborhoods with children and families will not be looked at in order to keep them family-friendly.  
- Since the goal of this project is to open a recreational dispensary 
 - I will be looking in areas with active nightlife and adult activities 
 - I will be looking for places with nearby fast-food, pizza restaurants and convenience stores.  Establishments are often frequented by target clientele
- I will be verifying crime rates in cities and aim for lower property crime rate locations
 - Many dispensaries have been robbed since installation due to the nature of the product.  Try to minimize this risk as much as possible.

  
#### Stipulations and Assumptions

1. The following states did not have crime data available and were not included in this assessment.  This was not a major problem since none of these states had recreational marijuana legal except for Vermont.   
    - Delaware
    - Maine
    - Mississippi
    - Vermont*  
    - West Virginia 
    - Wyoming 
 
1. Although a legalized area, District of Columbia was not considered in this report since it is not a state and therefore did not have state data available. 
1. Only property crime was considered.  Violent crimes, although bad, were not taken into account since the risk posed in this endeavor is robbery, burglary or larceny.   
  
1. Since this project is created to get investors and not based on a specific region, only cities with 100,000 residents or more were considered, additionally, anywhere in the United States was considered to be Viable.
 
 


## Methodology 

In order to gather the appropriate data, signficant data cleansing was performed to get information into a usable format, remove bad data, exclude cities that are not in scope and generate a streamlined dataset.

I began with consolidating all data based on state/city.  To do this I took the tax information which had abbreviated state names and combined them with a list of the 50 states from wikipedia.  Once I had the taxes and state names available in its own dataframe I was able to combine it with the crime statistics dataframe.  I also got geospatial data from GeoPy using the city and state names to add to the city database.  Finally, I calculated a crime-index which is a ratio of property crime to population for each city to complete the database.  

Once the database was established, I used the Geo-coordinate data to query the Foursquare server API and gather local venue data for each city.  Cities that had dispensaries were dropped from the dataset.  The venue data was then transformed with one-hot encoding and grouped along venue frequency by the counts in each city.  This data was then used to cluster the cities into 5 different clusters based on what types of venue and overall culture is present in the area.  

I started performing exploratory analysis by creating a map of all of the cities that were in my dataset.  After clustering was performed, a new map was generated color coding the clusters on the map.  Finally, once the target cluster was identified, the crime rate and sales tax were cross referenced to determine the top choice of city to open the dispensary.

In [1]:
import numpy as np # library to handle data in a vectorized manner
import matplotlib as mpl
import matplotlib.cm as cm
import matplotlib.colors as colors
import pandas as pd # library for data analsysis
import json # library to handle JSON files
import requests # library to handle requests
import folium # map rendering library
from sklearn.cluster import KMeans # import k-means from clustering stage
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from bs4 import BeautifulSoup
#!conda install beautifulsoup4 --yes
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

print('Libraries imported.')


Libraries imported.


## Get updated legalization data

In [2]:
url = "disa.com/map-of-marijuana-legality-by-state"
r  = requests.get("http://" +url)
data = r.text
soup = BeautifulSoup(data)

In [3]:
text = []
legalized = []
i=0
for table in soup.find_all('td'):
    text.append([i, table])
    i+=1

for j in range(0,len(text)-1):
    if str(text[j+1][1]) == "<td>Fully Legal</td>":
        s = str(text[j][1])
        state = s[4:-5]
        legalized.append(state.upper())
print(legalized)

['ALASKA', 'CALIFORNIA', 'COLORADO', 'DISTRICT OF COLUMBIA', 'MAINE', 'MASSACHUSETTS', 'MICHIGAN', 'NEVADA', 'OREGON', 'VERMONT', 'WASHINGTON']


## State Sales Tax 

In [4]:
url = "taxfoundation.org/sales-tax-rates-2019/"
r  = requests.get("http://" +url)
data = r.text
soup = BeautifulSoup(data)

In [5]:
text = []

for table in soup.find_all('td'):
    text.append(table)
text.pop(0)

state = []
tax = []

for i in range(0,len(text)):
    if i % 7 == 0:
        s = str(text[i])
        t = str(text[i+1])
        state.append(s[4:-5])
        tax.append(t[4:-5])



In [6]:
for j in range(0,len(tax)):
    try :
        tax[j] = float(tax[j].strip('%'))/100
    except :
        tax[j] = float(0)

In [7]:
for k in range(0,len(tax)):
    try :
        state[k] = state[k].split(' ', 1)[0]
        state[k] = state[k].split('\xa0', 1)[0]
    except :
        next 

## Correct State Names

In [8]:
url = "simple.wikipedia.org/wiki/List_of_U.S._states"
r  = requests.get("http://" +url)
data = r.text
soup = BeautifulSoup(data)

In [9]:
text = []

for table in soup.find_all('area'):
    text.append(str(table))
statelist =[]

for k in range(0,len(text)):
    try :
        text[k] = text[k][11:].split('"', 1)[0]
        text[k] = text[k].split(' CA', 1)[0]
        statelist.append(text[k].upper())
    except :
        next 

In [10]:
statelist.sort()
statelist = list(dict.fromkeys(statelist))

In [11]:
taxes = pd.DataFrame([statelist,tax[0:50]]).transpose()

In [12]:
taxes.rename(columns={0:'State',1:'Tax'},inplace=True)

## Import Crime Statistics 

In [13]:
crime = pd.read_csv(r'crime_stats.csv',skiprows=4)
crime = crime[:-9]

In [14]:
taxes.head()

Unnamed: 0,State,Tax
0,ALABAMA,0.04
1,ALASKA,0.0
2,ARIZONA,0.056
3,ARKANSAS,0.065
4,CALIFORNIA,0.0725


In [15]:
states = list(crime['State'].unique())
del states[1]

In [16]:
crime = crime.merge(taxes, on='State', how='left')

In [17]:
for i in range (1,len(crime)):
    if i%2 != 0:
        crime.loc[i, 'City'] = crime.loc[i-1, 'City']
    if pd.isnull(crime.loc[i,'State']):
        crime.loc[i, 'State'] = crime.loc[i-1, 'State']
        crime.loc[i, 'Tax'] = crime.loc[i-1, 'Tax']

In [18]:
crime.rename(columns={"Population1":"Population","Property \ncrime":"Property Crime", "Larceny-\ntheft":"Theft"}, inplace=True)
crime.drop(columns=["Unnamed: 2","Violent \ncrime","Murder","Rape2", "Aggravated \nassault","Motor \nvehicle \ntheft","Arson3","Unnamed: 14","Burglary","Theft","Robbery"],inplace=True)
crime = crime[:-9]

In [19]:
crime = crime.fillna(0)

In [20]:
crime["Population"] = crime["Population"].str.replace(",","").astype(float)
crime["Property Crime"] = crime["Property Crime"].str.replace(",","").astype(float)

In [21]:
crime = crime[crime['Population'] != 0]

In [22]:
crime['Crime Index'] = crime['Property Crime']/crime['Population']

In [23]:
crime.dropna(axis=0,inplace=True)

In [24]:
crime = crime.reset_index(drop=True)
crime.head()

Unnamed: 0,State,City,Population,Property Crime,Tax,Crime Index
0,ALABAMA,BIRMINGHAM,212178.0,6472.0,0.04,0.030503
1,ALABAMA,MOBILE4,248431.0,6493.0,0.04,0.026136
2,ALABAMA,MONTGOMERY,199099.0,4246.0,0.04,0.021326
3,ALABAMA,TUSCALOOSA,101124.0,1953.0,0.04,0.019313
4,ALASKA,ANCHORAGE,296188.0,7708.0,0.0,0.026024


In [25]:
for i in range(0,len(crime)):
    if crime['State'][i] not in legalized:
        crime.drop([i],axis=0, inplace=True)

### Find the city with lowest  Property Crime City to open a dispensary in each state

In [26]:
crime = crime[crime.City != 'RIALTO5']
crime = crime[crime.City != 'LAS VEGAS METROPOLITAN POLICE DEPARTMENT']

In [27]:
cities = pd.concat([crime['State'],crime['City'],crime['Tax'],crime['Crime Index']],axis=1)

In [28]:
cities.reset_index(inplace=True,drop=True)

In [29]:
cities.head()

Unnamed: 0,State,City,Tax,Crime Index
0,ALASKA,ANCHORAGE,0.0,0.026024
1,CALIFORNIA,ANAHEIM,0.0725,0.013098
2,CALIFORNIA,ANTIOCH,0.0725,0.016686
3,CALIFORNIA,BAKERSFIELD,0.0725,0.019861
4,CALIFORNIA,BERKELEY,0.0725,0.023042


### Get Geospatial Data using GeoPy for City, State

In [30]:
lat = []
lon = []
for i in range(0,len(cities)):
    address = '{}, {}'.format(cities['City'][i],cities['State'][i])
    geolocator = Nominatim(user_agent='capstone')
    location = geolocator.geocode(address)
    lat.append(location.latitude)
    lon.append(location.longitude)

In [31]:
cities = pd.concat([cities,pd.Series(lat),pd.Series(lon)],axis=1)

In [32]:
cities.rename(columns={0:'Latitude',1:'Longitude'},inplace=True)

In [33]:
cities.head()

Unnamed: 0,State,City,Tax,Crime Index,Latitude,Longitude
0,ALASKA,ANCHORAGE,0.0,0.026024,61.216313,-149.894852
1,CALIFORNIA,ANAHEIM,0.0725,0.013098,33.834752,-117.911732
2,CALIFORNIA,ANTIOCH,0.0725,0.016686,38.004921,-121.805789
3,CALIFORNIA,BAKERSFIELD,0.0725,0.019861,35.373871,-119.019464
4,CALIFORNIA,BERKELEY,0.0725,0.023042,37.870839,-122.272864


### Get local venue data for each city from Foursquare API

In [34]:
CLIENT_ID = 'K5HRW4OC5J4EZS14DH2OKQBNZV5GNXIOZKY03QXJ4RDCPTUD' # your Foursquare ID
CLIENT_SECRET = '4ZWZFMI0VAL01GTTVLUJSJOL0YPARUAQB225RMYILABEIQ0Q' # your Foursquare Secret
VERSION = '20190420'
limit = 100
radius = 1609
latitude = 1
longitude = 1
print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)


Your credentials:
CLIENT_ID: K5HRW4OC5J4EZS14DH2OKQBNZV5GNXIOZKY03QXJ4RDCPTUD
CLIENT_SECRET:4ZWZFMI0VAL01GTTVLUJSJOL0YPARUAQB225RMYILABEIQ0Q


In [35]:
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    limit)

In [36]:
results = requests.get(url).json()

In [37]:
def getNearbyVenues(names, latitudes, longitudes, radius=1609):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
                  
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [38]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [39]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]


### Create map of all viable US cities with population over 100,000 and legalized Recreational Marijuana

In [40]:
# create map of Cities using latitude and longitude values
cities_map = folium.Map(location=[39.83, -98.58], zoom_start=4)

# add markers to map
for lat, lng, city, state in zip(cities['Latitude'], cities['Longitude'], cities['City'], cities['State']):
    label = '{}, {}'.format(state, city)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=10,
        popup=label,
        color='green',
        fill=True,
        fill_color='#228B22',
        fill_opacity=0.7,
        parse_html=False).add_to(cities_map)  
    
cities_map

In [41]:
LIMIT = 100
search_query = 'top'
radius = 1609

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
def getNearbyVenues(state, names, latitudes, longitudes, radius=1609):
    
    venues_list=[]
    for state, name, lat, lng in zip(state, names, latitudes, longitudes):
                  
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            state,
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['State',
                  'City', 
                  'City Latitude', 
                  'City Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [43]:
city_venues = getNearbyVenues(state=cities['State'],
                                   names=cities['City'],
                                   latitudes=cities['Latitude'],
                                   longitudes=cities['Longitude'],
                                   radius = 1609
                                  )
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]


In [44]:
city_venues.head()

Unnamed: 0,State,City,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,ALASKA,ANCHORAGE,61.216313,-149.894852,Glacier BrewHouse,61.217719,-149.896839,Brewery
1,ALASKA,ANCHORAGE,61.216313,-149.894852,Humpy's Great Alaskan Alehouse,61.216427,-149.894146,Bar
2,ALASKA,ANCHORAGE,61.216313,-149.894852,Crow's Nest,61.217838,-149.899718,Seafood Restaurant
3,ALASKA,ANCHORAGE,61.216313,-149.894852,49th State Brewing,61.219736,-149.895975,Brewery
4,ALASKA,ANCHORAGE,61.216313,-149.894852,Apple Anchorage 5th Avenue Mall,61.21714,-149.888671,Electronics Store


### Prepare the venue data to perform K-means clustering

In [45]:
# one hot encoding
city_onehot = pd.get_dummies(city_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
city_onehot['State'] = city_venues['State'] 
city_onehot['City'] = city_venues['City'] 

# move neighborhood column to the first column
fixed_columns = [city_onehot.columns[-2]] + [city_onehot.columns[-1]] + list(city_onehot.columns[:-2])
city_onehot = city_onehot[fixed_columns]

city_onehot.head()

Unnamed: 0,State,City,ATM,Accessories Store,Advertising Agency,Afghan Restaurant,African Restaurant,Airport,American Restaurant,Animal Shelter,Antique Shop,Aquarium,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Australian Restaurant,Auto Dealership,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Bath House,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Belgian Restaurant,Big Box Store,Bike Shop,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Buffet,Building,Burger Joint,Burmese Restaurant,Burrito Place,Bus Station,Bus Stop,Business Service,Butcher,Café,Cajun / Creole Restaurant,Cambodian Restaurant,Camera Store,Campground,Candy Store,Cantonese Restaurant,Capitol Building,Caribbean Restaurant,Carpet Store,Casino,Cheese Shop,Chinese Restaurant,Chiropractor,Chocolate Shop,Church,Circus,City Hall,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Academic Building,College Auditorium,College Baseball Diamond,College Basketball Court,College Bookstore,College Gym,College Rec Center,College Theater,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Convention Center,Cosmetics Shop,Creperie,Cuban Restaurant,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Disc Golf,Discount Store,Dive Bar,Doctor's Office,Dog Run,Donburi Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Empanada Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Eye Doctor,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Financial or Legal Service,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Fondue Restaurant,Food,Food & Drink Shop,Food Court,Food Truck,Football Stadium,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Go Kart Track,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Shop,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Herbs & Spices Store,High School,Historic Site,History Museum,Hobby Shop,Hockey Arena,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Inn,Insurance Office,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Knitting Store,Korean Restaurant,Lake,Latin American Restaurant,Laundromat,Lawyer,Library,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Marijuana Dispensary,Market,Martial Arts Dojo,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Motel,Motorcycle Shop,Motorsports Shop,Movie Theater,Multiplex,Museum,Music School,Music Store,Music Venue,Nail Salon,National Park,Nature Preserve,Neighborhood,New American Restaurant,Nightclub,Non-Profit,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Nightlife,Other Repair Shop,Outdoor Sculpture,Paintball Field,Paper / Office Supplies Store,Park,Parking,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Service,Pet Store,Pharmacy,Piano Bar,Pie Shop,Pier,Pilates Studio,Pizza Place,Platform,Playground,Plaza,Poke Place,Pool,Pool Hall,Portuguese Restaurant,Print Shop,Pub,Public Art,Racetrack,Ramen Restaurant,Record Shop,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Resort,Restaurant,Rock Club,Salad Place,Salon / Barbershop,Salvadoran Restaurant,Sandwich Place,Satay Restaurant,Scandinavian Restaurant,Scenic Lookout,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Ski Chalet,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Soccer Stadium,Social Club,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvenir Shop,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,State / Provincial Park,Stationery Store,Steakhouse,Storage Facility,Street Food Gathering,Strip Club,Supermarket,Supplement Shop,Surf Spot,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Tennis Stadium,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Tiki Bar,Tourist Information Center,Toy / Game Store,Track,Track Stadium,Trail,Train Station,Tree,Turkish Restaurant,Udon Restaurant,Used Bookstore,Vape Store,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Yoshoku Restaurant
0,ALASKA,ANCHORAGE,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,ALASKA,ANCHORAGE,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,ALASKA,ANCHORAGE,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,ALASKA,ANCHORAGE,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,ALASKA,ANCHORAGE,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [46]:
city_group = city_onehot.groupby(['State','City']).mean().reset_index()

### Drop cities that already have dispnsaries 

In [47]:
print(city_group.shape)
city_group = city_group[city_group['Marijuana Dispensary'] == 0]
print(city_group.shape)

(105, 387)
(99, 387)


In [48]:
def return_most_common_venues(row, num_top_venues=10):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return  row_categories_sorted.index.values[0:num_top_venues]

### Find top 10 most common venue types for each city

In [49]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['State','City']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['State'] = city_group['State']
neighborhoods_venues_sorted['City'] = city_group['City']

for row in np.arange(city_group.shape[0]):
    neighborhoods_venues_sorted.iloc[row, 2:] = return_most_common_venues(city_group.iloc[row,1:],10)
neighborhoods_venues_sorted.head()

Unnamed: 0,State,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ALASKA,ANCHORAGE,Coffee Shop,Park,Clothing Store,Seafood Restaurant,Bar,Accessories Store,Steakhouse,Sporting Goods Shop,Cosmetics Shop,Pizza Place
1,CALIFORNIA,ANAHEIM,Mexican Restaurant,Coffee Shop,Ice Cream Shop,Indian Restaurant,Convenience Store,Taco Place,Liquor Store,Burger Joint,Brewery,Southern / Soul Food Restaurant
2,CALIFORNIA,ANTIOCH,Fast Food Restaurant,Pizza Place,Mexican Restaurant,Chinese Restaurant,Racetrack,Paintball Field,Bakery,Grocery Store,Sports Bar,Burger Joint
3,CALIFORNIA,BAKERSFIELD,Mexican Restaurant,Coffee Shop,Fast Food Restaurant,Sandwich Place,Bar,Chinese Restaurant,Italian Restaurant,Steakhouse,Breakfast Spot,General Entertainment
4,CALIFORNIA,BERKELEY,Chinese Restaurant,Japanese Restaurant,New American Restaurant,Thai Restaurant,French Restaurant,Café,Yoga Studio,Ice Cream Shop,Coffee Shop,Pizza Place


### Performing K-Means Clustering using 5 clusters

I used SciKit Learn K-Means Clustering unsupervised learning in order to group different cities based on their top 10 most common venues.  Once the clustering was performed, I reviewed the venues in the cluster to classify the clusters:  
  
  cluster 0: Shops
  cluster 1: 

In [50]:
k_means = KMeans(init = "k-means++", n_clusters = 5, n_init = 12)

In [51]:
city_group = city_group.merge(cities,on=['City','State'])

In [52]:
# set number of clusters
kclusters = 5
from sklearn.preprocessing import StandardScaler

city_clusters = city_group.drop(['State','City','Latitude','Longitude'], 1)
city_fit = StandardScaler().fit_transform(city_clusters)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(city_fit)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 4, 1, 1, 1, 1, 1], dtype=int32)

In [53]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster', kmeans.labels_)

city_full = cities

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
city_full = city_full.merge(neighborhoods_venues_sorted, on=['State','City'], how='left')

In [54]:
city_full.dropna(inplace=True, axis=0)

In [55]:
city_full['Cluster'] = city_full['Cluster'].astype(int)

In [56]:
city_full.head()

Unnamed: 0,State,City,Tax,Crime Index,Latitude,Longitude,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ALASKA,ANCHORAGE,0.0,0.026024,61.216313,-149.894852,1,Coffee Shop,Park,Clothing Store,Seafood Restaurant,Bar,Accessories Store,Steakhouse,Sporting Goods Shop,Cosmetics Shop,Pizza Place
1,CALIFORNIA,ANAHEIM,0.0725,0.013098,33.834752,-117.911732,1,Mexican Restaurant,Coffee Shop,Ice Cream Shop,Indian Restaurant,Convenience Store,Taco Place,Liquor Store,Burger Joint,Brewery,Southern / Soul Food Restaurant
2,CALIFORNIA,ANTIOCH,0.0725,0.016686,38.004921,-121.805789,1,Fast Food Restaurant,Pizza Place,Mexican Restaurant,Chinese Restaurant,Racetrack,Paintball Field,Bakery,Grocery Store,Sports Bar,Burger Joint
3,CALIFORNIA,BAKERSFIELD,0.0725,0.019861,35.373871,-119.019464,1,Mexican Restaurant,Coffee Shop,Fast Food Restaurant,Sandwich Place,Bar,Chinese Restaurant,Italian Restaurant,Steakhouse,Breakfast Spot,General Entertainment
4,CALIFORNIA,BERKELEY,0.0725,0.023042,37.870839,-122.272864,4,Chinese Restaurant,Japanese Restaurant,New American Restaurant,Thai Restaurant,French Restaurant,Café,Yoga Studio,Ice Cream Shop,Coffee Shop,Pizza Place


In [57]:
# create map
map_clusters = folium.Map(location=[39.83, -98.58], zoom_start=4)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(city_full['Latitude'], city_full['Longitude'], city_full['City'], city_full['Cluster']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=10,
        popup=label,
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Review Clusters

#### Cluster 0 

In [58]:
city_full.loc[city_full['Cluster'] == 0, city_full.columns[[1] + list(range(2, city_full.shape[1]))]]

Unnamed: 0,City,Tax,Crime Index,Latitude,Longitude,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
51,SAN BERNARDINO,0.0725,0.018655,34.108345,-117.289765,0,Convenience Store,Clothing Store,Discount Store,Fast Food Restaurant,Grocery Store,Pizza Place,Nightclub,Shoe Store,Department Store,Mexican Restaurant


#### Cluster 1

In [59]:
city_full.loc[city_full['Cluster'] == 1, city_full.columns[[1] + list(range(2, city_full.shape[1]))]]

Unnamed: 0,City,Tax,Crime Index,Latitude,Longitude,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ANCHORAGE,0.0,0.026024,61.216313,-149.894852,1,Coffee Shop,Park,Clothing Store,Seafood Restaurant,Bar,Accessories Store,Steakhouse,Sporting Goods Shop,Cosmetics Shop,Pizza Place
1,ANAHEIM,0.0725,0.013098,33.834752,-117.911732,1,Mexican Restaurant,Coffee Shop,Ice Cream Shop,Indian Restaurant,Convenience Store,Taco Place,Liquor Store,Burger Joint,Brewery,Southern / Soul Food Restaurant
2,ANTIOCH,0.0725,0.016686,38.004921,-121.805789,1,Fast Food Restaurant,Pizza Place,Mexican Restaurant,Chinese Restaurant,Racetrack,Paintball Field,Bakery,Grocery Store,Sports Bar,Burger Joint
3,BAKERSFIELD,0.0725,0.019861,35.373871,-119.019464,1,Mexican Restaurant,Coffee Shop,Fast Food Restaurant,Sandwich Place,Bar,Chinese Restaurant,Italian Restaurant,Steakhouse,Breakfast Spot,General Entertainment
5,BURBANK,0.0725,0.014175,34.181648,-118.325855,1,Mexican Restaurant,Sandwich Place,American Restaurant,Pizza Place,Burger Joint,Diner,Bakery,Donut Shop,Deli / Bodega,Pet Store
6,CARLSBAD,0.0725,0.008791,33.158093,-117.350597,1,Beach,Mexican Restaurant,Café,Coffee Shop,Hotel,Breakfast Spot,Pizza Place,Italian Restaurant,American Restaurant,Bar
7,CHULA VISTA,0.0725,0.007034,32.640054,-117.084196,1,Mexican Restaurant,Grocery Store,Convenience Store,Clothing Store,Taco Place,Sandwich Place,Italian Restaurant,Seafood Restaurant,Cosmetics Shop,Coffee Shop
8,CLOVIS,0.0725,0.013983,36.825228,-119.702919,1,Mexican Restaurant,Sandwich Place,Coffee Shop,Pizza Place,Burger Joint,Hotel,Ice Cream Shop,American Restaurant,Fast Food Restaurant,Italian Restaurant
9,CONCORD,0.0725,0.016411,37.976852,-122.033562,1,Mexican Restaurant,Sandwich Place,Discount Store,Japanese Restaurant,Coffee Shop,Café,Chinese Restaurant,Pizza Place,Italian Restaurant,Thai Restaurant
10,CORONA,0.0725,0.010682,33.875295,-117.566445,1,Mexican Restaurant,Convenience Store,Rental Car Location,Fast Food Restaurant,Indian Restaurant,Sandwich Place,Diner,Discount Store,Sushi Restaurant,Furniture / Home Store


#### Cluster 2

In [60]:
city_full.loc[city_full['Cluster'] == 2, city_full.columns[[1] + list(range(2, city_full.shape[1]))]]

Unnamed: 0,City,Tax,Crime Index,Latitude,Longitude,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
74,CENTENNIAL,0.029,0.008221,39.568064,-104.977831,2,Fast Food Restaurant,Sandwich Place,Grocery Store,American Restaurant,Pizza Place,Mexican Restaurant,Spa,BBQ Joint,Coffee Shop,Furniture / Home Store


#### Cluster 3

This is our ideal cluster. Active nightlife, pubs, breweries and trendy areas and activities.  

In [61]:
cluster3 = city_full.loc[city_full['Cluster'] == 3, city_full.columns[[1] + list(range(2, city_full.shape[1]))]]
cluster3

Unnamed: 0,City,Tax,Crime Index,Latitude,Longitude,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,LOS ANGELES,0.0725,0.012505,34.053691,-118.242767,3,Sushi Restaurant,Coffee Shop,Japanese Restaurant,Plaza,Mexican Restaurant,Ramen Restaurant,Ice Cream Shop,Bookstore,Bar,Mediterranean Restaurant
40,ORANGE,0.0725,0.009155,33.750038,-117.870493,3,Mexican Restaurant,Convenience Store,Bar,Restaurant,Fast Food Restaurant,Coffee Shop,Pizza Place,Sandwich Place,American Restaurant,Diner
43,PASADENA,0.0725,0.010903,34.147645,-118.144478,3,American Restaurant,Coffee Shop,Pizza Place,Italian Restaurant,Bar,Steakhouse,Bakery,Cosmetics Shop,Pub,Beer Garden
52,SAN DIEGO,0.0725,0.00925,32.717421,-117.162771,3,Hotel,Mexican Restaurant,Italian Restaurant,Café,Bar,American Restaurant,Coffee Shop,Seafood Restaurant,New American Restaurant,Burger Joint
54,SAN JOSE,0.0725,0.012075,37.336191,-121.890583,3,Mexican Restaurant,Coffee Shop,Cocktail Bar,Bar,Sushi Restaurant,Sandwich Place,Theater,Pizza Place,Pub,Ice Cream Shop
56,SANTA ANA,0.0725,0.011227,33.749495,-117.873221,3,Mexican Restaurant,Fast Food Restaurant,Bar,Convenience Store,Restaurant,Pharmacy,Sandwich Place,Pizza Place,American Restaurant,Coffee Shop
81,BOSTON,0.0625,0.009736,42.360253,-71.058291,3,Italian Restaurant,Seafood Restaurant,Historic Site,Park,Coffee Shop,Pizza Place,Sandwich Place,Market,Bakery,Hotel
82,CAMBRIDGE,0.0625,0.008119,42.3751,-71.105616,3,New American Restaurant,Coffee Shop,Pub,Café,Gastropub,Brewery,Vegetarian / Vegan Restaurant,Portuguese Restaurant,Ice Cream Shop,Tapas Restaurant
88,DETROIT,0.06,0.022378,42.331551,-83.04664,3,Coffee Shop,Bar,American Restaurant,Restaurant,Diner,Park,Lounge,Steakhouse,Burger Joint,Hotel
95,RENO,0.0685,0.014195,39.52927,-119.813674,3,Bar,Pub,Mexican Restaurant,Brewery,Coffee Shop,Café,Hotel,Breakfast Spot,Steakhouse,Pizza Place


#### Cluster 4

In [62]:
city_full.loc[city_full['Cluster'] == 4, city_full.columns[[1] + list(range(2, city_full.shape[1]))]]

Unnamed: 0,City,Tax,Crime Index,Latitude,Longitude,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,BERKELEY,0.0725,0.023042,37.870839,-122.272864,4,Chinese Restaurant,Japanese Restaurant,New American Restaurant,Thai Restaurant,French Restaurant,Café,Yoga Studio,Ice Cream Shop,Coffee Shop,Pizza Place


In [63]:
# create map
map_clusters = folium.Map(location=[39.83, -98.58], zoom_start=4)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(cluster3['Latitude'], cluster3['Longitude'], cluster3['City'],cluster3['Cluster']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=10,
        popup=label,
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [64]:
cam = cluster3[cluster3['City'] == 'CAMBRIDGE']
cam

Unnamed: 0,City,Tax,Crime Index,Latitude,Longitude,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
82,CAMBRIDGE,0.0625,0.008119,42.3751,-71.105616,3,New American Restaurant,Coffee Shop,Pub,Café,Gastropub,Brewery,Vegetarian / Vegan Restaurant,Portuguese Restaurant,Ice Cream Shop,Tapas Restaurant


In [65]:
cluster3.sort_values('Tax').head(5)

Unnamed: 0,City,Tax,Crime Index,Latitude,Longitude,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
88,DETROIT,0.06,0.022378,42.331551,-83.04664,3,Coffee Shop,Bar,American Restaurant,Restaurant,Diner,Park,Lounge,Steakhouse,Burger Joint,Hotel
81,BOSTON,0.0625,0.009736,42.360253,-71.058291,3,Italian Restaurant,Seafood Restaurant,Historic Site,Park,Coffee Shop,Pizza Place,Sandwich Place,Market,Bakery,Hotel
82,CAMBRIDGE,0.0625,0.008119,42.3751,-71.105616,3,New American Restaurant,Coffee Shop,Pub,Café,Gastropub,Brewery,Vegetarian / Vegan Restaurant,Portuguese Restaurant,Ice Cream Shop,Tapas Restaurant
105,SEATTLE,0.065,0.025904,47.603832,-122.330062,3,Coffee Shop,Cocktail Bar,Seafood Restaurant,Vietnamese Restaurant,American Restaurant,Sushi Restaurant,Italian Restaurant,Breakfast Spot,Hotel,Art Museum
95,RENO,0.0685,0.014195,39.52927,-119.813674,3,Bar,Pub,Mexican Restaurant,Brewery,Coffee Shop,Café,Hotel,Breakfast Spot,Steakhouse,Pizza Place


In [66]:
cluster3.sort_values('Crime Index').head(5)

Unnamed: 0,City,Tax,Crime Index,Latitude,Longitude,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
82,CAMBRIDGE,0.0625,0.008119,42.3751,-71.105616,3,New American Restaurant,Coffee Shop,Pub,Café,Gastropub,Brewery,Vegetarian / Vegan Restaurant,Portuguese Restaurant,Ice Cream Shop,Tapas Restaurant
40,ORANGE,0.0725,0.009155,33.750038,-117.870493,3,Mexican Restaurant,Convenience Store,Bar,Restaurant,Fast Food Restaurant,Coffee Shop,Pizza Place,Sandwich Place,American Restaurant,Diner
52,SAN DIEGO,0.0725,0.00925,32.717421,-117.162771,3,Hotel,Mexican Restaurant,Italian Restaurant,Café,Bar,American Restaurant,Coffee Shop,Seafood Restaurant,New American Restaurant,Burger Joint
81,BOSTON,0.0625,0.009736,42.360253,-71.058291,3,Italian Restaurant,Seafood Restaurant,Historic Site,Park,Coffee Shop,Pizza Place,Sandwich Place,Market,Bakery,Hotel
43,PASADENA,0.0725,0.010903,34.147645,-118.144478,3,American Restaurant,Coffee Shop,Pizza Place,Italian Restaurant,Bar,Steakhouse,Bakery,Cosmetics Shop,Pub,Beer Garden


## Results

### City Selection

Of the 11 cities in the target cluster, the top 5 safest and 5 lowest state tax are cross referenced to select Cambridge, MA as the top choice to open a Dispensary followed by Boston, MA.

In [67]:
# create map
cambridge = folium.Map(location=[42.375100,-71.105616], zoom_start=10)

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(cam['Latitude'], cam['Longitude'], cam['City'], cam['Cluster']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=20,
        popup=label,
        color='darkgreen',
        fill=True,
        fill_color='green',
        fill_opacity=0.7).add_to(cambridge)
       
cambridge

## Discussion

Upon review of the 5 different cluster datasets, I was able to determine which clusters contain the target demographic and culture that is becoming of a new dispensary such as an active nightlife scene from bars, pubs, liquor stores, activities, lots of restaurants such as fast food and convenience stores.  The recommendation would be to focus your investors on opening a dispensary in Cambridge, MA followed by Boston, MA as a backup option or expansion location.  These cities are both highly populated, very trendy, have an active nightlife, are relatively safe, and have a decent sales tax compared to other cities assessed.  

## Conclusion

Recreational marijuana is a rapidly materializing industry, estimated to surpass $16 Billion in 2019 and exponentially growing.  With the influx of new users and change of legal status, there is a huge demand for recreational marijuana with very limited supply.  This is the prime chance to seize this opportunity and open a new legal marijuana supply chain in the United States.  Based on the data reviewed and analyzed, Cambridge is the top city in the US to open the next Marijuana Dispensary based on its population, safety and culture.  