# Battle of the Neighborhoods: Where to open a yoga studio in Washington, DC?

### Table of contents
#### 1 Geocoding DC Neighborhoods
1.1 Import DC Neighborhood JSON File

1.2 Make Pandas data frame and import JSON data

#### 2 Explore the venues in each neighborhood
2.1 Get Coordinates of Washington, DC

2.2 Create a map of Washington, DC with neighborhoods on top

2.3 Get foursquare credentials 

2.4 Explore Venues in Washington, DC

2.5 Analyze each neighborhood

2.6 Put the top 5 venues of each neighborhood into a dataframe

#### 3 Cluster the Neighborhoods & explore clusters
3.1 Run k-means clustering model with 5 clusters
3.2 Visualize clusters on a map

3.3 Examine the top 5 venues in each cluster

3.4 Decide which cluster would be the best fit for a yoga studio

#### 4 Decide which neighborhoods would be best to open a yoga studio
4.1 Keep only cluster 1 neighborhoods

4.2 Drop any neighborhoods with gyms in the top 5 venues

4.3 Keep neighborhoods with a restaurant in the top venue slot


### First, I'm making sure all the packages are installed

In [1]:
!pip install bs4
!pip install requests
!pip install simplejson



In [2]:
import pandas as pd
from bs4 import BeautifulSoup
import requests
import simplejson as json

# 1 Geocoding DC Neighborhoods

### 1.1 Import the json file from the DC data website and examine the data

In [3]:
!wget -q -O 'Neighborhood_Labels.geojson' C:\Users\rbloodworth\OneDrive - PCORI\Misc\githubprojects\Coursera_Capstone\Neighborhood_Labels.geojson
print('Data found!')

Data found!


'wget' is not recognized as an internal or external command,
operable program or batch file.


In [4]:
with open('Neighborhood_Labels.geojson') as json_data:
    dcdata=json.load(json_data)

In [5]:
dcdata

{'type': 'FeatureCollection',
 'name': 'Neighborhood_Labels',
 'crs': {'type': 'name',
  'properties': {'name': 'urn:ogc:def:crs:OGC:1.3:CRS84'}},
 'features': [{'type': 'Feature',
   'properties': {'OBJECTID': 1,
    'GIS_ID': 'nhood_050',
    'NAME': 'Fort Stanton',
    'WEB_URL': 'http://NeighborhoodAction.dc.gov',
    'LABEL_NAME': 'Fort Stanton',
    'DATELASTMODIFIED': '2003-04-10T00:00:00Z'},
   'geometry': {'type': 'Point',
    'coordinates': [-76.98034770695811, 38.85565773097726]}},
  {'type': 'Feature',
   'properties': {'OBJECTID': 2,
    'GIS_ID': 'nhood_031',
    'NAME': 'Congress Heights',
    'WEB_URL': 'http://NeighborhoodAction.dc.gov',
    'LABEL_NAME': 'Congress Heights',
    'DATELASTMODIFIED': '2003-04-10T00:00:00Z'},
   'geometry': {'type': 'Point',
    'coordinates': [-76.99794992892741, 38.84107730589159]}},
  {'type': 'Feature',
   'properties': {'OBJECTID': 3,
    'GIS_ID': 'nhood_123',
    'NAME': 'Washington Highlands',
    'WEB_URL': 'http://NeighborhoodAc

In [6]:
dcdatafeatures=dcdata['features']

In [7]:
dcdatafeatures[0]

{'type': 'Feature',
 'properties': {'OBJECTID': 1,
  'GIS_ID': 'nhood_050',
  'NAME': 'Fort Stanton',
  'WEB_URL': 'http://NeighborhoodAction.dc.gov',
  'LABEL_NAME': 'Fort Stanton',
  'DATELASTMODIFIED': '2003-04-10T00:00:00Z'},
 'geometry': {'type': 'Point',
  'coordinates': [-76.98034770695811, 38.85565773097726]}}

### 1.2 Make a pandas dataframe and import the json data

In [8]:
column_names=['Neighborhood','Latitude','Longitude']
dcneighborhoods=pd.DataFrame(columns=column_names)
dcneighborhoods

Unnamed: 0,Neighborhood,Latitude,Longitude


In [9]:
for data in dcdatafeatures:
    neighborhood_name=data['properties']['NAME']
    neighborhood_latlon=data['geometry']['coordinates']
    neighborhood_lat=neighborhood_latlon[1]
    neighborhood_lon=neighborhood_latlon[0]
    dcneighborhoods=dcneighborhoods.append({'Neighborhood':neighborhood_name,
                                           'Latitude':neighborhood_lat,
                                           'Longitude':neighborhood_lon}, ignore_index=True)

In [10]:
dcneighborhoods.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Fort Stanton,38.855658,-76.980348
1,Congress Heights,38.841077,-76.99795
2,Washington Highlands,38.830237,-76.995636
3,Bellevue,38.826952,-77.009271
4,Knox Hill/Buena Vista,38.853688,-76.96766


In [11]:
dcneighborhoods.shape

(132, 3)

# 2 Explore venues in each neighborhood

### Import all necessary packages

In [12]:
import numpy as np
import json

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim

from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium

print('Libraries Imported')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Libraries Imported


### 2.1 Get the coordinates of Washington, DC

In [13]:
address='Washington'

geolocator=Nominatim(user_agent='washington_explorer')
location=geolocator.geocode(address)
latitude=location.latitude
longitude=location.longitude
print('The geographic coordinates of Washington, DC are {}, {}'.format(latitude,longitude))

The geographic coordinates of Washington, DC are 38.8949924, -77.0365581


### 2.2 Create a map of Washington, DC with the neighborhoods on top

In [14]:
mapdc=folium.Map(location=[latitude,longitude],zoom_start=11)
mapdc

### Add the neighborhoods onto the map

In [15]:
for lat,lng,neighborhood in zip(dcneighborhoods['Latitude'],dcneighborhoods['Longitude'],dcneighborhoods['Neighborhood']):
    label='{}'.format(neighborhood)
    label=folium.Popup(label,parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(mapdc)    

In [16]:
mapdc

### 2.3 Get my foursquare credentials

In [17]:
CLIENT_ID = 'W0RBWOFCXMGYQG4FOEZK4TGBAER3Z4JR34FKBUXJDL0TB1UY' # your Foursquare ID
CLIENT_SECRET = 'VP3GIBHBLOKCPEQLDMXUSWJ2BA1C4U54NJOV0E5XERNZVWVV' # your Foursquare Secret
VERSION = '20180614' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: W0RBWOFCXMGYQG4FOEZK4TGBAER3Z4JR34FKBUXJDL0TB1UY
CLIENT_SECRET:VP3GIBHBLOKCPEQLDMXUSWJ2BA1C4U54NJOV0E5XERNZVWVV


### 2.4 Exploring venues in DC

In [18]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [19]:
dcvenues=getNearbyVenues(names=dcneighborhoods['Neighborhood'],
                              latitudes=dcneighborhoods['Latitude'],
                              longitudes=dcneighborhoods['Longitude']                              )

Fort Stanton
Congress Heights
Washington Highlands
Bellevue
Knox Hill/Buena Vista
Shipley
Douglass
Woodland
Garfield Heights
Near Southeast
Capitol Hill
Dupont Park
Twining
Randle Highlands
Fairlawn
Penn Branch
Barry Farm
Historic Anacostia
Columbia Heights
Logan Circle/Shaw
Cardozo/Shaw
Van Ness
Forest Hills
Georgetown Reservoir
Foxhall Village
Fort Totten
Pleasant Hill
Kenilworth
Eastland Gardens
Deanwood
Fort Dupont
Greenway
Woodland-Normanstone
Mass. Ave. Heights
Naylor Gardens
Pleasant Plains
Hillsdale
Benning Ridge
Penn Quarter
Chinatown
Stronghold
South Central
Langston
Downtown East
North Portal Estates
Colonial Village
Shepherd Park
Takoma
Lamond Riggs
Petworth
Brightwood Park
Manor Park
Brightwood
Hawthorne
Barnaby Woods
Queens Chapel
Michigan Park
North Michigan Park
Woodridge
University Heights
Brookland
Edgewood
Skyland
Bloomingdale
Lincoln Park
16th Street Heights
Fort Lincoln
Gateway
Langdon
Brentwood
Eckington
Truxton Circle
Ivy City
Trinidad
Arboretum
Carver
Mount Vern

### Check the size of the dataframe and how many venues were returned for each neighborhood

In [20]:
print(dcvenues.shape)
dcvenues.head()

(2643, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Fort Stanton,38.855658,-76.980348,Anacostia Community Museum,38.856728,-76.976899,Museum
1,Fort Stanton,38.855658,-76.980348,Fort Stanton Recreation Center,38.857118,-76.978816,Recreation Center
2,Fort Stanton,38.855658,-76.980348,Fort Stanton Park,38.857541,-76.978266,Park
3,Fort Stanton,38.855658,-76.980348,Stanton Road SE & Suitland Parkway SE,38.853278,-76.983289,Intersection
4,Fort Stanton,38.855658,-76.980348,Douglass Community Recreation Center,38.852218,-76.977411,Park


In [25]:
dcgrouped2=dcvenues.groupby('Neighborhood').count()
pd.set_option("display.max_rows",None)
dcgrouped2
dcgrouped2.sort_values(by=['Venue'])

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
NE Boundary,1,1,1,1,1,1
Crestwood,1,1,1,1,1,1
Colonial Village,1,1,1,1,1,1
North Cleveland Park,1,1,1,1,1,1
Grant Park,2,2,2,2,2,2
Spring Valley,2,2,2,2,2,2
Hawthorne,2,2,2,2,2,2
Mayfair,2,2,2,2,2,2
Greenway,2,2,2,2,2,2
American University Park,2,2,2,2,2,2


### How many unique categories are in the returned venues?

In [26]:
print('There are {} unique cateogories.'.format(len(dcvenues['Venue Category'].unique())))

There are 313 unique cateogories.


### 2.5 Analyze each neighborhood

In [27]:
# one hot encoding
dc_onehot = pd.get_dummies(dcvenues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dc_onehot['Neighborhood'] = dcvenues['Neighborhood'] 

# define a list of column names
cols=dc_onehot.columns.tolist()

# move the column name to the beginning
cols.insert(0, cols.pop(cols.index('Neighborhood')))

#reorder the columns
dc_onehot=dc_onehot.reindex(columns=cols)

dc_onehot.head()
dc_onehot

Unnamed: 0,Neighborhood,ATM,Afghan Restaurant,Alternative Healer,American Restaurant,Animal Shelter,Antique Shop,Arepa Restaurant,Art Gallery,Art Museum,...,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo Exhibit
0,Fort Stanton,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Fort Stanton,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Fort Stanton,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Fort Stanton,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Fort Stanton,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,Congress Heights,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,Congress Heights,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,Congress Heights,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,Congress Heights,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,Congress Heights,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Group rows by neighborhood and take the mean of each occurence of each category

In [28]:
dc_grouped=dc_onehot.groupby('Neighborhood').mean().reset_index()
dc_grouped

Unnamed: 0,Neighborhood,ATM,Afghan Restaurant,Alternative Healer,American Restaurant,Animal Shelter,Antique Shop,Arepa Restaurant,Art Gallery,Art Museum,...,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo Exhibit
0,16th Street Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Adams Morgan,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.015625,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,American University Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Arboretum,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Barnaby Woods,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Barry Farm,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Bellevue,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Benning,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Benning Ridge,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Bloomingdale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0


### Print each neighborhood along with the top 5 most common venue types

In [32]:
num_top_venues = 5

for hood in dc_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = dc_grouped[dc_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----16th Street Heights----
            venue  freq
0           Diner  0.08
1  Cosmetics Shop  0.08
2        Bus Stop  0.08
3  Breakfast Spot  0.08
4            Park  0.08


----Adams Morgan----
                  venue  freq
0                   Bar  0.05
1        Ice Cream Shop  0.05
2  Ethiopian Restaurant  0.03
3    Italian Restaurant  0.03
4                 Diner  0.03


----American University Park----
                 venue  freq
0   Italian Restaurant   0.5
1            BBQ Joint   0.5
2            Pet Store   0.0
3             Pet Café   0.0
4  Peruvian Restaurant   0.0


----Arboretum----
                  venue  freq
0                Garden  0.15
1        Ice Cream Shop  0.08
2           Gas Station  0.08
3  Fast Food Restaurant  0.08
4      Basketball Court  0.08


----Barnaby Woods----
                  venue  freq
0                 Field  0.25
1  Gym / Fitness Center  0.25
2             BBQ Joint  0.25
3                  Park  0.25
4                   ATM  0.00


----Barry 

                venue  freq
0       Grocery Store   0.2
1                Park   0.2
2       Memorial Site   0.2
3  Miscellaneous Shop   0.2
4            Hospital   0.2


----Foxhall Crescents----
                      venue  freq
0                      Café  0.17
1  Mediterranean Restaurant  0.08
2               Coffee Shop  0.08
3                    Museum  0.08
4                 Wine Shop  0.08


----Foxhall Village----
            venue  freq
0           Trail  0.17
1  Sandwich Place  0.17
2         Dog Run  0.17
3             Spa  0.17
4          Bakery  0.17


----Friendship Heights----
                 venue  freq
0     Department Store  0.16
1       Cosmetics Shop  0.10
2  American Restaurant  0.06
3          Coffee Shop  0.06
4    Mobile Phone Shop  0.06


----Garfield Heights----
               venue  freq
0  Convenience Store  0.25
1           Gym Pool  0.25
2        Art Gallery  0.25
3               Park  0.25
4  Outdoor Sculpture  0.00


----Gateway----
                    

                      venue  freq
0                Laundromat   0.2
1         Convenience Store   0.2
2               Wings Joint   0.2
3  Bike Rental / Bike Share   0.2
4             Boat or Ferry   0.2


----Penn Quarter----
                 venue  freq
0  American Restaurant  0.06
1           Art Museum  0.04
2              Theater  0.04
3          Salad Place  0.03
4    Indian Restaurant  0.03


----Petworth----
                     venue  freq
0              Pizza Place  0.14
1              Gas Station  0.07
2                    Plaza  0.07
3  New American Restaurant  0.07
4            Grocery Store  0.07


----Pleasant Hill----
                venue  freq
0            Bus Stop   0.4
1      Sandwich Place   0.2
2        Dance Studio   0.2
3  Seafood Restaurant   0.2
4                 ATM   0.0


----Pleasant Plains----
               venue  freq
0      Deli / Bodega  0.08
1        Beer Garden  0.08
2        Coffee Shop  0.08
3        Pizza Place  0.08
4  Indian Restaurant  0.04




### 2.6 Put top venue types by neighborhood into a dataframe

In [33]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [34]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = dc_grouped['Neighborhood']

for ind in np.arange(dc_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dc_grouped.iloc[ind, :], num_top_venues)


neighborhoods_venues_sorted.head()
#neighborhoods_venues_sorted.shape  
#dc_grouped.shape
#dcneighborhoods.shape
#ok so we're missing 3 neighborhoods in the grouped files - 129. 
#so we need to merge right below so the cluster labels will be integers

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,16th Street Heights,Diner,Cosmetics Shop,Bus Stop,Breakfast Spot,Park
1,Adams Morgan,Bar,Ice Cream Shop,Ethiopian Restaurant,Italian Restaurant,Diner
2,American University Park,Italian Restaurant,BBQ Joint,Pet Store,Pet Café,Peruvian Restaurant
3,Arboretum,Garden,Ice Cream Shop,Gas Station,Fast Food Restaurant,Basketball Court
4,Barnaby Woods,Field,Gym / Fitness Center,BBQ Joint,Park,ATM


In [35]:
#print the whole dataset 
neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,16th Street Heights,Diner,Cosmetics Shop,Bus Stop,Breakfast Spot,Park
1,Adams Morgan,Bar,Ice Cream Shop,Ethiopian Restaurant,Italian Restaurant,Diner
2,American University Park,Italian Restaurant,BBQ Joint,Pet Store,Pet Café,Peruvian Restaurant
3,Arboretum,Garden,Ice Cream Shop,Gas Station,Fast Food Restaurant,Basketball Court
4,Barnaby Woods,Field,Gym / Fitness Center,BBQ Joint,Park,ATM
5,Barry Farm,Bus Stop,Rental Car Location,Metro Station,Basketball Court,Intersection
6,Bellevue,Shoe Repair,Grocery Store,Pizza Place,Playground,ATM
7,Benning,Fried Chicken Joint,Fish & Chips Shop,Fast Food Restaurant,Seafood Restaurant,Pharmacy
8,Benning Ridge,Convenience Store,Burger Joint,Insurance Office,Outdoor Sculpture,Pet Café
9,Bloomingdale,Bus Stop,Asian Restaurant,Coffee Shop,Pizza Place,Grocery Store


## 3 Cluster the neighborhoods & explore clusters

### 3.1 Run k means to cluster the neighborhoods into 5 clusters

In [36]:
# set number of clusters
kclusters = 5

dc_grouped_clustering = dc_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dc_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 1, 1, 1, 4, 1, 1, 1, 1, 1])

### Create a dataframe that includes the clusters plus the top 5 venues for each neighborhood

In [37]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

dc_merged3 = dcneighborhoods
dc_merged3.shape

(132, 3)

In [38]:
# add latitude/longitude for each neighborhood using merge
dc_merged3 = dc_merged3.join(neighborhoods_venues_sorted.set_index('Neighborhood'), how="right", on='Neighborhood')

dc_merged3.head() 
dc_merged3

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
65,16th Street Heights,38.950315,-77.033559,1,Diner,Cosmetics Shop,Bus Stop,Breakfast Spot,Park
105,Adams Morgan,38.920472,-77.042391,1,Bar,Ice Cream Shop,Ethiopian Restaurant,Italian Restaurant,Diner
94,American University Park,38.947612,-77.09025,1,Italian Restaurant,BBQ Joint,Pet Store,Pet Café,Peruvian Restaurant
74,Arboretum,38.91486,-76.97249,1,Garden,Ice Cream Shop,Gas Station,Fast Food Restaurant,Basketball Court
54,Barnaby Woods,38.975433,-77.060174,4,Field,Gym / Fitness Center,BBQ Joint,Park,ATM
16,Barry Farm,38.859255,-76.997281,1,Bus Stop,Rental Car Location,Metro Station,Basketball Court,Intersection
3,Bellevue,38.826952,-77.009271,1,Shoe Repair,Grocery Store,Pizza Place,Playground,ATM
121,Benning,38.891885,-76.948884,1,Fried Chicken Joint,Fish & Chips Shop,Fast Food Restaurant,Seafood Restaurant,Pharmacy
37,Benning Ridge,38.881162,-76.938203,1,Convenience Store,Burger Joint,Insurance Office,Outdoor Sculpture,Pet Café
63,Bloomingdale,38.918226,-77.011159,1,Bus Stop,Asian Restaurant,Coffee Shop,Pizza Place,Grocery Store


### 3.2 Visualize the clusters

In [39]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dc_merged3['Latitude'], dc_merged3['Longitude'], dc_merged3['Neighborhood'], dc_merged3['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 3.3 Examine the Clusters

In [45]:
dc_merged3.groupby('Cluster Labels').count()

Unnamed: 0_level_0,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
Cluster Labels,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,6,6,6,6,6,6,6,6
1,101,101,101,101,101,101,101,101
2,1,1,1,1,1,1,1,1
3,3,3,3,3,3,3,3,3
4,19,19,19,19,19,19,19,19


### Cluster 0

In [40]:
dc_merged3.loc[dc_merged3['Cluster Labels'] == 0, dc_merged3.columns[[0] + list(range(1, dc_merged3.shape[1]))]]
#note that i had to change columns to [0] from [1] because it was not showing neighborhood

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
127,Fort Davis Park,38.868901,-76.94467,0,Liquor Store,Historic Site,Chinese Restaurant,BBQ Joint,ATM
67,Gateway,38.919757,-76.963846,0,Liquor Store,Shipping Store,Gas Station,Thrift / Vintage Store,Business Service
4,Knox Hill/Buena Vista,38.853688,-76.96766,0,Liquor Store,Convenience Store,ATM,Paper / Office Supplies Store,Pet Café
120,NE Boundary,38.895451,-76.917389,0,Liquor Store,ATM,Pet Store,Pet Café,Peruvian Restaurant
55,Queens Chapel,38.95607,-76.996591,0,Liquor Store,Residential Building (Apartment / Condo),Chinese Restaurant,Gym / Fitness Center,Convenience Store
12,Twining,38.875588,-76.960847,0,Liquor Store,Restaurant,Convenience Store,Food Truck,Bike Rental / Bike Share


Ok so based on the fact that liquor store is the top venue for all of these neighborhoods, and the rest seem to be more on the shopping/store side and not restaurants, I don't think this cluster is our best bet. 

### Cluster 1

In [41]:
dc_merged3.loc[dc_merged3['Cluster Labels'] == 1, dc_merged3.columns[[0]+ list(range(1, dc_merged3.shape[1]))]]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
65,16th Street Heights,38.950315,-77.033559,1,Diner,Cosmetics Shop,Bus Stop,Breakfast Spot,Park
105,Adams Morgan,38.920472,-77.042391,1,Bar,Ice Cream Shop,Ethiopian Restaurant,Italian Restaurant,Diner
94,American University Park,38.947612,-77.09025,1,Italian Restaurant,BBQ Joint,Pet Store,Pet Café,Peruvian Restaurant
74,Arboretum,38.91486,-76.97249,1,Garden,Ice Cream Shop,Gas Station,Fast Food Restaurant,Basketball Court
16,Barry Farm,38.859255,-76.997281,1,Bus Stop,Rental Car Location,Metro Station,Basketball Court,Intersection
3,Bellevue,38.826952,-77.009271,1,Shoe Repair,Grocery Store,Pizza Place,Playground,ATM
121,Benning,38.891885,-76.948884,1,Fried Chicken Joint,Fish & Chips Shop,Fast Food Restaurant,Seafood Restaurant,Pharmacy
37,Benning Ridge,38.881162,-76.938203,1,Convenience Store,Burger Joint,Insurance Office,Outdoor Sculpture,Pet Café
63,Bloomingdale,38.918226,-77.011159,1,Bus Stop,Asian Restaurant,Coffee Shop,Pizza Place,Grocery Store
69,Brentwood,38.918977,-76.987035,1,Sports Club,Video Store,Department Store,Plaza,Sandwich Place


Cluster 2 is looking pretty good. There are a few neighborhoods with gyms in the top 5 venues, but there are pleny of neighborhoods that look hip and trendy with lots of restaurants, etc. 

### Cluster 2

In [42]:
dc_merged3.loc[dc_merged3['Cluster Labels'] == 2, dc_merged3.columns[[0]+ list(range(1, dc_merged3.shape[1]))]]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
45,Colonial Village,38.98679,-77.041094,2,Bus Station,ATM,Other Repair Shop,Peruvian Restaurant,Persian Restaurant


Cluster 3 doesn't look great either. No gyms at all in the top 5 venues, which might be ok. But the top 3 venuse are bus stations, ATM, and other repair shop. Makes me think this neighborhood cluster is a bit small and not very trendy. 

### Cluster 3

In [43]:
dc_merged3.loc[dc_merged3['Cluster Labels'] == 3, dc_merged3.columns[[0]+ list(range(1, dc_merged3.shape[1]))]]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
130,Crestwood,38.943327,-77.041097,3,Intersection,Other Repair Shop,Peruvian Restaurant,Persian Restaurant,Performing Arts Venue
129,Hillcrest,38.861794,-76.960688,3,Playground,Intersection,Gym / Fitness Center,Wings Joint,Tennis Court
13,Randle Highlands,38.869336,-76.965804,3,Intersection,Sandwich Place,Gym / Fitness Center,Seafood Restaurant,Business Service


These neighborhoods have gyms in the top 5 and seem to have more businesses than restaurants. 

### Cluster 4

In [44]:
dc_merged3.loc[dc_merged3['Cluster Labels'] == 4, dc_merged3.columns[[0]+ list(range(1, dc_merged3.shape[1]))]]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
54,Barnaby Woods,38.975433,-77.060174,4,Field,Gym / Fitness Center,BBQ Joint,Park,ATM
125,Capitol View,38.889978,-76.927086,4,Building,Convenience Store,Chinese Restaurant,Café,Park
123,Central NE,38.897035,-76.942277,4,Park,Cosmetics Shop,Restaurant,Metro Station,Bus Station
70,Eckington,38.915202,-77.000425,4,Train Station,BBQ Joint,Park,Check Cashing Service,ATM
0,Fort Stanton,38.855658,-76.980348,4,Park,Intersection,Recreation Center,Museum,Other Repair Shop
25,Fort Totten,38.94943,-77.008128,4,Grocery Store,Park,Memorial Site,Miscellaneous Shop,Hospital
8,Garfield Heights,38.854085,-76.972213,4,Convenience Store,Gym Pool,Art Gallery,Park,Outdoor Sculpture
122,Grant Park,38.892707,-76.920638,4,Home Service,Park,ATM,Pet Store,Pet Café
104,Kalorama Heights,38.91566,-77.051195,4,Steakhouse,Gym / Fitness Center,Park,Scenic Lookout,Sushi Restaurant
27,Kenilworth,38.910679,-76.938586,4,Liquor Store,Chinese Restaurant,Border Crossing,Coffee Shop,Park


I see several gyms in these neighborhoods, and even a yoga studio. There are some restaurants, but many of the neighborhoods look more commercial with businesses and less hip/trendy with restaurants and cafes. 

### 3.4 Decide which cluster would be the best fit for a yoga studio

Now that I've decided to use Cluster 2 is the place we should open a yoga studio, I will decide which neighborhood is best

# 4 Decide which neighborhoods would be best to open a yoga studio in Washington, DC

### 4.1 Make a new dataframe with only cluster 1 neighbrohoods

In [46]:
dccluster2=dc_merged3.loc[dc_merged3['Cluster Labels'] == 1]
dccluster2

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
65,16th Street Heights,38.950315,-77.033559,1,Diner,Cosmetics Shop,Bus Stop,Breakfast Spot,Park
105,Adams Morgan,38.920472,-77.042391,1,Bar,Ice Cream Shop,Ethiopian Restaurant,Italian Restaurant,Diner
94,American University Park,38.947612,-77.09025,1,Italian Restaurant,BBQ Joint,Pet Store,Pet Café,Peruvian Restaurant
74,Arboretum,38.91486,-76.97249,1,Garden,Ice Cream Shop,Gas Station,Fast Food Restaurant,Basketball Court
16,Barry Farm,38.859255,-76.997281,1,Bus Stop,Rental Car Location,Metro Station,Basketball Court,Intersection
3,Bellevue,38.826952,-77.009271,1,Shoe Repair,Grocery Store,Pizza Place,Playground,ATM
121,Benning,38.891885,-76.948884,1,Fried Chicken Joint,Fish & Chips Shop,Fast Food Restaurant,Seafood Restaurant,Pharmacy
37,Benning Ridge,38.881162,-76.938203,1,Convenience Store,Burger Joint,Insurance Office,Outdoor Sculpture,Pet Café
63,Bloomingdale,38.918226,-77.011159,1,Bus Stop,Asian Restaurant,Coffee Shop,Pizza Place,Grocery Store
69,Brentwood,38.918977,-76.987035,1,Sports Club,Video Store,Department Store,Plaza,Sandwich Place


### 4.2 Drop neighborhoods with gym, sports club, gym / fitness center, dance studio, recreation center in top 5 venues

In [47]:
#check shape of original dataset
dccluster2.shape

(101, 9)

In [48]:
#1st most common venues
d1=dccluster2[dccluster2['1st Most Common Venue']!="Gym"]
d1=d1[d1['1st Most Common Venue']!="Gym / Fitness Center"]
d1=d1[d1['1st Most Common Venue']!="Sports Club"]
d1=d1[d1['1st Most Common Venue']!="Dance Studio"]
d1=d1[d1['1st Most Common Venue']!="Recreation Center"]

#2nd most common venues
d1=d1[d1['2nd Most Common Venue']!="Gym"]
d1=d1[d1['2nd Most Common Venue']!="Gym / Fitness Center"]
d1=d1[d1['2nd Most Common Venue']!="Sports Club"]
d1=d1[d1['2nd Most Common Venue']!="Dance Studio"]
d1=d1[d1['2nd Most Common Venue']!="Recreation Center"]

#3rd most common venues
d1=d1[d1['3rd Most Common Venue']!="Gym"]
d1=d1[d1['3rd Most Common Venue']!="Gym / Fitness Center"]
d1=d1[d1['3rd Most Common Venue']!="Sports Club"]
d1=d1[d1['3rd Most Common Venue']!="Dance Studio"]
d1=d1[d1['3rd Most Common Venue']!="Recreation Center"]

#4th most common venues
d1=d1[d1['4th Most Common Venue']!="Gym"]
d1=d1[d1['4th Most Common Venue']!="Gym / Fitness Center"]
d1=d1[d1['4th Most Common Venue']!="Sports Club"]
d1=d1[d1['4th Most Common Venue']!="Dance Studio"]
d1=d1[d1['4th Most Common Venue']!="Recreation Center"]

#5th most common venues
d1=d1[d1['5th Most Common Venue']!="Gym"]
d1=d1[d1['5th Most Common Venue']!="Gym / Fitness Center"]
d1=d1[d1['5th Most Common Venue']!="Sports Club"]
d1=d1[d1['5th Most Common Venue']!="Dance Studio"]
d1=d1[d1['5th Most Common Venue']!="Recreation Center"]
d1.shape

(84, 9)

### Examing the resulting neighborhoods

In [49]:
d1

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
65,16th Street Heights,38.950315,-77.033559,1,Diner,Cosmetics Shop,Bus Stop,Breakfast Spot,Park
105,Adams Morgan,38.920472,-77.042391,1,Bar,Ice Cream Shop,Ethiopian Restaurant,Italian Restaurant,Diner
94,American University Park,38.947612,-77.09025,1,Italian Restaurant,BBQ Joint,Pet Store,Pet Café,Peruvian Restaurant
74,Arboretum,38.91486,-76.97249,1,Garden,Ice Cream Shop,Gas Station,Fast Food Restaurant,Basketball Court
16,Barry Farm,38.859255,-76.997281,1,Bus Stop,Rental Car Location,Metro Station,Basketball Court,Intersection
3,Bellevue,38.826952,-77.009271,1,Shoe Repair,Grocery Store,Pizza Place,Playground,ATM
121,Benning,38.891885,-76.948884,1,Fried Chicken Joint,Fish & Chips Shop,Fast Food Restaurant,Seafood Restaurant,Pharmacy
37,Benning Ridge,38.881162,-76.938203,1,Convenience Store,Burger Joint,Insurance Office,Outdoor Sculpture,Pet Café
63,Bloomingdale,38.918226,-77.011159,1,Bus Stop,Asian Restaurant,Coffee Shop,Pizza Place,Grocery Store
52,Brightwood,38.966379,-77.026874,1,Chinese Restaurant,Mexican Restaurant,Pizza Place,Southern / Soul Food Restaurant,Gas Station


Ok it looks like most of these resulting neighborhoods have at least one restaurant. Let's keep the ones with restaurants in the first common spot. For simplicity, I'll only use actual restaurants, not sandwich/pizza places. 

### 4.3 Keep only neighborhoods with a restaurant in the top spot

In [50]:
d2=d1.loc[d1['1st Most Common Venue'].str.contains("Restaurant")]

In [51]:
d2.shape

(14, 9)

### Examing the final list of neighborhoods and top 5 venues

In [52]:
d2

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
94,American University Park,38.947612,-77.09025,1,Italian Restaurant,BBQ Joint,Pet Store,Pet Café,Peruvian Restaurant
52,Brightwood,38.966379,-77.026874,1,Chinese Restaurant,Mexican Restaurant,Pizza Place,Southern / Soul Food Restaurant,Gas Station
119,Burrville,38.900243,-76.921802,1,American Restaurant,Seafood Restaurant,Fish & Chips Shop,Chinese Restaurant,ATM
20,Cardozo/Shaw,38.917168,-77.02755,1,New American Restaurant,Bar,Coffee Shop,American Restaurant,Pizza Place
39,Chinatown,38.899151,-77.020135,1,American Restaurant,Café,Hotel,Cocktail Bar,Italian Restaurant
98,Cleveland Park,38.936098,-77.064402,1,Mexican Restaurant,Xinjiang Restaurant,Steakhouse,Indian Restaurant,Thai Restaurant
106,Dupont Circle,38.912128,-77.040984,1,Thai Restaurant,Italian Restaurant,Spa,Pizza Place,Greek Restaurant
84,Georgetown,38.909556,-77.064796,1,American Restaurant,Coffee Shop,Pizza Place,Dessert Shop,Vietnamese Restaurant
17,Historic Anacostia,38.863186,-76.984678,1,American Restaurant,Convenience Store,Coffee Shop,Fast Food Restaurant,Comfort Food Restaurant
108,Mount Pleasant,38.931741,-77.040656,1,Latin American Restaurant,Grocery Store,Café,Thai Restaurant,Liquor Store


Based on these results, I'd say any of these neighborhoods would be a good place to open a yoga studio in Washington, DC. 