<font size='6'>Battle of the Neighborhoods</font>

<font size='6'>IBM Data Science Capstone</font>

<font size='6'>Problem</font>
    
Washington, DC is an affluent and dynamic city with a thriving restaurant scene. I would like to open a new restaurant, and based on the neighborhood data scraped from the Foursquare API. There are many areas that are overserved, and some that are underserved, and this analysis will determine which areas could be good sites for a new restaurant.
Using these data, I will take the following approach:
1.	List the Washington, DC neighborhoods
2.	Cluster the neighborhoods using K-means clustering based on Foursquare data
3.	Determine the number of restaurants in each cluster
4.	Contrast that with the number of users in each cluster
5.	Determine which neighborhood is underrepresented and would best support a restaurant


<font size='6'>Data Sources<font>

Using Python machine learning, a list of neighborhoods in Washington DC will be clustered using K-means clustering. Each neighborhood has its restaurants, and using this dataset, I will determine which locations would be best to put this restaurant. 

The criteria need also to be a popular, and densely populated neighborhood, while being under represented by restaurants.
The restaurant data will come from the Foursquare API.

The DC neighborhood data will come from the following sources:

•	Open Data DC https://opendata.dc.gov/datasets/ where the list of neighborhoods and their locations will be scraped.

•	DC.gov office of GIS services https://octo.dc.gov/service/dc-gis-services where additional location data will be retrieved

This is an API that allows scraping for analysis. This will provide a list of neighborhoods, and the neighborhood latitude and longitude. Using this data, I will cluster restaurants by neighborhood, and determine which neighborhoods are underserved by restaurant type and population density. From that analysis, I will determine where I need to establish my new restaurant.


In [2]:
!pip install beautifulsoup4
!pip install lxml
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

#!conda install -c conda-forge geopy --yes 
#from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 


from IPython.display import display_html
import pandas as pd
import numpy as np
import csv
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library
from bs4 import BeautifulSoup
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

print('Folium installed')
print('Libraries imported.')

Collecting beautifulsoup4
[?25l  Downloading https://files.pythonhosted.org/packages/d1/41/e6495bd7d3781cee623ce23ea6ac73282a373088fcd0ddc809a047b18eae/beautifulsoup4-4.9.3-py3-none-any.whl (115kB)
[K     |████████████████████████████████| 122kB 5.4MB/s eta 0:00:01
[?25hCollecting soupsieve>1.2; python_version >= "3.0" (from beautifulsoup4)
  Downloading https://files.pythonhosted.org/packages/6f/8f/457f4a5390eeae1cc3aeab89deb7724c965be841ffca6cfca9197482e470/soupsieve-2.0.1-py3-none-any.whl
Installing collected packages: soupsieve, beautifulsoup4
Successfully installed beautifulsoup4-4.9.3 soupsieve-2.0.1
Collecting lxml
[?25l  Downloading https://files.pythonhosted.org/packages/64/28/0b761b64ecbd63d272ed0e7a6ae6e4402fc37886b59181bfdf274424d693/lxml-4.6.1-cp36-cp36m-manylinux1_x86_64.whl (5.5MB)
[K     |████████████████████████████████| 5.5MB 8.3MB/s eta 0:00:01
[?25hInstalling collected packages: lxml
Successfully installed lxml-4.6.1
Collecting package metadata (current_repoda

In [3]:
!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          97 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-2.0.0-pyh9f0ad1d_0



Downloading and Extracting Packages
geopy-2.0.0          | 63 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ################################

Import Neighborhood data into dataframe

In [4]:
#neighborhood data and latitude and longitude imported into dataframe
dc_df = pd.read_csv (r'https://opendata.arcgis.com/datasets/c4b0cd43d50949e98e57de9f22b455fc_35.csv')
dc_df.head()

Unnamed: 0,X,Y,OBJECTID,GIS_ID,NAME,WEB_URL,LABEL_NAME,DATELASTMODIFIED
0,-76.980348,38.855658,1,nhood_050,Fort Stanton,http://NeighborhoodAction.dc.gov,Fort Stanton,2003/04/10 00:00:00+00
1,-76.99795,38.841077,2,nhood_031,Congress Heights,http://NeighborhoodAction.dc.gov,Congress Heights,2003/04/10 00:00:00+00
2,-76.995636,38.830237,3,nhood_123,Washington Highlands,http://NeighborhoodAction.dc.gov,Washington Highlands,2003/04/10 00:00:00+00
3,-77.009271,38.826952,4,nhood_008,Bellevue,http://NeighborhoodAction.dc.gov,Bellevue,2003/04/10 00:00:00+00
4,-76.96766,38.853688,5,nhood_073,Knox Hill/Buena Vista,http://NeighborhoodAction.dc.gov,Knox Hill/Buena Vista,2003/04/10 00:00:00+00


Clean data and remove unwanted columns

In [5]:
dc_df2=dc_df.drop(["OBJECTID", "GIS_ID", "WEB_URL", "LABEL_NAME","DATELASTMODIFIED"], axis=1)

In [6]:
dc_df2.rename(columns={"X":"Longitude","Y":"Latitude","NAME":"Neighborhood"},inplace=True)
dc_df2.head()

Unnamed: 0,Longitude,Latitude,Neighborhood
0,-76.980348,38.855658,Fort Stanton
1,-76.99795,38.841077,Congress Heights
2,-76.995636,38.830237,Washington Highlands
3,-77.009271,38.826952,Bellevue
4,-76.96766,38.853688,Knox Hill/Buena Vista


In [7]:
address = 'Washington, DC'

geolocator = Nominatim(user_agent="dc_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Washington, DC are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Washington, DC are 38.8949924, -77.0365581.


Map of DC

In [8]:
map_dc = folium.Map(location=[latitude, longitude],zoom_start=12)

for lat, lng,  neighborhood in zip(dc_df2['Latitude'],dc_df2['Longitude'],dc_df2['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_dc) 
map_dc

Import Foursquare data

In [9]:
CLIENT_ID = 'R13WRBCQBMLZPCGOWXI22UWEMDMJFRJRGRBG5GPIMICBJLYB' # your Foursquare ID
CLIENT_SECRET = 'S3CMAEQTUTCK5WRMP2RSD2ECYYIKB21YWL5DU0021CBZXKKA' # your Foursquare Secret
radius = 100000
VERSION = '20180604'
LIMIT = 3000
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: R13WRBCQBMLZPCGOWXI22UWEMDMJFRJRGRBG5GPIMICBJLYB
CLIENT_SECRET:S3CMAEQTUTCK5WRMP2RSD2ECYYIKB21YWL5DU0021CBZXKKA


In [10]:
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=R13WRBCQBMLZPCGOWXI22UWEMDMJFRJRGRBG5GPIMICBJLYB&client_secret=S3CMAEQTUTCK5WRMP2RSD2ECYYIKB21YWL5DU0021CBZXKKA&ll=38.8949924,-77.0365581&v=20180604&radius=100000&limit=3000'

In [11]:
results = requests.get(url).json()

In [12]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [13]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Washington Monument,Monument / Landmark,38.889401,-77.035244
1,National Museum of African American History an...,History Museum,38.891171,-77.032818
2,World War II Memorial,Monument / Landmark,38.889377,-77.040516
3,The Hay-Adams,Hotel,38.90051,-77.036885
4,Renwick Gallery,Art Museum,38.898962,-77.039189


In [14]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [15]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Washington Monument,Monument / Landmark,38.889401,-77.035244
1,National Museum of African American History an...,History Museum,38.891171,-77.032818
2,World War II Memorial,Monument / Landmark,38.889377,-77.040516
3,The Hay-Adams,Hotel,38.900510,-77.036885
4,Renwick Gallery,Art Museum,38.898962,-77.039189
...,...,...,...,...
95,Old Town Alexandria,Neighborhood,38.805065,-77.047792
96,German Gourmet,German Restaurant,38.849107,-77.133170
97,"iLoveKickboxing - Falls Church, VA",Gym,38.869081,-77.145865
98,Port City Brewing Company,Brewery,38.807955,-77.101449


In [16]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


In [17]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
        
        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [18]:
dc_venues = getNearbyVenues(names=dc_df2['Neighborhood'],
                                   latitudes=dc_df2['Latitude'],
                                   longitudes=dc_df2['Longitude']
                                  )

Fort Stanton
Congress Heights
Washington Highlands
Bellevue
Knox Hill/Buena Vista
Shipley
Douglass
Woodland
Garfield Heights
Near Southeast
Capitol Hill
Dupont Park
Twining
Randle Highlands
Fairlawn
Penn Branch
Barry Farm
Historic Anacostia
Columbia Heights
Logan Circle/Shaw
Cardozo/Shaw
Van Ness
Forest Hills
Georgetown Reservoir
Foxhall Village
Fort Totten
Pleasant Hill
Kenilworth
Eastland Gardens
Deanwood
Fort Dupont
Greenway
Woodland-Normanstone
Mass. Ave. Heights
Naylor Gardens
Pleasant Plains
Hillsdale
Benning Ridge
Penn Quarter
Chinatown
Stronghold
South Central
Langston
Downtown East
North Portal Estates
Colonial Village
Shepherd Park
Takoma
Lamond Riggs
Petworth
Brightwood Park
Manor Park
Brightwood
Hawthorne
Barnaby Woods
Queens Chapel
Michigan Park
North Michigan Park
Woodridge
University Heights
Brookland
Edgewood
Skyland
Bloomingdale
Lincoln Park
16th Street Heights
Fort Lincoln
Gateway
Langdon
Brentwood
Eckington
Truxton Circle
Ivy City
Trinidad
Arboretum
Carver
Mount Vern

In [19]:
print(dc_venues.shape)
dc_venues.head()

(2568, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Fort Stanton,38.855658,-76.980348,Anacostia Community Museum,38.856728,-76.976899,Museum
1,Fort Stanton,38.855658,-76.980348,Puppy Playground,38.853616,-76.981894,Dog Run
2,Fort Stanton,38.855658,-76.980348,Stanton Road SE & Suitland Parkway SE,38.853278,-76.983289,Intersection
3,Fort Stanton,38.855658,-76.980348,Anacostia Art Gallery & Boutique,38.856265,-76.975281,Art Gallery
4,Fort Stanton,38.855658,-76.980348,Douglass Community Recreation Center,38.852218,-76.977411,Park


In [20]:
dc_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
16th Street Heights,16,16,16,16,16,16
Adams Morgan,61,61,61,61,61,61
American University Park,2,2,2,2,2,2
Arboretum,16,16,16,16,16,16
Barnaby Woods,4,4,4,4,4,4
...,...,...,...,...,...,...
West End,49,49,49,49,49,49
Woodland,4,4,4,4,4,4
Woodland-Normanstone,5,5,5,5,5,5
Woodley Park,23,23,23,23,23,23


In [21]:
print('There are {} uniques categories.'.format(len(dc_venues['Venue Category'].unique())))

There are 304 uniques categories.


In [22]:
# one hot encoding
dc_onehot = pd.get_dummies(dc_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dc_onehot['Neighborhood'] = dc_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [dc_onehot.columns[-1]] + list(dc_onehot.columns[:-1])
dc_onehot = dc_onehot[fixed_columns]

dc_onehot.head()

Unnamed: 0,Zoo Exhibit,ATM,Afghan Restaurant,Alternative Healer,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Art Gallery,Art Museum,...,Volleyball Court,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [23]:
dc_onehot.shape

(2568, 304)

In [24]:
dc_grouped = dc_onehot.groupby('Neighborhood').mean().reset_index()
dc_grouped

Unnamed: 0,Neighborhood,Zoo Exhibit,ATM,Afghan Restaurant,Alternative Healer,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Art Gallery,...,Volleyball Court,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio
0,16th Street Heights,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.00,...,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000
1,Adams Morgan,0.000000,0.0,0.016393,0.0,0.000000,0.0,0.0,0.0,0.00,...,0.0,0.0,0.016393,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000
2,American University Park,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.00,...,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000
3,Arboretum,0.000000,0.0,0.000000,0.0,0.062500,0.0,0.0,0.0,0.00,...,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000
4,Barnaby Woods,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.00,...,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
124,West End,0.000000,0.0,0.000000,0.0,0.040816,0.0,0.0,0.0,0.00,...,0.0,0.0,0.000000,0.020408,0.0,0.0,0.0,0.0,0.0,0.020408
125,Woodland,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.25,...,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000
126,Woodland-Normanstone,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.00,...,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000
127,Woodley Park,0.130435,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.00,...,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000


In [25]:
dc_grouped.shape

(129, 304)

In [26]:
num_top_venues = 5

for hood in dc_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = dc_grouped[dc_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----16th Street Heights----
                   venue  freq
0            Coffee Shop  0.06
1  Salvadoran Restaurant  0.06
2            Pizza Place  0.06
3           Soccer Field  0.06
4         Cosmetics Shop  0.06


----Adams Morgan----
                      venue  freq
0                       Spa  0.05
1            Ice Cream Shop  0.05
2  Mediterranean Restaurant  0.03
3               Coffee Shop  0.03
4      Ethiopian Restaurant  0.03


----American University Park----
                 venue  freq
0   Italian Restaurant   0.5
1            BBQ Joint   0.5
2          Zoo Exhibit   0.0
3    Other Repair Shop   0.0
4  Peruvian Restaurant   0.0


----Arboretum----
                  venue  freq
0                Garden  0.12
1  Fast Food Restaurant  0.06
2                 Hotel  0.06
3             Nightclub  0.06
4      Storage Facility  0.06


----Barnaby Woods----
                  venue  freq
0  Gym / Fitness Center  0.25
1                 Field  0.25
2             BBQ Joint  0.25
3     

In [27]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [28]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = dc_grouped['Neighborhood']

for ind in np.arange(dc_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dc_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,16th Street Heights,Bed & Breakfast,Greek Restaurant,Gymnastics Gym,Salvadoran Restaurant,Chinese Restaurant,Soccer Field,Coffee Shop,Pizza Place,Gym,Cosmetics Shop
1,Adams Morgan,Spa,Ice Cream Shop,Mediterranean Restaurant,Bar,Ethiopian Restaurant,Asian Restaurant,Coffee Shop,Pizza Place,Diner,Cocktail Bar
2,American University Park,BBQ Joint,Italian Restaurant,Yoga Studio,Flea Market,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market
3,Arboretum,Garden,Ice Cream Shop,Automotive Shop,Fast Food Restaurant,Chinese Restaurant,Nightclub,Gas Station,Storage Facility,Botanical Garden,Lake
4,Barnaby Woods,Gym / Fitness Center,BBQ Joint,Park,Field,Yoga Studio,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop


Cluster Nieghborhoods

from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

mms = MinMaxScaler()
mms.fit(dc_onehot)
data_transformed = mms.transform(dc_onehot)

Sum_of_squared_distances = []
K = range(1,20)
for k in K:
    km = KMeans(n_clusters=k)
    km = km.fit(data_transformed)
    Sum_of_squared_distances.append(km.inertia_)

plt.plot(K, Sum_of_squared_distances, 'bx-')
plt.xlabel('k')
plt.ylabel('Sum_of_squared_distances')
plt.title('Elbow Method For Optimal k')
plt.show()

In [59]:
# set number of clusters
kclusters = 4

dc_grouped_clustering = dc_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dc_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 5, 5, 1, 1, 1, 1, 3, 1], dtype=int32)

In [73]:
# add clustering labels
#reset_index('Cluster_Labels')
neighborhoods_venues_sorted.insert(0, 'Cluster_Labels9', kmeans.labels_)

dc_merged = dc_df2
dc_merged = dc_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
dc_merged.head() 

ValueError: cannot insert Cluster_Labels9, already exists

In [74]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

dc_merged=dc_merged.dropna()

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dc_merged['Latitude'], dc_merged['Longitude'], dc_merged['Neighborhood'], dc_merged['Cluster_Labels9']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Examine Clusters

In [64]:
dc_merged.loc[dc_merged['Cluster_Labels9'] == 0, dc_merged.columns[[2] + list(range(5, dc_merged.shape[1]))]]

Unnamed: 0,Neighborhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
30,Fort Dupont,Market,Skating Rink,Fish & Chips Shop,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Yoga Studio


In [65]:
dc_merged.loc[dc_merged['Cluster_Labels9'] == 1, dc_merged.columns[[2] + list(range(5, dc_merged.shape[1]))]]

Unnamed: 0,Neighborhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Congress Heights,Ice Cream Shop,Deli / Bodega,Health & Beauty Service,Convenience Store,Road,Tennis Court,American Restaurant,Intersection,Fried Chicken Joint
2,Washington Highlands,Liquor Store,Asian Restaurant,Seafood Restaurant,Basketball Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish Market
3,Bellevue,Pizza Place,Shoe Repair,Playground,Basketball Court,Exhibit,Eye Doctor,Falafel Restaurant,Farmers Market,Fast Food Restaurant
7,Woodland,Art Gallery,Park,Museum,Food Truck,Food Service,Food & Drink Shop,Food,Flower Shop,Fountain
9,Near Southeast,Coffee Shop,Gym / Fitness Center,Sushi Restaurant,Salad Place,Beer Garden,Taco Place,Bike Rental / Bike Share,Pool,Bagel Shop
14,Fairlawn,Fried Chicken Joint,Sandwich Place,Shop & Service,Deli / Bodega,Hostel,Hospital,Hotel,Exhibit,Eye Doctor
16,Barry Farm,Rental Car Location,Bus Station,Metro Station,Basketball Court,Intersection,Food Service,Food Truck,Food & Drink Shop,Food
18,Columbia Heights,Gym,Bakery,Convenience Store,Bar,Soccer Field,Shipping Store,Bus Stop,Shoe Store,Kids Store
23,Georgetown Reservoir,Deli / Bodega,Food Truck,Tennis Court,Lake,Fish & Chips Shop,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field
24,Foxhall Village,Tennis Court,Trail,Bus Station,Lake,Sandwich Place,Field,Eye Doctor,Falafel Restaurant,Farmers Market


In [66]:
dc_merged.loc[dc_merged['Cluster_Labels9'] == 2, dc_merged.columns[[2] + list(range(5, dc_merged.shape[1]))]]

Unnamed: 0,Neighborhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
130,Crestwood,Yoga Studio,Flea Market,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market


In [67]:
dc_merged.loc[dc_merged['Cluster_Labels9'] == 3, dc_merged.columns[[2] + list(range(5, dc_merged.shape[1]))]]

Unnamed: 0,Neighborhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
37,Benning Ridge,Convenience Store,Construction & Landscaping,Yoga Studio,Fish Market,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant
48,Lamond Riggs,Liquor Store,Baseball Field,Video Store,Bus Station,Convenience Store,Construction & Landscaping,Gas Station,State / Provincial Park,Smoke Shop
51,Manor Park,Liquor Store,Convenience Store,Park,Caribbean Restaurant,Construction & Landscaping,Baseball Field,Flower Shop,Flea Market,Food
53,Hawthorne,Business Service,Construction & Landscaping,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market


In [68]:
dc_merged.loc[dc_merged['Cluster_Labels9'] == 4, dc_merged.columns[[2] + list(range(5, dc_merged.shape[1]))]]

Unnamed: 0,Neighborhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
31,Greenway,Yoga Studio,Fish Market,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Flea Market
95,Spring Valley,Athletics & Sports,Yoga Studio,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop


In [69]:
dc_merged.loc[dc_merged['Cluster_Labels9'] == 5, dc_merged.columns[[2] + list(range(5, dc_merged.shape[1]))]]

Unnamed: 0,Neighborhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Fort Stanton,Museum,Intersection,American Restaurant,Dog Run,Art Gallery,Flower Shop,Fast Food Restaurant,Field,Filipino Restaurant
6,Douglass,Breakfast Spot,Pizza Place,Sandwich Place,Video Store,Women's Store,Food Service,Food & Drink Shop,Eye Doctor,Falafel Restaurant
10,Capitol Hill,American Restaurant,Deli / Bodega,Pizza Place,Coffee Shop,Italian Restaurant,Food Truck,Spa,Chinese Restaurant,Seafood Restaurant
13,Randle Highlands,Gym / Fitness Center,Bank,Seafood Restaurant,Sandwich Place,Field,Exhibit,Eye Doctor,Falafel Restaurant,Farmers Market
17,Historic Anacostia,Art Gallery,Comfort Food Restaurant,Grocery Store,Thrift / Vintage Store,Outdoor Sculpture,Bank,Convenience Store,History Museum,Fast Food Restaurant
19,Logan Circle/Shaw,Bar,Mexican Restaurant,Gym / Fitness Center,American Restaurant,Grocery Store,Dive Bar,Fried Chicken Joint,New American Restaurant,Cocktail Bar
20,Cardozo/Shaw,Bar,New American Restaurant,Grocery Store,Mexican Restaurant,Southern / Soul Food Restaurant,American Restaurant,Gym / Fitness Center,Gym,Thai Restaurant
21,Van Ness,Sandwich Place,Thai Restaurant,Sushi Restaurant,Market,Coffee Shop,Shipping Store,Mediterranean Restaurant,Noodle House,Performing Arts Venue
22,Forest Hills,Wine Shop,Grocery Store,Sandwich Place,Performing Arts Venue,Shipping Store,Flower Shop,Food & Drink Shop,Food Service,Event Service
26,Pleasant Hill,Dance Studio,Bus Stop,Chinese Restaurant,Liquor Store,Gym,Discount Store,Fish Market,Falafel Restaurant,Farmers Market


In [70]:
dc_merged.loc[dc_merged['Cluster_Labels9'] == 6, dc_merged.columns[[2] + list(range(5, dc_merged.shape[1]))]]

Unnamed: 0,Neighborhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Knox Hill/Buena Vista,Grocery Store,Convenience Store,Fish Market,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop
5,Shipley,Dance Studio,Performing Arts Venue,Liquor Store,Chinese Restaurant,Wings Joint,Food Service,Food & Drink Shop,Food,Food Truck
8,Garfield Heights,Bus Stop,Park,Wings Joint,Yoga Studio,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop
11,Dupont Park,Sandwich Place,Liquor Store,Seafood Restaurant,Intersection,Restaurant,Mobile Phone Shop,Bike Rental / Bike Share,Bank,Gym / Fitness Center
12,Twining,Bike Rental / Bike Share,Restaurant,Pharmacy,Convenience Store,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant
15,Penn Branch,Convenience Store,Laundromat,Boat or Ferry,Bike Rental / Bike Share,Yoga Studio,Flea Market,Farmers Market,Fast Food Restaurant,Field
34,Naylor Gardens,Liquor Store,Sandwich Place,Wings Joint,Coffee Shop,Shopping Mall,Bank,Grocery Store,Playground,Gym
36,Hillsdale,Dry Cleaner,Convenience Store,Spa,Eye Doctor,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant
41,South Central,Park,Convenience Store,Playground,Lounge,Yoga Studio,Filipino Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant
55,Queens Chapel,Convenience Store,Gym / Fitness Center,Liquor Store,Residential Building (Apartment / Condo),Food Service,Field,Exhibit,Eye Doctor,Falafel Restaurant


In [71]:
dc_merged.loc[dc_merged['Cluster_Labels9'] == 7, dc_merged.columns[[2] + list(range(5, dc_merged.shape[1]))]]

Unnamed: 0,Neighborhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
120,NE Boundary,Fish Market,Eye Doctor,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Flea Market


In [72]:
dc_merged.loc[dc_merged['Cluster_Labels9'] == 8, dc_merged.columns[[2] + list(range(5, dc_merged.shape[1]))]]

Unnamed: 0,Neighborhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
56,Michigan Park,Yoga Studio,Fish Market,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Flea Market
68,Langdon,Memorial Site,Dog Run,Yoga Studio,Fish Market,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop
114,Kingman Park,Taco Place,Intersection,Pool,Yoga Studio,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant
122,Grant Park,Park,Yoga Studio,Fish Market,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Flea Market
