## Exploring Neighborhoods of the Happiest and Saddest Cities in America (IBM Data Science Capstone Project)

#### By: Jessica Jacobson

### **Introduction**

**Background**

In March 2019, WalletHub conducted a study of 182 cities in the United States, ranking them according to the  happiness of their residents. Factors considered in this study included depression, suicide rates, average household income, poverty rate, crime rate, weather conditions in addition to other personal, social and financial factors. The happiest city in America was determined to be Plano, TX while the saddest was determined to be Detroit, MI. (View study results at https://wallethub.com/edu/happiest-places-to-live/32619/).

**Business Problem**

What do the "happiest" and "saddest" cities in America look like? Are they similar or do they reveal striking differences? In this study, we will analyze the neighborhoods and venues that make up both Plano, TX and Detroit, MI. We will find if the "happiness" of these cities show any possible relation to their physical makeup.

**Interested Audience**

This study provides valuable information for city planners and public servants who could make decisions to facilitate a "happier" city for their citizens. Business owners would also benefit from understanding the impact their venues can have on the morale of their neighbors. Lastly, as people are relocating, they can identify the neighborhood types to both embrace and avoid. 

### **Data Section**

**Description of the Data**

A list of Neighbohoods in Plano, TX was found on www.neighborhoodscout.com. This list of neighborhoods, along with latitiude and longitude coordinates from Google Maps was used to create a Plano, TX csv file. 

Likewise, a list of neighborhoods in Detroit, MI was downloaded from https://data.detroitmi.gov and corresponding coordinates were found using Google Maps to create a Detroit, MI csv file.

By using the Foursquare API, we have downloaded a list of venues in each neighborhood for both Plano, TX and Detroit, MI. 

**How the Data will be used to solve the problem**

By using the Foursquare API interface, we will import the top venues for each neighborhood in Plano, TX and Detroit, MI. Using the top five venues in each neighborhood, we will use k-means clustering to find patterns for each city. Folium and Matplotlib will be used to visualize these clusters on a map of these cities. We will also determine the top venues categories and how often these venue types appear in each city. Using this data, we will draw comparisons between these two cities.

### Import Necessary Libraries

In [1]:
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import requests #library to handle requests

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

import json # library to handle JSON files
from pandas.io.json import json_normalize

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print ("Libraries Imported Successfully")

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries Imported Successfully


### Define Foursquare Credentials

In [2]:
CLIENT_ID = 'BGX0JOM3OLODMOLQHQRFIOSI0PPTJJ1M0MH3WZN33ZGLYAU5' # your Foursquare ID
CLIENT_SECRET = 'TOWB1F3GBWTGVHU3ALB13TYSRQKX0OV4KPIKISBVUE5MCXZS' # your Foursquare Secret
VERSION = '20190101' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: BGX0JOM3OLODMOLQHQRFIOSI0PPTJJ1M0MH3WZN33ZGLYAU5
CLIENT_SECRET:TOWB1F3GBWTGVHU3ALB13TYSRQKX0OV4KPIKISBVUE5MCXZS


#### Load 56 Nieghborhoods and Corresponding Coordinates for Plano, TX (Happiest City in US)

In [3]:
Pl_geo=pd.read_csv("Plano TX Neighborhoods.csv")
print (Pl_geo.shape)
Pl_geo.head()

(56, 4)


Unnamed: 0,Neighborhood,Zip,Latitude,Longitude
0,"PLANO, TX (ROUNDROCK TRL / CROSS BEND RD)",75023,33.049031,-96.744917
1,"PLANO, TX (INDEPENDENCE PKY / CROSS BEND RD)",75023,33.04923,-96.75263
2,"PLANO, TX (CROSS BEND RD / RAINIER RD)",75023,33.049634,-96.721493
3,"PLANO, TX (BRANCH HOLLOW DR / GOODWIN DR)",75023,33.054023,-96.713712
4,"PLANO, TX (ALMA DR / W SPRING CREEK PKY)",75023,33.056587,-96.710126


### Find coordinates for Plano, TX

In [4]:
address = 'Plano, TX'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Plano, TX are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Plano, TX are 33.0136764, -96.6925096.


### Visualize zip codes on map in Plano, TX

In [5]:
# create map of Plano, TX using latitude and longitude values
map_Plano = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers for zip codes in Plano, TX
for lat, lng, label in zip(Pl_geo['Latitude'], Pl_geo['Longitude'], Pl_geo['Zip']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Plano)  
    
map_Plano

### Let's Explore nearby venues for each neighborhood in a new dataset

In [6]:
def getNearbyVenues(names, zipcodes, latitudes, longitudes):
    radius=2000
    LIMIT=100
    venues_list=[]
    for name, zipcode, lat, lng in zip(names, zipcodes, latitudes, longitudes):
    
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            zipcode,
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood',
                    'Zip', 
                     'Venue', 
                      'Venue Latitude', 
                      'Venue Longitude', 
                      'Venue Category']
    
    return(nearby_venues)

In [7]:
plano_venues = getNearbyVenues(names=Pl_geo['Neighborhood'],
                               zipcodes=Pl_geo['Zip'],
                                   latitudes=Pl_geo['Latitude'],
                                   longitudes=Pl_geo['Longitude']
                                  )

In [8]:
print(plano_venues.shape)
plano_venues.head()

(4105, 6)


Unnamed: 0,Neighborhood,Zip,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"PLANO, TX (ROUNDROCK TRL / CROSS BEND RD)",75023,Mama's Pizza,33.040731,-96.735681,Pizza Place
1,"PLANO, TX (ROUNDROCK TRL / CROSS BEND RD)",75023,First Watch - Plano,33.039862,-96.733903,Breakfast Spot
2,"PLANO, TX (ROUNDROCK TRL / CROSS BEND RD)",75023,El Norte Grill,33.04137,-96.73621,Mexican Restaurant
3,"PLANO, TX (ROUNDROCK TRL / CROSS BEND RD)",75023,The Latin Pig,33.039994,-96.733513,Cuban Restaurant
4,"PLANO, TX (ROUNDROCK TRL / CROSS BEND RD)",75023,Country Burger,33.04143,-96.753607,Burger Joint


In [9]:
plano_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Zip,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
"PLANO, TX (14TH ST / K AVE)",100,100,100,100,100
"PLANO, TX (ALMA DR / HEDGCOXE RD)",27,27,27,27,27
"PLANO, TX (ALMA DR / LEGACY DR)",76,76,76,76,76
"PLANO, TX (ALMA DR / W PARK BLVD)",100,100,100,100,100
"PLANO, TX (ALMA DR / W PARKER RD)",100,100,100,100,100
"PLANO, TX (ALMA DR / W SPRING CREEK PKY)",100,100,100,100,100
"PLANO, TX (BRANCH HOLLOW DR / GOODWIN DR)",100,100,100,100,100
"PLANO, TX (CITY CENTER)",100,100,100,100,100
"PLANO, TX (COIT RD / HEDGCOXE RD)",61,61,61,61,61
"PLANO, TX (COIT RD / LEGACY DR)",79,79,79,79,79


In [10]:
# one hot encoding
plano_onehot = pd.get_dummies(plano_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
plano_onehot['Neighborhood'] = plano_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [plano_onehot.columns[-1]] + list(plano_onehot.columns[:-1])
plano_onehot = plano_onehot[fixed_columns]

plano_onehot.shape

(4105, 225)

In [11]:
plano_group = plano_onehot.groupby('Neighborhood').mean().reset_index()
plano_group.head()

Unnamed: 0,Neighborhood,Accessories Store,African Restaurant,American Restaurant,Antique Shop,Arcade,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,...,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Weight Loss Center,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"PLANO, TX (14TH ST / K AVE)",0.0,0.0,0.09,0.0,0.0,0.01,0.01,0.0,0.0,...,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0
1,"PLANO, TX (ALMA DR / HEDGCOXE RD)",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"PLANO, TX (ALMA DR / LEGACY DR)",0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,...,0.039474,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"PLANO, TX (ALMA DR / W PARK BLVD)",0.0,0.0,0.05,0.0,0.0,0.01,0.01,0.0,0.0,...,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0
4,"PLANO, TX (ALMA DR / W PARKER RD)",0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,...,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0


In [12]:
num_top_venues = 5

for hood in plano_group['Neighborhood']:
    print(hood)
    temp = plano_group[plano_group['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

PLANO, TX (14TH ST / K AVE) 
                 venue  freq
0  American Restaurant  0.09
1   Mexican Restaurant  0.08
2          Coffee Shop  0.05
3                  Bar  0.04
4           Restaurant  0.03


PLANO, TX (ALMA DR / HEDGCOXE RD)
                  venue  freq
0        Baseball Field  0.11
1  Fast Food Restaurant  0.11
2                  Park  0.07
3           Pizza Place  0.07
4    Mexican Restaurant  0.07


PLANO, TX (ALMA DR / LEGACY DR)
                  venue  freq
0  Fast Food Restaurant  0.09
1    Chinese Restaurant  0.07
2   Japanese Restaurant  0.05
3         Grocery Store  0.04
4           Video Store  0.04


PLANO, TX (ALMA DR / W PARK BLVD)
                 venue  freq
0   Mexican Restaurant  0.07
1          Coffee Shop  0.06
2  American Restaurant  0.05
3     Sushi Restaurant  0.03
4       Sandwich Place  0.03


PLANO, TX (ALMA DR / W PARKER RD) 
                venue  freq
0  Mexican Restaurant  0.06
1  Chinese Restaurant  0.04
2         Coffee Shop  0.04
3      I

In [13]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [14]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
pl_venues_sorted = pd.DataFrame(columns=columns)
pl_venues_sorted['Neighborhood'] = plano_group['Neighborhood']

for ind in np.arange(plano_group.shape[0]):
    pl_venues_sorted.iloc[ind, 1:] = return_most_common_venues(plano_group.iloc[ind, :], num_top_venues)

pl_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"PLANO, TX (14TH ST / K AVE)",American Restaurant,Mexican Restaurant,Coffee Shop,Bar,Restaurant,Deli / Bodega,Furniture / Home Store,Sushi Restaurant,Food Truck,Breakfast Spot
1,"PLANO, TX (ALMA DR / HEDGCOXE RD)",Baseball Field,Fast Food Restaurant,Mexican Restaurant,Pizza Place,Park,BBQ Joint,Convenience Store,Salon / Barbershop,Grocery Store,Chinese Restaurant
2,"PLANO, TX (ALMA DR / LEGACY DR)",Fast Food Restaurant,Chinese Restaurant,Japanese Restaurant,Indian Restaurant,Grocery Store,Video Store,Convenience Store,Mexican Restaurant,Bank,Pizza Place
3,"PLANO, TX (ALMA DR / W PARK BLVD)",Mexican Restaurant,Coffee Shop,American Restaurant,Sushi Restaurant,Sandwich Place,Restaurant,Furniture / Home Store,Breakfast Spot,Fast Food Restaurant,Thai Restaurant
4,"PLANO, TX (ALMA DR / W PARKER RD)",Mexican Restaurant,Chinese Restaurant,Coffee Shop,Ice Cream Shop,Supermarket,Discount Store,Korean Restaurant,Breakfast Spot,Bubble Tea Shop,Burger Joint
5,"PLANO, TX (ALMA DR / W SPRING CREEK PKY)",Chinese Restaurant,Mexican Restaurant,Indian Restaurant,Coffee Shop,Grocery Store,Fast Food Restaurant,Vietnamese Restaurant,Japanese Restaurant,Burger Joint,Pizza Place
6,"PLANO, TX (BRANCH HOLLOW DR / GOODWIN DR)",Chinese Restaurant,Mexican Restaurant,Fast Food Restaurant,Video Store,Coffee Shop,Pharmacy,Pizza Place,Indian Restaurant,Discount Store,Thai Restaurant
7,"PLANO, TX (CITY CENTER)",Mexican Restaurant,American Restaurant,Coffee Shop,Bar,Restaurant,Pizza Place,Fast Food Restaurant,Taco Place,Furniture / Home Store,Convenience Store
8,"PLANO, TX (COIT RD / HEDGCOXE RD)",Fast Food Restaurant,Pizza Place,Trail,Indian Restaurant,Video Store,Grocery Store,Bank,Bakery,Rental Car Location,Sushi Restaurant
9,"PLANO, TX (COIT RD / LEGACY DR)",Indian Restaurant,Pizza Place,Video Store,Chinese Restaurant,Bank,Sandwich Place,Mexican Restaurant,Halal Restaurant,Pet Store,South Indian Restaurant


In [15]:
# set number of clusters
kclusters = 5

plano_grouped_clustering = plano_group.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(plano_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 
                             

array([1, 4, 3, 1, 1, 3, 3, 1, 3, 3], dtype=int32)

In [16]:
# add clustering labels
pl_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

plano_grouped_clustering = Pl_geo

# merge plano_grouped with plano_data to add latitude/longitude for each neighborhood
plano_grouped_clustering = plano_grouped_clustering.join(pl_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

plano_grouped_clustering.head()

Unnamed: 0,Neighborhood,Zip,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"PLANO, TX (ROUNDROCK TRL / CROSS BEND RD)",75023,33.049031,-96.744917,3,Pizza Place,Park,Pharmacy,Grocery Store,Burger Joint,Fast Food Restaurant,Indian Restaurant,Mexican Restaurant,Toy / Game Store,Thai Restaurant
1,"PLANO, TX (INDEPENDENCE PKY / CROSS BEND RD)",75023,33.04923,-96.75263,3,Pizza Place,Fast Food Restaurant,Indian Restaurant,Burger Joint,Chinese Restaurant,Coffee Shop,Grocery Store,Convenience Store,Park,Automotive Shop
2,"PLANO, TX (CROSS BEND RD / RAINIER RD)",75023,33.049634,-96.721493,3,Chinese Restaurant,Pizza Place,Coffee Shop,Fast Food Restaurant,Burger Joint,Pharmacy,Indian Restaurant,Mexican Restaurant,Clothing Store,Sandwich Place
3,"PLANO, TX (BRANCH HOLLOW DR / GOODWIN DR)",75023,33.054023,-96.713712,3,Chinese Restaurant,Mexican Restaurant,Fast Food Restaurant,Video Store,Coffee Shop,Pharmacy,Pizza Place,Indian Restaurant,Discount Store,Thai Restaurant
4,"PLANO, TX (ALMA DR / W SPRING CREEK PKY)",75023,33.056587,-96.710126,3,Chinese Restaurant,Mexican Restaurant,Indian Restaurant,Coffee Shop,Grocery Store,Fast Food Restaurant,Vietnamese Restaurant,Japanese Restaurant,Burger Joint,Pizza Place


In [17]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(plano_grouped_clustering['Latitude'], plano_grouped_clustering['Longitude'], 
                                  plano_grouped_clustering['Neighborhood'], plano_grouped_clustering['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [18]:
# Determine what are the most common venues in Cluster #1
plano_grouped_clustering.loc[plano_grouped_clustering['Cluster Labels'] == 0, 
                             plano_grouped_clustering.columns[[1] + list(range(5, plano_grouped_clustering.shape[1]))]].head()


Unnamed: 0,Zip,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,75024,Park,Restaurant,Pharmacy,Bank,Sandwich Place,Bakery,Salon / Barbershop,Convenience Store,Pizza Place,Mexican Restaurant
10,75024,Gym,Pizza Place,Sandwich Place,Gym / Fitness Center,Italian Restaurant,Mexican Restaurant,Park,Bar,Bank,Mobile Phone Shop
11,75024,Hotel,New American Restaurant,Mexican Restaurant,Pizza Place,Mediterranean Restaurant,Pub,Sushi Restaurant,Ice Cream Shop,Coffee Shop,Sandwich Place
12,75024,Hotel,Coffee Shop,Fast Food Restaurant,Park,Convenience Store,Furniture / Home Store,BBQ Joint,Beer Garden,Bar,Gym
13,75024,Convenience Store,Playground,Indian Restaurant,Sandwich Place,Other Repair Shop,Burger Joint,Café,Field,Office,Optical Shop


In [19]:
# Determine what are the most common venues in Cluster #2
plano_grouped_clustering.loc[plano_grouped_clustering['Cluster Labels'] == 1, 
                             plano_grouped_clustering.columns[[1] + list(range(5, plano_grouped_clustering.shape[1]))]].head()


Unnamed: 0,Zip,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
31,75074,American Restaurant,Mexican Restaurant,Coffee Shop,Bar,Restaurant,Deli / Bodega,Furniture / Home Store,Sushi Restaurant,Food Truck,Breakfast Spot
32,75074,Mexican Restaurant,American Restaurant,Coffee Shop,Bar,Restaurant,Pizza Place,Fast Food Restaurant,Taco Place,Furniture / Home Store,Convenience Store
34,75074,Mexican Restaurant,Fried Chicken Joint,Gas Station,Pizza Place,Grocery Store,Korean Restaurant,Thrift / Vintage Store,Sandwich Place,Burger Joint,Fast Food Restaurant
35,75074,Mexican Restaurant,Pharmacy,Grocery Store,Baseball Stadium,Gymnastics Gym,Chinese Restaurant,Lake,Pizza Place,Coffee Shop,Donut Shop
36,75074,Fried Chicken Joint,Mexican Restaurant,Convenience Store,Pharmacy,Bakery,Trail,Chinese Restaurant,Korean Restaurant,Park,Dessert Shop


In [20]:
# Determine what are the most common venues in Cluster #3
plano_grouped_clustering.loc[plano_grouped_clustering['Cluster Labels'] == 2, 
                             plano_grouped_clustering.columns[[1] + list(range(5, plano_grouped_clustering.shape[1]))]].head()


Unnamed: 0,Zip,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
24,75025,Pizza Place,Video Store,Bank,Convenience Store,Sandwich Place,Tex-Mex Restaurant,Ice Cream Shop,Gas Station,Donut Shop,Pharmacy
25,75025,Pizza Place,Video Store,Indian Restaurant,Fast Food Restaurant,Bank,Convenience Store,Gym,Pharmacy,Discount Store,Park
26,75025,Baseball Field,Trail,Indian Restaurant,Sandwich Place,Video Store,Gas Station,Park,Fast Food Restaurant,Pizza Place,Supermarket
27,75025,Video Store,Convenience Store,Italian Restaurant,Gas Station,Baseball Field,Pharmacy,Sandwich Place,Indian Restaurant,Bank,Fast Food Restaurant
28,75025,Shipping Store,Video Store,Athletics & Sports,Fast Food Restaurant,Gym,Bank,Convenience Store,Mexican Restaurant,Sandwich Place,Dessert Shop


In [21]:
# Determine what are the most common venues in Cluster #4
plano_grouped_clustering.loc[plano_grouped_clustering['Cluster Labels'] == 3, 
                             plano_grouped_clustering.columns[[1] + list(range(5, plano_grouped_clustering.shape[1]))]].head()


Unnamed: 0,Zip,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,75023,Pizza Place,Park,Pharmacy,Grocery Store,Burger Joint,Fast Food Restaurant,Indian Restaurant,Mexican Restaurant,Toy / Game Store,Thai Restaurant
1,75023,Pizza Place,Fast Food Restaurant,Indian Restaurant,Burger Joint,Chinese Restaurant,Coffee Shop,Grocery Store,Convenience Store,Park,Automotive Shop
2,75023,Chinese Restaurant,Pizza Place,Coffee Shop,Fast Food Restaurant,Burger Joint,Pharmacy,Indian Restaurant,Mexican Restaurant,Clothing Store,Sandwich Place
3,75023,Chinese Restaurant,Mexican Restaurant,Fast Food Restaurant,Video Store,Coffee Shop,Pharmacy,Pizza Place,Indian Restaurant,Discount Store,Thai Restaurant
4,75023,Chinese Restaurant,Mexican Restaurant,Indian Restaurant,Coffee Shop,Grocery Store,Fast Food Restaurant,Vietnamese Restaurant,Japanese Restaurant,Burger Joint,Pizza Place


In [22]:
# Determine what are the most common venues in Cluster #5
plano_grouped_clustering.loc[plano_grouped_clustering['Cluster Labels'] == 4, 
                             plano_grouped_clustering.columns[[1] + list(range(5, plano_grouped_clustering.shape[1]))]].head()


Unnamed: 0,Zip,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,75023,Fast Food Restaurant,Park,Pharmacy,Pizza Place,Indian Restaurant,Playground,Video Store,Thai Restaurant,Mexican Restaurant,Chinese Restaurant
7,75023,Park,Pizza Place,Fast Food Restaurant,Pharmacy,Indian Restaurant,Thai Restaurant,Convenience Store,Video Store,American Restaurant,Farmers Market
21,75025,Pizza Place,Park,Video Store,Pharmacy,Thai Restaurant,Fast Food Restaurant,Indian Restaurant,Convenience Store,Tex-Mex Restaurant,Donut Shop
22,75025,Fast Food Restaurant,Pizza Place,Video Store,Gas Station,Convenience Store,Sandwich Place,Grocery Store,Bank,Indian Restaurant,Park
23,75025,Park,Pharmacy,Pizza Place,Convenience Store,Video Store,Fast Food Restaurant,Salon / Barbershop,Italian Restaurant,Bakery,Discount Store


### Find the most common venues overall in the city of Plano, TX.

In [23]:
plano_venues_tot = getNearbyVenues(names=Pl_geo['Neighborhood'],
                               zipcodes=Pl_geo['Zip'],
                                   latitudes=Pl_geo['Latitude'],
                                   longitudes=Pl_geo['Longitude']
                                  )

In [24]:
print(plano_venues_tot.shape)
plano_venues_tot.head()

(4105, 6)


Unnamed: 0,Neighborhood,Zip,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"PLANO, TX (ROUNDROCK TRL / CROSS BEND RD)",75023,Mama's Pizza,33.040731,-96.735681,Pizza Place
1,"PLANO, TX (ROUNDROCK TRL / CROSS BEND RD)",75023,First Watch - Plano,33.039862,-96.733903,Breakfast Spot
2,"PLANO, TX (ROUNDROCK TRL / CROSS BEND RD)",75023,El Norte Grill,33.04137,-96.73621,Mexican Restaurant
3,"PLANO, TX (ROUNDROCK TRL / CROSS BEND RD)",75023,The Latin Pig,33.039994,-96.733513,Cuban Restaurant
4,"PLANO, TX (ROUNDROCK TRL / CROSS BEND RD)",75023,Country Burger,33.04143,-96.753607,Burger Joint


In [25]:
plano_tx_venues=plano_venues_tot.drop(['Neighborhood','Zip', 'Venue Latitude', 'Venue Longitude'], 1)


In [26]:
plano=plano_tx_venues.groupby('Venue Category').count().sort_values(by='Venue', ascending=False)
plano.head()

Unnamed: 0_level_0,Venue
Venue Category,Unnamed: 1_level_1
Pizza Place,160
Fast Food Restaurant,149
Mexican Restaurant,141
Coffee Shop,117
Chinese Restaurant,116


In [27]:
#Convert venue category totals to a percentage of the sum of all total values
plano_pct = plano/plano[plano.columns].sum()*100
plano_pct.rename(columns={'Venue': '% Venues'}, inplace=True)
plano_pct.head(20)

Unnamed: 0_level_0,% Venues
Venue Category,Unnamed: 1_level_1
Pizza Place,3.897686
Fast Food Restaurant,3.62972
Mexican Restaurant,3.434836
Coffee Shop,2.850183
Chinese Restaurant,2.825822
Sandwich Place,2.679659
Video Store,2.362972
Pharmacy,2.241169
Indian Restaurant,2.143727
Convenience Store,2.046285
