## Exploring Neighborhoods of the Happiest and Saddest Cities in America (IBM Data Science Capstone Project)

#### By: Jessica Jacobson

### **Introduction**

**Background**

In March 2019, WalletHub published a study of 182 cities in the United States, ranking them according to the happiness of their residents. Factors considered in this study included depression, suicide rates, average household income, poverty rate, crime rate, weather conditions, in addition to other personal, social and financial factors. The happiest city in America was determined to be Plano, TX while the saddest was determined to be Detroit, MI. (View study results at https://wallethub.com/edu/happiest-places-to-live/32619/).

**Business Problem**

What do the "happiest" and "saddest" cities in America look like? Are they similar or do they reveal differences? In this study, we will analyze the neighborhoods and venues that make up both Plano, TX and Detroit, MI. We will find if the contrasting "happiness" of these cities show any possible relation to their physical makeup.

**Interested Audience**

This study provides valuable information for city planners and public servants who could make decisions to facilitate a "happier" city for their citizens. Business owners would also benefit from understanding the impact their venues can have on the morale of their neighbors. Lastly, as people are relocating, they can identify the neighborhood types to both embrace and avoid. 

### **Data Section**

**Description of the Data**

A list of Neighborhoods in Plano, TX was found on www.neighborhoodscout.com. This list of neighborhoods, along with latitude and longitude coordinates from Google Maps was entered into a spreadsheet for Plano, TX.

Likewise, a list of neighborhoods in Detroit, MI was downloaded from https://data.detroitmi.gov and corresponding coordinates were found using Google Maps a spreadsheet of data for Detroit, MI.

By using the Foursquare API, we have downloaded a list of venues in each neighborhood for both Plano, TX and Detroit, MI. 

**How the Data will be used to solve the problem**

By using the Foursquare API interface, we will import the top venues for each neighborhood in Plano, TX and Detroit, MI. Using the top five venues in each neighborhood, we will use k-means clustering to find patterns for neighborhoods in each city. Folium and Matplotlib will be used to visualize these clusters on a map of these cities. We will also determine the top venues categories present and how often these venue types appear in each city. Using this data, we will draw comparisons between these two cities.

### Import Necessary Libraries

In [1]:
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import requests #library to handle requests

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

import json # library to handle JSON files
from pandas.io.json import json_normalize

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print ("Libraries Imported Successfully")

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries Imported Successfully


### Define Foursquare Credentials

We will import data using the Foursquare API for venues in the selected cities.

In [2]:
CLIENT_ID = 'BGX0JOM3OLODMOLQHQRFIOSI0PPTJJ1M0MH3WZN33ZGLYAU5' 
CLIENT_SECRET = 'TOWB1F3GBWTGVHU3ALB13TYSRQKX0OV4KPIKISBVUE5MCXZS' 
VERSION = '20190101' 

print('MY credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

MY credentials:
CLIENT_ID: BGX0JOM3OLODMOLQHQRFIOSI0PPTJJ1M0MH3WZN33ZGLYAU5
CLIENT_SECRET:TOWB1F3GBWTGVHU3ALB13TYSRQKX0OV4KPIKISBVUE5MCXZS


## Plano, TX

### Data Processing

#### Load 56 Neighborhoods and Corresponding Coordinates for Plano, TX (Happiest City in US)

In [3]:
Pl_geo=pd.read_csv("Plano TX Neighborhoods.csv")
print (Pl_geo.shape)
Pl_geo.head()

(56, 4)


Unnamed: 0,Neighborhood,Zip,Latitude,Longitude
0,"PLANO, TX (ROUNDROCK TRL / CROSS BEND RD)",75023,33.049031,-96.744917
1,"PLANO, TX (INDEPENDENCE PKY / CROSS BEND RD)",75023,33.04923,-96.75263
2,"PLANO, TX (CROSS BEND RD / RAINIER RD)",75023,33.049634,-96.721493
3,"PLANO, TX (BRANCH HOLLOW DR / GOODWIN DR)",75023,33.054023,-96.713712
4,"PLANO, TX (ALMA DR / W SPRING CREEK PKY)",75023,33.056587,-96.710126


### Find coordinates for Plano, TX using Geopy Nominatum.

In [4]:
address = 'Plano, TX'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinates of Plano, TX are {}, {}.'.format(latitude, longitude))

The geographical coordinates of Plano, TX are 33.0136764, -96.6925096.


### Visualize neighborhoods on map in Plano, TX

In [5]:
# create map of Plano, TX using latitude and longitude values
map_Plano = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers for zip codes in Plano, TX
for lat, lng, label in zip(Pl_geo['Latitude'], Pl_geo['Longitude'], Pl_geo['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Plano)  
    
map_Plano

### Let's Explore nearby venues for each neighborhood in a new dataset

In [6]:
def getNearbyVenues(names, zipcodes, latitudes, longitudes):
    radius=2000
    LIMIT=100
    venues_list=[]
    for name, zipcode, lat, lng in zip(names, zipcodes, latitudes, longitudes):
    
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            zipcode,
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood',
                    'Zip', 
                     'Venue', 
                      'Venue Latitude', 
                      'Venue Longitude', 
                      'Venue Category']
    
    return(nearby_venues)

In [7]:
plano_venues = getNearbyVenues(names=Pl_geo['Neighborhood'],
                               zipcodes=Pl_geo['Zip'],
                                   latitudes=Pl_geo['Latitude'],
                                   longitudes=Pl_geo['Longitude']
                                  )

In [8]:
print(plano_venues.shape)
plano_venues.head()

(4108, 6)


Unnamed: 0,Neighborhood,Zip,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"PLANO, TX (ROUNDROCK TRL / CROSS BEND RD)",75023,Mama's Pizza,33.040731,-96.735681,Pizza Place
1,"PLANO, TX (ROUNDROCK TRL / CROSS BEND RD)",75023,First Watch - Plano,33.039862,-96.733903,Breakfast Spot
2,"PLANO, TX (ROUNDROCK TRL / CROSS BEND RD)",75023,El Norte Grill,33.04137,-96.73621,Mexican Restaurant
3,"PLANO, TX (ROUNDROCK TRL / CROSS BEND RD)",75023,The Latin Pig,33.039994,-96.733513,Cuban Restaurant
4,"PLANO, TX (ROUNDROCK TRL / CROSS BEND RD)",75023,Country Burger,33.04143,-96.753607,Burger Joint


#### We have built a dataframe of 4105 venues spanning 56 neighborhoods in Plano, TX. Below you will see how many venues from each neighborhoods are repesented.

In [9]:
plano_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Zip,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
"PLANO, TX (14TH ST / K AVE)",100,100,100,100,100
"PLANO, TX (ALMA DR / HEDGCOXE RD)",28,28,28,28,28
"PLANO, TX (ALMA DR / LEGACY DR)",76,76,76,76,76
"PLANO, TX (ALMA DR / W PARK BLVD)",100,100,100,100,100
"PLANO, TX (ALMA DR / W PARKER RD)",100,100,100,100,100
"PLANO, TX (ALMA DR / W SPRING CREEK PKY)",100,100,100,100,100
"PLANO, TX (BRANCH HOLLOW DR / GOODWIN DR)",100,100,100,100,100
"PLANO, TX (CITY CENTER)",100,100,100,100,100
"PLANO, TX (COIT RD / HEDGCOXE RD)",60,60,60,60,60
"PLANO, TX (COIT RD / LEGACY DR)",80,80,80,80,80


### Methodology

Now that we have gathered a dataframe of venues in each neighborhoods in Plano, TX, we will sort our neighborhoods into similar groupings using the machine learning technique, k means clustering. After these clusters have been formed and visualized on a map of the city of Plano, we will look closely at which venue categories are most prevalent in each cluster of neighborhoods. Lastly, we will look at an overall list of the most common venues in the city of Plano. Since Detroit, MI is a larger city than Plano, TX and our dataframe consists of more venues, we will convert our list of venue totals in each category to a percentage of all the venues imported from FourSquare for each city. 

### Analysis

Now that we have collected our data, we can anaylze it to find the answer to our business problem. Next, we manipulate the data to show the specific venue categories in each neighborhood and take the mean of the frequency of the occurance for each category.

In [10]:
# one hot encoding
plano_onehot = pd.get_dummies(plano_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
plano_onehot['Neighborhood'] = plano_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [plano_onehot.columns[-1]] + list(plano_onehot.columns[:-1])
plano_onehot = plano_onehot[fixed_columns]

plano_onehot.shape

(4108, 227)

In [11]:
plano_group = plano_onehot.groupby('Neighborhood').mean().reset_index()
plano_group.head()

Unnamed: 0,Neighborhood,Accessories Store,African Restaurant,Airport Service,American Restaurant,Antique Shop,Arcade,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,...,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Weight Loss Center,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"PLANO, TX (14TH ST / K AVE)",0.0,0.0,0.0,0.08,0.0,0.0,0.01,0.01,0.0,...,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0
1,"PLANO, TX (ALMA DR / HEDGCOXE RD)",0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,...,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"PLANO, TX (ALMA DR / LEGACY DR)",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,...,0.039474,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0
3,"PLANO, TX (ALMA DR / W PARK BLVD)",0.0,0.0,0.0,0.05,0.0,0.0,0.01,0.01,0.0,...,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0
4,"PLANO, TX (ALMA DR / W PARKER RD)",0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,...,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0


#### Let's print each neighborhood along with the top 5 most common venues.

In [12]:
num_top_venues = 5

for hood in plano_group['Neighborhood']:
    print(hood)
    temp = plano_group[plano_group['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

PLANO, TX (14TH ST / K AVE) 
                 venue  freq
0  American Restaurant  0.08
1   Mexican Restaurant  0.08
2          Coffee Shop  0.05
3                  Bar  0.04
4     Sushi Restaurant  0.03


PLANO, TX (ALMA DR / HEDGCOXE RD)
                  venue  freq
0        Baseball Field  0.11
1  Fast Food Restaurant  0.11
2                  Park  0.11
3    Salon / Barbershop  0.07
4           Pizza Place  0.07


PLANO, TX (ALMA DR / LEGACY DR)
                  venue  freq
0  Fast Food Restaurant  0.09
1    Chinese Restaurant  0.07
2     Indian Restaurant  0.05
3         Grocery Store  0.05
4   Japanese Restaurant  0.05


PLANO, TX (ALMA DR / W PARK BLVD)
                    venue  freq
0      Mexican Restaurant  0.07
1             Coffee Shop  0.06
2     American Restaurant  0.05
3  Furniture / Home Store  0.03
4              Restaurant  0.03


PLANO, TX (ALMA DR / W PARKER RD) 
                venue  freq
0  Mexican Restaurant  0.06
1         Coffee Shop  0.04
2  Chinese Restaur

#### Create a new dataframe showing the top 10 venues for each neighborhood.

In [13]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [14]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
pl_venues_sorted = pd.DataFrame(columns=columns)
pl_venues_sorted['Neighborhood'] = plano_group['Neighborhood']

for ind in np.arange(plano_group.shape[0]):
    pl_venues_sorted.iloc[ind, 1:] = return_most_common_venues(plano_group.iloc[ind, :], num_top_venues)

pl_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"PLANO, TX (14TH ST / K AVE)",Mexican Restaurant,American Restaurant,Coffee Shop,Bar,Deli / Bodega,Sushi Restaurant,Furniture / Home Store,Restaurant,Italian Restaurant,Electronics Store
1,"PLANO, TX (ALMA DR / HEDGCOXE RD)",Fast Food Restaurant,Park,Baseball Field,Salon / Barbershop,Pizza Place,BBQ Joint,Intersection,Golf Course,Grocery Store,Pharmacy
2,"PLANO, TX (ALMA DR / LEGACY DR)",Fast Food Restaurant,Chinese Restaurant,Japanese Restaurant,Grocery Store,Indian Restaurant,Video Store,Convenience Store,Food Court,BBQ Joint,Mexican Restaurant
3,"PLANO, TX (ALMA DR / W PARK BLVD)",Mexican Restaurant,Coffee Shop,American Restaurant,Restaurant,Sushi Restaurant,Furniture / Home Store,Fast Food Restaurant,Sandwich Place,German Restaurant,Deli / Bodega
4,"PLANO, TX (ALMA DR / W PARKER RD)",Mexican Restaurant,Coffee Shop,Chinese Restaurant,Ice Cream Shop,Supermarket,Indian Restaurant,Sandwich Place,Steakhouse,Discount Store,Tex-Mex Restaurant
5,"PLANO, TX (ALMA DR / W SPRING CREEK PKY)",Chinese Restaurant,Mexican Restaurant,Coffee Shop,Indian Restaurant,Grocery Store,Fast Food Restaurant,Japanese Restaurant,Burger Joint,Vietnamese Restaurant,Steakhouse
6,"PLANO, TX (BRANCH HOLLOW DR / GOODWIN DR)",Chinese Restaurant,Fast Food Restaurant,Mexican Restaurant,Pharmacy,Video Store,Pizza Place,Coffee Shop,Indian Restaurant,Burger Joint,Restaurant
7,"PLANO, TX (CITY CENTER)",Mexican Restaurant,Coffee Shop,American Restaurant,Bar,Fast Food Restaurant,Sandwich Place,Restaurant,Pizza Place,Sushi Restaurant,Pharmacy
8,"PLANO, TX (COIT RD / HEDGCOXE RD)",Fast Food Restaurant,Indian Restaurant,Pizza Place,Video Store,Trail,Bank,Donut Shop,Baseball Field,Sandwich Place,Bakery
9,"PLANO, TX (COIT RD / LEGACY DR)",Indian Restaurant,Pizza Place,Bank,Chinese Restaurant,Gym / Fitness Center,Sandwich Place,Mexican Restaurant,Video Store,Fast Food Restaurant,Coffee Shop


#### Run k means to cluster the neighborhoods into five clusters.

In [15]:
# set number of clusters
kclusters = 5

plano_grouped_clustering = plano_group.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(plano_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 
                             

array([1, 0, 2, 1, 1, 2, 2, 1, 4, 2], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [16]:
# add clustering labels
pl_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

plano_grouped_clustering = Pl_geo

# merge plano_grouped with plano_data to add latitude/longitude for each neighborhood
plano_grouped_clustering = plano_grouped_clustering.join(pl_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

plano_grouped_clustering.head()

Unnamed: 0,Neighborhood,Zip,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"PLANO, TX (ROUNDROCK TRL / CROSS BEND RD)",75023,33.049031,-96.744917,2,Pizza Place,Burger Joint,Grocery Store,Indian Restaurant,Pharmacy,Park,Fast Food Restaurant,Toy / Game Store,Sports Bar,Chinese Restaurant
1,"PLANO, TX (INDEPENDENCE PKY / CROSS BEND RD)",75023,33.04923,-96.75263,2,Pizza Place,Fast Food Restaurant,Indian Restaurant,Grocery Store,Chinese Restaurant,Convenience Store,Automotive Shop,Park,Burger Joint,Coffee Shop
2,"PLANO, TX (CROSS BEND RD / RAINIER RD)",75023,33.049634,-96.721493,2,Chinese Restaurant,Pizza Place,Fast Food Restaurant,Coffee Shop,Pharmacy,Burger Joint,Indian Restaurant,Mexican Restaurant,Thai Restaurant,Hobby Shop
3,"PLANO, TX (BRANCH HOLLOW DR / GOODWIN DR)",75023,33.054023,-96.713712,2,Chinese Restaurant,Fast Food Restaurant,Mexican Restaurant,Pharmacy,Video Store,Pizza Place,Coffee Shop,Indian Restaurant,Burger Joint,Restaurant
4,"PLANO, TX (ALMA DR / W SPRING CREEK PKY)",75023,33.056587,-96.710126,2,Chinese Restaurant,Mexican Restaurant,Coffee Shop,Indian Restaurant,Grocery Store,Fast Food Restaurant,Japanese Restaurant,Burger Joint,Vietnamese Restaurant,Steakhouse


### Let's visualize the resulting clusters.

In [17]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(plano_grouped_clustering['Latitude'], plano_grouped_clustering['Longitude'], 
                                  plano_grouped_clustering['Neighborhood'], plano_grouped_clustering['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [18]:
# Determine what are the most common venues in Cluster #1
plano_grouped_clustering.loc[plano_grouped_clustering['Cluster Labels'] == 0, 
                             plano_grouped_clustering.columns
                             [[1] + list(range(5, plano_grouped_clustering.shape[1]))]].head()



Unnamed: 0,Zip,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,75023,Fast Food Restaurant,Park,Pharmacy,Pizza Place,Indian Restaurant,Chinese Restaurant,Convenience Store,BBQ Joint,Mexican Restaurant,Video Store
23,75025,Park,Fast Food Restaurant,Pizza Place,Pharmacy,Convenience Store,Video Store,Chinese Restaurant,Soccer Field,Salon / Barbershop,Grocery Store
48,75075,Fast Food Restaurant,Park,Baseball Field,Salon / Barbershop,Pizza Place,BBQ Joint,Intersection,Golf Course,Grocery Store,Pharmacy


In [19]:
# Determine what are the most common venues in Cluster #2
plano_grouped_clustering.loc[plano_grouped_clustering['Cluster Labels'] == 1, 
                             plano_grouped_clustering.columns
                             [[1] + list(range(5, plano_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Zip,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,75024,Hotel,New American Restaurant,Mexican Restaurant,Pub,Mediterranean Restaurant,Pizza Place,Sushi Restaurant,Gym,Italian Restaurant,Office
15,75024,Hotel,Mexican Restaurant,New American Restaurant,American Restaurant,Italian Restaurant,Steakhouse,Mediterranean Restaurant,Burger Joint,Coffee Shop,Sandwich Place
19,75024,Fast Food Restaurant,Pizza Place,Mexican Restaurant,Coffee Shop,Furniture / Home Store,Burger Joint,Sandwich Place,Salon / Barbershop,Japanese Restaurant,Thai Restaurant
31,75074,Mexican Restaurant,American Restaurant,Coffee Shop,Bar,Deli / Bodega,Sushi Restaurant,Furniture / Home Store,Restaurant,Italian Restaurant,Electronics Store
32,75074,Mexican Restaurant,Coffee Shop,American Restaurant,Bar,Fast Food Restaurant,Sandwich Place,Restaurant,Pizza Place,Sushi Restaurant,Pharmacy


In [20]:
# Determine what are the most common venues in Cluster #3
plano_grouped_clustering.loc[plano_grouped_clustering['Cluster Labels'] == 2, 
                             plano_grouped_clustering.columns
                             [[1] + list(range(5, plano_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Zip,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,75023,Pizza Place,Burger Joint,Grocery Store,Indian Restaurant,Pharmacy,Park,Fast Food Restaurant,Toy / Game Store,Sports Bar,Chinese Restaurant
1,75023,Pizza Place,Fast Food Restaurant,Indian Restaurant,Grocery Store,Chinese Restaurant,Convenience Store,Automotive Shop,Park,Burger Joint,Coffee Shop
2,75023,Chinese Restaurant,Pizza Place,Fast Food Restaurant,Coffee Shop,Pharmacy,Burger Joint,Indian Restaurant,Mexican Restaurant,Thai Restaurant,Hobby Shop
3,75023,Chinese Restaurant,Fast Food Restaurant,Mexican Restaurant,Pharmacy,Video Store,Pizza Place,Coffee Shop,Indian Restaurant,Burger Joint,Restaurant
4,75023,Chinese Restaurant,Mexican Restaurant,Coffee Shop,Indian Restaurant,Grocery Store,Fast Food Restaurant,Japanese Restaurant,Burger Joint,Vietnamese Restaurant,Steakhouse


In [21]:
# Determine what are the most common venues in Cluster #4
plano_grouped_clustering.loc[plano_grouped_clustering['Cluster Labels'] == 3, 
                             plano_grouped_clustering.columns
                             [[1] + list(range(5, plano_grouped_clustering.shape[1]))]].head()


Unnamed: 0,Zip,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,75024,Bank,Park,Pharmacy,Salon / Barbershop,Sandwich Place,Restaurant,Mexican Restaurant,Convenience Store,Pizza Place,French Restaurant
10,75024,Sandwich Place,Pizza Place,Gym,Mexican Restaurant,Bar,Italian Restaurant,Coffee Shop,Gym / Fitness Center,Bank,Mobile Phone Shop
12,75024,Hotel,Coffee Shop,Furniture / Home Store,Fast Food Restaurant,Convenience Store,Gym,Office,Park,Pizza Place,Shipping Store
13,75024,Playground,Indian Restaurant,Convenience Store,Sandwich Place,Kitchen Supply Store,Bank,Café,Field,Office,General Entertainment
28,75025,Video Store,Convenience Store,Fast Food Restaurant,Mexican Restaurant,Pharmacy,Pizza Place,Sandwich Place,Shipping Store,Gym,Park


In [22]:
# Determine what are the most common venues in Cluster #5
plano_grouped_clustering.loc[plano_grouped_clustering['Cluster Labels'] == 4, 
                             plano_grouped_clustering.columns
                             [[1] + list(range(5, plano_grouped_clustering.shape[1]))]].head()


Unnamed: 0,Zip,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,75023,Pizza Place,Fast Food Restaurant,Video Store,Sandwich Place,Indian Restaurant,Bank,Convenience Store,Grocery Store,American Restaurant,Mexican Restaurant
7,75023,Pizza Place,Park,Convenience Store,Fast Food Restaurant,Indian Restaurant,Pharmacy,American Restaurant,Video Store,Ice Cream Shop,Kids Store
16,75024,Indian Restaurant,Fast Food Restaurant,Video Store,Pizza Place,Sandwich Place,Bank,Sushi Restaurant,Grocery Store,Halal Restaurant,Pet Store
18,75024,Fast Food Restaurant,Indian Restaurant,Pizza Place,Video Store,Trail,Bank,Donut Shop,Baseball Field,Sandwich Place,Bakery
21,75025,Pizza Place,Pharmacy,Convenience Store,Video Store,Tex-Mex Restaurant,Indian Restaurant,Park,Fast Food Restaurant,Kids Store,Gas Station


### Find the most common venues overall in the city of Plano, TX.

In [23]:
plano_venues_tot = getNearbyVenues(names=Pl_geo['Neighborhood'],
                               zipcodes=Pl_geo['Zip'],
                                   latitudes=Pl_geo['Latitude'],
                                   longitudes=Pl_geo['Longitude']
                                  )

In [24]:
print(plano_venues_tot.shape)
plano_venues_tot.head()

(4108, 6)


Unnamed: 0,Neighborhood,Zip,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"PLANO, TX (ROUNDROCK TRL / CROSS BEND RD)",75023,Mama's Pizza,33.040731,-96.735681,Pizza Place
1,"PLANO, TX (ROUNDROCK TRL / CROSS BEND RD)",75023,First Watch - Plano,33.039862,-96.733903,Breakfast Spot
2,"PLANO, TX (ROUNDROCK TRL / CROSS BEND RD)",75023,El Norte Grill,33.04137,-96.73621,Mexican Restaurant
3,"PLANO, TX (ROUNDROCK TRL / CROSS BEND RD)",75023,The Latin Pig,33.039994,-96.733513,Cuban Restaurant
4,"PLANO, TX (ROUNDROCK TRL / CROSS BEND RD)",75023,Country Burger,33.04143,-96.753607,Burger Joint


In [25]:
plano_tx_venues=plano_venues_tot.drop(['Neighborhood','Zip', 'Venue Latitude', 'Venue Longitude'], 1)


In [26]:
plano=plano_tx_venues.groupby('Venue Category').count().sort_values(by='Venue', ascending=False)
plano.head(10)

Unnamed: 0_level_0,Venue
Venue Category,Unnamed: 1_level_1
Pizza Place,158
Fast Food Restaurant,150
Mexican Restaurant,139
Chinese Restaurant,118
Coffee Shop,116
Sandwich Place,113
Video Store,97
Indian Restaurant,96
Pharmacy,93
Convenience Store,91


In [27]:
#Convert venue category totals to a percentage of the sum of all total values
plano_pct = plano/plano[plano.columns].sum()*100
plano_pct.rename(columns={'Venue': '% Venues'}, inplace=True)
plano_pct.head(30)

Unnamed: 0_level_0,% Venues
Venue Category,Unnamed: 1_level_1
Pizza Place,3.846154
Fast Food Restaurant,3.651412
Mexican Restaurant,3.383642
Chinese Restaurant,2.872444
Coffee Shop,2.823759
Sandwich Place,2.75073
Video Store,2.361246
Indian Restaurant,2.336904
Pharmacy,2.263875
Convenience Store,2.21519


To better compare with data from Detroit, MI, below are values from the above dataframe forspecific venue categories that do not appear in the top 30 listed above.

In [62]:
plano_pct.loc['American Restaurant']

% Venues    1.77702
Name: American Restaurant, dtype: float64

In [64]:
plano_pct.loc['Liquor Store']

% Venues    0.413827
Name: Liquor Store, dtype: float64

In [74]:
plano_pct.loc['Bar']

% Venues    0.827653
Name: Bar, dtype: float64

In [73]:
plano_pct.loc['Fried Chicken Joint']

% Venues    0.827653
Name: Fried Chicken Joint, dtype: float64

## Detroit, MI

#### Load 205 Neighborhoods and Corresponding Coordinates for Detroit, MI (Saddest City in US)

In [28]:
De_geo=pd.read_csv("Detroit Neighborhoods.csv")
print (De_geo.shape)
De_geo.head()

(205, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Airport Sub,42.403752,-83.010351
1,Arden Park,42.389536,-83.080611
2,Aviation Sub,42.355504,-83.166319
3,Bagley,42.426683,-83.150954
4,Barton-McFarland,42.365392,-83.162433


### Find coordinates for Detroit, MI using Geopy Nominatum.

In [29]:
address = 'Detroit, MI'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Detroit, MI are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Detroit, MI are 42.3315509, -83.0466403.


### Visualize neighborhoods on map in Detroit, MI

In [30]:
# create map of Plano, TX using latitude and longitude values
map_Detroit = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers for neighborhoods in Plano, TX
for lat, lng, label in zip(De_geo['Latitude'], De_geo['Longitude'], De_geo['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Detroit)  
    
map_Detroit

### Let's Explore nearby venues for each neighborhood in a new dataset

In [31]:
def getNearbyVenues2(names, latitudes, longitudes):
    radius=1500
    LIMIT=100
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
    
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name,
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood',
                     'Venue', 
                      'Venue Latitude', 
                      'Venue Longitude', 
                      'Venue Category']
    
    return(nearby_venues)

In [32]:
detroit_venues = getNearbyVenues2(names=De_geo['Neighborhood'],
                                   latitudes=De_geo['Latitude'],
                                   longitudes=De_geo['Longitude']
                                  )

In [33]:
print(detroit_venues.shape)
detroit_venues.head()

(6864, 5)


Unnamed: 0,Neighborhood,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Airport Sub,Tim Hortons,42.40095,-83.000668,Coffee Shop
1,Airport Sub,CVS pharmacy,42.404051,-82.99748,Pharmacy
2,Airport Sub,Family Dollar,42.39687,-83.001419,Discount Store
3,Airport Sub,Long John Silver's,42.409522,-82.994029,Fish & Chips Shop
4,Airport Sub,Little Caesars Pizza,42.415616,-83.002032,Pizza Place


#### We have built a dataframe of 6869 venues spanning 205 neighborhoods in Detroit, MI. Below you will see how many venues from each neighborhoods are repesented.

In [34]:
detroit_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Airport Sub,16,16,16,16
Arden Park,14,14,14,14
Aviation Sub,32,32,32,32
Bagley,46,46,46,46
Barton-McFarland,17,17,17,17
...,...,...,...,...
Westwood Park,17,17,17,17
Wildemere Park,19,19,19,19
Winship,37,37,37,37
Woodbridge,76,76,76,76


### Methodology

As with Plano, TX, we will use the same process to analyze the data collected for Detroit, MI. We will sort our neighborhoods into similar groupings using the machine learning technique, k means clustering. After these clusters have been formed and visualized on a map of the city of Detroit, we will look closely at which venue categories are most prevalent in each cluster of neighborhoods. Lastly, we will look at an overall list of the most common venues in the city of Detroit, MI. Since Detroit, MI is a larger city than Plano, TX and our dataframe consists of more venues, we will convert our list of venue totals in each category to a percentage of all the venues imported from FourSquare for each city. 

### Analysis

Next, we manipulate the data to show the specific venue categories in each neighborhood and take the mean of the frequency of the occurance for each category.

In [35]:
# one hot encoding
detroit_onehot = pd.get_dummies(detroit_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
detroit_onehot['Neighborhood'] = detroit_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [detroit_onehot.columns[-1]] + list(detroit_onehot.columns[:-1])
detroit_onehot = detroit_onehot[fixed_columns]

detroit_onehot.shape

(6864, 271)

In [36]:
detroit_group = detroit_onehot.groupby('Neighborhood').mean().reset_index()
detroit_group.head()

Unnamed: 0,Neighborhood,Zoo,ATM,Accessories Store,Airport,Airport Terminal,Alternative Healer,American Restaurant,Antique Shop,Aquarium,...,Warehouse Store,Waste Facility,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Airport Sub,0.0,0.0,0.0,0.0625,0.0,0.0,0.0625,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Arden Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Aviation Sub,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bagley,0.0,0.0,0.0,0.0,0.0,0.0,0.108696,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Barton-McFarland,0.0,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0


#### Let's print each neighborhood along with the top 5 most common venues.

In [37]:
num_top_venues = 5

for hood in detroit_group['Neighborhood']:
    print(hood)
    temp = detroit_group[detroit_group['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

Airport Sub
                  venue  freq
0  Fast Food Restaurant  0.12
1          Intersection  0.12
2           Pizza Place  0.12
3   Fried Chicken Joint  0.12
4              Pharmacy  0.06


Arden Park
                 venue  freq
0       Discount Store  0.21
1        Grocery Store  0.07
2  Fried Chicken Joint  0.07
3                 Bank  0.07
4       Cosmetics Shop  0.07


Aviation Sub
                       venue  freq
0  Middle Eastern Restaurant  0.25
1             Discount Store  0.09
2             Sandwich Place  0.06
3                   Pharmacy  0.06
4       Fast Food Restaurant  0.06


Bagley
                 venue  freq
0  American Restaurant  0.11
1          Pizza Place  0.09
2       Discount Store  0.07
3             Pharmacy  0.07
4         Intersection  0.04


Barton-McFarland
                 venue  freq
0         Liquor Store  0.12
1       Discount Store  0.12
2       Sandwich Place  0.12
3  American Restaurant  0.12
4    Convenience Store  0.12


Belle Isle
       

#### Create a new dataframe showing the top 10 venues for each neighborhood.

In [38]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [39]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
de_venues_sorted = pd.DataFrame(columns=columns)
de_venues_sorted['Neighborhood'] = detroit_group['Neighborhood']

for ind in np.arange(detroit_group.shape[0]):
    de_venues_sorted.iloc[ind, 1:] = return_most_common_venues(detroit_group.iloc[ind, :], num_top_venues)

de_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Airport Sub,Intersection,Pizza Place,Fast Food Restaurant,Fried Chicken Joint,Music Store,Pharmacy,Grocery Store,Coffee Shop,Discount Store,Fish & Chips Shop
1,Arden Park,Discount Store,Convenience Store,Bank,Café,Grocery Store,Sandwich Place,Bath House,Garden Center,Hardware Store,Fried Chicken Joint
2,Aviation Sub,Middle Eastern Restaurant,Discount Store,Pharmacy,Fast Food Restaurant,Sandwich Place,Ice Cream Shop,Spa,Smoke Shop,Bowling Alley,Farmers Market
3,Bagley,American Restaurant,Pizza Place,Pharmacy,Discount Store,Grocery Store,Cosmetics Shop,Intersection,Seafood Restaurant,Optical Shop,Liquor Store
4,Barton-McFarland,Convenience Store,Liquor Store,Discount Store,Sandwich Place,American Restaurant,Diner,Burger Joint,Food,Market,Lounge
...,...,...,...,...,...,...,...,...,...,...,...
198,Westwood Park,Discount Store,Intersection,Construction & Landscaping,Baseball Field,Park,Chinese Restaurant,Seafood Restaurant,Grocery Store,Market,Pharmacy
199,Wildemere Park,Discount Store,Pizza Place,Grocery Store,Building,Women's Store,Baseball Field,Chinese Restaurant,American Restaurant,Stadium,Intersection
200,Winship,Pharmacy,Fast Food Restaurant,American Restaurant,Fried Chicken Joint,Southern / Soul Food Restaurant,Chinese Restaurant,Intersection,Music Venue,Salon / Barbershop,Liquor Store
201,Woodbridge,American Restaurant,Casino,Bar,Lounge,Gift Shop,Bakery,Pizza Place,Art Gallery,Coffee Shop,Brewery


#### Run k means to cluster the neighborhoods into five clusters.

In [40]:
# set number of clusters
kclusters = 5

detroit_grouped_clustering = detroit_group.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(detroit_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 
                             

array([1, 2, 4, 2, 2, 0, 2, 2, 4, 4], dtype=int32)

In [41]:
# add clustering labels
de_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

detroit_grouped_clustering = De_geo

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
detroit_grouped_clustering = detroit_grouped_clustering.join(de_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

detroit_grouped_clustering.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Airport Sub,42.403752,-83.010351,1,Intersection,Pizza Place,Fast Food Restaurant,Fried Chicken Joint,Music Store,Pharmacy,Grocery Store,Coffee Shop,Discount Store,Fish & Chips Shop
1,Arden Park,42.389536,-83.080611,2,Discount Store,Convenience Store,Bank,Café,Grocery Store,Sandwich Place,Bath House,Garden Center,Hardware Store,Fried Chicken Joint
2,Aviation Sub,42.355504,-83.166319,4,Middle Eastern Restaurant,Discount Store,Pharmacy,Fast Food Restaurant,Sandwich Place,Ice Cream Shop,Spa,Smoke Shop,Bowling Alley,Farmers Market
3,Bagley,42.426683,-83.150954,2,American Restaurant,Pizza Place,Pharmacy,Discount Store,Grocery Store,Cosmetics Shop,Intersection,Seafood Restaurant,Optical Shop,Liquor Store
4,Barton-McFarland,42.365392,-83.162433,2,Convenience Store,Liquor Store,Discount Store,Sandwich Place,American Restaurant,Diner,Burger Joint,Food,Market,Lounge


### Let's visualize the resulting clusters.

In [42]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(detroit_grouped_clustering['Latitude'], detroit_grouped_clustering['Longitude'], 
                                  detroit_grouped_clustering['Neighborhood'], detroit_grouped_clustering['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### We look at the top venues for each of the 5 clusters.

In [43]:
# Determine what are the most common venues in Cluster #1
detroit_grouped_clustering.loc[detroit_grouped_clustering['Cluster Labels'] == 0, 
                             detroit_grouped_clustering.columns
                               [list(range(4, detroit_grouped_clustering.shape[1]))]].head(10)


Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Harbor / Marina,Zoo,Park,Museum,Food Truck,Beach,Garden,Tennis Court,Lighthouse,Golf Course
48,Harbor / Marina,Gym / Fitness Center,New American Restaurant,Liquor Store,Community Center,Bakery,Convenience Store,Diner,Coffee Shop,Pharmacy
67,Harbor / Marina,Liquor Store,New American Restaurant,Chinese Restaurant,Community Center,Pharmacy,Beach,Fried Chicken Joint,French Restaurant,Bakery
73,Diner,Restaurant,Speakeasy,BBQ Joint,Market,Art Gallery,Seafood Restaurant,Liquor Store,Soup Place,Financial or Legal Service
90,Chinese Restaurant,Coffee Shop,Sporting Goods Shop,French Restaurant,Gas Station,Pharmacy,Bakery,Grocery Store,Gym / Fitness Center,Art Gallery
95,Liquor Store,New American Restaurant,Harbor / Marina,Gym / Fitness Center,Coffee Shop,Sporting Goods Shop,Dance Studio,French Restaurant,Park,Chinese Restaurant
100,Harbor / Marina,Liquor Store,Gym / Fitness Center,New American Restaurant,Chinese Restaurant,Ice Cream Shop,Seafood Restaurant,Beach,Supermarket,French Restaurant
109,Harbor / Marina,Chinese Restaurant,American Restaurant,Ice Cream Shop,Park,Seafood Restaurant,Clothing Store,Pool,Diner,Pharmacy
159,Flea Market,Diner,Speakeasy,Sandwich Place,Chinese Restaurant,Market,Grocery Store,Seafood Restaurant,Art Gallery,Restaurant
191,Harbor / Marina,Chinese Restaurant,Gym / Fitness Center,Ice Cream Shop,Liquor Store,Convenience Store,Diner,Coffee Shop,Pharmacy,Paper / Office Supplies Store


In [44]:
# Determine what are the most common venues in Cluster #2
detroit_grouped_clustering.loc[detroit_grouped_clustering['Cluster Labels'] == 1, 
                             detroit_grouped_clustering.columns
                               [list(range(4, detroit_grouped_clustering.shape[1]))]].head(10)


Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Intersection,Pizza Place,Fast Food Restaurant,Fried Chicken Joint,Music Store,Pharmacy,Grocery Store,Coffee Shop,Discount Store,Fish & Chips Shop
21,Fast Food Restaurant,Ice Cream Shop,Cafeteria,Bank,Chinese Restaurant,Gas Station,BBQ Joint,Sandwich Place,Market,Pizza Place
24,Discount Store,Grocery Store,Pizza Place,Fast Food Restaurant,American Restaurant,Coffee Shop,Chinese Restaurant,Sandwich Place,Furniture / Home Store,Fried Chicken Joint
27,Grocery Store,Discount Store,Sandwich Place,Fast Food Restaurant,Ice Cream Shop,Pizza Place,Pharmacy,Burger Joint,Mexican Restaurant,Fried Chicken Joint
28,American Restaurant,Pharmacy,Sandwich Place,Fast Food Restaurant,Mobile Phone Shop,Bank,Deli / Bodega,Restaurant,Pizza Place,Scenic Lookout
29,Gay Bar,Intersection,Fast Food Restaurant,Pizza Place,Jazz Club,Hot Dog Joint,Nightclub,American Restaurant,BBQ Joint,Sandwich Place
31,Liquor Store,American Restaurant,Fast Food Restaurant,Discount Store,Ice Cream Shop,Chinese Restaurant,Home Service,Women's Store,Intersection,Gas Station
35,Fast Food Restaurant,American Restaurant,Fried Chicken Joint,Sandwich Place,Liquor Store,Grocery Store,Intersection,Asian Restaurant,Nightclub,Optical Shop
39,Fast Food Restaurant,Liquor Store,Nightclub,Park,Historic Site,Burger Joint,Restaurant,Food Truck,Mexican Restaurant,Travel & Transport
42,Fast Food Restaurant,Fried Chicken Joint,Discount Store,Sandwich Place,Pizza Place,Gym,Liquor Store,Shoe Store,BBQ Joint,Pharmacy


In [45]:
# Determine what are the most common venues in Cluster #3
detroit_grouped_clustering.loc[detroit_grouped_clustering['Cluster Labels'] == 2, 
                             detroit_grouped_clustering.columns
                               [list(range(4, detroit_grouped_clustering.shape[1]))]].head(10)


Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Discount Store,Convenience Store,Bank,Café,Grocery Store,Sandwich Place,Bath House,Garden Center,Hardware Store,Fried Chicken Joint
3,American Restaurant,Pizza Place,Pharmacy,Discount Store,Grocery Store,Cosmetics Shop,Intersection,Seafood Restaurant,Optical Shop,Liquor Store
4,Convenience Store,Liquor Store,Discount Store,Sandwich Place,American Restaurant,Diner,Burger Joint,Food,Market,Lounge
6,Fried Chicken Joint,Discount Store,Shoe Store,Grocery Store,Music Venue,Intersection,Southern / Soul Food Restaurant,Pharmacy,Clothing Store,Fast Food Restaurant
7,Intersection,Fast Food Restaurant,Fried Chicken Joint,Shipping Store,Furniture / Home Store,Golf Course,Liquor Store,Chinese Restaurant,Skating Rink,Bar
10,Grocery Store,Discount Store,Pizza Place,Liquor Store,Convenience Store,Sandwich Place,Chinese Restaurant,Park,Fast Food Restaurant,Fried Chicken Joint
11,Convenience Store,Pizza Place,Seafood Restaurant,Mobile Phone Shop,Public Art,Storage Facility,Intersection,Liquor Store,Discount Store,Yoga Studio
13,Intersection,Burger Joint,Construction & Landscaping,Soup Place,Baseball Field,Seafood Restaurant,Gas Station,Farm,Electronics Store,Event Service
16,Grocery Store,Storage Facility,Wings Joint,Home Service,Restaurant,Intersection,Discount Store,Burger Joint,Farmers Market,Electronics Store
17,Grocery Store,Shoe Store,Intersection,Hot Dog Joint,Nightclub,Boutique,Fast Food Restaurant,Skating Rink,Liquor Store,Clothing Store


In [46]:
# Determine what are the most common venues in Cluster #4
detroit_grouped_clustering.loc[detroit_grouped_clustering['Cluster Labels'] == 3, 
                             detroit_grouped_clustering.columns
                               [list(range(4, detroit_grouped_clustering.shape[1]))]].head(10)


Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
20,Diner,Bridge,Deli / Bodega,Intersection,Italian Restaurant,Bakery,Financial or Legal Service,Event Service,Event Space,Farm
22,Mexican Restaurant,Fast Food Restaurant,Food Truck,Gas Station,Bank,Pharmacy,Discount Store,Bakery,Park,Bar
23,Mexican Restaurant,Liquor Store,Bar,Grocery Store,Sandwich Place,Convenience Store,Latin American Restaurant,Lounge,Supermarket,Fried Chicken Joint
91,Mexican Restaurant,Bakery,Food Truck,Taco Place,Pizza Place,Supermarket,Park,Bar,Nightclub,Fried Chicken Joint
92,Mexican Restaurant,Bar,Park,Coffee Shop,Taco Place,Bakery,Cocktail Bar,New American Restaurant,Music Venue,Burger Joint
117,Mexican Restaurant,Bakery,Food Truck,Bar,Taco Place,Discount Store,Bank,Fast Food Restaurant,Pizza Place,Soccer Field
118,Mexican Restaurant,Liquor Store,Sandwich Place,Food Truck,Bar,Lounge,Soccer Field,Taco Place,Supermarket,Fried Chicken Joint
180,Mexican Restaurant,Fast Food Restaurant,Pizza Place,Pharmacy,Grocery Store,Bank,Bakery,Food Truck,Taco Place,Italian Restaurant
195,Mexican Restaurant,Diner,Intersection,Park,Market,Fast Food Restaurant,Italian Restaurant,Bar,Burger Joint,Restaurant


In [47]:
# Determine what are the most common venues in Cluster #5
detroit_grouped_clustering.loc[detroit_grouped_clustering['Cluster Labels'] == 4, 
                             detroit_grouped_clustering.columns
                               [list(range(4, detroit_grouped_clustering.shape[1]))]].head(10)


Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Middle Eastern Restaurant,Discount Store,Pharmacy,Fast Food Restaurant,Sandwich Place,Ice Cream Shop,Spa,Smoke Shop,Bowling Alley,Farmers Market
8,Bar,Sandwich Place,Fast Food Restaurant,Pizza Place,Discount Store,Fried Chicken Joint,Clothing Store,Art Gallery,History Museum,Building
9,Fried Chicken Joint,Furniture / Home Store,Dance Studio,Rental Car Location,Caribbean Restaurant,Liquor Store,Gas Station,Sandwich Place,Chinese Restaurant,Seafood Restaurant
12,American Restaurant,Coffee Shop,Brewery,Farmers Market,Yoga Studio,Deli / Bodega,Concert Hall,Bar,Clothing Store,Theater
14,Coffee Shop,American Restaurant,Farmers Market,Concert Hall,Restaurant,Lounge,Deli / Bodega,Clothing Store,Pizza Place,Music Venue
15,Art Gallery,Gas Station,Discount Store,Bar,Burger Joint,Indian Restaurant,Grocery Store,Liquor Store,Butcher,Skate Park
19,Pizza Place,Bar,Bakery,Grocery Store,Gas Station,Intersection,Convenience Store,Baseball Field,American Restaurant,Discount Store
26,Golf Course,Discount Store,Seafood Restaurant,Pet Store,Fried Chicken Joint,Beach,Sandwich Place,Pizza Place,Diner,Deli / Bodega
32,Deli / Bodega,Art Gallery,Mexican Restaurant,Café,Mediterranean Restaurant,Fast Food Restaurant,Liquor Store,Sandwich Place,BBQ Joint,Grocery Store
33,Lounge,Hotel,Bar,Coffee Shop,New American Restaurant,Restaurant,Sports Bar,Park,Brewery,Diner


### Find the most common venues overall in the city of Detroit, MI.

In [48]:
detroit_venues_tot = getNearbyVenues2(names=De_geo['Neighborhood'],
                                   latitudes=De_geo['Latitude'],
                                   longitudes=De_geo['Longitude']
                                  )

In [49]:
print(detroit_venues_tot.shape)
detroit_venues_tot.head()

(6864, 5)


Unnamed: 0,Neighborhood,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Airport Sub,Tim Hortons,42.40095,-83.000668,Coffee Shop
1,Airport Sub,CVS pharmacy,42.404051,-82.99748,Pharmacy
2,Airport Sub,Family Dollar,42.39687,-83.001419,Discount Store
3,Airport Sub,Long John Silver's,42.409522,-82.994029,Fish & Chips Shop
4,Airport Sub,Little Caesars Pizza,42.415616,-83.002032,Pizza Place


In [50]:
detroit_mi_venues=detroit_venues_tot.drop(['Neighborhood', 'Venue Latitude', 'Venue Longitude'], 1)
detroit_mi_venues.head()

Unnamed: 0,Venue,Venue Category
0,Tim Hortons,Coffee Shop
1,CVS pharmacy,Pharmacy
2,Family Dollar,Discount Store
3,Long John Silver's,Fish & Chips Shop
4,Little Caesars Pizza,Pizza Place


In [51]:
detroit=detroit_mi_venues.groupby('Venue Category').count().sort_values(by='Venue', ascending=False)
detroit.head(10)

Unnamed: 0_level_0,Venue
Venue Category,Unnamed: 1_level_1
Fast Food Restaurant,300
Discount Store,281
American Restaurant,275
Pizza Place,271
Intersection,223
Sandwich Place,209
Liquor Store,209
Bar,201
Grocery Store,199
Fried Chicken Joint,192


In [52]:
detroit_pct = detroit/detroit[detroit.columns].sum()*100
detroit_pct.rename(columns={'Venue': '% Venues'}, inplace=True)
detroit_pct.head(30)

Unnamed: 0_level_0,% Venues
Venue Category,Unnamed: 1_level_1
Fast Food Restaurant,4.370629
Discount Store,4.093823
American Restaurant,4.00641
Pizza Place,3.948135
Intersection,3.248834
Sandwich Place,3.044872
Liquor Store,3.044872
Bar,2.928322
Grocery Store,2.899184
Fried Chicken Joint,2.797203


For comparison purposes, below are values from the above dataframe forspecific venue categories that do not appear in the top 30.

In [69]:
detroit_pct.loc['Video Store']

% Venues    0.014569
Name: Video Store, dtype: float64

In [70]:
detroit_pct.loc['Indian Restaurant']

% Venues    0.058275
Name: Indian Restaurant, dtype: float64

In [71]:
detroit_pct.loc['Gym']

% Venues    0.320513
Name: Gym, dtype: float64

In [72]:
detroit_pct.loc['Gym / Fitness Center']

% Venues    0.626457
Name: Gym / Fitness Center, dtype: float64

### Results and Discussion

Listed below are the most common venues found in each of the five clusters of neighborhoods formed through our machine learning technique for Plano, TX. 
1.  Fast Food Restaurants, Parks, Pizza Joints, and Pharmacies
2.  Mexican Restaurants, Coffee Shops, and American Restaurants
3.  Restaurants, primarily Chinese, Fast Food, Indian, Pizza Joints and Burger Shops
4.  Convenience Stores, Gyms/Fitness Centers, Sandwich Shops, Banks and Parks
5.  Video Stores, Convenience Stores, and Fast Food Restaurants

Likewise, below is a list of the most common venues found in each of the five clusters of neighborhoods in Detroit, MI.
1.  Harbor/Marina, Chinese Restaurants, Liquor Stores, and Gyms/Fitness Centers
2.  Fast Food Restaurants, Pizza Joints, and Sandwich Places
3.  Discount Stores, Liquor Stores, and Grocery Stores
4.  Mexican Restaurants, Bars and Bakeries
5.  Bars, Discount Stores, and Coffee Shops

In comparing the neighborhood clusters from these cities, we do not see a lot in common between them. Both Plano, TX (in cluster no.3) and Detroit, MI (in cluster no.2) have neighborhoods where the number of restaurants dominates that of other types of venues. Also, we see each city has a cluster of neighborhoods where Mexican restaurants are very prevalent. Other than these two similarities, the neighborhood makeup of Plano, TX and Detroit, MI appear very different.

Our analysis also looks at the venues of these cities as a whole. Here we find striking difference between Plano, TX and Detroit, MI. As expected, based on the differences in location and demographics, different types of restaurants are more prevalent in one city versus another. (The values below were calculated by comparing the percentage of venues from each city that fit in the following categories.)
* Mexican Restaurants are 2.25x's more common in Plano, TX than in Detroit, MI.
* Chinese Restaurants are 1.5x's more common in Plano, TX than in Detroit, MI.
* Indian Restaurant are 40x's more common in Plano, TX than in Detroit, MI.
* American Restaurants are 2.2x's more common in Detroit, MI than in Plano, TX.
* Fried Chicken Joints are 3.38x's more common in Detroit, MI than in Plano, TX.

Also, notice that Detroit, MI has several art galleries whereas Plano, TX has none. This can be expected as Detroit, MI is a much larger city than Plano, TX. 

There is also a difference in the number of grocery stores in each city. Grocery stores in Detroit, MI are 1.59x's more prevalent than grocery stores as Plano, TX.

Other major differences are seen in the prevalence of recreational venues in these cities:
* Video Rentals (aka. Redbox) are 162x's more common in Plano, TX than in Detroit, MI.
* Parks are 1.76x's more common in Plano, TX than in Detroit, MI.
* Gyms / Fitness Centers are 2.70x's more common in Plano, TX than in Detroit, MI.
* Discount Stores are 6.23x's more common in Detroit, MI than in Plano, TX.
* Liquor Stores are 7.36x's more common in Detroit, MI than in Plano, TX.
* Bars are 3.54x's more common in Detroit, MI than in Plano, TX.


### Conclusion

As seen in the study conducted by WalletHub back in 2019, there are many differences between America's "happiest" and "saddest" cities. Our analysis has revealed even more disparities, namely the prevalence of certain venue types in these cities. Detroit, MI, which was named America's "saddest" city has significantly more Discount Stores, Liquor Stores and Bars than its happier counterpart. Plano, TX, America's "happiest" city was found to have a significantly higher percentage of Parks, Gyms/Fitness Centers, and Video Rental (aka. Redbox) venues than Detroit. 

In a study published by WebMd, researchers confirm that "one-third of people with depression also have an alcohol problem" (https://www.webmd.com/depression/guide/alcohol-and-depresssion). In another interesting article, researchers claim that discount stores target poverty-stricken areas and actually "contribute to declines in economic and public health" (https://progressive.org/magazine/dollar-stores-prey-on-the-poor-sainato-191001/). 

It is no mistake that the "saddest" city in America is riddled with bars, liquor stores and an abundance of discount stores. As city planners approve business licenses and new development within their neighborhoods, it is vital that they consider how each new venue can affect the mental, physical and financial health of their residents. Likewise, business owners should consider the impact their endeavors have on the lives of those around them and not just their income.

In comparing these cities, it is apparent which type of neighborhoods people should seek to live in. From this study, we can see that parks, gyms/fitness centers, and other health-promoting, recreational venues create a "happier" atmosphere than neighborhoods full of bars and liquor stores.