# IBM Data Science Capstone Project Notebook

## Project Title: Alternative areas to Asheville, NC

### Introduction / Business Problem

Asheville, North Carolina has been a tourist destination since the middle of the 19th century when people came to the area seeking relief from tuberculosis. At the time it was thought the climate promoted healing and relief of tuberculosis and numerous boarding houses and sanitariums were established to support this industry. In more recent years, Asheville has seen double-digit population growth since the 1990s and is commonly featured on lists of top destinations. Because of this continued growth and demand the cost of housing in Asheville is higher than most other areas in North Carolina. 

The objective of this project is to compare Asheville with other metropolitan areas in North Carolina to identify similar areas in terms of businesses and venues but with lower cost housing. There are factors that we are not considering that can be considered unique to Asheville which could ultimately influence a buyer's decision such as the geography and climate.

The audience for our project is people who would like to purchase a single-family home in Asheville but either can’t afford the cost or choose not to pay the prices. We are attempting to help these people find alternative locations in North Carolina that are similar in terms of businesses/venues to Asheville but with lower cost housing.

### Data

#### House Price Data

We will use the Federal Housing Finance Agency’s House Price Index (HPI) to determine how the price of housing compares across US Census Bureau Metropolitan Statistical Areas (MSAs). We’ll use the most recent HPI data available which is from the first quarter of 2021. We’ll also use the “all-transactions” type which includes both refinance mortgages and purchase-only data. We’re choosing this type because it appears to be the only type for which recent data is available that includes Asheville, NC. We’ll use the non-seasonally adjusted index since it appears the seasonally adjusted index is not available for the “all-transactions” type.

https://www.fhfa.gov/DataTools/Downloads/Pages/House-Price-Index.aspx

NOTE: the HPI focuses on single-family house prices and primarily on mortgages that are purchased and/or securitized by Fannie Mae or Freddie Mac.

There are several MSAs listed for North Carolina, but we will be focusing our comparison on the following MSAs: Asheville, Charlotte-Concord-Gastonia, Durham-Chapel Hill, Greensboro-High Point, Hickory-Lenoir-Morganton, Raleigh-Cary, Wilmington, and Winston-Salem.

#### Location / Venue Data

We will use the Foursquare API to acquire location/venue data by zip-code for comparison.

https://developer.foursquare.com/docs/places-api/

We will use the US Department of Housing and Urban Development’s (HUD) crosswalk between MSAs and zip-codes to acquire the list of zip-codes within each MSA. NOTE: zip-codes can sometimes be located partially inside and outside of an MSA.

https://www.huduser.gov/portal/datasets/usps_crosswalk.html#data


### Load and Clean Data

#### Install necessary libraries and set options

In [1]:
# install necessary libraries and set options
import os, types
import pandas as pd
import numpy as np
from botocore.client import Config
import ibm_boto3
import requests
!pip install pgeocode
import pgeocode
from geopy.geocoders import Nominatim
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
print('Install complete')

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Install complete


In [2]:
!pip install folium
import folium
print('Install complete')

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Install complete


#### Load price index and zip-code data

In [3]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,hpi_type,hpi_flavor,frequency,level,place_name,place_id,yr,period,index_nsa
0,traditional,all-transactions,quarterly,MSA,"Asheville, NC",11700,2021,1,332.44
1,traditional,all-transactions,quarterly,MSA,"Charlotte-Concord-Gastonia, NC-SC",16740,2021,1,257.81
2,traditional,all-transactions,quarterly,MSA,"Durham-Chapel Hill, NC",20500,2021,1,246.64
3,traditional,all-transactions,quarterly,MSA,"Greensboro-High Point, NC",24660,2021,1,190.91
4,traditional,all-transactions,quarterly,MSA,"Hickory-Lenoir-Morganton, NC",25860,2021,1,217.67


In [4]:
HPI.shape

(8, 9)

In [5]:
# clean up HPI dataframe
HPInew = HPI.drop(['hpi_type','hpi_flavor','frequency','level','yr','period'], axis = 1, errors = 'ignore')
HPInew.rename(columns = {'place_name':'MSA_NAME', 'place_id':'MSA_ID', 'index_nsa':'PRICE_INDEX'}, inplace = True)
HPInew.head()

Unnamed: 0,MSA_NAME,MSA_ID,PRICE_INDEX
0,"Asheville, NC",11700,332.44
1,"Charlotte-Concord-Gastonia, NC-SC",16740,257.81
2,"Durham-Chapel Hill, NC",20500,246.64
3,"Greensboro-High Point, NC",24660,190.91
4,"Hickory-Lenoir-Morganton, NC",25860,217.67


In [6]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,ZIP,CBSA,USPS_ZIP_PREF_CITY,USPS_ZIP_PREF_STATE
0,27006,49180,ADVANCE,NC
1,27009,49180,BELEWS CREEK,NC
2,27010,49180,BETHANIA,NC
3,27011,49180,BOONVILLE,NC
4,27012,49180,CLEMMONS,NC


In [7]:
HUD.shape

(588, 4)

In [8]:
# clean up HUD dataframe
HUDnew = HUD.drop(['USPS_ZIP_PREF_STATE'], axis = 1, errors = 'ignore')
HUDnew.rename(columns = {'USPS_ZIP_PREF_CITY':'CITY'}, inplace = True)
HUDnew.head()

Unnamed: 0,ZIP,CBSA,CITY
0,27006,49180,ADVANCE
1,27009,49180,BELEWS CREEK
2,27010,49180,BETHANIA
3,27011,49180,BOONVILLE
4,27012,49180,CLEMMONS


In [9]:
# merge HPInew and HUDnew dataframes on MSA
HPIandHUD = HPInew.merge(HUDnew, how = 'left', left_on = 'MSA_ID', right_on = 'CBSA')
HPIandHUD.drop('CBSA', axis = 1, inplace = True, errors = 'ignore')
HPIandHUD.reset_index()
HPIandHUD.head()

Unnamed: 0,MSA_NAME,MSA_ID,PRICE_INDEX,ZIP,CITY
0,"Asheville, NC",11700,332.44,28655,MORGANTON
1,"Asheville, NC",11700,332.44,28701,ALEXANDER
2,"Asheville, NC",11700,332.44,28704,ARDEN
3,"Asheville, NC",11700,332.44,28709,BARNARDSVILLE
4,"Asheville, NC",11700,332.44,28710,BAT CAVE


In [10]:
HPIandHUD.shape

(588, 5)

### Get latitude and longitude data for each zip code

In [11]:
g = pgeocode.Nominatim('US')
latitude=[]
longitude=[]
for code in HPIandHUD['ZIP']:
    x = g.query_postal_code('{}'.format(code))
    latitude.append(x['latitude'])
    longitude.append(x['longitude'])

# create dataframes for the latitude and longitude lists
latitude_df = pd.DataFrame(latitude, columns = {'Latitude'})
longitude_df = pd.DataFrame(longitude, columns = {'Longitude'})

# merge the latitude and longitude dataframes with the neighborhood dataframe
lat_long_df = latitude_df.merge(longitude_df, how = 'left', left_index = True, right_index = True)
HPIandHUD = HPIandHUD.merge(lat_long_df, how = 'left', left_index = True, right_index = True)
HPIandHUD.head()

Unnamed: 0,MSA_NAME,MSA_ID,PRICE_INDEX,ZIP,CITY,Latitude,Longitude
0,"Asheville, NC",11700,332.44,28655,MORGANTON,35.7346,-81.7042
1,"Asheville, NC",11700,332.44,28701,ALEXANDER,35.7064,-82.6311
2,"Asheville, NC",11700,332.44,28704,ARDEN,35.4637,-82.5354
3,"Asheville, NC",11700,332.44,28709,BARNARDSVILLE,35.7748,-82.4567
4,"Asheville, NC",11700,332.44,28710,BAT CAVE,35.4515,-82.2871


In [None]:
# this wan the initial way we acquired zip-code data but the Arcgis system became unresponsive so the above was used instead
latitude=[]
longitude=[]
for code in HPIandHUD['ZIP']:
    g = geocoder.arcgis('{}'.format(code))
    # print(code, g.latlng)
    while (g.latlng is None):
        g = geocoder.arcgis('{}'.format(code))
        # print(code, g.latlng)
    latlng = g.latlng
    latitude.append(latlng[0])
    longitude.append(latlng[1])

# create dataframes for the latitude and longitude lists
latitude_df = pd.DataFrame(latitude, columns = {'Latitude'})
longitude_df = pd.DataFrame(longitude, columns = {'Longitude'})

# merge the latitude and longitude dataframes with the neighborhood dataframe
lat_long_df = latitude_df.merge(longitude_df, how = 'left', left_index = True, right_index = True)
HPIandHUD = HPIandHUD.merge(lat_long_df, how = 'left', left_index = True, right_index = True)
HPIandHUD.head()

### Explore and Cluster MSAs

#### Define Foursquare credentials

In [12]:
# The code was removed by Watson Studio for sharing.

#### Create a function to look up venues from Foursquare for each lat and long pair

In [13]:
# create a function to get nearby venues for each MSA and lat/long pair
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['MSA_NAME', 
                  'MSA Latitude', 
                  'MSA Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [14]:
# run the above function on each MSA and lat/long combination and create a new dataframe called nc_venues
nc_venues = getNearbyVenues(names=HPIandHUD['MSA_NAME'],
                                   latitudes=HPIandHUD['Latitude'],
                                   longitudes=HPIandHUD['Longitude']
                                  )

In [15]:
nc_venues.head()

Unnamed: 0,MSA_NAME,MSA Latitude,MSA Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Asheville, NC",35.7346,-81.7042,Moondog Pizza,35.734903,-81.708535,Pizza Place
1,"Asheville, NC",35.7346,-81.7042,fred's Super Dollar,35.735161,-81.709251,Discount Store
2,"Asheville, NC",35.7346,-81.7042,New China,35.735888,-81.708612,Chinese Restaurant
3,"Asheville, NC",35.7064,-82.6311,Pro-Landscape & Service,35.705105,-82.629976,Business Service
4,"Asheville, NC",35.7064,-82.6311,Inspired Ts Co,35.707388,-82.630797,Cosmetics Shop


In [16]:
nc_venues.shape

(4633, 7)

In [18]:
# analyze each MSA
# one hot encoding
nc_onehot = pd.get_dummies(nc_venues[['Venue Category']], prefix="", prefix_sep="")

# add MSA column back to dataframe
nc_onehot.insert(0, 'MSA_NAME', nc_venues['MSA_NAME'])

nc_onehot.head()

Unnamed: 0,MSA_NAME,ATM,Accessories Store,African Restaurant,American Restaurant,Amphitheater,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Trail,Bistro,Boat Rental,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Breakfast Spot,Brewery,Buffet,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Candy Store,Car Wash,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,City,City Hall,Clothing Store,Cocktail Bar,Coffee Shop,College Academic Building,College Auditorium,College Cafeteria,College Communications Building,College Gym,College Library,College Theater,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Convention Center,Cosmetics Shop,Credit Union,Creperie,Cuban Restaurant,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant,Electronics Store,Empanada Restaurant,Entertainment Service,Event Service,Event Space,Eye Doctor,Farm,Farmers Market,Fast Food Restaurant,Financial or Legal Service,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Service,Food Stand,Food Truck,Football Stadium,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gun Shop,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Hardware Store,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hot Dog Joint,Hotel,Hotel Bar,Hunting Supply,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Insurance Office,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Laundromat,Lawyer,Library,Light Rail Station,Lingerie Store,Liquor Store,Locksmith,Lounge,Market,Martial Arts School,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Motel,Motorcycle Shop,Mountain,Movie Theater,Moving Target,Museum,Music Store,Music Venue,Nail Salon,Nature Preserve,New American Restaurant,Night Market,Nightclub,Nightlife Spot,Noodle House,Office,Optical Shop,Other Great Outdoors,Other Repair Shop,Outdoor Sculpture,Outdoor Supply Store,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Photography Studio,Pilates Studio,Pizza Place,Planetarium,Playground,Plaza,Pool,Pool Hall,Post Office,Print Shop,Pub,Racetrack,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Restaurant,River,Rock Club,Roller Rink,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Southern / Soul Food Restaurant,Souvenir Shop,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Storage Facility,Supermarket,Supplement Shop,Surf Spot,Sushi Restaurant,Taco Place,Tailor Shop,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Thrift / Vintage Store,Tour Provider,Tourist Information Center,Toy / Game Store,Trail,Train Station,Tram Station,Travel & Transport,Tree,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Asheville, NC",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Asheville, NC",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Asheville, NC",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Asheville, NC",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Asheville, NC",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Group rows by MSA and by taking the mean of the frequency of occurrence of each category

In [19]:
# group rows by MSA and by taking the mean of the frequency of occurrence of each category
MSA_grouped = nc_onehot.groupby(['MSA_NAME']).mean().reset_index()
MSA_grouped.head()

Unnamed: 0,MSA_NAME,ATM,Accessories Store,African Restaurant,American Restaurant,Amphitheater,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Trail,Bistro,Boat Rental,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Breakfast Spot,Brewery,Buffet,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Candy Store,Car Wash,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,City,City Hall,Clothing Store,Cocktail Bar,Coffee Shop,College Academic Building,College Auditorium,College Cafeteria,College Communications Building,College Gym,College Library,College Theater,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Convention Center,Cosmetics Shop,Credit Union,Creperie,Cuban Restaurant,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant,Electronics Store,Empanada Restaurant,Entertainment Service,Event Service,Event Space,Eye Doctor,Farm,Farmers Market,Fast Food Restaurant,Financial or Legal Service,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Service,Food Stand,Food Truck,Football Stadium,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gun Shop,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Hardware Store,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hot Dog Joint,Hotel,Hotel Bar,Hunting Supply,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Insurance Office,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Laundromat,Lawyer,Library,Light Rail Station,Lingerie Store,Liquor Store,Locksmith,Lounge,Market,Martial Arts School,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Motel,Motorcycle Shop,Mountain,Movie Theater,Moving Target,Museum,Music Store,Music Venue,Nail Salon,Nature Preserve,New American Restaurant,Night Market,Nightclub,Nightlife Spot,Noodle House,Office,Optical Shop,Other Great Outdoors,Other Repair Shop,Outdoor Sculpture,Outdoor Supply Store,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Photography Studio,Pilates Studio,Pizza Place,Planetarium,Playground,Plaza,Pool,Pool Hall,Post Office,Print Shop,Pub,Racetrack,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Restaurant,River,Rock Club,Roller Rink,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Southern / Soul Food Restaurant,Souvenir Shop,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Storage Facility,Supermarket,Supplement Shop,Surf Spot,Sushi Restaurant,Taco Place,Tailor Shop,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Thrift / Vintage Store,Tour Provider,Tourist Information Center,Toy / Game Store,Trail,Train Station,Tram Station,Travel & Transport,Tree,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Asheville, NC",0.0,0.004566,0.0,0.018265,0.0,0.004566,0.004566,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004566,0.0,0.004566,0.004566,0.018265,0.004566,0.0,0.0,0.0,0.0,0.009132,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013699,0.0,0.0,0.004566,0.013699,0.009132,0.0,0.0,0.009132,0.004566,0.0,0.004566,0.0,0.009132,0.0,0.0,0.009132,0.004566,0.0,0.0,0.0,0.0,0.009132,0.004566,0.004566,0.0,0.0,0.009132,0.013699,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.031963,0.004566,0.0,0.009132,0.004566,0.0,0.0,0.0,0.004566,0.0,0.004566,0.009132,0.027397,0.004566,0.0,0.0,0.0,0.0,0.013699,0.004566,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045662,0.0,0.0,0.0,0.0,0.004566,0.004566,0.0,0.0,0.0,0.004566,0.004566,0.009132,0.013699,0.0,0.004566,0.0,0.0,0.022831,0.004566,0.0,0.004566,0.009132,0.004566,0.0,0.0,0.009132,0.004566,0.0,0.013699,0.0,0.0,0.0,0.0,0.004566,0.0,0.0,0.0,0.0,0.03653,0.0,0.004566,0.050228,0.0,0.0,0.004566,0.018265,0.009132,0.0,0.0,0.004566,0.004566,0.013699,0.0,0.0,0.0,0.0,0.0,0.0,0.009132,0.004566,0.004566,0.004566,0.0,0.0,0.0,0.018265,0.0,0.004566,0.0,0.0,0.0,0.0,0.0,0.004566,0.018265,0.0,0.0,0.004566,0.0,0.004566,0.0,0.013699,0.0,0.004566,0.004566,0.004566,0.018265,0.0,0.0,0.004566,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004566,0.0,0.0,0.0,0.009132,0.0,0.0,0.0,0.0,0.009132,0.018265,0.0,0.0,0.027397,0.0,0.0,0.0,0.0,0.0,0.018265,0.0,0.004566,0.0,0.0,0.0,0.0,0.004566,0.0,0.0,0.013699,0.0,0.0,0.0,0.0,0.0,0.013699,0.013699,0.0,0.0,0.0,0.004566,0.0,0.004566,0.0,0.0,0.0,0.0,0.0,0.004566,0.0,0.004566,0.0,0.0,0.0,0.004566,0.0,0.004566,0.0,0.0,0.0,0.0,0.009132,0.004566,0.0,0.004566,0.004566,0.004566,0.0,0.0,0.0,0.0,0.009132,0.0,0.0,0.0,0.009132,0.0,0.0,0.0,0.004566,0.013699,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.009132,0.0,0.0,0.0,0.0
1,"Charlotte-Concord-Gastonia, NC-SC",0.005889,0.0,0.0,0.044433,0.0,0.000535,0.001606,0.000535,0.000535,0.002141,0.0,0.0,0.0,0.0,0.001071,0.006959,0.001071,0.002677,0.009636,0.009636,0.001606,0.0,0.001071,0.006424,0.0,0.0,0.0,0.000535,0.0,0.001071,0.0,0.0,0.0,0.0,0.000535,0.000535,0.000535,0.001071,0.003747,0.002141,0.0,0.0,0.009101,0.006424,0.0,0.000535,0.0,0.01606,0.0,0.0,0.01606,0.000535,0.000535,0.0,0.000535,0.006424,0.002141,0.006959,0.0,0.0,0.0,0.006959,0.025161,0.000535,0.0,0.0,0.000535,0.0,0.0,0.000535,0.000535,0.000535,0.007495,0.018201,0.006959,0.0,0.001606,0.0,0.013383,0.0,0.000535,0.014454,0.001071,0.000535,0.001071,0.003212,0.000535,0.000535,0.000535,0.002141,0.0,0.0,0.0,0.001606,0.0,0.000535,0.000535,0.0,0.0,0.000535,0.009636,0.009101,0.001071,0.000535,0.000535,0.0,0.002141,0.000535,0.0,0.000535,0.000535,0.000535,0.0,0.013383,0.002677,0.0,0.005353,0.0,0.002141,0.005353,0.000535,0.0,0.0,0.001071,0.001606,0.000535,0.001071,0.003747,0.000535,0.0,0.011242,0.011242,0.0,0.000535,0.001071,0.001606,0.000535,0.000535,0.012313,0.001071,0.019807,0.000535,0.006959,0.030514,0.013919,0.0,0.010707,0.003212,0.007495,0.000535,0.001071,0.002141,0.013383,0.032655,0.007495,0.001606,0.0,0.000535,0.0,0.0,0.000535,0.000535,0.0,0.000535,0.001071,0.000535,0.0,0.001606,0.001071,0.007495,0.0,0.0,0.001071,0.001071,0.001606,0.0,0.023019,0.0,0.000535,0.001071,0.0,0.0,0.0,0.0,0.001071,0.001071,0.001071,0.0,0.002141,0.000535,0.000535,0.009101,0.0,0.001071,0.000535,0.0,0.0,0.001071,0.0,0.000535,0.0,0.0,0.0,0.025161,0.0,0.0,0.00803,0.000535,0.001606,0.012313,0.0,0.000535,0.038009,0.0,0.002141,0.0,0.005353,0.000535,0.003212,0.0,0.021949,0.0,0.000535,0.0,0.0,0.019807,0.000535,0.0,0.028908,0.0,0.000535,0.0,0.012848,0.00803,0.03212,0.0,0.000535,0.006424,0.006424,0.013919,0.001606,0.0,0.000535,0.007495,0.0,0.000535,0.000535,0.0,0.0,0.013383,0.0,0.001606,0.000535,0.007495,0.000535,0.030514,0.000535,0.009101,0.001071,0.0,0.007495,0.000535,0.0,0.012848,0.0,0.000535,0.006959,0.020343,0.0,0.0,0.001071,0.000535,0.0,0.000535,0.002141,0.000535,0.000535,0.000535,0.000535,0.000535,0.0,0.0,0.009101,0.0,0.000535,0.000535,0.0,0.008565,0.000535,0.0,0.000535,0.001071
2,"Durham-Chapel Hill, NC",0.006565,0.0,0.0,0.026258,0.0,0.002188,0.004376,0.002188,0.004376,0.004376,0.004376,0.0,0.010941,0.002188,0.002188,0.008753,0.0,0.015317,0.006565,0.02407,0.002188,0.0,0.002188,0.0,0.0,0.0,0.008753,0.0,0.006565,0.004376,0.0,0.004376,0.0,0.0,0.002188,0.0,0.0,0.0,0.010941,0.006565,0.0,0.0,0.010941,0.002188,0.002188,0.008753,0.002188,0.002188,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015317,0.0,0.0,0.0,0.004376,0.019694,0.013129,0.0,0.002188,0.004376,0.0,0.002188,0.0,0.0,0.0,0.0,0.006565,0.013129,0.017505,0.0,0.006565,0.0,0.0,0.004376,0.004376,0.004376,0.006565,0.002188,0.004376,0.013129,0.0,0.0,0.0,0.010941,0.004376,0.006565,0.0,0.004376,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043764,0.0,0.0,0.0,0.004376,0.0,0.004376,0.0,0.0,0.002188,0.002188,0.0,0.006565,0.013129,0.004376,0.004376,0.0,0.0,0.008753,0.0,0.0,0.0,0.006565,0.002188,0.0,0.0,0.004376,0.0,0.004376,0.017505,0.002188,0.0,0.0,0.0,0.002188,0.0,0.0,0.002188,0.0,0.021882,0.0,0.008753,0.039387,0.0,0.0,0.002188,0.017505,0.006565,0.002188,0.002188,0.0,0.0,0.002188,0.002188,0.004376,0.0,0.0,0.0,0.004376,0.0,0.0,0.0,0.002188,0.0,0.0,0.0,0.002188,0.0,0.013129,0.0,0.004376,0.002188,0.0,0.002188,0.0,0.030635,0.0,0.002188,0.004376,0.002188,0.002188,0.0,0.0,0.0,0.0,0.002188,0.002188,0.004376,0.004376,0.0,0.008753,0.0,0.0,0.0,0.004376,0.0,0.002188,0.0,0.0,0.004376,0.0,0.0,0.004376,0.004376,0.0,0.002188,0.004376,0.006565,0.026258,0.002188,0.0,0.037199,0.002188,0.0,0.0,0.008753,0.002188,0.008753,0.0,0.002188,0.0,0.0,0.002188,0.002188,0.004376,0.0,0.0,0.002188,0.0,0.0,0.0,0.0,0.006565,0.026258,0.0,0.0,0.0,0.0,0.006565,0.008753,0.0,0.0,0.004376,0.002188,0.004376,0.0,0.0,0.0,0.002188,0.0,0.002188,0.004376,0.006565,0.0,0.002188,0.0,0.010941,0.004376,0.0,0.013129,0.002188,0.0,0.0,0.0,0.0,0.002188,0.0,0.0,0.0,0.002188,0.0,0.002188,0.0,0.006565,0.0,0.0,0.0,0.0,0.004376,0.002188,0.004376,0.010941,0.002188,0.0,0.0,0.0,0.006565,0.0,0.004376,0.0,0.0
3,"Greensboro-High Point, NC",0.001199,0.001199,0.001199,0.058753,0.0,0.0,0.004796,0.0,0.0,0.001199,0.0,0.001199,0.001199,0.0,0.020384,0.001199,0.0,0.021583,0.017986,0.094724,0.0,0.017986,0.001199,0.0,0.001199,0.0,0.0,0.001199,0.017986,0.001199,0.0,0.0,0.0,0.0,0.020384,0.0,0.001199,0.0,0.0,0.03717,0.0,0.0,0.020384,0.0,0.0,0.0,0.0,0.003597,0.0,0.0,0.021583,0.0,0.0,0.0,0.0,0.0,0.003597,0.0,0.0,0.0,0.017986,0.0,0.026379,0.0,0.0,0.0,0.0,0.0,0.001199,0.0,0.0,0.0,0.017986,0.008393,0.008393,0.0,0.003597,0.0,0.0,0.0,0.0,0.001199,0.004796,0.0,0.020384,0.004796,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.002398,0.0,0.0,0.0,0.0,0.0,0.001199,0.0,0.007194,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.001199,0.0,0.0,0.0,0.0,0.001199,0.0,0.002398,0.001199,0.0,0.005995,0.0,0.017986,0.0,0.0,0.0,0.0,0.0,0.003597,0.0,0.0,0.021583,0.0,0.0,0.0,0.0,0.001199,0.0,0.017986,0.017986,0.0,0.005995,0.001199,0.0,0.0,0.017986,0.001199,0.001199,0.002398,0.0,0.0,0.0,0.0,0.0,0.002398,0.002398,0.002398,0.001199,0.001199,0.0,0.0,0.001199,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.001199,0.0,0.001199,0.0,0.0,0.001199,0.0,0.009592,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.002398,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035971,0.0,0.0,0.0,0.0,0.001199,0.001199,0.0,0.001199,0.0,0.019185,0.0,0.0,0.0,0.0,0.0,0.002398,0.0,0.0,0.031175,0.0,0.001199,0.035971,0.0,0.019185,0.001199,0.0,0.019185,0.0,0.0,0.0,0.0,0.001199,0.001199,0.0,0.002398,0.0,0.001199,0.001199,0.0,0.0,0.043165,0.001199,0.0,0.0,0.0,0.003597,0.001199,0.002398,0.001199,0.002398,0.0,0.002398,0.001199,0.0,0.0,0.0,0.0,0.002398,0.002398,0.001199,0.0,0.0,0.001199,0.001199,0.0,0.0,0.001199,0.017986,0.0,0.0,0.0,0.0,0.0,0.019185,0.001199,0.0,0.017986,0.0,0.0,0.001199,0.005995,0.0,0.0,0.0,0.0,0.001199,0.0,0.0,0.001199,0.001199,0.0,0.0,0.001199,0.017986,0.0,0.019185,0.0,0.0
4,"Hickory-Lenoir-Morganton, NC",0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.010989,0.010989,0.0,0.010989,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.021978,0.0,0.0,0.010989,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032967,0.054945,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.032967,0.0,0.010989,0.0,0.0,0.032967,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.021978,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021978,0.0,0.0,0.010989,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054945,0.0,0.0,0.010989,0.0,0.0,0.010989,0.010989,0.0,0.0,0.010989,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043956,0.0,0.0,0.054945,0.0,0.0,0.0,0.0,0.0,0.076923,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.021978,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.010989,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054945,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0


#### Define a function to sort the venues in descending order

In [20]:
# define a function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### Create a new dataframe and display the top 10 venues for each neighborhood

In [21]:
# create the new dataframe and display the top 10 venues for each MSA
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['MSA_NAME']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
MSA_venues_sorted = pd.DataFrame(columns=columns)
MSA_venues_sorted['MSA_NAME'] = MSA_grouped['MSA_NAME']

for ind in np.arange(MSA_grouped.shape[0]):
    MSA_venues_sorted.iloc[ind, 1:] = return_most_common_venues(MSA_grouped.iloc[ind, :], num_top_venues)

MSA_venues_sorted.head()

Unnamed: 0,MSA_NAME,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Asheville, NC",Hotel,Fast Food Restaurant,Home Service,Construction & Landscaping,Discount Store,Pizza Place,Gas Station,Ice Cream Shop,Liquor Store,Music Venue
1,"Charlotte-Concord-Gastonia, NC-SC",American Restaurant,Pizza Place,Italian Restaurant,Sandwich Place,Steakhouse,Hotel,Restaurant,Coffee Shop,Park,Mexican Restaurant
2,"Durham-Chapel Hill, NC",Fast Food Restaurant,Hotel,Pizza Place,Mexican Restaurant,Pharmacy,American Restaurant,Sandwich Place,Bar,Home Service,Cocktail Bar
3,"Greensboro-High Point, NC",Bar,American Restaurant,Sandwich Place,Brewery,Plaza,Nightclub,Pizza Place,Coffee Shop,Gym,Bakery
4,"Hickory-Lenoir-Morganton, NC",American Restaurant,Post Office,Pizza Place,Discount Store,Home Service,Video Store,Pharmacy,Fast Food Restaurant,Food,Diner


#### Cluster MSAs by running k-means to cluster the MSAs into 2 clusters

In [67]:
# run this cell in order to re-run k-means cluster when chaning the number of clusters
del MSA_grouped_clustering
del HPIfinal
MSA_venues_sorted.drop('Cluster Labels', axis = 1, inplace = True)

In [68]:
# set number of clusters
kclusters = 7

MSA_grouped_clustering = MSA_grouped.drop('MSA_NAME', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(MSA_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:8]

array([1, 5, 1, 2, 0, 3, 4, 6], dtype=int32)

#### Create a new dataframe that includes the cluster as well as the top 10 venues for each MSA

In [69]:
# add clustering labels
MSA_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

HPIfinal = HPInew

# merge neighborhood data with venue data to add latitude/longitude for each neighborhood
HPIfinal = HPIfinal.join(MSA_venues_sorted.set_index('MSA_NAME'), on='MSA_NAME')

# drop rows that contain NaN values and convert Cluster Labels back to integers
HPIfinal.dropna(axis = 0, inplace = True)
HPIfinal = HPIfinal.astype({'Cluster Labels': 'int'})

HPIfinal.head()

Unnamed: 0,MSA_NAME,MSA_ID,PRICE_INDEX,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Asheville, NC",11700,332.44,1,Hotel,Fast Food Restaurant,Home Service,Construction & Landscaping,Discount Store,Pizza Place,Gas Station,Ice Cream Shop,Liquor Store,Music Venue
1,"Charlotte-Concord-Gastonia, NC-SC",16740,257.81,5,American Restaurant,Pizza Place,Italian Restaurant,Sandwich Place,Steakhouse,Hotel,Restaurant,Coffee Shop,Park,Mexican Restaurant
2,"Durham-Chapel Hill, NC",20500,246.64,1,Fast Food Restaurant,Hotel,Pizza Place,Mexican Restaurant,Pharmacy,American Restaurant,Sandwich Place,Bar,Home Service,Cocktail Bar
3,"Greensboro-High Point, NC",24660,190.91,2,Bar,American Restaurant,Sandwich Place,Brewery,Plaza,Nightclub,Pizza Place,Coffee Shop,Gym,Bakery
4,"Hickory-Lenoir-Morganton, NC",25860,217.67,0,American Restaurant,Post Office,Pizza Place,Discount Store,Home Service,Video Store,Pharmacy,Fast Food Restaurant,Food,Diner


#### Visualize the resulting clusters

In [70]:
# insert an approx central lat and long for each MSA in order to map
HPIfinal['MSA_lat'] = ['35.59009','35.22290','35.94815','36.00618','35.75282','35.81881','34.22192','36.11802']
HPIfinal['MSA_long'] = ['-82.55824','-80.84520','-78.95553','-79.87207','-81.53405','-78.71404','-77.87040','-80.20037']
HPIfinal.head()

Unnamed: 0,MSA_NAME,MSA_ID,PRICE_INDEX,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,MSA_lat,MSA_long
0,"Asheville, NC",11700,332.44,1,Hotel,Fast Food Restaurant,Home Service,Construction & Landscaping,Discount Store,Pizza Place,Gas Station,Ice Cream Shop,Liquor Store,Music Venue,35.59009,-82.55824
1,"Charlotte-Concord-Gastonia, NC-SC",16740,257.81,5,American Restaurant,Pizza Place,Italian Restaurant,Sandwich Place,Steakhouse,Hotel,Restaurant,Coffee Shop,Park,Mexican Restaurant,35.2229,-80.8452
2,"Durham-Chapel Hill, NC",20500,246.64,1,Fast Food Restaurant,Hotel,Pizza Place,Mexican Restaurant,Pharmacy,American Restaurant,Sandwich Place,Bar,Home Service,Cocktail Bar,35.94815,-78.95553
3,"Greensboro-High Point, NC",24660,190.91,2,Bar,American Restaurant,Sandwich Place,Brewery,Plaza,Nightclub,Pizza Place,Coffee Shop,Gym,Bakery,36.00618,-79.87207
4,"Hickory-Lenoir-Morganton, NC",25860,217.67,0,American Restaurant,Post Office,Pizza Place,Discount Store,Home Service,Video Store,Pharmacy,Fast Food Restaurant,Food,Diner,35.75282,-81.53405


In [71]:
HPIfinal.sort_values('PRICE_INDEX', ascending=False)

Unnamed: 0,MSA_NAME,MSA_ID,PRICE_INDEX,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,MSA_lat,MSA_long
0,"Asheville, NC",11700,332.44,1,Hotel,Fast Food Restaurant,Home Service,Construction & Landscaping,Discount Store,Pizza Place,Gas Station,Ice Cream Shop,Liquor Store,Music Venue,35.59009,-82.55824
6,"Wilmington, NC",48900,266.57,4,Coffee Shop,Seafood Restaurant,Fast Food Restaurant,Café,Beach,Brewery,Park,Hotel,Bar,Southern / Soul Food Restaurant,34.22192,-77.8704
1,"Charlotte-Concord-Gastonia, NC-SC",16740,257.81,5,American Restaurant,Pizza Place,Italian Restaurant,Sandwich Place,Steakhouse,Hotel,Restaurant,Coffee Shop,Park,Mexican Restaurant,35.2229,-80.8452
2,"Durham-Chapel Hill, NC",20500,246.64,1,Fast Food Restaurant,Hotel,Pizza Place,Mexican Restaurant,Pharmacy,American Restaurant,Sandwich Place,Bar,Home Service,Cocktail Bar,35.94815,-78.95553
5,"Raleigh-Cary, NC",39580,236.3,3,Coffee Shop,American Restaurant,Bakery,Gym / Fitness Center,ATM,Gas Station,Hot Dog Joint,Fast Food Restaurant,Pizza Place,Home Service,35.81881,-78.71404
4,"Hickory-Lenoir-Morganton, NC",25860,217.67,0,American Restaurant,Post Office,Pizza Place,Discount Store,Home Service,Video Store,Pharmacy,Fast Food Restaurant,Food,Diner,35.75282,-81.53405
7,"Winston-Salem, NC",49180,196.06,6,American Restaurant,Pizza Place,Bar,Cocktail Bar,Hotel,Sandwich Place,Park,Brewery,Food Truck,Taco Place,36.11802,-80.20037
3,"Greensboro-High Point, NC",24660,190.91,2,Bar,American Restaurant,Sandwich Place,Brewery,Plaza,Nightclub,Pizza Place,Coffee Shop,Gym,Bakery,36.00618,-79.87207


In [26]:
#get lat and long for North Carolina
address = 'North Carolina, US'

geolocator = Nominatim(user_agent="nc_explorer")
location = geolocator.geocode(address)
latitude_nc = location.latitude
longitude_nc = location.longitude

In [72]:
# create map
map_clusters = folium.Map(location=[latitude_nc, longitude_nc], zoom_start=7)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
#markers_colors = []
for lat, lon, poi, cluster in zip(HPIfinal['MSA_lat'], HPIfinal['MSA_long'], HPIfinal['MSA_NAME'], HPIfinal['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters