# IBM Data Science Capstone Project

## Analysis of Similarities Between Manhattan/Toronto Neighborhoods and the "Top 19 Hippest Mid-Size US Cities."
#### Justin Fitch

### With many jobs moving towards remote work during the pandemic, many people have been reconsidering life in major cites and considering relocating to a smaller, less crowded and less expensive city. 

### In previous lab exercises, we clustered neighborhoods in Manhattan and Toronto to see which neighborhoods provide a similar mix of venues/businesses. Trip.com recently publised a list of the the "Top 19 Hippest Mid-Size US Cities". Using FourSquare data for these 19 cities, as well as the neighborhood data already obtained for Manhattan and Toronto, I have performed a k-means cluster analysis to determine which, if any, mid-size US cities would provide a comparible mix of food/entertainment venues. 

### Analyzing individual boroughs makes sense in NYC where, even with a robust public transportation system, it take a long time to get other parts of town. Mid-sized cities, on the other hand, typically have less traffic and are not as spread out geographically. You can often get from one side of town to the other in a half-hour or less. Additionally, the relative abundance of free parking make driving to the venue of your choice much more practical than in most big cities. 

### For this reason, I ran city-wide FourSquare venue queries on the mid-sized cities.  This allowed me to identify cities that offer similar mixes of culture and venues as the various boroughs we have already identified in NYC and Toronto.

In [1]:
#pd.set_option('display.max_rows', None)
#pd.set_option('display.max_columns', None)
#pd.set_option('display.width', None)
#pd.set_option('display.max_colwidth', -1)

### First, I will web-scrape a list of the "Top 19 Hippest Mid-Sized US Cities", as this seems like a good place to get some cities that might compare favorably with the diverse and exciting boroughs of NYC and Toronto ;-)

In [2]:
import pandas as pd
import requests
from bs4 import BeautifulSoup
import numpy as np
import json

### Download the "Top 19" cities page and save the contents as html_data

In [3]:
url = 'https://www.trip.com/blog/top-hippest-mid-size-us-cities/'
res = requests.get(url)
html_data = res.content

### Create a BeautifulSoup ojbect ("soup") and extract the list of cities to a list called "list"

In [4]:
soup = BeautifulSoup(html_data, 'html.parser')
list = soup.find_all(class_ = "main-title")

At this point, the list still includes the html formatting. Use a slice and strip to make the list include only the city and state, which we can use to find geospatial coordinates.

In [5]:
cities = []

for row in list:
    if row.text=="":
        pass
    else:
        city = row.text[3:]
        city = city.strip()
        cities.append(city)

In [6]:
cities

['Santa Rosa, California',
 'Rochester, New York',
 'Seattle, Washington',
 'San Antonio, Texas',
 'Huntsville, Alabama',
 'Albuquerque, New Mexico',
 'Knoxville, Tennessee',
 'Reno, Nevada',
 'Tampa, Florida',
 'Boulder, Colorado',
 'Tucson, Arizona',
 'Grand Rapids, Michigan',
 'Portland, Oregon',
 'Orlando, Florida',
 'Atlanta, Georgia',
 'Richmond, Virginia',
 'Boise, Idaho',
 'Salt Lake City, Utah',
 'Vancouver, Washington']

### Use the built-in Python Geocoder to obtain geographic coordinates for the 19 hippest mid-sized US cities

In [7]:
#pip install geocoder #Install Geocoder, if necessary

In [8]:
import geocoder
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="example app")

In [9]:
lat_lng_coords = []

In [10]:
for city in cities:
    lat_long = geolocator.geocode(city).point
    entry = {'City' : city, 'Latitude' : lat_long[0], 'Longitude' : lat_long[1]}
    lat_lng_coords.append(entry)

In [11]:
lat_lng_coords

[{'City': 'Santa Rosa, California',
  'Latitude': 38.4404925,
  'Longitude': -122.7141049},
 {'City': 'Rochester, New York',
  'Latitude': 43.157285,
  'Longitude': -77.615214},
 {'City': 'Seattle, Washington',
  'Latitude': 47.6038321,
  'Longitude': -122.3300624},
 {'City': 'San Antonio, Texas',
  'Latitude': 29.4246002,
  'Longitude': -98.4951405},
 {'City': 'Huntsville, Alabama',
  'Latitude': 34.729847,
  'Longitude': -86.5859011},
 {'City': 'Albuquerque, New Mexico',
  'Latitude': 35.212870949999996,
  'Longitude': -106.71324849574629},
 {'City': 'Knoxville, Tennessee',
  'Latitude': 35.9603948,
  'Longitude': -83.9210261},
 {'City': 'Reno, Nevada', 'Latitude': 39.5261206, 'Longitude': -119.8126581},
 {'City': 'Tampa, Florida', 'Latitude': 27.9477595, 'Longitude': -82.458444},
 {'City': 'Boulder, Colorado',
  'Latitude': 40.0149856,
  'Longitude': -105.270545},
 {'City': 'Tucson, Arizona',
  'Latitude': 32.2228765,
  'Longitude': -110.9748477},
 {'City': 'Grand Rapids, Michigan',

In [12]:
hip_towns = pd.DataFrame(lat_lng_coords)
hip_towns

Unnamed: 0,City,Latitude,Longitude
0,"Santa Rosa, California",38.440492,-122.714105
1,"Rochester, New York",43.157285,-77.615214
2,"Seattle, Washington",47.603832,-122.330062
3,"San Antonio, Texas",29.4246,-98.495141
4,"Huntsville, Alabama",34.729847,-86.585901
5,"Albuquerque, New Mexico",35.212871,-106.713248
6,"Knoxville, Tennessee",35.960395,-83.921026
7,"Reno, Nevada",39.526121,-119.812658
8,"Tampa, Florida",27.94776,-82.458444
9,"Boulder, Colorado",40.014986,-105.270545


### Input FourSquare Credentials:

In [13]:
CLIENT_ID = 'R0XOAK5S5QH1TRVROZT4RIMFXZSNXAKLFXINYOESGFUC2CFX' # your Foursquare ID
CLIENT_SECRET = 'EQW23JZIIHY53OU4LVRNRE2HPPC44E3ODH5KYJRTBTCFUT42' # your Foursquare Secret
ACCESS_TOKEN = '2XFXFNFWF2TXZOMFLIUE3ZKOHD10OY4BUYQMIF3MTLKP2CQG' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 1000

In [14]:
def getVenues(names):
    
    venues_list=[]
    for name, in zip(names):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION,
            name,
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['City', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

In [15]:
city_venues = getVenues(names=hip_towns['City'])

Santa Rosa, California
Rochester, New York
Seattle, Washington
San Antonio, Texas
Huntsville, Alabama
Albuquerque, New Mexico
Knoxville, Tennessee
Reno, Nevada
Tampa, Florida
Boulder, Colorado
Tucson, Arizona
Grand Rapids, Michigan
Portland, Oregon
Orlando, Florida
Atlanta, Georgia
Richmond, Virginia
Boise, Idaho
Salt Lake City, Utah
Vancouver, Washington


In [16]:
city_venues.head(10)

Unnamed: 0,City,Venue,Venue Category
0,"Santa Rosa, California",Ike's Place,Sandwich Place
1,"Santa Rosa, California",Oliver's Market,Supermarket
2,"Santa Rosa, California",SEA Thai Bistro,Thai Restaurant
3,"Santa Rosa, California",St. Francis Winery & Vineyards,Winery
4,"Santa Rosa, California",Superburger,Burger Joint
5,"Santa Rosa, California",Charles M. Schulz Museum & Research Center,Museum
6,"Santa Rosa, California",Jeffrey's Hillside Cafe,Café
7,"Santa Rosa, California",Benovia Winery,Winery
8,"Santa Rosa, California",Bird & The Bottle,American Restaurant
9,"Santa Rosa, California",Dierk's Parkside Café,Café


In [17]:
city_venues.shape

(1900, 3)

### Analyze each city

In [18]:
# one hot encoding
hip_cities_onehot = pd.get_dummies(city_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
hip_cities_onehot['City'] = city_venues['City'] 

hip_cities_onehot

Unnamed: 0,Adult Boutique,African Restaurant,American Restaurant,Amphitheater,Andhra Restaurant,Animal Shelter,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,...,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Yoga Studio,Zoo,Zoo Exhibit,City
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Santa Rosa, California"
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Santa Rosa, California"
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Santa Rosa, California"
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,"Santa Rosa, California"
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Santa Rosa, California"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1895,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Vancouver, Washington"
1896,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Vancouver, Washington"
1897,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Vancouver, Washington"
1898,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Vancouver, Washington"


In [19]:
first_column = hip_cities_onehot.pop('City')
hip_cities_onehot.insert(0, 'City', first_column)

hip_cities_onehot

Unnamed: 0,City,Adult Boutique,African Restaurant,American Restaurant,Amphitheater,Andhra Restaurant,Animal Shelter,Antique Shop,Arcade,Argentinian Restaurant,...,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Yoga Studio,Zoo,Zoo Exhibit
0,"Santa Rosa, California",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Santa Rosa, California",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Santa Rosa, California",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Santa Rosa, California",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
4,"Santa Rosa, California",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1895,"Vancouver, Washington",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1896,"Vancouver, Washington",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1897,"Vancouver, Washington",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1898,"Vancouver, Washington",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [20]:
hip_cities_grouped = hip_cities_onehot.groupby('City').mean().reset_index()
hip_cities_grouped

Unnamed: 0,City,Adult Boutique,African Restaurant,American Restaurant,Amphitheater,Andhra Restaurant,Animal Shelter,Antique Shop,Arcade,Argentinian Restaurant,...,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Yoga Studio,Zoo,Zoo Exhibit
0,"Albuquerque, New Mexico",0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,...,0.02,0.02,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0
1,"Atlanta, Georgia",0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,...,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0
2,"Boise, Idaho",0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Boulder, Colorado",0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.01,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0
4,"Grand Rapids, Michigan",0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0
5,"Huntsville, Alabama",0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.01,0.0,...,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0
6,"Knoxville, Tennessee",0.0,0.0,0.09,0.0,0.0,0.01,0.0,0.0,0.0,...,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Orlando, Florida",0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0
8,"Portland, Oregon",0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0
9,"Reno, Nevada",0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0


In [21]:
hip_cities_grouped.to_json("hip_cities.json")

## Import grouped neighborhood data for Manhattan and Toronto Neighborhoods (exported from previous labs)

In [22]:
manhattan_grouped = pd.read_json('manhattan.json')
manhattan_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,...,Vietnamese Restaurant,Volleyball Court,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Battery Park City,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0
1,Carnegie Hill,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.027778,...,0.013889,0.0,0.0,0.027778,0.0,0.013889,0.055556,0.0,0.013889,0.027778
2,Central Harlem,0.0,0.0,0.0,0.076923,0.051282,0.0,0.0,0.025641,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Chelsea,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.09,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.02,0.0
4,Chinatown,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,...,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0


In [23]:
toronto_grouped = pd.read_json('toronto.json')
toronto_grouped.shape

(100, 263)

### Merge Toronto and NY Datasets into one dataframe

In [24]:
TOR_NY = pd.concat([manhattan_grouped, toronto_grouped], axis=0, ignore_index=True)
TOR_NY_data = TOR_NY.rename({'Neighborhood' : 'Neighborhood/City'}, axis =1)
TOR_NY_data

Unnamed: 0,Neighborhood/City,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,...,Shopping Plaza,Skating Rink,Smoothie Shop,Stadium,Stationery Store,Swim School,Tanning Salon,Theme Restaurant,Truck Stop,Warehouse Store
0,Battery Park City,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,...,,,,,,,,,,
1,Carnegie Hill,0.0,0.0,0.0,0.000000,0.013889,0.0,0.0,0.000000,0.027778,...,,,,,,,,,,
2,Central Harlem,0.0,0.0,0.0,0.076923,0.051282,0.0,0.0,0.025641,0.000000,...,,,,,,,,,,
3,Chelsea,0.0,0.0,0.0,0.000000,0.030000,0.0,0.0,0.090000,0.000000,...,,,,,,,,,,
4,Chinatown,0.0,0.0,0.0,0.000000,0.030000,0.0,0.0,0.010000,0.000000,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
135,Willowdale West,0.0,0.0,0.0,,0.000000,0.0,,0.000000,,...,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
136,"Willowdale, Newtonbrook",0.0,0.0,0.0,,0.000000,0.0,,0.000000,,...,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
137,Woburn,0.0,0.0,0.0,,0.000000,0.0,,0.000000,,...,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
138,Woodbine Heights,0.0,0.0,0.0,,0.000000,0.0,,0.000000,,...,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Load the Hip Cities data frame from the json created above. Combine the Hip Cities data with the Toronto/Manhattan data frame, and replace all NaN values with 0.0

In [25]:
hip_cities_load = pd.read_json('hip_cities.json')

hip_cities_data = hip_cities_load.rename({'City' : 'Neighborhood/City'}, axis =1)

all_data = pd.concat([TOR_NY_data, hip_cities_data], axis=0, ignore_index=True)

final_df = all_data.fillna(0)

final_df

Unnamed: 0,Neighborhood/City,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,...,Ski Area,Soccer Stadium,State / Provincial Park,Summer Camp,Tex-Mex Restaurant,Theme Park,Used Auto Dealership,Winery,Zoo,Zoo Exhibit
0,Battery Park City,0.0,0.00,0.0,0.000000,0.000000,0.0,0.00,0.000000,0.000000,...,0.0,0.0,0.00,0.00,0.0,0.00,0.0,0.00,0.00,0.00
1,Carnegie Hill,0.0,0.00,0.0,0.000000,0.013889,0.0,0.00,0.000000,0.027778,...,0.0,0.0,0.00,0.00,0.0,0.00,0.0,0.00,0.00,0.00
2,Central Harlem,0.0,0.00,0.0,0.076923,0.051282,0.0,0.00,0.025641,0.000000,...,0.0,0.0,0.00,0.00,0.0,0.00,0.0,0.00,0.00,0.00
3,Chelsea,0.0,0.00,0.0,0.000000,0.030000,0.0,0.00,0.090000,0.000000,...,0.0,0.0,0.00,0.00,0.0,0.00,0.0,0.00,0.00,0.00
4,Chinatown,0.0,0.00,0.0,0.000000,0.030000,0.0,0.00,0.010000,0.000000,...,0.0,0.0,0.00,0.00,0.0,0.00,0.0,0.00,0.00,0.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
154,"Santa Rosa, California",0.0,0.00,0.0,0.000000,0.030000,0.0,0.00,0.000000,0.000000,...,0.0,0.0,0.01,0.01,0.0,0.00,0.0,0.04,0.01,0.00
155,"Seattle, Washington",0.0,0.00,0.0,0.000000,0.020000,0.0,0.00,0.000000,0.010000,...,0.0,0.0,0.00,0.00,0.0,0.00,0.0,0.00,0.00,0.00
156,"Tampa, Florida",0.0,0.01,0.0,0.000000,0.030000,0.0,0.01,0.000000,0.000000,...,0.0,0.0,0.00,0.00,0.0,0.01,0.0,0.00,0.01,0.02
157,"Tucson, Arizona",0.0,0.00,0.0,0.000000,0.030000,0.0,0.00,0.000000,0.000000,...,0.0,0.0,0.00,0.00,0.0,0.00,0.0,0.00,0.00,0.00


### Perform a k-means cluster analysis

In [26]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

In [27]:
# set number of clusters
kclusters = 10

clusters = final_df.drop('Neighborhood/City', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(clusters)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:]

array([8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
       8, 8, 8, 8, 8, 8, 8, 8, 2, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 3, 3, 3,
       3, 8, 8, 3, 8, 1, 8, 3, 8, 8, 8, 3, 7, 3, 3, 8, 3, 8, 8, 8, 8, 2,
       8, 2, 8, 3, 8, 8, 3, 8, 3, 2, 3, 2, 2, 8, 3, 8, 8, 0, 8, 8, 8, 3,
       8, 2, 8, 2, 3, 8, 6, 2, 8, 0, 3, 2, 3, 3, 8, 3, 3, 8, 2, 3, 3, 1,
       8, 8, 3, 8, 9, 3, 8, 3, 3, 8, 3, 3, 8, 1, 8, 2, 8, 3, 3, 3, 4, 3,
       1, 8, 3, 3, 1, 5, 8, 1, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
       8, 8, 8, 8, 3], dtype=int32)

In [28]:
# add clustering labels
final_df.insert(0, 'Cluster Labels', kmeans.labels_)
#neighborhoods_venues_sorted #verify above line added cluster label to each neighborhood in neighborhoods_venues_sorted

final_df

Unnamed: 0,Cluster Labels,Neighborhood/City,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,...,Ski Area,Soccer Stadium,State / Provincial Park,Summer Camp,Tex-Mex Restaurant,Theme Park,Used Auto Dealership,Winery,Zoo,Zoo Exhibit
0,8,Battery Park City,0.0,0.00,0.0,0.000000,0.000000,0.0,0.00,0.000000,...,0.0,0.0,0.00,0.00,0.0,0.00,0.0,0.00,0.00,0.00
1,8,Carnegie Hill,0.0,0.00,0.0,0.000000,0.013889,0.0,0.00,0.000000,...,0.0,0.0,0.00,0.00,0.0,0.00,0.0,0.00,0.00,0.00
2,8,Central Harlem,0.0,0.00,0.0,0.076923,0.051282,0.0,0.00,0.025641,...,0.0,0.0,0.00,0.00,0.0,0.00,0.0,0.00,0.00,0.00
3,8,Chelsea,0.0,0.00,0.0,0.000000,0.030000,0.0,0.00,0.090000,...,0.0,0.0,0.00,0.00,0.0,0.00,0.0,0.00,0.00,0.00
4,8,Chinatown,0.0,0.00,0.0,0.000000,0.030000,0.0,0.00,0.010000,...,0.0,0.0,0.00,0.00,0.0,0.00,0.0,0.00,0.00,0.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
154,8,"Santa Rosa, California",0.0,0.00,0.0,0.000000,0.030000,0.0,0.00,0.000000,...,0.0,0.0,0.01,0.01,0.0,0.00,0.0,0.04,0.01,0.00
155,8,"Seattle, Washington",0.0,0.00,0.0,0.000000,0.020000,0.0,0.00,0.000000,...,0.0,0.0,0.00,0.00,0.0,0.00,0.0,0.00,0.00,0.00
156,8,"Tampa, Florida",0.0,0.01,0.0,0.000000,0.030000,0.0,0.01,0.000000,...,0.0,0.0,0.00,0.00,0.0,0.01,0.0,0.00,0.01,0.02
157,8,"Tucson, Arizona",0.0,0.00,0.0,0.000000,0.030000,0.0,0.00,0.000000,...,0.0,0.0,0.00,0.00,0.0,0.00,0.0,0.00,0.00,0.00


### The cluster analysis worked. Unfortunately, based on the FourSquare data, all but one of the "Hip Mid-Size Citites" were placed in cluster 8, along with most of the neighborhoods of Manhattan. The variety, at least according to this analysis, seems mostly to be in Toronto.

### This tells us that while a move from parts of Toronto to a mid-sized American city may be a bit of a culture shock, New Yorkers wishing to find a similar variety of venues in a smaller city have a lot of options.