# Coursera Capstone - How Red and Blue Cities Compare in Terms of Interest

### 1. Introduction/

### 1a. Background / Problem / Idea

#### Background

Recently, there's been incredible political discourse in the U.S.  Pitted on one side are call it red states & cities / conversatives. These red cities tend to argue for less government intervention and are rooted in Christian-oriented values.  On the other are blue states & cities and liberals.  These blue cities tend to argue for more social safety nets and progressive policites.  With so much discourse, it suggests a fractured country.  With that, my hypothesis is that the country is more similiar than different.  I intend to leverage data science and machine learning to validate or invalidate this hypothesis.

#### Problem / Idea
Data that may contribute to how a city leans includes the types of businesses most prominent in said area.

I intend to analyze the venues that most often show up in the top liberal and conservative cities.  For this data, I'll leverage the FourSquare API.  From there, we can group together similiar cities (based on frequency of venues).  This may give us insight how political leanings may or may not intersect with the types of businesses that show up in each respective area.

### 1b. Audience

The target audience is those interested in politics.  Specifically those interested in learning how political leanings may influence socio trends or vice versa.

This audience will be interested in this given the incredible recent political discourse and the upcoming election.  In addition, they'll be interested to learn more about how political learnings are manifested or manifiest socio-economically.

### 2. Data Section

#### a. Data to be Utilized

1. I'll first need information on the largest liberal and converative cities.  For that, I'll utilize the data in this article: https://www.vox.com/xpress/2014/8/9/5983959/the-most-liberal-and-conservative-big-cities-in-america-in-one-chart.  I did find some issues that skewed my analysis (more to come on this).  Given this, I added two additional conservative cities from this list: https://cafemom.com/news/188114-20_most_conservative_states_in/136268-1_mesa_arizona.  The data sampled from here are the ten most liberal and ten most conservative cities.
2. I'll need geolocation data for the center point of each city.  I'll utilize Nominatim to fetch this information.
3. To get venue information, I plan to pull from the FourSquare API.  I'll use the geolocation information in #2 to pull the data for each city.  Specifically, I'll be utilizing the venues call in the API.  With that, sample data I'll get will include the category of the business - for example, yoga sudios, accessory stores, aiports, train stations, restaurants as well as the frequency/count of said category for business.  
4. Once I've organized the venue information by frequency per city, I'll use K-Means to cluster similar cities.

#### b. Data Cleaning

Fortunately, there wasn't much data cleaning that I had to do.  The analysis accounts for a fairly limited amount of features (more to go into later).  Howere, there are a few issues that I came across.

1. The main article I used to determine liberal or conservative cities was set within an image.  Being an image, I couldn't scrap the data.  Therefore, I started by reconstructing the table.

2. Outlier cities due more to unique properties of that city.

I tried to kept the number clusters small given I only have 20 cities.  With that, I started with 3 clusters.  Two of the clusters only had 1 city.  In reviewing those clusters, it was representative of

a. Virigina Beach, VA
b. Anaheim, CA

Virigina Beach was it's own cluster because it has a heavy preponderence of beaches.  Anaheim had a heavy preponderence of theme parks / attractions.  The result was single city listings for two clusters and all other cities placed into the other cluster.  This would heavily skew my analysis.  Therefore, I decided to remove Virigina Beach and Anaheim with Tulsa and Corpus Christi which came from https://cafemom.com/news/188114-20_most_conservative_states_in/136268-1_mesa_arizona.

3. Amount of data

One restriction was the amount of data per location I could pull.  Unfortunately, it seems that the FourSquare API restricted the data pull for each location to 100 records.  For larger cities like New York City, the small data set may not be representative of the larger metro.  In addition, with the large amount of potential categories, it cuts the data even smaller.

### c. Feature Selection

1. The two articles mentioned earlier gave me 10 liberal and 10 conservative leaning cities.  This enabled me to have a feature of Is_Liberal to track the individual cities.
2. With Nominatim, I was able to the feature of lat / long for each city.  This enabled me to utilize the FourSquare API.
3. The FourSquare API enabled me to extract features such as frequency of categories and therefore most common venues.

### 3. Methodology

To start, I'll need to collect the data necessary to run my analysis.  This will include:

a. Constructing my initial data set so that I can accurately utilize the FourSquare API.

b. Pulling data from the FourSquare API.

c. Completing some statistic analysis to ensure that I pulled the correct amount of data and that's its organized properly.

From there, I plan to use K-Means clustering to group together common cities based the category of businesses that most often display and run analysis to determine why K-means may have clustered in certain respects.

### 3a. Constructing my initial data set so that I can accurately utilize the FourSquare API.

#### Import Libraries that I'm going to need.

In [1]:
#Importing statistical and dataframe library

import numpy as np # library for vectorized computation
import pandas as pd # library to process data as dataframes

print("Libraries downloaded")

Libraries downloaded


#### Building an Initial Dataframe based on Liberal & Conservative Cities

In [85]:
#laying out cities into a list
cities = ['San Francisco', 'Washington', 'Seattle', 'Oakland', 'Boston', 'Minneapolis', 'Detroit', 'New York', 'Buffalo', 'Baltimore', 
          'Mesa', 'Oklahoma City', 'Colorado Springs', 'Jacksonville', 'Arlington', 'Omaha', 'Anchorage', 'Fort Worth', 'Tulsa', 'Corpus Christi']

In [86]:
#laying out the states
states = ['California', 'DC', 'Washington', 'California', 'Massachusetts', 'Minnesota', 'Michigan', 'New York', 'New York','Maryland', 
          'Arizona', 'Oklahoma', 'Colorado', 'Florida', 'Texas', 'Nebraska', 'Alaska', 'Texas', 'Oklahoma', 'Texas']

In [87]:
#laying out whether it's a liberal city
is_liberal = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
          0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

In [88]:
#constructing dataframe based on cities, states, is_liberal

#first setting up the data dictionary
city_data = {'Cities': cities, 'State':states, 'Is_Liberal':is_liberal}

#setting up the dataframe
initial_dataframe = pd.DataFrame(city_data, columns = ['Cities', 'State', 'Is_Liberal'])

#verifying data frame
initial_dataframe.head(20)

Unnamed: 0,Cities,State,Is_Liberal
0,San Francisco,California,1
1,Washington,DC,1
2,Seattle,Washington,1
3,Oakland,California,1
4,Boston,Massachusetts,1
5,Minneapolis,Minnesota,1
6,Detroit,Michigan,1
7,New York,New York,1
8,Buffalo,New York,1
9,Baltimore,Maryland,1


#### Getting Lat / Long Information for each of the cities and mapping

In [6]:
#importing relevant mapping libraries

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-4.1.0               |             py_1         614 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    openssl-1.1.1g             |       h516909a_1         2.1 MB  conda-forge
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ------------------------------------------------------------
                       

In [7]:
#using the geopy library to get a general lat / long for Toronto

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          97 KB

The following NEW packages will be INSTALLED:

    geographiclib: 1.50-py_0          conda-forge
    geopy:         2.0.0-pyh9f0ad1d_0 conda-forge


Downloading and Extracting Packages
geopy-2.0.0          | 63 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done


In [89]:
#add new Latitude and Longitude columns to our dataframe to store future values
initial_dataframe['Latitude'] = np.nan
initial_dataframe['Longitude'] = np.nan

initial_dataframe.head()

Unnamed: 0,Cities,State,Is_Liberal,Latitude,Longitude
0,San Francisco,California,1,,
1,Washington,DC,1,,
2,Seattle,Washington,1,,
3,Oakland,California,1,,
4,Boston,Massachusetts,1,,


In [90]:
#setting up geolocator function
geolocator = Nominatim(user_agent="us_explorer")

#running a loop to populate the lat / long data for all the cities.
index = 0

while index < len(initial_dataframe):
    #look up city information
    temp_city = initial_dataframe.iloc[index]['Cities']
    temp_state = initial_dataframe.iloc[index]['State']
    #setting the address from the city state information
    temp_location = temp_city, temp_state
    #calling the geolocator function to get lat and long
    geolocator_location = geolocator.geocode(temp_location)
    temp_latitude = geolocator_location.latitude
    temp_longitude = geolocator_location.longitude
    #assigning the lat and long values to the dataframe
    initial_dataframe.set_value(index, 'Latitude', temp_latitude)
    initial_dataframe.set_value(index, 'Longitude', temp_longitude)
    #increasing the index
    index = index + 1
    
#viewing dataframe with lat / long data
initial_dataframe.head(20)



Unnamed: 0,Cities,State,Is_Liberal,Latitude,Longitude
0,San Francisco,California,1,37.779026,-122.419906
1,Washington,DC,1,38.894985,-77.036571
2,Seattle,Washington,1,47.603832,-122.330062
3,Oakland,California,1,37.804456,-122.271356
4,Boston,Massachusetts,1,42.360253,-71.058291
5,Minneapolis,Minnesota,1,44.9773,-93.265469
6,Detroit,Michigan,1,42.331551,-83.04664
7,New York,New York,1,40.712728,-74.006015
8,Buffalo,New York,1,42.886717,-78.878392
9,Baltimore,Maryland,1,39.290882,-76.610759


#### Creating map with red and blue dots to signify locations

In [91]:
#creating a mapping container to start

#setting lat / long to center point in the country

central_latitude = 39.50
central_longitude = -98.35

map_us = folium.Map(location=[central_latitude, central_longitude], zoom_start=3)

In [143]:
#add color palette and legend details

color_palette = ['red', 'blue']
#legend_details = FeatureGroup(name='Layer1')

# add markers to map
for lat, lng, label, col in zip(initial_dataframe['Latitude'], initial_dataframe['Longitude'], initial_dataframe['Cities'], initial_dataframe['Is_Liberal']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=color_palette[col],
        fill=True,
        fill_color=color_palette[col],
        fill_opacity=0.5,
        parse_html=False).add_to(map_us)  

map_us.caption = 'Most Conservative and Liberal Cities'
    
#displaying map
map_us

I know have my dataset to start pulling relevant venue information.  You can see from the map up top that I've categorized conservative / red cities with a red dot and liberal / blue cities with a blue dot.

Based on the map alone, it seems to suggest that another "feature" that may determine political leaning is location in the country you are. More conservative leaning cities are in the south while more liberal in the north.

### b. Pulling data from the Calling the FourSquare API

In [18]:
#setting the basic URL of foursquare.

url = 'https://api.foursquare.com/v2/venues/explore'

In [19]:
#libraries to handle requests and parse json

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

In [20]:
# The code was removed by Watson Studio for sharing.

In [185]:
#Pulling data from a single location to verify we're pulling correctly
radius = 10000
LIMIT = 200

#url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, 
#CLIENT_SECRET, initial_dataframe.loc[0][3], initial_dataframe.loc[0][4], VERSION, radius, LIMIT)
#results = requests.get(url).json()
#results

#### Getting venue data for all the locations from the FourSquare API

In [93]:
#Function to get all the Neighborhood information in Toronto

LIMIT = 200

def getNearbyVenues(names, latitudes, longitudes, radius=5000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['City', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [94]:
#utilizing above to get information in all the Toronto locations

all_venues = getNearbyVenues(names=initial_dataframe['Cities'],
                                   latitudes=initial_dataframe['Latitude'],
                                   longitudes=initial_dataframe['Longitude']
                                  )

San Francisco
Washington
Seattle
Oakland
Boston
Minneapolis
Detroit
New York
Buffalo
Baltimore
Mesa
Oklahoma City
Colorado Springs
Jacksonville
Arlington
Omaha
Anchorage
Fort Worth
Tulsa
Corpus Christi


### c. Completing some statistic analysis to ensure that I pulled the correct amount of data and that's its organized properly

The next two things I'm doing is ensuring that all locations have the same amount of records. 

For this, I look at the shape of the dataframe.  I also count the venues per city.

In [95]:
#checking the new dataframe

print(all_venues.shape)
all_venues.head()

(2000, 7)


Unnamed: 0,City,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,San Francisco,37.779026,-122.419906,Louise M. Davies Symphony Hall,37.777976,-122.420157,Concert Hall
1,San Francisco,37.779026,-122.419906,War Memorial Opera House,37.778601,-122.420816,Opera House
2,San Francisco,37.779026,-122.419906,SFJazz Center,37.77635,-122.421539,Jazz Club
3,San Francisco,37.779026,-122.419906,Asian Art Museum,37.780178,-122.416505,Art Museum
4,San Francisco,37.779026,-122.419906,Birba,37.77775,-122.424159,Wine Bar


In [96]:
#checking venues returned per neighborhood

all_venues.groupby('City').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Anchorage,100,100,100,100,100,100
Arlington,100,100,100,100,100,100
Baltimore,100,100,100,100,100,100
Boston,100,100,100,100,100,100
Buffalo,100,100,100,100,100,100
Colorado Springs,100,100,100,100,100,100
Corpus Christi,100,100,100,100,100,100
Detroit,100,100,100,100,100,100
Fort Worth,100,100,100,100,100,100
Jacksonville,100,100,100,100,100,100


I then check how many categories of businesses there are.  Theres a fair amount of businesses which makes me a bit concerned given I only have 100 locations per city.

In [97]:
# Checking unique categories per for the returned venues

print('There are {} uniques categories.'.format(len(all_venues['Venue Category'].unique())))

There are 274 uniques categories.


#### Analyzing each city for the categories of venues

I then want to calculate how many categories of businesses are present in each city and the frequency of those businesses.  This will given me initial insight into differences at a city level.

In [150]:
#Setting up a dataframe to categorize each location provided by the FourSquare API.

# one hot encoding
us_onehot = pd.get_dummies(all_venues[['Venue Category']], prefix="", prefix_sep="")

# add city column back to dataframe
us_onehot['City'] = all_venues['City']

# move city column to the first column
fixed_columns = [us_onehot.columns[-1]] + list(us_onehot.columns[:-1])
us_onehot = us_onehot[fixed_columns]

us_onehot.head()

Unnamed: 0,City,Accessories Store,Adult Boutique,American Restaurant,Aquarium,Arcade,Arepa Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Bath House,Beach,Beer Bar,Beer Garden,Beer Store,Belgian Restaurant,Bike Rental / Bike Share,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Business Service,Butcher,Café,Cajun / Creole Restaurant,Candy Store,Car Wash,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Churrascaria,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Baseball Diamond,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Convention Center,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Dog Run,Donut Shop,Drive-in Theater,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Financial or Legal Service,Fish Market,Flower Shop,Fondue Restaurant,Food,Food & Drink Shop,Food Court,Food Service,Food Truck,Football Stadium,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Golf Driving Range,Gourmet Shop,Government Building,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Herbs & Spices Store,High School,Historic Site,History Museum,Hobby Shop,Hockey Arena,Hockey Rink,Hookah Bar,Hospital,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Korean Restaurant,Lake,Latin American Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Marijuana Dispensary,Market,Mediterranean Restaurant,Memorial Site,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mongolian Restaurant,Monument / Landmark,Moroccan Restaurant,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Nail Salon,Nature Preserve,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Nightlife,Outdoor Sculpture,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pier,Pilates Studio,Pizza Place,Planetarium,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Print Shop,Pub,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Restaurant,River,Rock Club,Romanian Restaurant,Roof Deck,Russian Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skate Park,Smoke Shop,Smoothie Shop,Snack Place,Soba Restaurant,Soccer Field,Soccer Stadium,Social Club,South American Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Stadium,State / Provincial Park,Stationery Store,Steakhouse,Street Food Gathering,Strip Club,Supermarket,Surf Spot,Sushi Restaurant,Taco Place,Tea Room,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Tiki Bar,Tourist Information Center,Toy / Game Store,Track Stadium,Trail,Train Station,Trattoria/Osteria,Udon Restaurant,Used Bookstore,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Winery,Wings Joint,Yoga Studio,Zoo,Zoo Exhibit
0,San Francisco,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,San Francisco,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,San Francisco,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,San Francisco,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,San Francisco,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0


#### Next, let's group rows by city and by taking the sum and mean of the frequency of occurrence of each category

In [151]:
#taking sum of categories that show up
cities_grouped_by_total = us_onehot.groupby('City').sum().reset_index()
cities_grouped_by_total.head(20)

Unnamed: 0,City,Accessories Store,Adult Boutique,American Restaurant,Aquarium,Arcade,Arepa Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Bath House,Beach,Beer Bar,Beer Garden,Beer Store,Belgian Restaurant,Bike Rental / Bike Share,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Business Service,Butcher,Café,Cajun / Creole Restaurant,Candy Store,Car Wash,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Churrascaria,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Baseball Diamond,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Convention Center,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Dog Run,Donut Shop,Drive-in Theater,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Financial or Legal Service,Fish Market,Flower Shop,Fondue Restaurant,Food,Food & Drink Shop,Food Court,Food Service,Food Truck,Football Stadium,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Golf Driving Range,Gourmet Shop,Government Building,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Herbs & Spices Store,High School,Historic Site,History Museum,Hobby Shop,Hockey Arena,Hockey Rink,Hookah Bar,Hospital,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Korean Restaurant,Lake,Latin American Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Marijuana Dispensary,Market,Mediterranean Restaurant,Memorial Site,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mongolian Restaurant,Monument / Landmark,Moroccan Restaurant,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Nail Salon,Nature Preserve,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Nightlife,Outdoor Sculpture,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pier,Pilates Studio,Pizza Place,Planetarium,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Print Shop,Pub,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Restaurant,River,Rock Club,Romanian Restaurant,Roof Deck,Russian Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skate Park,Smoke Shop,Smoothie Shop,Snack Place,Soba Restaurant,Soccer Field,Soccer Stadium,Social Club,South American Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Stadium,State / Provincial Park,Stationery Store,Steakhouse,Street Food Gathering,Strip Club,Supermarket,Surf Spot,Sushi Restaurant,Taco Place,Tea Room,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Tiki Bar,Tourist Information Center,Toy / Game Store,Track Stadium,Trail,Train Station,Trattoria/Osteria,Udon Restaurant,Used Bookstore,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Winery,Wings Joint,Yoga Studio,Zoo,Zoo Exhibit
0,Anchorage,1,0,2,0,0,0,0,0,0,2,0,1,3,0,3,0,0,0,0,0,0,0,0,1,0,2,0,0,0,0,2,2,0,0,0,0,0,0,0,2,1,0,0,0,0,1,1,0,0,0,0,0,6,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,2,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,3,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,1,1,2,0,0,1,0,0,2,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,5,0,1,0,1,0,0,0,4,0,1,0,0,0,0,0,1,0,0,0,0,2,0,0,0,0,0,0,1,0,0,0,0,0,6,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,3,0,0,0,0,2,0,1,0,1,0,0,0,0,1,0,0,1,1,0,0,0,2,0,1,0,0,0,1,0,0,0,0,0,0
1,Arlington,0,0,4,0,1,0,0,0,0,3,2,0,2,0,1,0,0,0,0,1,0,0,0,0,0,2,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0,0,0,1,0,3,0,0,0,0,0,0,3,0,1,0,0,0,1,0,3,0,0,0,0,1,0,0,1,0,0,0,2,0,0,0,0,0,0,0,0,0,2,0,0,0,1,0,0,0,0,0,0,1,3,1,2,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,1,1,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,2,0,0,0,0,1,0,0,4,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,2,0,0,0,1,0,0,0,2,0,0,0,0,0,0,0,1,0,0,0,0,2,0,0,0,0,1,0,1,5,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,1,1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,4,0,1,0,0,1,0,1,0,0,0
2,Baltimore,0,0,4,3,0,0,0,2,0,1,0,0,2,0,3,1,0,0,0,1,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,2,0,0,0,1,0,0,0,0,0,0,0,0,1,0,1,2,4,0,0,0,0,1,0,0,0,0,0,0,0,3,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,2,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,4,0,1,0,0,0,0,5,2,0,0,0,0,0,0,0,0,0,0,2,0,0,1,0,0,1,1,0,0,0,0,1,0,0,0,1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,3,0,0,1,0,0,0,0,2,0,0,0,0,0,2,0,0,1,1,0,0,0,0,0,0,0,0,1,1,0,0,1,1,0,2,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,1,2,0,0,1,3,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,2,1,0,0,0,0,0
3,Boston,0,0,0,2,0,0,0,0,0,1,0,0,8,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,1,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,5,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,2,0,3,2,0,0,0,0,0,0,0,2,0,0,1,0,0,0,0,0,5,0,2,0,0,0,0,5,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,2,0,0,0,0,1,0,0,0,0,1,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,8,0,0,0,1,0,0,0,4,0,0,1,0,0,0,0,0,0,0,0,0,2,1,0,0,0,0,3,0,3,0,0,1,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0
4,Buffalo,0,0,3,0,0,0,2,0,0,0,1,0,1,0,7,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,8,0,0,0,1,0,0,0,2,0,0,1,0,0,1,0,0,0,0,0,3,3,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,2,0,0,1,0,0,0,0,0,0,0,1,0,2,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,1,0,2,0,0,0,0,0,0,0,0,0,1,0,1,1,0,0,0,1,5,0,0,0,0,0,0,3,1,0,0,1,0,0,0,1,0,0,0,0,0,2,1,0,0,1,0,0,0,0,0,0,0,0,0,2,0,0,0,1,0,4,0,0,1,0,0,0,0,1,0,1,0,1,0,0,0,0,0,3,0,0,2,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,1,0,1,0,0,0
5,Colorado Springs,0,0,3,0,0,0,0,1,0,0,1,1,1,1,5,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,2,5,0,0,0,3,0,0,0,5,1,1,0,0,0,0,0,0,0,0,0,0,9,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0,0,1,0,0,0,0,1,0,3,0,0,2,0,0,0,0,0,0,1,0,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,2,0,0,0,1,2,0,0,0,1,0,0,0,0,0,0,2,0,0,0,1,0,0,3,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,3,0,0,0,0,0,1,0,0,0,6,0,0,0,0,1,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,3,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,1,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
6,Corpus Christi,0,0,2,0,0,0,0,0,0,1,1,0,1,0,3,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,7,0,0,0,2,0,0,0,0,0,1,0,0,0,0,0,0,2,0,0,0,0,0,2,0,1,1,0,0,0,2,1,0,3,3,1,0,0,0,0,0,1,0,0,0,0,0,3,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,2,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,2,1,0,2,0,0,0,0,2,0,0,0,0,1,0,0,0,0,1,1,0,0,0,1,0,0,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,3,0,1,0,0,0,0,1,2,0,0,0,0,2,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,2,0,0,0,0,2,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,2,0,0,0,1,0,0,2,0,0,0
7,Detroit,0,0,3,0,0,0,2,0,0,0,1,1,0,0,3,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,5,0,0,1,1,0,1,0,3,0,0,0,0,1,0,0,0,0,0,0,3,6,0,0,0,1,1,0,0,0,1,0,0,0,1,0,1,3,0,1,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,2,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,5,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,1,0,0,1,0,0,0,2,0,5,0,0,0,0,0,0,0,1,0,0,4,0,0,0,0,0,0,1,1,0,3,0,0,1,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,2,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,2,0,0,0,0,0,0,0
8,Fort Worth,0,0,6,0,0,0,0,2,1,0,2,0,0,0,3,0,0,0,0,4,0,0,0,0,0,0,1,0,0,1,1,6,0,0,0,2,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,2,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,2,0,0,1,0,0,1,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,3,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,3,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,3,0,0,0,0,0,0,0,0,0,2,0,1,0,1,0,0,0,2,0,0,1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,3,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,2,2,0,0,2,0,0,0,0,0,0,0,1,0,0,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0
9,Jacksonville,0,0,5,0,0,0,0,1,1,1,0,1,1,0,5,1,0,0,0,0,2,1,0,0,0,0,0,0,0,0,1,5,0,0,0,1,2,0,0,3,0,0,0,1,0,0,0,0,0,0,0,2,6,0,0,1,0,3,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,2,2,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,2,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,2,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,6,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,1,0,0,0,0,1,0,1,0,0,0,0,0,4,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,2,0,0,0,3,0,0,0,0,3,0,0,0,0,2,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [141]:
#taking mean of the categories that show up
cities_grouped = us_onehot.groupby('City').mean().reset_index()
cities_grouped.head()

Unnamed: 0,City,Accessories Store,Adult Boutique,American Restaurant,Aquarium,Arcade,Arepa Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Bath House,Beach,Beer Bar,Beer Garden,Beer Store,Belgian Restaurant,Bike Rental / Bike Share,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Business Service,Butcher,Café,Cajun / Creole Restaurant,Candy Store,Car Wash,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Churrascaria,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Baseball Diamond,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Convention Center,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Dog Run,Donut Shop,Drive-in Theater,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Financial or Legal Service,Fish Market,Flower Shop,Fondue Restaurant,Food,Food & Drink Shop,Food Court,Food Service,Food Truck,Football Stadium,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Golf Driving Range,Gourmet Shop,Government Building,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Herbs & Spices Store,High School,Historic Site,History Museum,Hobby Shop,Hockey Arena,Hockey Rink,Hookah Bar,Hospital,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Korean Restaurant,Lake,Latin American Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Marijuana Dispensary,Market,Mediterranean Restaurant,Memorial Site,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mongolian Restaurant,Monument / Landmark,Moroccan Restaurant,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Nail Salon,Nature Preserve,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Nightlife,Outdoor Sculpture,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pier,Pilates Studio,Pizza Place,Planetarium,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Print Shop,Pub,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Restaurant,River,Rock Club,Romanian Restaurant,Roof Deck,Russian Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skate Park,Smoke Shop,Smoothie Shop,Snack Place,Soba Restaurant,Soccer Field,Soccer Stadium,Social Club,South American Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Stadium,State / Provincial Park,Stationery Store,Steakhouse,Street Food Gathering,Strip Club,Supermarket,Surf Spot,Sushi Restaurant,Taco Place,Tea Room,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Tiki Bar,Tourist Information Center,Toy / Game Store,Track Stadium,Trail,Train Station,Trattoria/Osteria,Udon Restaurant,Used Bookstore,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Winery,Wings Joint,Yoga Studio,Zoo,Zoo Exhibit
0,Anchorage,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.03,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.05,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0
1,Arlington,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.02,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.04,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.05,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0
2,Baltimore,0.0,0.0,0.04,0.03,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.02,0.0,0.03,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.04,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.05,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0
3,Boston,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.03,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.02,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.08,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.03,0.0,0.0,0.01,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0
4,Buffalo,0.0,0.0,0.03,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.07,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.08,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.04,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0


#### Printing each neighborhood along with the top 5 most common venues

In [101]:
num_top_venues = 5

for city in cities_grouped['City']:
    print("----"+city+"----")
    temp = cities_grouped[cities_grouped['City'] == city].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Anchorage----
                venue  freq
0  Seafood Restaurant  0.06
1         Coffee Shop  0.06
2                Park  0.05
3         Pizza Place  0.04
4          Steakhouse  0.03


----Arlington----
                   venue  freq
0         Sandwich Place  0.05
1    American Restaurant  0.04
2     Mexican Restaurant  0.04
3  Vietnamese Restaurant  0.04
4         Cosmetics Shop  0.03


----Baltimore----
                 venue  freq
0   Italian Restaurant  0.05
1                Hotel  0.04
2  American Restaurant  0.04
3          Coffee Shop  0.04
4             Aquarium  0.03


----Boston----
                venue  freq
0              Bakery  0.08
1                Park  0.08
2  Italian Restaurant  0.05
3               Hotel  0.05
4         Coffee Shop  0.05


----Buffalo----
                     venue  freq
0                  Brewery  0.08
1                      Bar  0.07
2                    Hotel  0.05
3  New American Restaurant  0.04
4       Italian Restaurant  0.03


----Colorad

Based on this frequency sorting, we're able to see that there are certain stand out venue types.  For example, San Francisco and Colorado Springs are high on coffee shops despite being liberal and conservative.  Corpus Christi and Mesa are high on Mexican restaurants.  Lastly, Minneapolis and Oklahoma City are high on breweries despite being on opposite ends of the political spectrum.

#### Sorting venues by descending order and placing into a dataframe.

In [173]:
#creating a function to place into dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [174]:
#Creating a dataframe with top 10 venues per neighborhood

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['City']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
city_venues_sorted = pd.DataFrame(columns=columns)
city_venues_sorted['City'] = cities_grouped['City']

for ind in np.arange(cities_grouped.shape[0]):
    city_venues_sorted.iloc[ind, 1:] = return_most_common_venues(cities_grouped.iloc[ind, :], num_top_venues)

city_venues_sorted.head()

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Anchorage,Coffee Shop,Seafood Restaurant,Park,Pizza Place,Hotel,Bakery,Bar,Cosmetics Shop,Steakhouse,Sporting Goods Shop
1,Arlington,Sandwich Place,Mexican Restaurant,American Restaurant,Vietnamese Restaurant,Chinese Restaurant,Fried Chicken Joint,Coffee Shop,Cosmetics Shop,Asian Restaurant,Furniture / Home Store
2,Baltimore,Italian Restaurant,American Restaurant,Coffee Shop,Hotel,Aquarium,Theater,Deli / Bodega,Bar,Park,Burger Joint
3,Boston,Park,Bakery,Coffee Shop,Hotel,Italian Restaurant,Pizza Place,Seafood Restaurant,Salad Place,Sandwich Place,Gym
4,Buffalo,Brewery,Bar,Hotel,New American Restaurant,Coffee Shop,American Restaurant,Pizza Place,Cocktail Bar,Italian Restaurant,Market


#### Clustering Cities Based on K-Means

I next used the machine learning algorithm K-Means to group together different cities based on the most common venues that displayed.  I preferred the unsuperivised learning approach as my hypothese wasnt baked in necessarily achieving an end objective.

I settled on three as it seemed to create some nice buckets.  Also, I do have limited data (20 cities), so I didn't want to segment to much.

In [176]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 3

cities_grouped_clustering = cities_grouped.drop('City', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(cities_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 0, 1, 1, 2, 2, 0, 2, 2, 2], dtype=int32)

In [177]:
#dataframe with clusters and top venues for each city merged together

# add clustering labels
city_venues_sorted.insert(0, 'Cluster Labels', (kmeans.labels_))

In [178]:
#showing dataframe with cluster values
city_venues_sorted.head()

Unnamed: 0,Cluster Labels,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,2,Anchorage,Coffee Shop,Seafood Restaurant,Park,Pizza Place,Hotel,Bakery,Bar,Cosmetics Shop,Steakhouse,Sporting Goods Shop
1,0,Arlington,Sandwich Place,Mexican Restaurant,American Restaurant,Vietnamese Restaurant,Chinese Restaurant,Fried Chicken Joint,Coffee Shop,Cosmetics Shop,Asian Restaurant,Furniture / Home Store
2,1,Baltimore,Italian Restaurant,American Restaurant,Coffee Shop,Hotel,Aquarium,Theater,Deli / Bodega,Bar,Park,Burger Joint
3,1,Boston,Park,Bakery,Coffee Shop,Hotel,Italian Restaurant,Pizza Place,Seafood Restaurant,Salad Place,Sandwich Place,Gym
4,2,Buffalo,Brewery,Bar,Hotel,New American Restaurant,Coffee Shop,American Restaurant,Pizza Place,Cocktail Bar,Italian Restaurant,Market


In [179]:
#add latitude and longitude values.  first, need to create separate dataframe with just specific columns
merged_dataframe = city_venues_sorted.merge(initial_dataframe, left_on='City', right_on='Cities')
merged_dataframe

Unnamed: 0,Cluster Labels,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cities,State,Is_Liberal,Latitude,Longitude
0,2,Anchorage,Coffee Shop,Seafood Restaurant,Park,Pizza Place,Hotel,Bakery,Bar,Cosmetics Shop,Steakhouse,Sporting Goods Shop,Anchorage,Alaska,0,61.216313,-149.894852
1,0,Arlington,Sandwich Place,Mexican Restaurant,American Restaurant,Vietnamese Restaurant,Chinese Restaurant,Fried Chicken Joint,Coffee Shop,Cosmetics Shop,Asian Restaurant,Furniture / Home Store,Arlington,Texas,0,32.701939,-97.105624
2,1,Baltimore,Italian Restaurant,American Restaurant,Coffee Shop,Hotel,Aquarium,Theater,Deli / Bodega,Bar,Park,Burger Joint,Baltimore,Maryland,1,39.290882,-76.610759
3,1,Boston,Park,Bakery,Coffee Shop,Hotel,Italian Restaurant,Pizza Place,Seafood Restaurant,Salad Place,Sandwich Place,Gym,Boston,Massachusetts,1,42.360253,-71.058291
4,2,Buffalo,Brewery,Bar,Hotel,New American Restaurant,Coffee Shop,American Restaurant,Pizza Place,Cocktail Bar,Italian Restaurant,Market,Buffalo,New York,1,42.886717,-78.878392
5,2,Colorado Springs,Coffee Shop,Park,Café,Bar,Brewery,New American Restaurant,Sandwich Place,Steakhouse,Mexican Restaurant,Gastropub,Colorado Springs,Colorado,0,38.833958,-104.825349
6,0,Corpus Christi,Mexican Restaurant,Burger Joint,Bar,Restaurant,Fast Food Restaurant,Diner,Discount Store,Seafood Restaurant,Grocery Store,Italian Restaurant,Corpus Christi,Texas,0,27.747725,-97.401413
7,2,Detroit,Coffee Shop,Park,Brewery,Hotel,Plaza,Farmers Market,American Restaurant,New American Restaurant,Restaurant,Café,Detroit,Michigan,1,42.331551,-83.04664
8,2,Fort Worth,Brewery,Hotel,American Restaurant,Beer Bar,Coffee Shop,Seafood Restaurant,Bar,Ice Cream Shop,New American Restaurant,Mexican Restaurant,Fort Worth,Texas,0,32.753177,-97.332746
9,2,Jacksonville,Park,Coffee Shop,Bar,American Restaurant,Brewery,Sandwich Place,Steakhouse,Concert Hall,Café,Sushi Restaurant,Jacksonville,Florida,0,30.332184,-81.655651


### 4. Results

In [184]:
#visualizing the cluster

#setting lat / long to center point in the country

central_latitude = 39.50
central_longitude = -98.35

map_us_clusters = folium.Map(location=[central_latitude, central_longitude], zoom_start=4)

#add color palette

color_palette = ['green', 'yellow', 'orange']

# add markers to the map
for lat, lon, poi, cluster in zip(merged_dataframe['Latitude'], merged_dataframe['Longitude'], merged_dataframe['City'], merged_dataframe['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=color_palette[cluster],
        fill=True,
        fill_color=color_palette[cluster],
        fill_opacity=0.7).add_to(map_us_clusters)
       
map_us_clusters

From the map above, you can see a the distribution of the clusters.

The results are

Cluster 1:

1. Arlington, Texas - conservative
2. Corpus Christi, Texas - conservative
3. Mesa, Arizona - conservative

Cluster 2:

1. Baltimore, MD - liberal
2. Boston, MA - liberal
3. New York, NY - liberal
4. Seattle, WA - liberal
5. Washington, DC - liberal

Cluster 3:

1. Tulsa, OK - conservative
2. Oklahoma City, OK - conservative
3. Colorado Springs, CO - conservative
4. Jacksonville, FL - conservative
5. Omaha, NE - conservative
6. Fort Worth, TX - conservative
7. Anchorage, AK - conservative
8. Minneapolis, MN - liberal
9. Detroit, MI - liberal
10. Buffalo, NY - liberal
11. Oakland, CA - liberal
12. San Francisco, CA - liberal
    
While cluster 1 and 2 seemed to be homogeneous, cluster 3 was not.

I found the clustering interesting so I wanted to do more statistical analysis to give intuition on why such clusters may have taken hold.

I analyzed the top three venues per location to create categories.

From there, I analyzed the total and frequency based on the cluster.

In [181]:
#tracking list of most common categories
list_of_most_common = np.unique(merged_dataframe[['1st Most Common Venue', '2nd Most Common Venue', '3rd Most Common Venue']].values)
list_of_common = list_of_most_common.tolist()
list_of_common.insert(0,'City')
list_of_common

['City',
 'American Restaurant',
 'Art Museum',
 'Bakery',
 'Bar',
 'Beer Garden',
 'Bookstore',
 'Brewery',
 'Burger Joint',
 'Café',
 'Coffee Shop',
 'Convenience Store',
 'Hotel',
 'Italian Restaurant',
 'Mexican Restaurant',
 'Monument / Landmark',
 'New American Restaurant',
 'Park',
 'Pizza Place',
 'Sandwich Place',
 'Seafood Restaurant',
 'Vietnamese Restaurant',
 'Yoga Studio',
 'Zoo Exhibit']

In [117]:
#viewing of top 3 venues per city

top_3_venues_per_city = cities_grouped_by_total[list_of_common]

# add clustering labels to the top_3
top_3_venues_per_city.insert(0, 'Cluster Labels', (kmeans.labels_))

In [118]:
#view of top_3_venues_with_clusters
top_3_venues_per_city

Unnamed: 0,Cluster Labels,City,American Restaurant,Art Museum,Bakery,Bar,Beer Garden,Bookstore,Brewery,Burger Joint,...,Mexican Restaurant,Monument / Landmark,New American Restaurant,Park,Pizza Place,Sandwich Place,Seafood Restaurant,Vietnamese Restaurant,Yoga Studio,Zoo Exhibit
0,2,Anchorage,2,0,3,3,0,2,2,0,...,2,0,0,5,4,0,6,1,0,0
1,0,Arlington,4,0,2,1,0,2,1,1,...,4,0,1,2,2,5,1,4,0,0
2,1,Baltimore,4,2,2,3,0,1,0,2,...,1,0,1,3,2,0,2,0,0,0
3,1,Boston,0,0,8,0,1,1,1,0,...,2,1,1,8,4,3,4,0,0,0
4,2,Buffalo,3,0,1,7,0,0,8,1,...,1,0,4,1,3,0,1,0,0,0
5,2,Colorado Springs,3,1,1,5,0,1,5,3,...,3,0,3,6,2,3,1,1,0,0
6,0,Corpus Christi,2,0,1,3,0,1,1,7,...,11,0,0,1,1,2,2,2,0,0
7,2,Detroit,3,0,0,3,0,0,5,1,...,0,0,3,5,1,1,1,0,0,0
8,2,Fort Worth,6,2,0,3,0,0,6,2,...,3,0,3,2,2,2,3,1,0,0
9,2,Jacksonville,5,1,1,5,2,0,5,1,...,2,0,1,6,1,4,1,0,0,0


In [136]:
#viewing top 3 venues per cluster
pd.set_option('display.max_columns', None)
grouped_by_cluster_top_3 = top_3_venues_per_city.groupby('Cluster Labels').mean().reset_index()
grouped_by_cluster_top_3

Unnamed: 0,Cluster Labels,American Restaurant,Art Museum,Bakery,Bar,Beer Garden,Bookstore,Brewery,Burger Joint,Café,Coffee Shop,Convenience Store,Hotel,Italian Restaurant,Mexican Restaurant,Monument / Landmark,New American Restaurant,Park,Pizza Place,Sandwich Place,Seafood Restaurant,Vietnamese Restaurant,Yoga Studio,Zoo Exhibit
0,0,2.666667,0.333333,1.0,1.666667,0.0,1.333333,2.0,3.333333,1.0,2.333333,2.666667,1.0,1.333333,9.0,0.0,0.333333,1.333333,1.666667,3.333333,1.0,2.333333,0.333333,0.0
1,1,2.4,2.0,2.8,0.8,0.2,1.8,0.6,0.8,1.2,3.8,0.4,5.4,4.0,1.0,1.8,1.0,4.8,1.8,2.0,2.0,1.2,0.6,0.0
2,2,3.25,1.0,1.916667,3.333333,0.666667,0.5,4.75,1.416667,2.416667,5.833333,0.0,2.916667,1.25,1.916667,0.0,2.0,3.916667,3.0,1.583333,1.5,0.583333,0.333333,0.333333


In [139]:
for cluster in grouped_by_cluster_top_3['Cluster Labels']:
    print("----"+str(cluster)+"----")
    temp = grouped_by_cluster_top_3[grouped_by_cluster_top_3['Cluster Labels'] == cluster].T.reset_index()
    temp.columns = ['category','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----0----
              category  freq
0   Mexican Restaurant  9.00
1         Burger Joint  3.33
2       Sandwich Place  3.33
3  American Restaurant  2.67
4    Convenience Store  2.67


----1----
             category  freq
0               Hotel   5.4
1                Park   4.8
2  Italian Restaurant   4.0
3         Coffee Shop   3.8
4              Bakery   2.8


----2----
              category  freq
0          Coffee Shop  5.83
1              Brewery  4.75
2                 Park  3.92
3                  Bar  3.33
4  American Restaurant  3.25




We can see that cluster 1 had a strong amount of Mexican restaurants.  Cluster 2 was heavy on hotels and parks.  Cluster 3 was heavy on coffee shops and breweries.

### 5. Discussion

#### 5a. Observations

With cluster 1 and 2, it does feel like the location of the cities may be driving the end clustering.

For example, cluster 1 is squarely in the southwest with a strong Hispanic population which may highlight the high number of Mexican restaurants.

Cluster 2 are more coastal cities that see a decent amount of Tourism (New York, Washington D.C.).

I found cluster 3 to be the most interesting as it was a nice collection of liberal and conservative cities.  Seems like regardless of your political leaning, coffee shops, breweries and parks are going to be of interest.  Also, was nice to see the spread across the country from west coast (San Francisco / Oakland to south east (Jacksonville).

#### 5b. Recommendations on results

Ideally, I would pull much more venue data so that I had a great "sample" of venues per location.  One idea I could have implemented was identifying call it, four separate points in a city, pulling the results and then de-duping.  That probalby woud have given me a more representive sample.  In addition, there's probably opportunities to group categories.  For examples, breweries and bars could be grouped.  This could help to simplify the data.

### 6. Conclusion

Referring back to the original hypothesis. The results of cluster 3 seem to indicate that regardless of political leaning, Americans do share certain similar interests (breweries, parks, coffee shops).  It's encouraging to see these similarites to comparing highly liberal cities (San Francisc) vs. highly conservative cities (Tulsa, OK).  I do think it's important that regardless of political leanings, we remember these similarities.

While I do think the data set could have been more robust and there's a host of other factors that tie certain cities together (regional cuisine, location), I'm hopeful that more data would still be suggestive of this larger theme of shared interests.