# Chicago Community Areas Segmented by Income and Population Size: Selecting an Optimal Location for Pop Up Food Pantries (Final Capstone Project Jan 2021)


### Problem and Background

Chicago is the 3rd largest city in the United States with a population of 2,693,976.[1] For statistical and planning purposes, Chicago is divided into 77 distinct community areas.[2]  These areas were first defined by researchers at the University of Chicago in the 1920s and still are used for urban planning initiatives and to compare equity of access to government and public services, in addition to informing commercial decisions (e.g., where to build new housing developments, entertainment, etc.) and other public and private sector analyses.  

During the Covid19 Pandemic, many businesses in Chicago were forced to close by gubernatorial order.  Unemployment in the state sharply increased leaving many people without stable income.  Additionally, riots over police brutality erupted in the summer of 2020.  These riots resulted in extensive damage to small business throughout the city.  Among the businesses permanently closed were grocery stores and affordable restaurants.

Many parts of Chicago are considered food deserts for their lack of affordable healthy grocery shopping options.  Food deserts were already a problem before the riots and pandemic lockdowns and have only gotten worse.[3,3b]  Currently, the demand for food banks or food pantries is higher than ever as many people cannot afford healthy food and rely on volunteer organizations for assistance.[4]  The increased demand for food bank services has also coincided with a decrease in food bank resources as less people have been able to donate money and food. 

Because of this increased strain on food banks, it is more important than ever to make wise use of existing resources.  Stakeholder Traveling food banks, philanthropist organizations and government agencies that aim to reach the maximum number of needy citizens need to go directly to the areas that need it most. The purpose of this project will be to identify the lowest income and most populous communities within the city where temporary food banks can be established to reach the maximum number of neediest citizens.  

This analysis will address two primary research questions:
1.  How are the 77 community areas of Chicago similar with respect to income and population and most popular venues (specifically grocery/restuarant)?
2.  Where should food non-profit orgs establish temporary food banks to serve the most needy people in the city?


### Data Description

To answer the above two research questions, I will use two primary data sources to cluster and segment Chicago communities.  The first is community data released in 2020 from the Chicago Metropolitan Agency for Planning (CMAP).[5] This data set contains many variables related to community population, average income, access to transportation, and more.  The second data service that will be used here is the Foursquare API to explore how these communities differ in access to grocers and restaurants. The data sets will be merged for clustering and segmentation to develop recommendations to identify the best places for temporary, or new, food banks.     

Geocordinates representing the center of each of the 77 Chicago areas were obtained via the Google Maps Platform Geocoding API.  Chicago area names were manually entered into the geocoder at https://developers-dot-devsite-v2-prod.appspot.com/maps/documentation/utils/geocoder to visually confirm area location within Chicago as some Chicago areas have generic names that can be found in other cities, towns and even suburban Chicago areas.  For more information on the tool, see the overview at https://developers.google.com/maps/documentation/geocoding/overview.  The resultant latitude and longitudinal coordinates were saved into a csv file along with the area name.  

Chicago Community Area (CCA) data was downloaded from the CMPA Data Hub at https://datahub.cmap.illinois.gov/dataset/community-data-snapshots-raw-data.  The CMAP Data Hub is operated by the Chicago Metropolitan Agency Data Hub and according to their about statement is “the source for data and information relevant to comprehensive planning in the seven county metropolitan Chicago region.  As the official regional planning organization for northeastern Illinois, CMAP prepares data, analyses, and evaluations on land use, transportation and environmental topics. These activities are critical to providing objective assessments of current and future regional conditions”  

The CCA Profile CSV file contained 77 records (corresponding to each Chicago community area) and 221 columns related to demographic data compiled mostly from the 2010 US Census and July 2020 American Community Data Snapshots.  These fields include median income, median education, age, race and ethnicity, access to transportation, employment and more.  For the purposes of this analysis, the three column Chicago area geocoordinate file was merged with the select columns from the CCA dataset.  The final data set contained:  area name, latitude, longitude, total population, and median income.


### Footnotes

1.  US Census quick facts page: https://www.census.gov/quickfacts/chicagocityillinois
2.  https://en.wikipedia.org/wiki/Community_areas_in_Chicago
3.  https://www.chicagoreporter.com/food-deserts-persist-in-chicago-despite-more-supermarkets/
3b. https://www.chicagotribune.com/coronavirus/ct-life-coronavirus-food-insecurity-tt-20200403-20200403-ytanbm6j75e2fhjitctqhgljay-story.html
4. https://chicago.eater.com/2021/1/20/22231602/chicago-food-deserts-fresh-food-healthy-hood-we-go-us
5. https://datahub.cmap.illinois.gov/dataset/community-data-snapshots-raw-data

#### For a full report including results and discussion, please see the powerpoint presentation in my github directory.

## Procedures

First, all the dependencies that we will need are to be downloaded and installed.

In [21]:
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt # plotting library
# backend for rendering plots within the browser
%matplotlib inline 

# import k-means from clustering stage
from sklearn.cluster import KMeans
from sklearn.datasets.samples_generator import make_blobs

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library


print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


<a id='item1'></a>


### Load and Explore Dataset


There are 77 Chicago Community Areas in Chicago.  In order to segement the areas and explore them, a dataset containing the areas and their latitude and logitude coordinates has been downloaded from the internet. Let's load the data and prepare the data set for clustering. 


In [96]:
#Reading the csv into a _pandas_ dataframe.

CCAP_df = pd.read_csv('FinalCCAP_rev.csv')
CCAP_df.head()

Unnamed: 0,AREANO,GEOG,LAT,LON,TOT_POP,MEDINC
0,1,Albany Park,41.968327,-87.728028,50343,59883
1,2,Archer Heights,41.8079,-87.723585,13055,44109
2,3,Armour Square,41.840755,-87.634019,13779,27464
3,4,Ashburn,41.74969,-87.712007,43986,68464
4,5,Auburn Gresham,41.743377,-87.656199,45271,34661


We do not want to use categorical values for clustering. Essentially, all we want to use is population and income. Extraneous columns will be dropped. 


In [97]:
df = CCAP_df.drop(['AREANO','GEOG','LAT', 'LON' ], axis=1)
df.head()


Unnamed: 0,TOT_POP,MEDINC
0,50343,59883
1,13055,44109
2,13779,27464
3,43986,68464
4,45271,34661


### Clustering Dataset

Now we will normalize the dataset. Normalization is a statistical method that helps mathematical-based algorithms interpret features with different magnitudes and distributions equally. We use **StandardScaler()** to normalize our dataset.

In [98]:
from sklearn.preprocessing import StandardScaler

X = df.values[:,1:]
X = np.nan_to_num(X)
cluster_dataset = StandardScaler().fit_transform(X)
cluster_dataset



array([[ 0.26012795],
       [-0.36647279],
       [-1.02767283],
       [ 0.60099653],
       [-0.74178177],
       [-0.79107881],
       [-0.39570939],
       [ 0.26477562],
       [-0.10199277],
       [ 1.79079909],
       [-0.1094211 ],
       [-0.48298223],
       [-1.10723928],
       [-0.17349533],
       [-0.85431885],
       [-0.78845705],
       [ 0.32535418],
       [-0.86699069],
       [ 0.67166489],
       [-1.21222887],
       [-0.30323275],
       [-0.00602045],
       [ 1.90190612],
       [-1.27352245],
       [ 2.32130834],
       [-1.20817706],
       [-0.48274389],
       [ 0.7175457 ],
       [-0.83366255],
       [-1.04785244],
       [-0.07025358],
       [-0.4101688 ],
       [-0.5855884 ],
       [ 0.03199507],
       [ 0.44742493],
       [ 0.90249898],
       [-0.16578895],
       [ 1.42772499],
       [ 2.20626867],
       [ 0.80585136],
       [ 0.87385824],
       [-0.3229754 ],
       [-0.28690633],
       [-0.31848663],
       [ 0.30577042],
       [ 1

Now we will run our model and group into 9 clusters. The choice of nine is driven by the assumption that high, middle and low income designations will be viewed across highly populated, medium populated and low populated areas. This would result in a 3 x 3 tiered stratification across population and income.  

In [99]:
num_clusters = 9

k_means = KMeans(init="k-means++", n_clusters=num_clusters, n_init=12)
k_means.fit(cluster_dataset)
labels = k_means.labels_

print(labels)

[6 7 5 1 0 0 7 6 3 4 3 7 5 3 0 0 6 0 1 5 7 3 4 5 2 5 7 1 0 5 3 7 7 3 6 1 3
 8 2 1 1 7 7 7 6 4 4 4 8 0 2 5 3 8 5 3 6 7 5 7 7 0 0 0 0 2 3 3 5 3 5 5 3 7
 3 4 5]


Note that each row in our dataset represents a community area, and therefore, each row is assigned a label.

In [102]:
df["Labels"] = labels
df

Unnamed: 0,TOT_POP,MEDINC,Labels
0,50343,59883,6
1,13055,44109,7
2,13779,27464,5
3,43986,68464,1
4,45271,34661,0
5,94762,33420,0
6,9738,43373,7
7,37909,60000,6
8,80648,50767,3
9,20437,98416,4


To check the centroid values, we will average the features in each cluster.

In [101]:
df.groupby('Labels').mean()

Unnamed: 0_level_0,TOT_POP,MEDINC
Labels,Unnamed: 1_level_1,Unnamed: 2_level_1
0,43314.090909,32933.0
1,44473.833333,72518.833333
2,40290.5,110912.0
3,34600.285714,51758.571429
4,41656.5,99167.0
5,16944.230769,24771.384615
6,42383.333333,62224.5
7,28844.142857,42660.0
8,66864.0,84500.0


To begin mapping, let's get the location of Chicago. 

In [103]:
address = 'Chicago, IL'

geolocator = Nominatim(user_agent="chicago_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Chicago are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Chicago are 41.8755616, -87.6244212.


### Note: The below two maps may not be viewed correctly in Github. Please trust the notebook to view images.

##### Map 1: All 77 Community Areas in Chicago

In [122]:

# create map of Chicago using latitude and longitude values
map_chicago = folium.Map(location=[41.8755616, -87.6244212], zoom_start=10)

mks = pd.DataFrame({
'lat':[41.968327, 41.8079, 41.840755, 41.74969, 41.743377, 41.894871, 41.744202, 41.941501, 41.931294, 41.71712, 41.836416, 41.817084, 41.72813, 41.729831, 41.74012, 41.771839, 41.777964, 41.859375, 41.94526, 41.881035, 41.707994, 41.983685, 42.003519, 41.775305, 41.996168, 41.809092, 41.79352, 41.79272, 41.813079, 41.765703, 41.655496, 41.920733, 41.899075, 41.794767, 41.953822, 41.982504, 41.809477, 41.939781, 41.92547, 41.968682, 41.92306, 41.852333, 41.831618,
41.929382, 41.68753, 41.69159, 41.90391, 41.8608, 41.866846, 41.80871, 41.946745, 41.858978, 41.982848, 41.985809, 41.8227, 41.977298, 41.953217, 41.707617, 41.66109, 42.012629, 41.710834, 41.739686, 41.673658, 41.844524,
41.758993, 41.878635, 41.96654, 41.717649, 41.794474, 41.793219, 41.779516, 41.877935, 41.778663, 41.677851, 42.00058, 41.893595, 41.78058],
'lon':[-87.728028, -87.723585, -87.634019, -87.712007, -87.656199, -87.765401, -87.585576, -87.702502, -87.766255, -87.67618, -87.648661, -87.699427, -87.596418, -87.570476, -87.614636, -87.693178, -87.769056, -87.69509, -87.807, -87.701185,
-87.535173, -87.660114, -87.817149, -87.641642, -87.764221, -87.633376, -87.703578, -87.76231, -87.617809, -87.615251, -87.545862, -87.733999, -87.721293, -87.591675, -87.719287, -87.7704, -87.593266, -87.658927, -87.648778, -87.688965,
-87.709291, -87.666016, -87.672907, -87.79816, -87.669165, -87.700805, -87.631463, -87.62572, -87.666409, -87.659941, -87.688257, -87.715202, -87.728443, -87.806912, -87.601357, -87.836891, -87.764558, -87.594188, -87.603805, 
-87.674588, -87.623583, -87.554418, -87.575337, -87.70502, -87.570026, -87.625055, -87.65334, -87.64313, -87.615963, -87.723146, -87.664291, -87.730489, -87.722794, -87.641952, -87.692577, -87.672167, -87.591535],
'name':['Albany Park', 'Archer Heights', 'Armour Square', 'Ashburn', 'Auburn Gresham', 'Austin', 'Avalon Park', 'Avondale', 'Belmont Cragin', 'Beverly', 'Bridgeport', 'Brighton Park', 'Burnside', 'Calumet Heights', 'Chatham', 'Chicago Lawn', 'Clearing', 'Douglas', 'Dunning', 'East Garfield Park', 'East Side', 'Edgewater', 'Edison Park', 'Englewood', 'Forest Glen', 'Fuller Park', 'Gage Park', 'Garfield Ridge', 'Grand Boulevard', 'Greater Grand Crossing', 'Hegewisch', 'Hermosa', 'Humboldt Park', 'Hyde Park', 'Irving Park', 'Jefferson Park', 'Kenwood', 'Lake View', 'Lincoln Park', 'Lincoln Square', 'Logan Square', 'Lower West Side', 'McKinley Park', 'Montclare', 'Morgan Park',
 'Mount Greenwood', 'Near North Side', 'Near South Side', 'Near West Side', 'New City', 'North Center', 'North Lawndale', 'North Park', 'Norwood Park', 'Oakland', 'OHare', 'Portage Park', 'Pullman', 'Riverdale', 'Rogers Park', 'Roseland', 'South Chicago', 'South Deering', 'South Lawndale', 'South Shore', 'The Loop', 'Uptown', 'Washington Heights', 'Washington Park', 'West Elsdon', 'West Englewood', 'West Garfield Park', 'West Lawn', 'West Pullman', 
'West Ridge', 'West Town', 'Woodlawn'],
'ClusterL':[6, 7, 5, 1, 0, 0, 7, 6, 3, 4, 3, 7, 5, 3, 0, 0, 6, 0, 1, 5, 7, 3, 4, 5, 2, 5, 7, 1, 0, 5, 3, 7, 7, 3, 6, 1, 3, 8, 
2, 1, 1, 7, 7, 7, 6, 4, 4, 4, 8, 0, 2, 5, 3, 8, 5, 3, 6, 7, 5, 7, 7, 0, 0, 0, 0, 2, 3, 3, 5, 3, 5, 5, 3, 7, 3, 4, 5,]
})
mks


# add markers to map
for lat, lon, label in zip(mks['lat'], mks['lon'], mks['name']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_chicago)  
    
map_chicago

##### Map 2: Clustered Community Areas

Now we will map the community areas by cluster group. 

In [126]:
# create map
map_chicago = folium.Map(location=[41.8755616, -87.6244212], zoom_start=10)

# set color scheme for the clusters
x = np.arange(9)
ys = [i + x + (i*x)**2 for i in range(9)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, neighborhood, cluster in zip(mks['lat'], mks['lon'], mks['name'], mks['ClusterL']):
    label = folium.Popup(' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_chicago)
       
map_chicago

#### Exporing via Foursquare

Next, we are going to start utilizing the Foursquare API to explore each community area.

In [127]:
# Define Foursquare credentials and version

CLIENT_ID = '5XS1RPSUY5Y5WHJS0KJ0DNIG1PVTYBPYELFDE2GMUMJDOWK2' # your Foursquare ID
CLIENT_SECRET = 'DWS4CUCGTKQEIAAYQ5W4VJ44NN4ORDQPIKVTHQIRKVX1AJNZ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)


Your credentails:
CLIENT_ID: 5XS1RPSUY5Y5WHJS0KJ0DNIG1PVTYBPYELFDE2GMUMJDOWK2
CLIENT_SECRET:DWS4CUCGTKQEIAAYQ5W4VJ44NN4ORDQPIKVTHQIRKVX1AJNZ


In [129]:
#get the neighborhood's name

mks.loc[0, 'name']

'Albany Park'

In [130]:
#get the neighborhood's geocordinates

neighborhood_latitude = mks.loc[0, 'lat'] # neighborhood latitude value
neighborhood_longitude = mks.loc[0, 'lon'] # neighborhood longitude value

neighborhood_name = mks.loc[0, 'name'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Albany Park are 41.968327, -87.728028.


Now let's get the top 50 venues that are in Albany Park within a radius of 500 meters. First, let's create the GET request URL and name it. 

In [131]:
LIMIT = 50 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=5XS1RPSUY5Y5WHJS0KJ0DNIG1PVTYBPYELFDE2GMUMJDOWK2&client_secret=DWS4CUCGTKQEIAAYQ5W4VJ44NN4ORDQPIKVTHQIRKVX1AJNZ&v=20180605&ll=41.968327,-87.728028&radius=500&limit=50'

In [132]:
#Sending the GET request and examining results.

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '60173c6f99d3e66bf4afe7c2'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'},
    {'name': '$-$$$$', 'key': 'price'}]},
  'headerLocation': 'Albany Park',
  'headerFullLocation': 'Albany Park, Chicago',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 30,
  'suggestedBounds': {'ne': {'lat': 41.972827004500004,
    'lng': -87.72198695865384},
   'sw': {'lat': 41.9638269955, 'lng': -87.73406904134615}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b5afeaaf964a520e7dd28e3',
       'name': 'Starbucks',
       'location': {'address': '4830 N Pulaski Rd',
        'lat': 41.96891084604478,
        'lng': -87.72881662266137,
        'labeledLatLngs': [{'labe

In [134]:
# Borrow the function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
print('done.')

done.


In [135]:
#Now we clean the json and structure it into a pandas dataframe.

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  """


Unnamed: 0,name,categories,lat,lng
0,Starbucks,Coffee Shop,41.968911,-87.728817
1,Lawrence Fish Market,Seafood Restaurant,41.96828,-87.72625
2,Ssyal Korean Restaurant and Ginseng House,Korean Restaurant,41.968172,-87.733207
3,Dunkin',Donut Shop,41.967933,-87.729699
4,Gamblers,Dive Bar,41.970398,-87.728289


In [136]:
#To determine number of venues returned by Foursquare.

print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

30 venues were returned by Foursquare.


Now I will create a function to repeat the process for all neighborhoods in Chicago.  

In [137]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [139]:
# Now write the code to run this function on each hood in a new dataframe.

chicago_venues = getNearbyVenues(names=mks['name'],
                                   latitudes=mks['lat'],
                                   longitudes=mks['lon']
                                  )

Albany Park
Archer Heights
Armour Square
Ashburn
Auburn Gresham
Austin
Avalon Park
Avondale
Belmont Cragin
Beverly
Bridgeport
Brighton Park
Burnside
Calumet Heights
Chatham
Chicago Lawn
Clearing
Douglas
Dunning
East Garfield Park
East Side
Edgewater
Edison Park
Englewood
Forest Glen
Fuller Park
Gage Park
Garfield Ridge
Grand Boulevard
Greater Grand Crossing
Hegewisch
Hermosa
Humboldt Park
Hyde Park
Irving Park
Jefferson Park
Kenwood
Lake View
Lincoln Park
Lincoln Square
Logan Square
Lower West Side
McKinley Park
Montclare
Morgan Park
Mount Greenwood
Near North Side
Near South Side
Near West Side
New City
North Center
North Lawndale
North Park
Norwood Park
Oakland
OHare
Portage Park
Pullman
Riverdale
Rogers Park
Roseland
South Chicago
South Deering
South Lawndale
South Shore
The Loop
Uptown
Washington Heights
Washington Park
West Elsdon
West Englewood
West Garfield Park
West Lawn
West Pullman
West Ridge
West Town
Woodlawn


In [165]:
print(chicago_venues.shape)
chicago_venues.head()

(1400, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Albany Park,41.968327,-87.728028,Starbucks,41.968911,-87.728817,Coffee Shop
1,Albany Park,41.968327,-87.728028,Lawrence Fish Market,41.96828,-87.72625,Seafood Restaurant
2,Albany Park,41.968327,-87.728028,Ssyal Korean Restaurant and Ginseng House,41.968172,-87.733207,Korean Restaurant
3,Albany Park,41.968327,-87.728028,Dunkin',41.967933,-87.729699,Donut Shop
4,Albany Park,41.968327,-87.728028,Gamblers,41.970398,-87.728289,Dive Bar


In [166]:
# Checking to see how many venues were returned for each neighborhood.

chicago_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Albany Park,30,30,30,30,30,30
Archer Heights,22,22,22,22,22,22
Armour Square,11,11,11,11,11,11
Ashburn,4,4,4,4,4,4
Auburn Gresham,4,4,4,4,4,4
Austin,9,9,9,9,9,9
Avalon Park,12,12,12,12,12,12
Avondale,36,36,36,36,36,36
Belmont Cragin,14,14,14,14,14,14
Beverly,16,16,16,16,16,16


In [167]:
# Finding out how many unique categories can be curated from all returned venues.

print('There are {} uniques categories.'.format(len(chicago_venues['Venue Category'].unique())))

There are 226 uniques categories.


### Analyze the neighborhood venues

In [168]:
# one hot encoding
chicago_onehot = pd.get_dummies(chicago_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
chicago_onehot['Neighborhood'] = chicago_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [chicago_onehot.columns[-1]] + list(chicago_onehot.columns[:-1])
chicago_onehot = chicago_onehot[fixed_columns]

#Newchicago_onehot.head()

In [169]:
# New dataframe size

chicago_onehot.shape


(1400, 227)

In [170]:
# Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

chicago_grouped = chicago_onehot.groupby('Neighborhood').mean().reset_index()
chicago_grouped



Unnamed: 0,Neighborhood,ATM,African Restaurant,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beach,Bed & Breakfast,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bistro,Board Shop,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Bus Line,Bus Station,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Chinese Restaurant,Circus,Clothing Store,Cocktail Bar,Coffee Shop,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cuban Restaurant,Currency Exchange,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Donut Shop,Eastern European Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Film Studio,Fish & Chips Shop,Flower Shop,Food,Food & Drink Shop,Food Truck,Football Stadium,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Halal Restaurant,Hardware Store,Health & Beauty Service,Heliport,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Insurance Office,Intersection,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Karaoke Bar,Kitchen Supply Store,Korean Restaurant,Latin American Restaurant,Laundromat,Lawyer,Library,Light Rail Station,Liquor Store,Locksmith,Lounge,Market,Martial Arts School,Massage Studio,Mattress Store,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Museum,Music Venue,Nail Salon,Nature Preserve,New American Restaurant,Night Market,Nightclub,Noodle House,Optical Shop,Other Great Outdoors,Outdoor Sculpture,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Pet Service,Pet Store,Pharmacy,Pie Shop,Pizza Place,Playground,Plaza,Poke Place,Polish Restaurant,Portuguese Restaurant,Post Office,Pub,Record Shop,Recreation Center,Rental Car Location,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Shopping Plaza,Smoke Shop,Snack Place,Soccer Field,South American Restaurant,Southern / Soul Food Restaurant,Souvenir Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Tour Provider,Track,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Weight Loss Center,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Albany Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Archer Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Armour Square,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.272727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Ashburn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Auburn Gresham,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Austin,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Avalon Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Avondale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.027778,0.027778,0.0,0.0,0.0,0.0,0.0,0.027778,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.027778,0.027778,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.027778,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Belmont Cragin,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Beverly,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0


In [171]:
# checking size

chicago_grouped.shape

(74, 227)

In [172]:
# printing each neighborhood with the top 5 common venues

num_top_venues = 5

for hood in chicago_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = chicago_grouped[chicago_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Albany Park----
                venue  freq
0  Mexican Restaurant  0.10
1         Pizza Place  0.10
2   Korean Restaurant  0.10
3          Hookah Bar  0.07
4          Donut Shop  0.03


----Archer Heights----
                venue  freq
0  Mexican Restaurant  0.18
1                Food  0.09
2              Bakery  0.09
3                Bank  0.05
4         Pizza Place  0.05


----Armour Square----
                venue  freq
0  Chinese Restaurant  0.27
1    Asian Restaurant  0.18
2       Grocery Store  0.09
3       Hot Dog Joint  0.09
4  Italian Restaurant  0.09


----Ashburn----
                 venue  freq
0   Mexican Restaurant  0.25
1  Fried Chicken Joint  0.25
2       Cosmetics Shop  0.25
3   Light Rail Station  0.25
4                  ATM  0.00


----Auburn Gresham----
                  venue  freq
0                  Park  0.50
1        Discount Store  0.25
2      Basketball Court  0.25
3                   ATM  0.00
4  Other Great Outdoors  0.00


----Austin----
            v

In [163]:
# Writing a function to put this in a dataframe. First sorting in descending order.

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [173]:
# New dataframe to display top 10 venues in each neighborhood.

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = chicago_grouped['Neighborhood']

for ind in np.arange(chicago_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(chicago_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Albany Park,Pizza Place,Mexican Restaurant,Korean Restaurant,Hookah Bar,Bank,Bus Station,Fried Chicken Joint,Seafood Restaurant,Mobile Phone Shop,Sandwich Place
1,Archer Heights,Mexican Restaurant,Bakery,Food,Hot Dog Joint,Pizza Place,Nightclub,Rental Car Location,Restaurant,Seafood Restaurant,Mobile Phone Shop
2,Armour Square,Chinese Restaurant,Asian Restaurant,Hot Dog Joint,Gas Station,Grocery Store,Italian Restaurant,Sandwich Place,Cosmetics Shop,Dog Run,Fast Food Restaurant
3,Ashburn,Mexican Restaurant,Fried Chicken Joint,Cosmetics Shop,Light Rail Station,Fast Food Restaurant,French Restaurant,Fountain,Football Stadium,Food Truck,Food & Drink Shop
4,Auburn Gresham,Park,Discount Store,Basketball Court,Yoga Studio,Fast Food Restaurant,French Restaurant,Fountain,Football Stadium,Food Truck,Food & Drink Shop
5,Austin,Breakfast Spot,Discount Store,Cosmetics Shop,Food,Café,BBQ Joint,Pizza Place,Dog Run,Donut Shop,French Restaurant
6,Avalon Park,Pizza Place,Burger Joint,Boutique,BBQ Joint,Food,Cajun / Creole Restaurant,Fast Food Restaurant,Sandwich Place,Grocery Store,Diner
7,Avondale,Coffee Shop,Food Truck,Mexican Restaurant,Fast Food Restaurant,Supplement Shop,Big Box Store,Pub,Korean Restaurant,Gaming Cafe,Record Shop
8,Belmont Cragin,Mexican Restaurant,Grocery Store,Department Store,Mobile Phone Shop,Laundromat,Thrift / Vintage Store,Gas Station,BBQ Joint,Nightclub,Discount Store
9,Beverly,Pizza Place,Breakfast Spot,Boutique,Burger Joint,Shopping Mall,Juice Bar,Grocery Store,Convenience Store,Cosmetics Shop,Italian Restaurant


## This marks the end of the notebook.