### Lowering the crime rate in St. Louis Missouri with Machine Learning

#### Business Problem

In 2018, the crime rate in St. Louis, Missouri was approximately 3 times higher than the U.S. Average, with 187 homocides in 2018 and 205 homocides in 2017. As a result the St. Louis city council and police department would be very interested in trying to find new ways to lower the city's crime rate and make their city a safer place to live. 

One suggestion is to regulate the types of businesses that populate each neighborhood if a relationship between business type and crime could be found. The core concept rooted in this idea is that the types of venues in a neighborhood can influence local crime. The goal for this project will be to determine if St Louis neighborhoods have a relationship between crime type and business type so that the city council and police department can take appropriate action with regulations and make the city safer. 

In order to accomplish this, the neighborhoods in St. Louis will be clustered using machine learning methods from data on crime and business type. The clusters will then be displayed on a map and analyzed to determine the key factors associated in neighborhoods with higher versus lower crime rates.


#### Data

The data that will be used to execute this project is publicly available crime data from St. Louis Missouri, along with data via Foursquare API on the venues in the neighborhoods in Missouri. The crime data will contain all recorded crimes over the past year from November 2018 to November 2019. It is important to get a sample of a full year of data to account for outliers produced in seasonal crime differences based on a variety of factors such as weather and holidays. The datasets will be combined to show the different business types in the neighborhoods and the different crimes per neighborhood. Once the dataset is prepared, machine learning clustering will be used to cluster the neighborhoods and further analysis will be applied to the clusters to determine the outcomes.

#### Code

The first step is to import the data and input it into dataframes. The Foursquare data is from their API. The St. Louis crime data is from their city website. I downloaded the crime data for every month November 2018 - November 2019 and then combined all the datasets in excel into one large dataset for the year. The dataset did not contain the neighborhood names, only the neighborhood codes. I needed to place the neighborhood names and codes into a separate dataset in excel. I got those names by code in a pdf file on the St. Louis website. 

First I will import the necessary packages for this project.

In [5]:
import requests
#I need to install lxml
!conda install -c anaconda lxml --yes
import lxml.html as lh
import pandas as pd
import numpy as np

Solving environment: done


  current version: 4.5.11
  latest version: 4.8.0

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - lxml


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2019.11.28         |           py36_0         156 KB  anaconda

The following packages will be UPDATED:

    certifi: 2019.11.28-py36_0 --> 2019.11.28-py36_0 anaconda
    openssl: 1.1.1d-h7b6447c_3 --> 1.1.1-h7b6447c_0  anaconda


Downloading and Extracting Packages
certifi-2019.11.28   | 156 KB    | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done


In [6]:
crimes = pd.read_csv('St.Louis_Crime_Data.CSV')
crimes.head()

Unnamed: 0,Complaint,CodedMonth,DateOccur,FlagCrime,FlagUnfounded,FlagAdministrative,Count,FlagCleanup,Crime,District,Description,ILEADSAddress,ILEADSStreet,Neighborhood,LocationName,LocationComment,CADAddress,CADStreet,XCoord,YCoord
0,19-056796,2019-11,1/1/1987 0:01,Y,,,1,,21000,4,RAPE -- FORCIBLE,900,N JEFFERSON AVE,59,,@CENTRAL PATROL DIVISION,900,JEFFERSON,0.0,0.0
1,19-056649,2019-11,1/1/1990 0:01,Y,,,1,,21000,1,RAPE -- FORCIBLE,4700,S BROADWAY,17,,,1900,OLIVE,0.0,0.0
2,19-059156,2019-11,1/1/2017 12:00,Y,,,1,,115400,6,STLG BY DECEIT/IDENTITY THEFT REPORT,4441,ELMBANK AVE,56,,,4441,ELMBANK,894487.9,1032053.0
3,19-057856,2019-11,1/1/2018 12:00,Y,,,1,,121000,6,EMBEZZLEMENT-VALUE OVER $150,4101,GERALDINE AVE,70,,,4101,GERALDINE,890309.1,1037199.0
4,19-057961,2019-11,1/20/2019 8:30,Y,,,1,,65701,1,LARCENY-MTR VEH PARTS UNDER $500,7807,S BROADWAY,2,,,7807,BROADWAY,888858.2,987721.4


In [7]:
neighborhoods = pd.read_csv('St.Louis_Neighborhoods.csv')
neighborhoods.head()

Unnamed: 0,Neighborhood Number,Neighborhood Name
0,1,Cardonlet
1,2,Patch
2,3,Holly Hills
3,4,Boulevard Heights
4,5,Bevo Mill


In [8]:
#rename the primary key neighborhood number to neighborhood in preparation for merge
neighborhoods = neighborhoods.rename(columns={'Neighborhood Number ': 'Neighborhood'})
neighborhoods.head()

Unnamed: 0,Neighborhood,Neighborhood Name
0,1,Cardonlet
1,2,Patch
2,3,Holly Hills
3,4,Boulevard Heights
4,5,Bevo Mill


In [9]:

df_louis = pd.merge(crimes,
                 neighborhoods[[ 'Neighborhood','Neighborhood Name']],
                 on='Neighborhood')
df_louis.head()

Unnamed: 0,Complaint,CodedMonth,DateOccur,FlagCrime,FlagUnfounded,FlagAdministrative,Count,FlagCleanup,Crime,District,...,ILEADSAddress,ILEADSStreet,Neighborhood,LocationName,LocationComment,CADAddress,CADStreet,XCoord,YCoord,Neighborhood Name
0,19-056796,2019-11,1/1/1987 0:01,Y,,,1,,21000,4,...,900,N JEFFERSON AVE,59,,@CENTRAL PATROL DIVISION,900,JEFFERSON,0.0,0.0,Jeff Vanderlou
1,19-049995,2019-11,10/5/2019 19:25,Y,,,1,,95100,4,...,2944,SHERIDAN AVE,59,,CITY STREET,2944,SHERIDAN,900398.8,1023611.0,Jeff Vanderlou
2,19-054335,2019-11,10/28/2019 6:00,Y,,,1,,51312,4,...,2914,THOMAS ST,59,,,2914,THOMAS,900662.7,1023294.0,Jeff Vanderlou
3,19-055103,2019-11,10/28/2019 23:30,Y,,,1,,67701,4,...,2917,JAMES COOL PAPA BELL AVE,59,,,2917,JAMES COOL PAPA BELL,900589.0,1023158.0,Jeff Vanderlou
4,19-055311,2019-11,10/31/2019 3:00,Y,,,1,,21000,4,...,2500,N GRAND BLVD,59,,@HOTEL- GRAND MOTEL,2500,GRAND,0.0,0.0,Jeff Vanderlou


Next I will add the number of crimes per neighborhood to the dataset

In [10]:
#get number of crimes
crime_number= df_louis.groupby('Neighborhood').count()
crime_number.head()

Unnamed: 0_level_0,Complaint,CodedMonth,DateOccur,FlagCrime,FlagUnfounded,FlagAdministrative,Count,FlagCleanup,Crime,District,Description,ILEADSAddress,ILEADSStreet,LocationName,LocationComment,CADAddress,CADStreet,XCoord,YCoord,Neighborhood Name
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
1,1869,1869,1869,1869,1869,1869,1869,1869,1869,1869,1869,1869,1869,1869,1869,1869,1869,1869,1869,1869
2,621,621,621,621,621,621,621,621,621,621,621,621,621,621,621,621,621,621,621,621
3,305,305,305,305,305,305,305,305,305,305,305,305,305,305,305,305,305,305,305,305
4,508,508,508,508,508,508,508,508,508,508,508,508,508,508,508,508,508,508,508,508
5,1276,1276,1276,1276,1276,1276,1276,1276,1276,1276,1276,1276,1276,1276,1276,1276,1276,1276,1276,1276


In [11]:
#need to reset the index in preparation for a merge
crime2= crime_number.reset_index() 
crime2.head()

Unnamed: 0,Neighborhood,Complaint,CodedMonth,DateOccur,FlagCrime,FlagUnfounded,FlagAdministrative,Count,FlagCleanup,Crime,...,Description,ILEADSAddress,ILEADSStreet,LocationName,LocationComment,CADAddress,CADStreet,XCoord,YCoord,Neighborhood Name
0,1,1869,1869,1869,1869,1869,1869,1869,1869,1869,...,1869,1869,1869,1869,1869,1869,1869,1869,1869,1869
1,2,621,621,621,621,621,621,621,621,621,...,621,621,621,621,621,621,621,621,621,621
2,3,305,305,305,305,305,305,305,305,305,...,305,305,305,305,305,305,305,305,305,305
3,4,508,508,508,508,508,508,508,508,508,...,508,508,508,508,508,508,508,508,508,508
4,5,1276,1276,1276,1276,1276,1276,1276,1276,1276,...,1276,1276,1276,1276,1276,1276,1276,1276,1276,1276


In [12]:
df_louis2 = pd.merge(df_louis,
                 crime2[[ 'Neighborhood','Crime']],
                 on='Neighborhood')
df_louis2.head()

Unnamed: 0,Complaint,CodedMonth,DateOccur,FlagCrime,FlagUnfounded,FlagAdministrative,Count,FlagCleanup,Crime_x,District,...,ILEADSStreet,Neighborhood,LocationName,LocationComment,CADAddress,CADStreet,XCoord,YCoord,Neighborhood Name,Crime_y
0,19-056796,2019-11,1/1/1987 0:01,Y,,,1,,21000,4,...,N JEFFERSON AVE,59,,@CENTRAL PATROL DIVISION,900,JEFFERSON,0.0,0.0,Jeff Vanderlou,1230
1,19-049995,2019-11,10/5/2019 19:25,Y,,,1,,95100,4,...,SHERIDAN AVE,59,,CITY STREET,2944,SHERIDAN,900398.8,1023611.0,Jeff Vanderlou,1230
2,19-054335,2019-11,10/28/2019 6:00,Y,,,1,,51312,4,...,THOMAS ST,59,,,2914,THOMAS,900662.7,1023294.0,Jeff Vanderlou,1230
3,19-055103,2019-11,10/28/2019 23:30,Y,,,1,,67701,4,...,JAMES COOL PAPA BELL AVE,59,,,2917,JAMES COOL PAPA BELL,900589.0,1023158.0,Jeff Vanderlou,1230
4,19-055311,2019-11,10/31/2019 3:00,Y,,,1,,21000,4,...,N GRAND BLVD,59,,@HOTEL- GRAND MOTEL,2500,GRAND,0.0,0.0,Jeff Vanderlou,1230


Now that we have the number of crimes per neighborhood we can just get the list of neighborhoods without listing every crime. I will not dive into crime type for this analysis, though that is a good step for a later analysis.

In [13]:
#get shape for comparison in next code bucket
df_louis2.shape

(52164, 22)

In [14]:
#remove duplicates
df_louis3= df_louis2.drop_duplicates('Neighborhood')
df_louis3.head()

Unnamed: 0,Complaint,CodedMonth,DateOccur,FlagCrime,FlagUnfounded,FlagAdministrative,Count,FlagCleanup,Crime_x,District,...,ILEADSStreet,Neighborhood,LocationName,LocationComment,CADAddress,CADStreet,XCoord,YCoord,Neighborhood Name,Crime_y
0,19-056796,2019-11,1/1/1987 0:01,Y,,,1,,21000,4,...,N JEFFERSON AVE,59,,@CENTRAL PATROL DIVISION,900,JEFFERSON,0.0,0.0,Jeff Vanderlou,1230
1230,19-056649,2019-11,1/1/1990 0:01,Y,,,1,,21000,1,...,S BROADWAY,17,,,1900,OLIVE,0.0,0.0,Mount Pleasant,824
2054,19-059156,2019-11,1/1/2017 12:00,Y,,,1,,115400,6,...,ELMBANK AVE,56,,,4441,ELMBANK,894487.9,1032053.0,The Great Ville,1017
3071,19-057856,2019-11,1/1/2018 12:00,Y,,,1,,121000,6,...,GERALDINE AVE,70,,,4101,GERALDINE,890309.1,1037199.0,Mark Twain I-70 Ind.,499
3570,19-057961,2019-11,1/20/2019 8:30,Y,,,1,,65701,1,...,S BROADWAY,2,,,7807,BROADWAY,888858.2,987721.4,Patch,621


In [15]:
#get shape for comparison in to original dataset
df_louis3.shape

(86, 22)

There are 86 neighborhoods so that means we now how a dataset with neighborhood, name, coordinates, and number of crimes. Next I want to get a dataset with just the crime description and the neighborhood fo clustering later on.

In [16]:

df_st_louis=df_louis.drop([ 'Complaint', 'CodedMonth','DateOccur','FlagCrime', 'FlagUnfounded', 'FlagAdministrative', 'Count'],axis=1)
df_st_louis.head()

Unnamed: 0,FlagCleanup,Crime,District,Description,ILEADSAddress,ILEADSStreet,Neighborhood,LocationName,LocationComment,CADAddress,CADStreet,XCoord,YCoord,Neighborhood Name
0,,21000,4,RAPE -- FORCIBLE,900,N JEFFERSON AVE,59,,@CENTRAL PATROL DIVISION,900,JEFFERSON,0.0,0.0,Jeff Vanderlou
1,,95100,4,"STALKING (HARASSMENT ONLY, NO THREAT)",2944,SHERIDAN AVE,59,,CITY STREET,2944,SHERIDAN,900398.8,1023611.0,Jeff Vanderlou
2,,51312,4,BURGLARY-RESDNCE/UNK TIM/FORC ENT/UNOCCUPIED,2914,THOMAS ST,59,,,2914,THOMAS,900662.7,1023294.0,Jeff Vanderlou
3,,67701,4,LARCENY-FROM BUILDING UNDER $500,2917,JAMES COOL PAPA BELL AVE,59,,,2917,JAMES COOL PAPA BELL,900589.0,1023158.0,Jeff Vanderlou
4,,21000,4,RAPE -- FORCIBLE,2500,N GRAND BLVD,59,,@HOTEL- GRAND MOTEL,2500,GRAND,0.0,0.0,Jeff Vanderlou


In [17]:
df_st_louis2=df_st_louis.drop([ 'FlagCleanup','Crime', 'District', 'ILEADSAddress','ILEADSStreet', 'LocationName', 'LocationComment', 'CADAddress','CADStreet','XCoord','YCoord'],axis=1)
df_st_louis2.head()

Unnamed: 0,Description,Neighborhood,Neighborhood Name
0,RAPE -- FORCIBLE,59,Jeff Vanderlou
1,"STALKING (HARASSMENT ONLY, NO THREAT)",59,Jeff Vanderlou
2,BURGLARY-RESDNCE/UNK TIM/FORC ENT/UNOCCUPIED,59,Jeff Vanderlou
3,LARCENY-FROM BUILDING UNDER $500,59,Jeff Vanderlou
4,RAPE -- FORCIBLE,59,Jeff Vanderlou


In [18]:
df_st_louis3= df_st_louis2.drop(['Neighborhood'],axis=1)
df_st_louis3.head(1)

Unnamed: 0,Description,Neighborhood Name
0,RAPE -- FORCIBLE,Jeff Vanderlou


Next I will visualize the neighborhoods and import the Foursquare API data and then move into clustering. I will need to import a few more packages for that.

In [19]:
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import json # library to handle JSON files
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Solving environment: done


  current version: 4.5.11
  latest version: 4.8.0

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge

The following packages will be UPDATED:

    certifi: 2019.11.28-py36_0 anaconda --> 2019.11.28-py36_0 conda-forge

The following packages will be DOWNGRADED:

    openssl: 1.1.1-h7b6447c_0  anaconda --> 1.1.1d-h516909a_0 conda-forge


Downloading and Extracting Packages
certifi-2019.11.28   | 149 KB    | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Solving environment: done


  current version: 4.5.11
  lates

In [20]:
address = 'St. Louis, Missouri'

geolocator = Nominatim(user_agent="st_louis_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of St. Louis are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of St. Louis are 38.6268039, -90.1994097.


To visualize the neighborhoods we'll need to add in their location data. This data will be imported in a separate dataset which I collected via google.

In [21]:
louis_location = pd.read_csv('louis_location.csv')
l_location= louis_location.rename(columns={'Neighborhood Number ': 'Neighborhood'})
l_location.head()

Unnamed: 0,Neighborhood,Neighborhood Name,Latitude,Longitude
0,1,Cardonlet,38.560977,-90.251337
1,2,Patch,38.547032,-90.26136
2,3,Holly Hills,38.569257,-90.261762
3,4,Boulevard Heights,38.56226,-90.280278
4,5,Bevo Mill,38.585982,-90.267534


Now I'll merge the dasets.

In [22]:
#need to reset the index in preparation for a merge
df_louis_merge= df_louis3.reset_index()
#df_louis_merge = df_louis3.drop(['level_0', 'index'], axis=1)
df_louis_merge.head()

Unnamed: 0,index,Complaint,CodedMonth,DateOccur,FlagCrime,FlagUnfounded,FlagAdministrative,Count,FlagCleanup,Crime_x,...,ILEADSStreet,Neighborhood,LocationName,LocationComment,CADAddress,CADStreet,XCoord,YCoord,Neighborhood Name,Crime_y
0,0,19-056796,2019-11,1/1/1987 0:01,Y,,,1,,21000,...,N JEFFERSON AVE,59,,@CENTRAL PATROL DIVISION,900,JEFFERSON,0.0,0.0,Jeff Vanderlou,1230
1,1230,19-056649,2019-11,1/1/1990 0:01,Y,,,1,,21000,...,S BROADWAY,17,,,1900,OLIVE,0.0,0.0,Mount Pleasant,824
2,2054,19-059156,2019-11,1/1/2017 12:00,Y,,,1,,115400,...,ELMBANK AVE,56,,,4441,ELMBANK,894487.9,1032053.0,The Great Ville,1017
3,3071,19-057856,2019-11,1/1/2018 12:00,Y,,,1,,121000,...,GERALDINE AVE,70,,,4101,GERALDINE,890309.1,1037199.0,Mark Twain I-70 Ind.,499
4,3570,19-057961,2019-11,1/20/2019 8:30,Y,,,1,,65701,...,S BROADWAY,2,,,7807,BROADWAY,888858.2,987721.4,Patch,621


In [23]:
df_stlouis = pd.merge(df_louis_merge,
                 l_location[[ 'Neighborhood','Latitude', 'Longitude']],
                 on='Neighborhood')
df_stlouis.head()

Unnamed: 0,index,Complaint,CodedMonth,DateOccur,FlagCrime,FlagUnfounded,FlagAdministrative,Count,FlagCleanup,Crime_x,...,LocationName,LocationComment,CADAddress,CADStreet,XCoord,YCoord,Neighborhood Name,Crime_y,Latitude,Longitude
0,0,19-056796,2019-11,1/1/1987 0:01,Y,,,1,,21000,...,,@CENTRAL PATROL DIVISION,900,JEFFERSON,0.0,0.0,Jeff Vanderlou,1230,38.652831,-90.221036
1,1230,19-056649,2019-11,1/1/1990 0:01,Y,,,1,,21000,...,,,1900,OLIVE,0.0,0.0,Mount Pleasant,824,38.575178,-90.235219
2,2054,19-059156,2019-11,1/1/2017 12:00,Y,,,1,,115400,...,,,4441,ELMBANK,894487.9,1032053.0,The Great Ville,1017,38.666156,-90.235429
3,3071,19-057856,2019-11,1/1/2018 12:00,Y,,,1,,121000,...,,,4101,GERALDINE,890309.1,1037199.0,Mark Twain I-70 Ind.,499,38.686092,-90.261059
4,3570,19-057961,2019-11,1/20/2019 8:30,Y,,,1,,65701,...,,,7807,BROADWAY,888858.2,987721.4,Patch,621,38.547032,-90.26136


Before Importing Foursquare data I will first visualize the neighborhoods.

In [24]:
print('The geograpical coordinates of St. Louis are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of St. Louis are 38.6268039, -90.1994097.


In [25]:
# create map of St. Louis using latitude and longitude values
map_st_louis = folium.Map(location=[latitude, longitude], zoom_start=11)

#add markers to map
for lat, lng, label in zip(df_stlouis['Latitude'], df_stlouis['Longitude'], df_stlouis['Neighborhood Name']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_st_louis)  
    
map_st_louis



Now that the neighborhoods have been successfully visualized, let's input the foursquare venue data.

In [26]:
CLIENT_ID = 'ZTHXBVFLF30SOA4CR21SYOCX50C4O452POKT5Y1PDXL1FHSQ' # your Foursquare ID
CLIENT_SECRET = 'J01LSJTD30T4UMNX24024NG5EWMU3K3RCM1BTDUPHFPAWIKY' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ZTHXBVFLF30SOA4CR21SYOCX50C4O452POKT5Y1PDXL1FHSQ
CLIENT_SECRET:J01LSJTD30T4UMNX24024NG5EWMU3K3RCM1BTDUPHFPAWIKY


In [29]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        LIMIT = 100 # limit of number of venues returned by Foursquare API

        radius = 500 # define radius
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [30]:
stlouis_venues = getNearbyVenues(names=df_stlouis['Neighborhood Name'],
                                   latitudes=df_stlouis['Latitude'],
                                   longitudes=df_stlouis['Longitude'])

Jeff Vanderlou
Mount Pleasant
The Great Ville
Mark Twain I-70 Ind.
Patch
Downtown
Central West End
Academy
Dutchtown
Bevo Mill
Fairgrounds Park
Midtown
Mark Twain
Tower Grove South
Peabody-Darst-Webbe
St. Louis Hills
Carr Square
The Hill
Vandeventer
Baden
Cardonlet
Columbus Square
Old North St. Louis
Tiffany
South Hampton
Tower Grove East
Walnut Park West
Forest Park
Walnut Park East
Wells-Goodfellow
Penrose
Kingsway West
College Hill
Lewis Place
Lasalle
Kings Oak
Forest Park SE
Marine Villa
Benton Park West
Holly Hills
DeBaliviere Place
Cheltenham
Clifton Heights
North Riverfront
Gravois Park
Kingsway East
North Point
North Hampton
O'Fallon
Downtown West
Near N.  Riverfront
Kosciusko
McKinley Heights
The Gate District
Shaw
O'Fallon Park
Hamilton Heights
Lindenwood Park
Princeton Heights
Ellendale
Southwest Garden
Riverview
The Ville
Fountain Park
Soulard
St. Louis Place
Hyde Park
Clayton-Tamm
Visitation Park
Boulevard Heights
Hi-Point
West End
Fairground Neighborhood
Benton Park
Compt

Next let's check the dimensions of the data and a little of its content

In [31]:
print(stlouis_venues.shape)
stlouis_venues.head()

(869, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Jeff Vanderlou,38.652831,-90.221036,I.P.,38.654319,-90.221679,Nightlife Spot
1,Jeff Vanderlou,38.652831,-90.221036,Imperial Palace,38.653498,-90.222586,Nightlife Spot
2,Jeff Vanderlou,38.652831,-90.221036,J & B Package Licquor,38.652741,-90.222605,Liquor Store
3,Jeff Vanderlou,38.652831,-90.221036,Mother's Fish,38.654816,-90.22107,Fish & Chips Shop
4,Jeff Vanderlou,38.652831,-90.221036,JJS WAREHOUSE,38.649858,-90.219435,Furniture / Home Store


Now let's find venue number per neighborhood

In [32]:
stlouis_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Academy,4,4,4,4,4,4
Baden,4,4,4,4,4,4
Benton Park,26,26,26,26,26,26
Benton Park West,18,18,18,18,18,18
Bevo Mill,9,9,9,9,9,9
...,...,...,...,...,...,...
Walnut Park East,6,6,6,6,6,6
Walnut Park West,1,1,1,1,1,1
Wells-Goodfellow,4,4,4,4,4,4
West End,5,5,5,5,5,5


Next will be neighborhood analysis with one hot encoding

In [33]:
# one hot encoding
stlouis_onehot = pd.get_dummies(stlouis_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
stlouis_onehot['Neighborhood'] = stlouis_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [stlouis_onehot.columns[-1]] + list(stlouis_onehot.columns[:-1])
stlouis_onehot = stlouis_onehot[fixed_columns]

stlouis_onehot.head()

Unnamed: 0,Neighborhood,ATM,Advertising Agency,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Video Game Store,Video Store,Vietnamese Restaurant,Waste Facility,Water Park,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Jeff Vanderlou,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Jeff Vanderlou,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Jeff Vanderlou,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Jeff Vanderlou,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Jeff Vanderlou,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [34]:
stlouis_onehot.shape

(869, 180)

In [44]:
#find frequency of venue appearance
stlouis_grouped = stlouis_onehot.groupby('Neighborhood').mean().reset_index()
stlouis_grouped

Unnamed: 0,Neighborhood,ATM,Advertising Agency,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Video Game Store,Video Store,Vietnamese Restaurant,Waste Facility,Water Park,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Academy,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.0,...,0.0,0.25,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000
1,Baden,0.0,0.0,0.250000,0.0,0.000000,0.000000,0.0,0.0,0.0,...,0.0,0.00,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000
2,Benton Park,0.0,0.0,0.038462,0.0,0.000000,0.038462,0.0,0.0,0.0,...,0.0,0.00,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.000000
3,Benton Park West,0.0,0.0,0.000000,0.0,0.000000,0.055556,0.0,0.0,0.0,...,0.0,0.00,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.055556
4,Bevo Mill,0.0,0.0,0.000000,0.0,0.111111,0.000000,0.0,0.0,0.0,...,0.0,0.00,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
77,Walnut Park East,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.0,...,0.0,0.00,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000
78,Walnut Park West,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.0,...,0.0,0.00,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000
79,Wells-Goodfellow,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.0,...,0.0,0.00,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000
80,West End,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.0,...,0.0,0.00,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000


Now we can examine the top 5 venues by frequency.

In [36]:
num_top_venues = 5

for hood in stlouis_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = stlouis_grouped[stlouis_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Academy----
                   venue  freq
0     Chinese Restaurant  0.25
1      Convenience Store  0.25
2  Outdoors & Recreation  0.25
3            Video Store  0.25
4              Pet Store  0.00


----Baden----
                 venue  freq
0  American Restaurant  0.25
1       Discount Store  0.25
2          Pizza Place  0.25
3        Grocery Store  0.25
4                  ATM  0.00


----Benton Park----
                  venue  freq
0                   Bar  0.08
1  Fast Food Restaurant  0.08
2        Breakfast Spot  0.08
3              Dive Bar  0.08
4        Sandwich Place  0.08


----Benton Park West----
                venue  freq
0  Mexican Restaurant  0.33
1         Pizza Place  0.11
2             Gay Bar  0.06
3                 Bar  0.06
4              Bakery  0.06


----Bevo Mill----
                venue  freq
0          Restaurant  0.22
1         Rugby Pitch  0.11
2              Arcade  0.11
3  Italian Restaurant  0.11
4      Discount Store  0.11


----Boulevard Heights

Clearly, the neighborhoods differ significantly based on the types of venues that are in them. This lends support to the analytical puropose of this project as we continue.

In [37]:
#placing data in dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [38]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = stlouis_grouped['Neighborhood']

for ind in np.arange(stlouis_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(stlouis_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Academy,Convenience Store,Video Store,Chinese Restaurant,Outdoors & Recreation,Yoga Studio,Event Space,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Festival
1,Baden,American Restaurant,Discount Store,Pizza Place,Grocery Store,Yoga Studio,Falafel Restaurant,Food,Flower Shop,Fish & Chips Shop,Filipino Restaurant
2,Benton Park,Fast Food Restaurant,Breakfast Spot,Sandwich Place,Dive Bar,Bar,Brazilian Restaurant,BBQ Joint,Performing Arts Venue,Sports Bar,Seafood Restaurant
3,Benton Park West,Mexican Restaurant,Pizza Place,Yoga Studio,Intersection,Taco Place,Locksmith,Bar,Gay Bar,Bakery,Convenience Store
4,Bevo Mill,Restaurant,Mexican Restaurant,Arcade,Rugby Pitch,Discount Store,Italian Restaurant,Lounge,German Restaurant,Falafel Restaurant,Flower Shop


I will examine the data to make sure data is as we would expect.

In [39]:
neighborhoods_venues_sorted.shape

(82, 11)

It looks like a few neighborhoods didn't have venues, an that is okay. The neighborhoods that do have venues will be the ones used in this analysis. Some of the venues that were originally included were parks and cemeteries, so this is expected.

Now, before I cluster the neighborhoods, I need to do the same steps above to the crime dataset to get frequency of crimes per neighborhood.


In [46]:
st_louis_crimes=df_st_louis3.rename(columns={'Description':'Crime','Neighborhood Name':'Neighborhood'})
st_louis_crimes.head(1)

Unnamed: 0,Crime,Neighborhood
0,RAPE -- FORCIBLE,Jeff Vanderlou


In [47]:
# one hot encoding
stlouis_onehot2 = pd.get_dummies(st_louis_crimes[['Crime']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
stlouis_onehot2['Neighborhood'] = st_louis_crimes['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [stlouis_onehot2.columns[-1]] + list(stlouis_onehot2.columns[:-1])
stlouis_onehot2 = stlouis_onehot2[fixed_columns]

stlouis_onehot2.head()

Unnamed: 0,Neighborhood,ABANDONMENT OF A CORPSE,AGG.ASSAULT-FIREARM/CITIZEN ADULT 1ST DEGREE,AGG.ASSAULT-FIREARM/CITIZEN ADULT 2ND DEGREE,AGG.ASSAULT-FIREARM/CITIZEN ADULT 3RD DEGREE,AGG.ASSAULT-FIREARM/CITIZEN CHILD 1ST DEGREE,AGG.ASSAULT-FIREARM/CITIZEN CHILD 2ND DEGREE,AGG.ASSAULT-FIREARM/CITIZEN CHILD 3RD DEGREE,AGG.ASSAULT-FIREARM/POLC.OFFICER 1ST DEGREE,AGG.ASSAULT-FIREARM/POLC.OFFICER 3RD DEGREE,...,"STALKING (HARASSMENT ONLY, NO THREAT)",STLG BY DECEIT/IDENTITY THEFT REPORT,"STOLEN PROPERTY-BUYING,RECEIVING,POSSESSING,ET",WEAPONS-CITY VIOL/DISCHRGING IN CITY,WEAPONS-OTHR UNSPEC WEAPONS VIOLATION,WEAPONS-STATE VIOL/DEFACED FIREARM,WEAPONS-STATE VIOL/UNLWFL TRANSFER,WEAPONS-STATE VIOL/UNLWFL USE/CARRYING,WEAPONS-STATE VIOL/UNLWFL USE/DISCHARGING,WEAPONS-STATE VIOL/UNLWFL USE/POSSESSION
0,Jeff Vanderlou,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Jeff Vanderlou,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
2,Jeff Vanderlou,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Jeff Vanderlou,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Jeff Vanderlou,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [48]:
stlouis_onehot2.shape

(52164, 299)

In [51]:
#find frequency of crime appearance
stlouis_crime_grouped = stlouis_onehot2.groupby('Neighborhood').mean().reset_index()
stlouis_crime_grouped.head(2)

Unnamed: 0,Neighborhood,ABANDONMENT OF A CORPSE,AGG.ASSAULT-FIREARM/CITIZEN ADULT 1ST DEGREE,AGG.ASSAULT-FIREARM/CITIZEN ADULT 2ND DEGREE,AGG.ASSAULT-FIREARM/CITIZEN ADULT 3RD DEGREE,AGG.ASSAULT-FIREARM/CITIZEN CHILD 1ST DEGREE,AGG.ASSAULT-FIREARM/CITIZEN CHILD 2ND DEGREE,AGG.ASSAULT-FIREARM/CITIZEN CHILD 3RD DEGREE,AGG.ASSAULT-FIREARM/POLC.OFFICER 1ST DEGREE,AGG.ASSAULT-FIREARM/POLC.OFFICER 3RD DEGREE,...,"STALKING (HARASSMENT ONLY, NO THREAT)",STLG BY DECEIT/IDENTITY THEFT REPORT,"STOLEN PROPERTY-BUYING,RECEIVING,POSSESSING,ET",WEAPONS-CITY VIOL/DISCHRGING IN CITY,WEAPONS-OTHR UNSPEC WEAPONS VIOLATION,WEAPONS-STATE VIOL/DEFACED FIREARM,WEAPONS-STATE VIOL/UNLWFL TRANSFER,WEAPONS-STATE VIOL/UNLWFL USE/CARRYING,WEAPONS-STATE VIOL/UNLWFL USE/DISCHARGING,WEAPONS-STATE VIOL/UNLWFL USE/POSSESSION
0,Academy,0.0,0.032727,0.001818,0.016364,0.001818,0.0,0.003636,0.0,0.0,...,0.0,0.0,0.043636,0.005455,0.0,0.001818,0.0,0.010909,0.0,0.018182
1,Baden,0.0,0.056469,0.0,0.006433,0.009292,0.0,0.0,0.00143,0.000715,...,0.000715,0.005718,0.013581,0.00143,0.0,0.0,0.0,0.003574,0.00143,0.010722


Now we examine top 9 crimes by frequency.

In [50]:
num_top_crimes = 5

for hood in stlouis_crime_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = stlouis_crime_grouped[stlouis_crime_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['crime','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Academy----
                                            crime  freq
0                       LEAVING SCENE OF ACCIDENT  0.10
1     DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP  0.09
2                LARCENY-MTR VEH PARTS UNDER $500  0.05
3  STOLEN PROPERTY-BUYING,RECEIVING,POSSESSING,ET  0.04
4    AGG.ASSAULT-FIREARM/CITIZEN ADULT 1ST DEGREE  0.03
5                 LARCENY-FROM MTR VEH UNDER $500  0.03
6          ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC  0.03
7         AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR  0.03
8                        DRUGS-POSSESSION/COCAINE  0.02
9                     LARCENY-SHOPLIFT UNDER $500  0.02


----Baden----
                                          crime  freq
0                     LEAVING SCENE OF ACCIDENT  0.12
1   DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP  0.08
2  AGG.ASSAULT-FIREARM/CITIZEN ADULT 1ST DEGREE  0.06
3       AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR  0.05
4              LARCENY-MTR VEH PARTS UNDER $500  0.05
5        ASSAULT, ADULT, AGE

In [52]:
#placing data in dataframe
def return_most_common_crimes(row, num_top_crimes):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_crimes]

In [53]:
num_top_crimes = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Crime'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Crime'.format(ind+1))

# create a new dataframe
neighborhoods_crimes_sorted = pd.DataFrame(columns=columns)
neighborhoods_crimes_sorted['Neighborhood'] = stlouis_crime_grouped['Neighborhood']

for ind in np.arange(stlouis_grouped.shape[0]):
    neighborhoods_crimes_sorted.iloc[ind, 1:] = return_most_common_crimes(stlouis_crime_grouped.iloc[ind, :], num_top_crimes)

neighborhoods_crimes_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Crime,2nd Most Common Crime,3rd Most Common Crime,4th Most Common Crime,5th Most Common Crime,6th Most Common Crime,7th Most Common Crime,8th Most Common Crime,9th Most Common Crime,10th Most Common Crime
0,Academy,LEAVING SCENE OF ACCIDENT,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,LARCENY-MTR VEH PARTS UNDER $500,"STOLEN PROPERTY-BUYING,RECEIVING,POSSESSING,ET",AGG.ASSAULT-FIREARM/CITIZEN ADULT 1ST DEGREE,LARCENY-FROM MTR VEH UNDER $500,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,"ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC",DISORDERLY CONDUCT-PEACE DSTRB/INDIVIDUAL,LARCENY-SHOPLIFT UNDER $500
1,Baden,LEAVING SCENE OF ACCIDENT,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,AGG.ASSAULT-FIREARM/CITIZEN ADULT 1ST DEGREE,LARCENY-MTR VEH PARTS UNDER $500,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,LARCENY-ALL OTHER UNDER $500,HEALTH-SANITATION VIOL,"ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC",LARCENY-FROM MTR VEH UNDER $500,PUBLIC ORDER-OTHR UNSPC PBLC ORDER VIOLATION
2,Benton Park,LEAVING SCENE OF ACCIDENT,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,LARCENY-ALL OTHER UNDER $500,LARCENY-FROM MTR VEH UNDER $500,LARCENY-MTR VEH PARTS UNDER $500,"LARCENY-FROM MTR VEH $500 - $24,999",SIMPLE ASSAULT-ADULT/NO INJURY,"LARCENY-ALL OTHER $500 - $24,999",AUTO THEFT-PERM RETNT/JOY RIDE
3,Benton Park West,LEAVING SCENE OF ACCIDENT,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,LARCENY-FROM MTR VEH UNDER $500,LARCENY-MTR VEH PARTS UNDER $500,BURGLARY-RESDNCE/UNK TIM/FORC ENT/UNOCCUPIED,LARCENY-ALL OTHER UNDER $500,"ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC",AUTO THEFT-PERM RETNT/JOY RIDE,LARCENY-FROM BUILDING UNDER $500
4,Bevo Mill,LEAVING SCENE OF ACCIDENT,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,LARCENY-MTR VEH PARTS UNDER $500,LARCENY-ALL OTHER UNDER $500,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,"ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC",DISORDERLY CONDUCT-PEACE DSTRB/INDIVIDUAL,LARCENY-FROM MTR VEH UNDER $500,LARCENY-FROM BUILDING UNDER $500,SIMPLE ASSAULT-ADULT/NO INJURY


Let's examine this new dataset.

In [54]:
neighborhoods_crimes_sorted.shape

(86, 11)

We now have investigated the qualitative results for the top venues and crimes in each neighborhood. This is not specifically needed for clustering, but it will be helpful when the analysis section comes and I am analyzing my clusters.

Now, I'll merge crime frequency by neighborhood with the St. Louis frequency of venues dataset for clustering.

In [55]:
stlouis_crime_grouped.head(1)

Unnamed: 0,Neighborhood,ABANDONMENT OF A CORPSE,AGG.ASSAULT-FIREARM/CITIZEN ADULT 1ST DEGREE,AGG.ASSAULT-FIREARM/CITIZEN ADULT 2ND DEGREE,AGG.ASSAULT-FIREARM/CITIZEN ADULT 3RD DEGREE,AGG.ASSAULT-FIREARM/CITIZEN CHILD 1ST DEGREE,AGG.ASSAULT-FIREARM/CITIZEN CHILD 2ND DEGREE,AGG.ASSAULT-FIREARM/CITIZEN CHILD 3RD DEGREE,AGG.ASSAULT-FIREARM/POLC.OFFICER 1ST DEGREE,AGG.ASSAULT-FIREARM/POLC.OFFICER 3RD DEGREE,...,"STALKING (HARASSMENT ONLY, NO THREAT)",STLG BY DECEIT/IDENTITY THEFT REPORT,"STOLEN PROPERTY-BUYING,RECEIVING,POSSESSING,ET",WEAPONS-CITY VIOL/DISCHRGING IN CITY,WEAPONS-OTHR UNSPEC WEAPONS VIOLATION,WEAPONS-STATE VIOL/DEFACED FIREARM,WEAPONS-STATE VIOL/UNLWFL TRANSFER,WEAPONS-STATE VIOL/UNLWFL USE/CARRYING,WEAPONS-STATE VIOL/UNLWFL USE/DISCHARGING,WEAPONS-STATE VIOL/UNLWFL USE/POSSESSION
0,Academy,0.0,0.032727,0.001818,0.016364,0.001818,0.0,0.003636,0.0,0.0,...,0.0,0.0,0.043636,0.005455,0.0,0.001818,0.0,0.010909,0.0,0.018182


In [56]:
stlouis_grouped.head(1)


Unnamed: 0,Neighborhood,ATM,Advertising Agency,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Video Game Store,Video Store,Vietnamese Restaurant,Waste Facility,Water Park,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Academy,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [65]:
neighborhoods_clusters = pd.merge(stlouis_grouped, stlouis_crime_grouped, on = ['Neighborhood'], how = 'outer')


neighborhoods_clusters

Unnamed: 0,Neighborhood,ATM,Advertising Agency,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,"STALKING (HARASSMENT ONLY, NO THREAT)",STLG BY DECEIT/IDENTITY THEFT REPORT,"STOLEN PROPERTY-BUYING,RECEIVING,POSSESSING,ET",WEAPONS-CITY VIOL/DISCHRGING IN CITY,WEAPONS-OTHR UNSPEC WEAPONS VIOLATION,WEAPONS-STATE VIOL/DEFACED FIREARM,WEAPONS-STATE VIOL/UNLWFL TRANSFER,WEAPONS-STATE VIOL/UNLWFL USE/CARRYING,WEAPONS-STATE VIOL/UNLWFL USE/DISCHARGING,WEAPONS-STATE VIOL/UNLWFL USE/POSSESSION
0,Academy,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.0,...,0.000000,0.000000,0.043636,0.005455,0.0,0.001818,0.0,0.010909,0.000000,0.018182
1,Baden,0.0,0.0,0.250000,0.0,0.000000,0.000000,0.0,0.0,0.0,...,0.000715,0.005718,0.013581,0.001430,0.0,0.000000,0.0,0.003574,0.001430,0.010722
2,Benton Park,0.0,0.0,0.038462,0.0,0.000000,0.038462,0.0,0.0,0.0,...,0.000000,0.010152,0.012690,0.007614,0.0,0.000000,0.0,0.002538,0.000000,0.010152
3,Benton Park West,0.0,0.0,0.000000,0.0,0.000000,0.055556,0.0,0.0,0.0,...,0.000000,0.006873,0.005155,0.012027,0.0,0.000000,0.0,0.003436,0.000000,0.003436
4,Bevo Mill,0.0,0.0,0.000000,0.0,0.111111,0.000000,0.0,0.0,0.0,...,0.000000,0.010972,0.006270,0.003135,0.0,0.000784,0.0,0.000784,0.000784,0.001567
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
81,Wydown-Skinker,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000
82,Calvary-Bellefontaine Cemetaries,,,,,,,,,,...,0.000000,0.000000,0.019608,0.000000,0.0,0.000000,0.0,0.000000,0.019608,0.039216
83,Mark Twain I-70 Ind.,,,,,,,,,,...,0.000000,0.000000,0.026052,0.000000,0.0,0.000000,0.0,0.010020,0.002004,0.006012
84,Riverview,,,,,,,,,,...,0.000000,0.000000,0.011236,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000


I need to remove the neighborhoods that were not in both the venue and crime datasets.

In [68]:
neighborhoods_clusters.shape

(86, 478)

I'll want the shape to be 82, 478.

In [69]:
neighborhoods_clusters=neighborhoods_clusters.drop([82,83,84,85])
neighborhoods_clusters.shape

(82, 478)

Now that crime data has been added, and there are no NaN values, the neighborhoods will be placed into different clusters. To determine how many clusters, I'll use the kmeans elbow method.

In [70]:
# set number of clusters
kclusters = 5
neighborhoods_clusters = neighborhoods_clusters.drop('Neighborhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(neighborhoods_clusters)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 0, 3, 3, 3, 3, 0, 3, 1, 3], dtype=int32)

Now that we have the clusters we can place them alongside the top 10 venues set.

In [74]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

neighborhoods_venues_sorted.head()

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,3,Academy,Convenience Store,Video Store,Chinese Restaurant,Outdoors & Recreation,Yoga Studio,Event Space,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Festival
1,0,Baden,American Restaurant,Discount Store,Pizza Place,Grocery Store,Yoga Studio,Falafel Restaurant,Food,Flower Shop,Fish & Chips Shop,Filipino Restaurant
2,3,Benton Park,Fast Food Restaurant,Breakfast Spot,Sandwich Place,Dive Bar,Bar,Brazilian Restaurant,BBQ Joint,Performing Arts Venue,Sports Bar,Seafood Restaurant
3,3,Benton Park West,Mexican Restaurant,Pizza Place,Yoga Studio,Intersection,Taco Place,Locksmith,Bar,Gay Bar,Bakery,Convenience Store
4,3,Bevo Mill,Restaurant,Mexican Restaurant,Arcade,Rugby Pitch,Discount Store,Italian Restaurant,Lounge,German Restaurant,Falafel Restaurant,Flower Shop


In [79]:
# add crimes
crimes_clusters = pd.merge(neighborhoods_crimes_sorted,
                 neighborhoods_venues_sorted[[ 'Neighborhood','Cluster Labels']],
                 on='Neighborhood')
crimes_clusters.head()


Unnamed: 0,Neighborhood,1st Most Common Crime,2nd Most Common Crime,3rd Most Common Crime,4th Most Common Crime,5th Most Common Crime,6th Most Common Crime,7th Most Common Crime,8th Most Common Crime,9th Most Common Crime,10th Most Common Crime,Cluster Labels
0,Academy,LEAVING SCENE OF ACCIDENT,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,LARCENY-MTR VEH PARTS UNDER $500,"STOLEN PROPERTY-BUYING,RECEIVING,POSSESSING,ET",AGG.ASSAULT-FIREARM/CITIZEN ADULT 1ST DEGREE,LARCENY-FROM MTR VEH UNDER $500,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,"ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC",DISORDERLY CONDUCT-PEACE DSTRB/INDIVIDUAL,LARCENY-SHOPLIFT UNDER $500,3
1,Baden,LEAVING SCENE OF ACCIDENT,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,AGG.ASSAULT-FIREARM/CITIZEN ADULT 1ST DEGREE,LARCENY-MTR VEH PARTS UNDER $500,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,LARCENY-ALL OTHER UNDER $500,HEALTH-SANITATION VIOL,"ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC",LARCENY-FROM MTR VEH UNDER $500,PUBLIC ORDER-OTHR UNSPC PBLC ORDER VIOLATION,0
2,Benton Park,LEAVING SCENE OF ACCIDENT,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,LARCENY-ALL OTHER UNDER $500,LARCENY-FROM MTR VEH UNDER $500,LARCENY-MTR VEH PARTS UNDER $500,"LARCENY-FROM MTR VEH $500 - $24,999",SIMPLE ASSAULT-ADULT/NO INJURY,"LARCENY-ALL OTHER $500 - $24,999",AUTO THEFT-PERM RETNT/JOY RIDE,3
3,Benton Park West,LEAVING SCENE OF ACCIDENT,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,LARCENY-FROM MTR VEH UNDER $500,LARCENY-MTR VEH PARTS UNDER $500,BURGLARY-RESDNCE/UNK TIM/FORC ENT/UNOCCUPIED,LARCENY-ALL OTHER UNDER $500,"ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC",AUTO THEFT-PERM RETNT/JOY RIDE,LARCENY-FROM BUILDING UNDER $500,3
4,Bevo Mill,LEAVING SCENE OF ACCIDENT,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,LARCENY-MTR VEH PARTS UNDER $500,LARCENY-ALL OTHER UNDER $500,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,"ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC",DISORDERLY CONDUCT-PEACE DSTRB/INDIVIDUAL,LARCENY-FROM MTR VEH UNDER $500,LARCENY-FROM BUILDING UNDER $500,SIMPLE ASSAULT-ADULT/NO INJURY,3


Now I have a data set with clustered neighborhoods along with the associated top venues. I want to add in crime data for reference as well as latitude and longitude for mapping.

In [80]:

df_st_louis=df_stlouis.drop(['index', 'Complaint', 'CodedMonth','DateOccur', 'FlagCrime', 'FlagUnfounded','FlagAdministrative', 'Count', 'FlagCleanup','Crime_x'],axis=1)
df_st_louis.head()

Unnamed: 0,District,Description,ILEADSAddress,ILEADSStreet,Neighborhood,LocationName,LocationComment,CADAddress,CADStreet,XCoord,YCoord,Neighborhood Name,Crime_y,Latitude,Longitude
0,4,RAPE -- FORCIBLE,900,N JEFFERSON AVE,59,,@CENTRAL PATROL DIVISION,900,JEFFERSON,0.0,0.0,Jeff Vanderlou,1230,38.652831,-90.221036
1,1,RAPE -- FORCIBLE,4700,S BROADWAY,17,,,1900,OLIVE,0.0,0.0,Mount Pleasant,824,38.575178,-90.235219
2,6,STLG BY DECEIT/IDENTITY THEFT REPORT,4441,ELMBANK AVE,56,,,4441,ELMBANK,894487.9,1032053.0,The Great Ville,1017,38.666156,-90.235429
3,6,EMBEZZLEMENT-VALUE OVER $150,4101,GERALDINE AVE,70,,,4101,GERALDINE,890309.1,1037199.0,Mark Twain I-70 Ind.,499,38.686092,-90.261059
4,1,LARCENY-MTR VEH PARTS UNDER $500,7807,S BROADWAY,2,,,7807,BROADWAY,888858.2,987721.4,Patch,621,38.547032,-90.26136


In [81]:
df_st_louis=df_st_louis.drop(['District', 'Description', 'ILEADSAddress','ILEADSStreet', 'Neighborhood', 'LocationName','LocationComment', 'CADAddress', 'CADStreet','XCoord', 'YCoord'],axis=1)
df_st_louis.head()

Unnamed: 0,Neighborhood Name,Crime_y,Latitude,Longitude
0,Jeff Vanderlou,1230,38.652831,-90.221036
1,Mount Pleasant,824,38.575178,-90.235219
2,The Great Ville,1017,38.666156,-90.235429
3,Mark Twain I-70 Ind.,499,38.686092,-90.261059
4,Patch,621,38.547032,-90.26136


In [82]:
df_st_louis=df_st_louis.rename(columns={'Neighborhood Name': 'Neighborhood'})
df_st_louis.head()

Unnamed: 0,Neighborhood,Crime_y,Latitude,Longitude
0,Jeff Vanderlou,1230,38.652831,-90.221036
1,Mount Pleasant,824,38.575178,-90.235219
2,The Great Ville,1017,38.666156,-90.235429
3,Mark Twain I-70 Ind.,499,38.686092,-90.261059
4,Patch,621,38.547032,-90.26136


The above steps were to clean up the number of crimes dataset to make it more manageable.

In [83]:
st_louis_mapping = pd.merge(neighborhoods_venues_sorted,
                  df_st_louis[[ 'Neighborhood','Latitude', 'Longitude','Crime_y']],
                 on='Neighborhood')
st_louis_mapping.head()

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude,Crime_y
0,3,Academy,Convenience Store,Video Store,Chinese Restaurant,Outdoors & Recreation,Yoga Studio,Event Space,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Festival,38.658148,-90.267234,550
1,0,Baden,American Restaurant,Discount Store,Pizza Place,Grocery Store,Yoga Studio,Falafel Restaurant,Food,Flower Shop,Fish & Chips Shop,Filipino Restaurant,38.719842,-90.235129,1399
2,3,Benton Park,Fast Food Restaurant,Breakfast Spot,Sandwich Place,Dive Bar,Bar,Brazilian Restaurant,BBQ Joint,Performing Arts Venue,Sports Bar,Seafood Restaurant,38.601895,-90.219104,394
3,3,Benton Park West,Mexican Restaurant,Pizza Place,Yoga Studio,Intersection,Taco Place,Locksmith,Bar,Gay Bar,Bakery,Convenience Store,38.598109,-90.22949,582
4,3,Bevo Mill,Restaurant,Mexican Restaurant,Arcade,Rugby Pitch,Discount Store,Italian Restaurant,Lounge,German Restaurant,Falafel Restaurant,Flower Shop,38.585982,-90.267534,1276


Okay Now we will visualize the clusters on a map.

In [84]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(st_louis_mapping['Latitude'], st_louis_mapping['Longitude'], st_louis_mapping['Neighborhood'], st_louis_mapping['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In order to see how the crime data played into the clustering, the first step we'll take is to overlap shading onto the map based on crime values.

First we'll need a couple more packages:

In [85]:
from folium import plugins
import matplotlib.pyplot as plt
import seaborn as sns
from folium.plugins import HeatMap

%matplotlib inline

In [86]:
max_crime = float(st_louis_mapping['Crime_y'].max())

heat_map = HeatMap( list(zip(st_louis_mapping.Latitude.values, st_louis_mapping.Longitude.values, st_louis_mapping.Crime_y.values)),
                   min_opacity=0.2,
                   max_val=max_crime,
                   radius=17, blur=15, 
                   max_zoom=1, 
                 )

#folium.GeoJson(district23).add_to(hmap)
map_clusters.add_child(heat_map)

We can see above that the clusters definitely took crime into effect, as we see downtown St. Louis is its own cluster and has significantly more crimes occur there than anywhere else We can also see some of the other clusters have a similar color level on the heat map. Now that we have that laid out, we can dive into that statistics on the individual clusters for some further analysis.

Cluster 0

In [87]:
cluster_0= st_louis_mapping.loc[st_louis_mapping['Cluster Labels'] == 0, st_louis_mapping.columns[[1] + list(range(2, st_louis_mapping.shape[1]))]]
cluster_0

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude,Crime_y
1,Baden,American Restaurant,Discount Store,Pizza Place,Grocery Store,Yoga Studio,Falafel Restaurant,Food,Flower Shop,Fish & Chips Shop,Filipino Restaurant,38.719842,-90.235129,1399
6,Cardonlet,Museum,Dive Bar,Scenic Lookout,Bar,Falafel Restaurant,Food,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Festival,38.560977,-90.251337,1869
13,College Hill,Smoke Shop,Bar,Grocery Store,Yoga Studio,Falafel Restaurant,Food,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Festival,38.674335,-90.2095,407
14,Columbus Square,Pizza Place,Hardware Store,Dive Bar,Bar,Café,Yoga Studio,Farmers Market,Food,Flower Shop,Fish & Chips Shop,38.637204,-90.190333,549
22,Fairground Neighborhood,Convenience Store,Discount Store,Chinese Restaurant,Lounge,Bar,Yoga Studio,Fast Food Restaurant,Food & Drink Shop,Food,Flower Shop,38.667502,-90.21572,467
30,Hamilton Heights,American Restaurant,Food,Gas Station,Yoga Studio,Event Space,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Festival,Fast Food Restaurant,38.669223,-90.278341,718
32,Holly Hills,Food,American Restaurant,Playground,Garden,Bar,Yoga Studio,Falafel Restaurant,Flower Shop,Fish & Chips Shop,Filipino Restaurant,38.569257,-90.261762,305
41,Lewis Place,Cosmetics Shop,American Restaurant,Event Space,Food & Drink Shop,Food,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Festival,Fast Food Restaurant,38.655252,-90.252142,282
42,Lindenwood Park,Bar,Yoga Studio,Italian Restaurant,Pizza Place,Dive Bar,Pet Store,Flower Shop,Furniture / Home Store,Mexican Restaurant,Dessert Shop,38.600936,-90.303774,549
48,Mount Pleasant,Pub,American Restaurant,Discount Store,Grocery Store,Event Space,Food,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Festival,38.575178,-90.235219,824


In [89]:
cluster_0_crimes= crimes_clusters.loc[crimes_clusters['Cluster Labels'] == 0, crimes_clusters.columns[[1] + list(range(2, crimes_clusters.shape[1]))]]
cluster_0_crimes

Unnamed: 0,1st Most Common Crime,2nd Most Common Crime,3rd Most Common Crime,4th Most Common Crime,5th Most Common Crime,6th Most Common Crime,7th Most Common Crime,8th Most Common Crime,9th Most Common Crime,10th Most Common Crime,Cluster Labels
1,LEAVING SCENE OF ACCIDENT,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,AGG.ASSAULT-FIREARM/CITIZEN ADULT 1ST DEGREE,LARCENY-MTR VEH PARTS UNDER $500,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,LARCENY-ALL OTHER UNDER $500,HEALTH-SANITATION VIOL,"ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC",LARCENY-FROM MTR VEH UNDER $500,PUBLIC ORDER-OTHR UNSPC PBLC ORDER VIOLATION,0
6,LEAVING SCENE OF ACCIDENT,LOITERING-BEGGING,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,LARCENY-SHOPLIFT UNDER $500,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,PUBLIC ORDER-OTHR UNSPC PBLC ORDER VIOLATION,LARCENY-MTR VEH PARTS UNDER $500,BURGLARY-RESDNCE/UNK TIM/FORC ENT/UNOCCUPIED,"ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC",SIMPLE ASSAULT-ADULT/NO INJURY,0
13,LEAVING SCENE OF ACCIDENT,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,AGG.ASSAULT-FIREARM/CITIZEN ADULT 1ST DEGREE,DRUGS-POSSESSION/MARIJUANA,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,"STOLEN PROPERTY-BUYING,RECEIVING,POSSESSING,ET",WEAPONS-STATE VIOL/UNLWFL USE/POSSESSION,"ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC",PUBLIC ORDER-OTHR UNSPC PBLC ORDER VIOLATION,ARSON-KNOWINGLY BURNING/SUCCESS,0
14,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,LOITERING-BEGGING,LEAVING SCENE OF ACCIDENT,PUBLIC ORDER-OTHR UNSPC PBLC ORDER VIOLATION,LARCENY-FROM MTR VEH UNDER $500,SIMPLE ASSAULT-ADULT/NO INJURY,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,"ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC",LARCENY-FROM MTR VEH UNDER $500 /ATTEMPT,AGG.ASSAULT-FIREARM/CITIZEN ADULT 1ST DEGREE,0
22,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,LEAVING SCENE OF ACCIDENT,PUBLIC ORDER-OTHR UNSPC PBLC ORDER VIOLATION,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,AGG.ASSAULT-FIREARM/CITIZEN ADULT 1ST DEGREE,"STOLEN PROPERTY-BUYING,RECEIVING,POSSESSING,ET",LARCENY-FROM MTR VEH UNDER $500,LARCENY-ALL OTHER UNDER $500,LARCENY-SHOPLIFT UNDER $500,DRUGS-POSSESSION/MARIJUANA,0
30,LEAVING SCENE OF ACCIDENT,AGG.ASSAULT-FIREARM/CITIZEN ADULT 1ST DEGREE,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,"STOLEN PROPERTY-BUYING,RECEIVING,POSSESSING,ET",AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,OBSTRUCT GOVRN OP-JUDIC/OTHR UNSPC JUDIC VIOL,DRUGS-POSSESSION/COCAINE,DRUGS-POSSESSION/OTHR UNSPEC DRUG,LARCENY-MTR VEH PARTS UNDER $500,LARCENY-ALL OTHER UNDER $500,0
32,LARCENY-FROM MTR VEH UNDER $500,LARCENY-MTR VEH PARTS UNDER $500,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,"ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC",LEAVING SCENE OF ACCIDENT,PUBLIC ORDER-OTHR UNSPC PBLC ORDER VIOLATION,BURGLARY-RESDNCE/UNK TIM/FORC ENT/UNOCCUPIED,"LARCENY-FROM MTR VEH $500 - $24,999",BURGLARY-RESDNCE/UNK TIM/UNLW EN/UNOCCUPIED,0
41,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,AGG.ASSAULT-FIREARM/CITIZEN ADULT 1ST DEGREE,LEAVING SCENE OF ACCIDENT,PUBLIC ORDER-OTHR UNSPC PBLC ORDER VIOLATION,LARCENY-MTR VEH PARTS UNDER $500,"ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC",AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,LARCENY-FROM MTR VEH UNDER $500,HOMICIDE,DISORDERLY CONDUCT-PEACE DSTRB/INDIVIDUAL,0
42,LEAVING SCENE OF ACCIDENT,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,LARCENY-FROM MTR VEH UNDER $500,LARCENY-ALL OTHER UNDER $500,LARCENY-MTR VEH PARTS UNDER $500,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,DISORDERLY CONDUCT-PEACE DSTRB/INDIVIDUAL,LARCENY-FROM MTR VEH UNDER $500 /ATTEMPT,LARCENY-SHOPLIFT UNDER $500,LARCENY-FROM BUILDING UNDER $500,0
48,LEAVING SCENE OF ACCIDENT,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,LARCENY-MTR VEH PARTS UNDER $500,DISORDERLY CONDUCT-PEACE DSTRB/INDIVIDUAL,"ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC",PUBLIC ORDER-OTHR UNSPC PBLC ORDER VIOLATION,LARCENY-FROM BUILDING UNDER $500,LOITERING-BEGGING,BURGLARY-RESDNCE/UNK TIM/FORC ENT/UNOCCUPIED,0


In [88]:
cluster_0.describe()

Unnamed: 0,Latitude,Longitude,Crime_y
count,17.0,17.0,17.0
mean,38.636038,-90.234714,636.117647
std,0.051699,0.032016,409.91598
min,38.547032,-90.303774,282.0
25%,38.600936,-90.252142,444.0
50%,38.650537,-90.235219,481.0
75%,38.667502,-90.2095,646.0
max,38.719842,-90.190333,1869.0


Cluster 1

In [90]:
cluster_1= st_louis_mapping.loc[st_louis_mapping['Cluster Labels'] == 1, st_louis_mapping.columns[[1] + list(range(2, st_louis_mapping.shape[1]))]]
cluster_1

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude,Crime_y
8,Carr Square,Grocery Store,Park,Yoga Studio,Event Space,Food,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Festival,Fast Food Restaurant,38.641037,-90.204004,489
23,Fairgrounds Park,Park,Electronics Store,Yoga Studio,Food,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Festival,Fast Food Restaurant,Farmers Market,38.665229,-90.223556,45
24,Forest Park,Park,Theater,American Restaurant,Cricket Ground,Playground,Soccer Field,Tennis Court,Scenic Lookout,Dog Run,Trail,38.639992,-90.28308,238
33,Hyde Park,Park,Tour Provider,New American Restaurant,Deli / Bodega,Diner,Farmers Market,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Design Studio,38.664073,-90.204427,589
58,Penrose,Park,Food,Gas Station,Yoga Studio,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Festival,Fast Food Restaurant,Farmers Market,38.67837,-90.239771,1066
66,St. Louis Hills,Park,Coffee Shop,Grocery Store,Athletics & Sports,Yoga Studio,Food,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Festival,38.582563,-90.302508,461
68,The Gate District,Park,Southern / Soul Food Restaurant,Gym,Garden,Café,Event Space,Fish & Chips Shop,Filipino Restaurant,Festival,Fast Food Restaurant,38.618946,-90.228589,506
74,Tower Grove South,Food,Sports Bar,Yoga Studio,Food Truck,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Festival,Fast Food Restaurant,Farmers Market,38.595948,-90.256682,1494
77,Walnut Park East,Fried Chicken Joint,Park,Food,Child Care Service,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Festival,Fast Food Restaurant,Farmers Market,38.698118,-90.251305,751
80,West End,Fried Chicken Joint,Grocery Store,Park,Food,Home Service,Dessert Shop,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Festival,38.66304,-90.285483,935


In [91]:
cluster_1_crimes= crimes_clusters.loc[crimes_clusters['Cluster Labels'] == 1, crimes_clusters.columns[[1] + list(range(2, crimes_clusters.shape[1]))]]
cluster_1_crimes

Unnamed: 0,1st Most Common Crime,2nd Most Common Crime,3rd Most Common Crime,4th Most Common Crime,5th Most Common Crime,6th Most Common Crime,7th Most Common Crime,8th Most Common Crime,9th Most Common Crime,10th Most Common Crime,Cluster Labels
8,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,PUBLIC ORDER-OTHR UNSPC PBLC ORDER VIOLATION,LEAVING SCENE OF ACCIDENT,"ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC",SIMPLE ASSAULT-ADULT/NO INJURY,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,AGG.ASSAULT-FIREARM/CITIZEN ADULT 1ST DEGREE,LARCENY-MTR VEH PARTS UNDER $500,LARCENY-FROM BUILDING UNDER $500,LARCENY-FROM MTR VEH UNDER $500,1
23,AGG.ASSAULT-FIREARM/CITIZEN ADULT 1ST DEGREE,LEAVING SCENE OF ACCIDENT,"LARCENY-MTR VEH PARTS $500 - $24,999",WEAPONS-CITY VIOL/DISCHRGING IN CITY,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,AUTO THEFT-PERM RETNT/JOY RIDE,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,DRUGS-POSSESSION/OTHR UNSPEC DRUG,SIMPLE ASSAULT-CHILD/NO INJURY,HOMICIDE,1
24,LARCENY-FROM MTR VEH UNDER $500,"LARCENY-FROM MTR VEH $500 - $24,999",LEAVING SCENE OF ACCIDENT,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,LARCENY-MTR VEH PARTS UNDER $500,LARCENY-FROM MTR VEH UNDER $500 /ATTEMPT,SIMPLE ASSAULT-ADULT/NO INJURY,LARCENY-FROM BUILDING UNDER $500,DISORDERLY CONDUCT-PEACE DSTRB/INDIVIDUAL,1
33,LEAVING SCENE OF ACCIDENT,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,LARCENY-ALL OTHER UNDER $500,LARCENY-MTR VEH PARTS UNDER $500,AGG.ASSAULT-FIREARM/CITIZEN ADULT 1ST DEGREE,"ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC",LARCENY-FROM MTR VEH UNDER $500,SIMPLE ASSAULT-ADULT/NO INJURY,BURGLARY-RESDNCE/UNK TIM/FORC ENT/UNOCCUPIED,1
58,LEAVING SCENE OF ACCIDENT,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,BURGLARY-RESDNCE/UNK TIM/FORC ENT/UNOCCUPIED,AGG.ASSAULT-FIREARM/CITIZEN ADULT 1ST DEGREE,LARCENY-MTR VEH PARTS UNDER $500,LARCENY-ALL OTHER UNDER $500,LARCENY-FROM BUILDING UNDER $500,"ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC","LARCENY-ALL OTHER $500 - $24,999",1
66,LARCENY-SHOPLIFT UNDER $500,PUBLIC ORDER-OTHR UNSPC PBLC ORDER VIOLATION,LEAVING SCENE OF ACCIDENT,LARCENY-FROM MTR VEH UNDER $500,LARCENY-ALL OTHER UNDER $500,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,AUTO THEFT-PERM RETNT/JOY RIDE,LARCENY-MTR VEH PARTS UNDER $500,LARCENY-FROM BUILDING UNDER $500,"LARCENY-FROM MTR VEH $500 - $24,999",1
68,LEAVING SCENE OF ACCIDENT,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,LARCENY-FROM BUILDING UNDER $500,LARCENY-SHOPLIFT UNDER $500,PUBLIC ORDER-OTHR UNSPC PBLC ORDER VIOLATION,LARCENY-ALL OTHER UNDER $500,AGG.ASSAULT-FIREARM/CITIZEN ADULT 1ST DEGREE,LARCENY-FROM MTR VEH UNDER $500,SIMPLE ASSAULT-ADULT/NO INJURY,"LARCENY-ALL OTHER $500 - $24,999",1
74,LEAVING SCENE OF ACCIDENT,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,LARCENY-FROM MTR VEH UNDER $500,LARCENY-SHOPLIFT UNDER $500,LARCENY-MTR VEH PARTS UNDER $500,LARCENY-ALL OTHER UNDER $500,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,PUBLIC ORDER-OTHR UNSPC PBLC ORDER VIOLATION,"LARCENY-FROM MTR VEH $500 - $24,999","ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC",1
77,LEAVING SCENE OF ACCIDENT,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,LARCENY-ALL OTHER UNDER $500,LARCENY-MTR VEH PARTS UNDER $500,AGG.ASSAULT-FIREARM/CITIZEN ADULT 1ST DEGREE,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,LARCENY-FROM MTR VEH UNDER $500,"ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC","LARCENY-ALL OTHER $500 - $24,999",LARCENY-FROM BUILDING UNDER $500,1
80,,,,,,,,,,,1


In [92]:
cluster_1.describe()

Unnamed: 0,Latitude,Longitude,Crime_y
count,10.0,10.0,10.0
mean,38.644732,-90.24794,657.4
std,0.036601,0.034269,421.391663
min,38.582563,-90.302508,45.0
25%,38.624208,-90.276481,468.0
50%,38.652039,-90.245538,547.5
75%,38.66494,-90.224814,889.0
max,38.698118,-90.204004,1494.0


Cluster 2

In [93]:
cluster_2= st_louis_mapping.loc[st_louis_mapping['Cluster Labels'] == 2, st_louis_mapping.columns[[1] + list(range(2, st_louis_mapping.shape[1]))]]
cluster_2

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude,Crime_y
36,Kingsway East,Convenience Store,Cocktail Bar,Bus Stop,Yoga Studio,Falafel Restaurant,Food,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Festival,38.668268,-90.252855,577
53,O'Fallon,Convenience Store,Gas Station,Nightlife Spot,Yoga Studio,Falafel Restaurant,Food,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Festival,38.676872,-90.222295,834
78,Walnut Park West,Gas Station,Yoga Studio,Food Truck,Food,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Festival,Fast Food Restaurant,Farmers Market,38.708282,-90.256935,612


In [94]:
cluster_2_crimes= crimes_clusters.loc[crimes_clusters['Cluster Labels'] == 2, crimes_clusters.columns[[1] + list(range(2, crimes_clusters.shape[1]))]]
cluster_2_crimes

Unnamed: 0,1st Most Common Crime,2nd Most Common Crime,3rd Most Common Crime,4th Most Common Crime,5th Most Common Crime,6th Most Common Crime,7th Most Common Crime,8th Most Common Crime,9th Most Common Crime,10th Most Common Crime,Cluster Labels
36,LEAVING SCENE OF ACCIDENT,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,AGG.ASSAULT-FIREARM/CITIZEN ADULT 1ST DEGREE,WEAPONS-CITY VIOL/DISCHRGING IN CITY,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,LARCENY-FROM MTR VEH UNDER $500,LARCENY-ALL OTHER UNDER $500,LARCENY-MTR VEH PARTS UNDER $500,"ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC",WEAPONS-STATE VIOL/UNLWFL USE/POSSESSION,2
53,LEAVING SCENE OF ACCIDENT,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,AGG.ASSAULT-FIREARM/CITIZEN ADULT 1ST DEGREE,LARCENY-MTR VEH PARTS UNDER $500,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,PUBLIC ORDER-OTHR UNSPC PBLC ORDER VIOLATION,WEAPONS-CITY VIOL/DISCHRGING IN CITY,"ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC",LARCENY-FROM MTR VEH UNDER $500,BURGLARY-RESDNCE/UNK TIM/FORC ENT/UNOCCUPIED,2
78,,,,,,,,,,,2


In [95]:
cluster_2.describe()

Unnamed: 0,Latitude,Longitude,Crime_y
count,3.0,3.0,3.0
mean,38.684474,-90.244028,674.333333
std,0.021062,0.018932,139.378382
min,38.668268,-90.256935,577.0
25%,38.67257,-90.254895,594.5
50%,38.676872,-90.252855,612.0
75%,38.692577,-90.237575,723.0
max,38.708282,-90.222295,834.0


Cluster 3

In [96]:
cluster_3= st_louis_mapping.loc[st_louis_mapping['Cluster Labels'] == 3, st_louis_mapping.columns[[1] + list(range(2, st_louis_mapping.shape[1]))]]
cluster_3

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude,Crime_y
0,Academy,Convenience Store,Video Store,Chinese Restaurant,Outdoors & Recreation,Yoga Studio,Event Space,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Festival,38.658148,-90.267234,550
2,Benton Park,Fast Food Restaurant,Breakfast Spot,Sandwich Place,Dive Bar,Bar,Brazilian Restaurant,BBQ Joint,Performing Arts Venue,Sports Bar,Seafood Restaurant,38.601895,-90.219104,394
3,Benton Park West,Mexican Restaurant,Pizza Place,Yoga Studio,Intersection,Taco Place,Locksmith,Bar,Gay Bar,Bakery,Convenience Store,38.598109,-90.22949,582
4,Bevo Mill,Restaurant,Mexican Restaurant,Arcade,Rugby Pitch,Discount Store,Italian Restaurant,Lounge,German Restaurant,Falafel Restaurant,Flower Shop,38.585982,-90.267534,1276
5,Boulevard Heights,Gym,ATM,Dessert Shop,Pizza Place,Discount Store,Dive Bar,Smoke Shop,Grocery Store,Food Truck,Women's Store,38.56226,-90.280278,508
7,Carondelet Park,Gym,Bakery,Mexican Restaurant,Coffee Shop,Mobile Phone Shop,Fast Food Restaurant,Pizza Place,Shopping Mall,Furniture / Home Store,Chinese Restaurant,38.561273,-90.262169,64
9,Central West End,Theater,Pharmacy,ATM,Seafood Restaurant,Food,Deli / Bodega,Southern / Soul Food Restaurant,Outdoor Sculpture,Convenience Store,American Restaurant,38.64226,-90.254089,2202
10,Cheltenham,Yoga Studio,Bar,Pet Store,Coffee Shop,Sandwich Place,Brewery,Hotel,Antique Shop,American Restaurant,Art Museum,38.627467,-90.281447,198
11,Clayton-Tamm,Fast Food Restaurant,Diner,Sandwich Place,Liquor Store,Yoga Studio,Bakery,Pub,Chinese Restaurant,Nightclub,Music Store,38.627042,-90.291561,220
12,Clifton Heights,Park,Italian Restaurant,BBQ Joint,Breakfast Spot,Convenience Store,Discount Store,Farmers Market,Food & Drink Shop,Food,Flower Shop,38.61165,-90.293921,213


In [97]:
cluster_3_crimes= crimes_clusters.loc[crimes_clusters['Cluster Labels'] == 3, crimes_clusters.columns[[1] + list(range(2, crimes_clusters.shape[1]))]]
cluster_3_crimes

Unnamed: 0,1st Most Common Crime,2nd Most Common Crime,3rd Most Common Crime,4th Most Common Crime,5th Most Common Crime,6th Most Common Crime,7th Most Common Crime,8th Most Common Crime,9th Most Common Crime,10th Most Common Crime,Cluster Labels
0,LEAVING SCENE OF ACCIDENT,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,LARCENY-MTR VEH PARTS UNDER $500,"STOLEN PROPERTY-BUYING,RECEIVING,POSSESSING,ET",AGG.ASSAULT-FIREARM/CITIZEN ADULT 1ST DEGREE,LARCENY-FROM MTR VEH UNDER $500,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,"ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC",DISORDERLY CONDUCT-PEACE DSTRB/INDIVIDUAL,LARCENY-SHOPLIFT UNDER $500,3
2,LEAVING SCENE OF ACCIDENT,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,LARCENY-ALL OTHER UNDER $500,LARCENY-FROM MTR VEH UNDER $500,LARCENY-MTR VEH PARTS UNDER $500,"LARCENY-FROM MTR VEH $500 - $24,999",SIMPLE ASSAULT-ADULT/NO INJURY,"LARCENY-ALL OTHER $500 - $24,999",AUTO THEFT-PERM RETNT/JOY RIDE,3
3,LEAVING SCENE OF ACCIDENT,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,LARCENY-FROM MTR VEH UNDER $500,LARCENY-MTR VEH PARTS UNDER $500,BURGLARY-RESDNCE/UNK TIM/FORC ENT/UNOCCUPIED,LARCENY-ALL OTHER UNDER $500,"ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC",AUTO THEFT-PERM RETNT/JOY RIDE,LARCENY-FROM BUILDING UNDER $500,3
4,LEAVING SCENE OF ACCIDENT,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,LARCENY-MTR VEH PARTS UNDER $500,LARCENY-ALL OTHER UNDER $500,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,"ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC",DISORDERLY CONDUCT-PEACE DSTRB/INDIVIDUAL,LARCENY-FROM MTR VEH UNDER $500,LARCENY-FROM BUILDING UNDER $500,SIMPLE ASSAULT-ADULT/NO INJURY,3
5,LEAVING SCENE OF ACCIDENT,LARCENY-FROM MTR VEH UNDER $500,LARCENY-ALL OTHER UNDER $500,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,LARCENY-MTR VEH PARTS UNDER $500,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,"LARCENY-FROM MTR VEH $500 - $24,999",BURGLARY-RESDNCE/UNK TIM/FORC ENT/UNOCCUPIED,DISORDERLY CONDUCT-PEACE DSTRB/INDIVIDUAL,AUTO THEFT-PERM RETNT/JOY RIDE,3
7,LEAVING SCENE OF ACCIDENT,LARCENY-FROM BUILDING UNDER $500,"LARCENY-FROM BUILDING $500 - $24,999",AUTO THEFT-PERM RETNT/JOY RIDE,LARCENY-FROM MTR VEH UNDER $500,"LARCENY-FROM MTR VEH $500 - $24,999",SIMPLE ASSAULT-ADULT/NO INJURY,OBSTRUCT GOVRN OP-PARK ORD VIOL,AGG.ASSAULT-FIREARM/CITIZEN ADULT 1ST DEGREE,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,3
9,LEAVING SCENE OF ACCIDENT,LARCENY-FROM MTR VEH UNDER $500,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,"LARCENY-FROM MTR VEH $500 - $24,999",LARCENY-FROM BUILDING UNDER $500,LARCENY-MTR VEH PARTS UNDER $500,SIMPLE ASSAULT-ADULT/NO INJURY,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,LARCENY-SHOPLIFT UNDER $500,LARCENY-FROM MTR VEH UNDER $500 /ATTEMPT,3
10,LEAVING SCENE OF ACCIDENT,"LARCENY-FROM MTR VEH $500 - $24,999",LARCENY-FROM MTR VEH UNDER $500,SIMPLE ASSAULT-ADULT/NO INJURY,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,LARCENY-MTR VEH PARTS UNDER $500,"LARCENY-FROM BUILDING $500 - $24,999",LARCENY-FROM BUILDING UNDER $500,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,BURGLARY-RESDNCE/UNK TIM/FORC ENT/UNOCCUPIED,3
11,LEAVING SCENE OF ACCIDENT,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,LARCENY-FROM MTR VEH UNDER $500,LARCENY-ALL OTHER UNDER $500,"LARCENY-FROM MTR VEH $500 - $24,999",DISORDERLY CONDUCT-PEACE DSTRB/INDIVIDUAL,LARCENY-FROM MTR VEH UNDER $500 /ATTEMPT,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,LARCENY-FROM BUILDING UNDER $500,AGG.ASSAULT-FIREARM/CITIZEN ADULT 3RD DEGREE,3
12,LEAVING SCENE OF ACCIDENT,LARCENY-ALL OTHER UNDER $500,LARCENY-FROM MTR VEH UNDER $500,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,"LARCENY-FROM MTR VEH $500 - $24,999",LARCENY-MTR VEH PARTS UNDER $500,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,LARCENY-FROM MTR VEH UNDER $500 /ATTEMPT,"ASSAULT, ADULT, AGE 17 AND UP-DOMESTIC",BURGLARY-RESDNCE/UNK TIM/FORC ENT/UNOCCUPIED,3


In [98]:
cluster_3.describe()

Unnamed: 0,Latitude,Longitude,Crime_y
count,51.0,51.0,51.0
mean,38.623935,-90.252067,617.235294
std,0.030099,0.03176,735.305939
min,38.561273,-90.309512,27.0
25%,38.60581,-90.278385,198.0
50%,38.621634,-90.254089,400.0
75%,38.642671,-90.227812,649.0
max,38.68975,-90.190283,3735.0


Cluster 4

In [101]:
cluster_4= st_louis_mapping.loc[st_louis_mapping['Cluster Labels'] == 4, st_louis_mapping.columns[[1] + list(range(2, st_louis_mapping.shape[1]))]]
cluster_4

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude,Crime_y
52,North Riverfront,Waste Facility,Yoga Studio,Event Space,Food & Drink Shop,Food,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Festival,Fast Food Restaurant,38.705874,-90.220762,336


In [100]:
cluster_4_crimes= crimes_clusters.loc[crimes_clusters['Cluster Labels'] == 4, crimes_clusters.columns[[1] + list(range(2, crimes_clusters.shape[1]))]]
cluster_4_crimes

Unnamed: 0,1st Most Common Crime,2nd Most Common Crime,3rd Most Common Crime,4th Most Common Crime,5th Most Common Crime,6th Most Common Crime,7th Most Common Crime,8th Most Common Crime,9th Most Common Crime,10th Most Common Crime,Cluster Labels
52,LEAVING SCENE OF ACCIDENT,SIMPLE ASSAULT-ADULT/NO INJURY,LARCENY-FROM MTR VEH UNDER $500,DESTRUCTION OF PROPERTY-MALICIOUS/PRIV PROP,LARCENY-MTR VEH PARTS UNDER $500,"LARCENY-FROM MTR VEH $500 - $24,999",PUBLIC ORDER-TRESPASSING,AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR,AGG.ASSAULT-FIREARM/CITIZEN ADULT 1ST DEGREE,"STOLEN PROPERTY-BUYING,RECEIVING,POSSESSING,ET",4


In [102]:
cluster_4.describe()

Unnamed: 0,Latitude,Longitude,Crime_y
count,1.0,1.0,1.0
mean,38.705874,-90.220762,336.0
std,,,
min,38.705874,-90.220762,336.0
25%,38.705874,-90.220762,336.0
50%,38.705874,-90.220762,336.0
75%,38.705874,-90.220762,336.0
max,38.705874,-90.220762,336.0


Here are the means for crimes committed over the past year from November 2018- November 2019 for each cluster. <br>
Cluster 0 mean crime: 636.1 <br>
Cluster 1 mean crime: 657.4<br>
Cluster 2 mean crime: 674.3<br>
Cluster 3 mean crime: 617.2<br>
Cluster 4 mean crime: 336.0<br>

All the clusters have a similar amount of average crimes committed over a year span aside from cluster 4, which reports about half as much as the rest. This is a good sign as far as diving into the differences goes. We know that what separates the clusters is the venues and crime types, rather than the number of crimes committed. <br>
<br>
The cluster with the lowest crime, cluster 4, has only one neighborhood and looks like it was in its own category because of the difference in venue make-up from the rest, with the highest reported venue being a waste management facility rather than a restaurant or similar venue. The crime is lower there, which could indicate that more industrial areas with perhaps less foot traffic is what is driving that number lower. 
<br>
Let's dive further into each cluster:
<br>
For cluster 0, bars were the most common venue type. The most frequent crimes committed were aggravated assault, destruction of property, and leaving the scene of a crime.
<br>
For cluster 1, parks were the most common venue type. The most frequent crimes committed were destruction of property, and leaving the scene of a crime.
<br>
For cluster 2, gas stations and convince stores were the most common venue types. The most frequent crimes committed were aggravated assault and weapons violations.
<br>
For cluster 3, restaurants were the most common venue type. The most frequent crime committed was larceny.
<br>
For cluster 4, a waste facility was the most common venue type. There was also an event space listed as a common venue type, painting the picture that the neighborhood is a less populated industrial area.  The most frequent crime committed was larceny.
<br>

Based on those results, it looks like we can actually predict the types of crimes that ware most likely to be committed by neighborhood with this machine learning clustering. From here, the city of St. Louis can take action steps to lower the crime rate with this knowledge. 

The legislative body can impose regulations on the amount of bars in a concentrated area to lower the rate of destruction of property and aggravated assault. They could provide infrastructure to assist in preventing aggravated assault and weapons violations at gas stations and convenience stores in ways such as bullet proof glass or automated pump use. They could make anti-theft notices in neighborhoods with many restaurants and in industrial areas and event spaces.

The police department can use this knowledge to devote different resources to different clusters of neighborhoods. In the cluster with more weapons related crimes they can send more units to patrol that area, specifically around the related venues. In the cluster with more larceny around restaurants, they can place additional cameras and possibly employ recognition software to spot crimes in real time. As a whole, knowing what types of crimes will happen can be a huge resource for the St. Louis police department.
