# Categorizing Chicago Communities using Crime Data and Foursquare

## Table of contents
* [Introduction / Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction / Business Problem <a name="introduction"></a>

The objective of this project is to obtain the safest communities taking into account the crime rate. The result of this study is aimed at those interested in living in the city of Chicago.

Chicago is one of the largest cities of the united states with a population of over 2 and a half million people. Chicago has 77 communities grouped into 9 districts.  The city has reported more than 7 million crimes of every category since 2001, 259 thousand just in 2019.  Business Problem  We will deal with the decision of “Which community has had the least crimes in 2019”, finding the right community to move into or beginning a business entrepreneurship based on the security and venue(residence) density in each community. 
 

## Data <a name="data"></a>

Our study will be based on the data extracted from:
 * Crime Data(using the datasets provided by <a target='new' href='https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present-Dashboard/5cd6-ry5g'>the city of chicago</a>)
 * Community information(scraped from <a target='new' href='https://en.wikipedia.org/wiki/Community_areas_in_Chicago'>this wikipedia page</a>)
 	* Using geopy to get each coordinate
 * Foursquare api to explore residences near our communities
 
**The study will be made using 2019 data**.
 

In [3]:
import pandas as pd
import numpy as np
import requests
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
!pip install lxml
import folium # map rendering library

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.21.0               |             py_0          58 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.2 MB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-1.21.0-py_0

The following packages will be UPDATED:

  openssl                                 1.1.1f-h516909a_0 --> 1.1.1g-h51

**Since we downloaded the data corresponding to 2019 into a csv file named: ChicagoCrimes2019.csv we proceed to utilize it**

In [4]:
df_crimes = pd.read_csv('ChicagoCrimes2019.csv')
df_crimes.shape

(259004, 22)

In [5]:
df_crimes.head()

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,11965029,JD132142,01/01/2019 12:00:00 AM,028XX W SHAKESPEARE AVE,1752,OFFENSE INVOLVING CHILDREN,AGGRAVATED CRIMINAL SEXUAL ABUSE BY FAMILY MEMBER,APARTMENT,False,True,...,1.0,22.0,17,1156820.0,1914304.0,2019,04/20/2020 03:47:53 PM,41.920626,-87.699237,"(41.920626036, -87.699236545)"
1,11752916,JC342540,01/01/2019 12:00:00 AM,036XX W THOMAS ST,1752,OFFENSE INVOLVING CHILDREN,AGGRAVATED CRIMINAL SEXUAL ABUSE BY FAMILY MEMBER,APARTMENT,False,False,...,27.0,23.0,17,1151715.0,1907084.0,2019,04/19/2020 03:44:21 PM,41.900916,-87.718184,"(41.900915738, -87.718183718)"
2,11739188,JC326320,01/01/2019 12:00:00 AM,010XX N RIDGEWAY AVE,1752,OFFENSE INVOLVING CHILDREN,AGGRAVATED CRIMINAL SEXUAL ABUSE BY FAMILY MEMBER,APARTMENT,False,True,...,27.0,23.0,17,1151192.0,1906670.0,2019,04/19/2020 03:44:21 PM,41.89979,-87.720116,"(41.899789956, -87.720115618)"
3,11737723,JC323962,01/01/2019 12:00:00 AM,028XX S ST LOUIS AVE,1562,SEX OFFENSE,AGGRAVATED CRIMINAL SEXUAL ABUSE,APARTMENT,False,True,...,22.0,30.0,17,1153496.0,1884904.0,2019,04/19/2020 03:44:21 PM,41.840016,-87.712231,"(41.840016176, -87.712231415)"
4,11992909,JD166116,01/01/2019 12:00:00 AM,014XX N KEELER AVE,1562,SEX OFFENSE,AGGRAVATED CRIMINAL SEXUAL ABUSE,RESIDENCE,False,False,...,26.0,23.0,17,1148120.0,1909302.0,2019,03/23/2020 03:47:32 PM,41.907072,-87.731331,"(41.907072136, -87.731331357)"


**Crime Data preliminar cleaning**

In [6]:
#take only the rows with location data
df_crimes= df_crimes[df_crimes['Latitude'].notna()]
df_crimes= df_crimes[df_crimes['Longitude'].notna()]
#save a copy of the dataframe so we can later use it on a map
df_crimes_cpy = df_crimes.copy()
#remove unnecessary columns
df_crimes.drop(df_crimes.columns.difference(['Community Area','Latitude','Longitude']), 1, inplace=True)
#rename for ease of access
df_crimes = df_crimes.rename(columns={"Community Area": "CommunityArea"})
#delete rows that dont have a community
df_crimes = df_crimes[df_crimes.CommunityArea != 0]
#modfiy dataframe to contain each community and the total of crimes
df_crimes = df_crimes.groupby(["CommunityArea"])["CommunityArea"].count().reset_index(name="CrimeTotal")
df_crimes.shape

(257842, 3)

Now we need a list of all communities, we use pandas to scrape the data from wikipedia

In [7]:
communities_raw_data = pd.read_html('https://en.wikipedia.org/wiki/Community_areas_in_Chicago')
# the pandas read_html function reads all tables from the scraped webpage. Since chicago has 77 communities divided into 9 districts 
#we take the first 9 indexes
print('the object communities_raw_data is of type {}, has {} items of type {}'.format(type(communities_raw_data),
                                                                                      len(communities_raw_data),
                                                                                      type(communities_raw_data[0])))

the object communities_raw_data is of type <class 'list'>, has 12 items of type <class 'pandas.core.frame.DataFrame'>


**Now that we have a list of DataFrames containing all the community data, we proceed to merge each item into only 1 DataFrame**

In [8]:
# define the dataframe columns
column_names = ['CommunityNumber', 'CommunityAreaName'] 

# instantiate the dataframe
community_data = pd.DataFrame(columns=column_names)

for index in range(9):
    for r,data in communities_raw_data[index].iterrows():
        number =data.Number
        name = data['Community area']

        community_data = community_data.append({'CommunityNumber': number,
                                              'CommunityAreaName': name} , ignore_index=True)

community_data.head()

Unnamed: 0,CommunityNumber,CommunityAreaName
0,8,Near North Side
1,32,Loop
2,33,Near South Side
3,5,North Center
4,6,Lake View


### We need a function to obtain the location of each community 

In [9]:
#returns the latitud and longitude of a given community
def get_community_location(community):
    address = community+', Chicago, IL'
    geolocator = Nominatim(user_agent="chicago_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    return latitude,longitude

The next step is to loop thru all communities and get each latitud,longitude


In [10]:
for i,data in community_data.iterrows():
    location= get_community_location(data.CommunityAreaName)
    community_data.at[i,'Latitude'] = location[0]
    community_data.at[i,'Longitude'] = location[1]
    
community_data.head()

Unnamed: 0,CommunityNumber,CommunityAreaName,Latitude,Longitude
0,8,Near North Side,41.900033,-87.634497
1,32,Loop,41.875562,-87.624421
2,33,Near South Side,41.8567,-87.624774
3,5,North Center,41.956107,-87.67916
4,6,Lake View,41.94705,-87.655429


### Foursquare configuration

In [11]:
#sets foursquare keys
CLIENT_ID =  # your Foursquare ID
CLIENT_SECRET =  # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100
CATEGORY = '4e67e38e036454776db1fb3a'

In [12]:
def getNearbyResidences(id,names, latitudes, longitudes):
    venues_list=[]
    for id,name, lat, lng in zip(id,names, latitudes, longitudes):
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            CATEGORY, 
            LIMIT)
        #print(url)    
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            id,
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = [
        'CommunityNumber',
        'CommunityName',
        'CommunityLatitude', 
        'CommunityLongitude', 
        'Venue', 
        'Venue Latitude', 
        'Venue Longitude', 
        'Venue Category']
    
    return(nearby_venues)

In [13]:
chicago_residencies = getNearbyResidences(id=community_data['CommunityNumber'],names=community_data['CommunityAreaName'],
                                   latitudes=community_data['Latitude'],
                                   longitudes=community_data['Longitude']
                                  )
chicago_residencies.head()

Unnamed: 0,CommunityNumber,CommunityName,CommunityLatitude,CommunityLongitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,8,Near North Side,41.900033,-87.634497,DeWitt Place,41.899483,-87.620066,Residential Building (Apartment / Condo)
1,8,Near North Side,41.900033,-87.634497,Trump International Hotel & Tower Chicago (Tru...,41.888938,-87.626354,Hotel
2,8,Near North Side,41.900033,-87.634497,Maple Pointe,41.901623,-87.633804,Residential Building (Apartment / Condo)
3,8,Near North Side,41.900033,-87.634497,Niche 905,41.899353,-87.636976,Residential Building (Apartment / Condo)
4,8,Near North Side,41.900033,-87.634497,Parc Chestnut,41.897744,-87.635741,Residential Building (Apartment / Condo)


In [14]:
chicago_location = get_community_location('')
chicago_location

(41.8755616, -87.6244212)

Lets start by just showing each community

In [15]:
# Create a map of chicago
map_chicago = folium.Map(location=chicago_location, zoom_start=10)

# add markers to map
for lat, lng, description in zip(community_data['Latitude'], community_data['Longitude'], community_data['CommunityAreaName']):
    label = description
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_chicago)  
    
map_chicago

Now lets show each residence venue on every community

In [16]:
# Create a map of chicago
map_chicago = folium.Map(location=chicago_location, zoom_start=10)

# add markers to map
for lat, lng, community,residency in zip(chicago_residencies['Venue Latitude'], chicago_residencies['Venue Longitude'], chicago_residencies['CommunityName'], chicago_residencies['Venue']):
    label = '{},{}'.format(community,residency)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_chicago)  
    
map_chicago

In [38]:
df_crimes.shape
chicago_residencies.head()

Unnamed: 0,CommunityNumber,CommunityName,CommunityLatitude,CommunityLongitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,8,Near North Side,41.900033,-87.634497,DeWitt Place,41.899483,-87.620066,Residential Building (Apartment / Condo)
1,8,Near North Side,41.900033,-87.634497,Trump International Hotel & Tower Chicago (Tru...,41.888938,-87.626354,Hotel
2,8,Near North Side,41.900033,-87.634497,Maple Pointe,41.901623,-87.633804,Residential Building (Apartment / Condo)
3,8,Near North Side,41.900033,-87.634497,Niche 905,41.899353,-87.636976,Residential Building (Apartment / Condo)
4,8,Near North Side,41.900033,-87.634497,Parc Chestnut,41.897744,-87.635741,Residential Building (Apartment / Condo)


Unnamed: 0,CommunityArea,CrimeTotal
0,1.0,3993
1,2.0,3419
2,3.0,3297
3,4.0,1768
4,5.0,1244


## Methodology <a name="methodology"></a>

In [103]:
#pendiente

(41.9000327, -87.6344975)

## Anaylisis <a name="anaylisis"></a>

first we need to have the total crimes per community and merge all dataframes together

In [48]:
community_summary =  chicago_residencies.copy()
community_summary = community_summary.groupby(["CommunityNumber","CommunityName","CommunityLatitude","CommunityLongitude"])["CommunityNumber"].count().reset_index(name="VenueTotal")

In [63]:
community_summary['CrimeTotal'] = df_crimes['CrimeTotal'].values
community_summary.head()

Unnamed: 0,CommunityNumber,CommunityName,CommunityLatitude,CommunityLongitude,VenueTotal,CrimeTotal
0,1,Rogers Park,42.010531,-87.670748,36,3993
1,2,West Ridge,42.003548,-87.696243,52,3419
2,3,Uptown,41.96663,-87.655546,70,3297
3,4,Lincoln Square,41.97599,-87.689616,63,1768
4,5,North Center,41.956107,-87.67916,47,1244


In [62]:
df_crimes.head()

Unnamed: 0,CommunityArea,CrimeTotal
0,1.0,3993
1,2.0,3419
2,3.0,3297
3,4.0,1768
4,5.0,1244


In [10]:

dfCrimes = pd.read_csv('ChicagoCrimes2019.csv')

In [11]:
dfCrimes.shape

(259004, 22)

In [17]:
dfCrimes['Community Area'].unique()


array([22., 23., 30., 31., 26., 18., 24., 25.,  5., 20., 14., 29., 53.,
       69., 15., 44., 75., 43., 27., 42., 54.,  4., 39., 60.,  1., 63.,
       49., 76.,  8., 73., 58., 67., 19.,  3., 46., 38.,  7., 28., 65.,
        6., 71., 66., 56., 55., 45., 68., 35., 33., 32., 48., 72., 36.,
       50., 41., 13., 64., 47., 16., 70., 21., 62., 51., 61., 77., 57.,
        2., 40., 34., 11., 52., 74., 59., 17., 10.,  9., 37., 12.,  0.,
       nan])

In [14]:
dfCrimes.head()

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,11965029,JD132142,01/01/2019 12:00:00 AM,028XX W SHAKESPEARE AVE,1752,OFFENSE INVOLVING CHILDREN,AGGRAVATED CRIMINAL SEXUAL ABUSE BY FAMILY MEMBER,APARTMENT,False,True,...,1.0,22.0,17,1156820.0,1914304.0,2019,04/20/2020 03:47:53 PM,41.920626,-87.699237,"(41.920626036, -87.699236545)"
1,11752916,JC342540,01/01/2019 12:00:00 AM,036XX W THOMAS ST,1752,OFFENSE INVOLVING CHILDREN,AGGRAVATED CRIMINAL SEXUAL ABUSE BY FAMILY MEMBER,APARTMENT,False,False,...,27.0,23.0,17,1151715.0,1907084.0,2019,04/19/2020 03:44:21 PM,41.900916,-87.718184,"(41.900915738, -87.718183718)"
2,11739188,JC326320,01/01/2019 12:00:00 AM,010XX N RIDGEWAY AVE,1752,OFFENSE INVOLVING CHILDREN,AGGRAVATED CRIMINAL SEXUAL ABUSE BY FAMILY MEMBER,APARTMENT,False,True,...,27.0,23.0,17,1151192.0,1906670.0,2019,04/19/2020 03:44:21 PM,41.89979,-87.720116,"(41.899789956, -87.720115618)"
3,11737723,JC323962,01/01/2019 12:00:00 AM,028XX S ST LOUIS AVE,1562,SEX OFFENSE,AGGRAVATED CRIMINAL SEXUAL ABUSE,APARTMENT,False,True,...,22.0,30.0,17,1153496.0,1884904.0,2019,04/19/2020 03:44:21 PM,41.840016,-87.712231,"(41.840016176, -87.712231415)"
4,11992909,JD166116,01/01/2019 12:00:00 AM,014XX N KEELER AVE,1562,SEX OFFENSE,AGGRAVATED CRIMINAL SEXUAL ABUSE,RESIDENCE,False,False,...,26.0,23.0,17,1148120.0,1909302.0,2019,03/23/2020 03:47:32 PM,41.907072,-87.731331,"(41.907072136, -87.731331357)"


In [20]:
communities_data = pd.read_html('https://en.wikipedia.org/wiki/Community_areas_in_Chicago')

In [23]:
len(communities_data)


12

In [32]:
communities_data[0]

Unnamed: 0,Number,Community area,Neighborhoods
0,8,Near North Side,Cabrini–Green The Gold Coast Goose Island Magn...
1,32,Loop,Loop New Eastside South Loop West Loop Gate
2,33,Near South Side,Dearborn Park Printer's Row South Loop Prairie...
