# Capstone Project - The Battle of the Neighborhoods

## Table of contents
* [A. Introduction](#introduction)
* [B. Data](#data)
* [C. Methodology](#methodology)
* [D. Results](#results)
* [E. Discussion](#discussion)
* [F. Conclusion](#conclusion)

## A. Introduction  <a name="introduction"></a>

Allocating police resources is a challenging endeavor. It’s likely that certain types of crime occur in certain areas and types of venues. If police had a better idea of where specific crimes, and crime in general occur, they will be able to more efficiently distribute their resources (manpower, equipment, etc.) and implement preventative measures. For example, if they noticed that a lot of theft occurs near bars, they may ask owners of bars to inform their customers to exercise caution especially when intoxicated since it is possible that people under the influence make prime targets for theft. Another example could be if it were found that drug possessions were found to occur in or near movie theatres. In this case, police may consider dispatching drug sniffing dogs in or near theatres. After implementing these measures, they could then consult the city crime statistics once more to see their if measures showed signs of success. The target audience of this project is specifically the Chicago PD of Illinois.

##  B. Data <a name="data"></a>

We will be using 2 sets of data, venue data from the Foursquare API and official crime data for the city of Chicago, IL. From the venue data, we will create a dataframe containing the category, longitude, and latitude of hundreds of venues of many types. From the crime data, we will create a dataframe containing the date, type, description, and coordinates of hundreds of thousands of criminal incidents. Then from these 2 dataframes, we will create a new dataframe where the index or row labels are types of crimes and the column labels are types of venues. For each cell in the dataframe or matrix, the cell in row i and column j represents the number of incidents of crime i at venues of type j. For example, if the cell in row ‘Battery’ and columns ‘Chinese Restaurant’ is 12, then in the past year there were 12 incidents of battery within 0.1 miles of a Chinese restaurant. 

https://data.cityofchicago.org/Public-Safety/Crimes-Map/dfnk-7re6

In [2]:
from geopy.geocoders import Nominatim
from geopy.distance import geodesic
import pandas as pd
import requests
import json

In [3]:
address = 'Chicago, IL'
geolocator = Nominatim(user_agent="city_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print('The geograpical coordinates of {} are {}, {}.'.format(address, latitude, longitude))

The geograpical coordinates of Chicago, IL are 41.8755616, -87.6244212.


In [4]:
local_filepath = '..\\foursquare_credentials.txt'
f = open(local_filepath, "r")
contents = f.read()
credentials = json.loads(contents)
f.close()

CLIENT_ID = credentials['CLIENT_ID']
CLIENT_SECRET = credentials['CLIENT_SECRET']
VERSION = credentials['VERSION']

In [5]:
LIMIT = 1000
def getNearbyVenues(latitudes, longitudes, radius=500):
    
    venues_list=[]
    for lat, lng in zip(latitudes, longitudes):
        
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
         
        results = requests.get(url).json()["response"]["groups"][0]["items"]
        venues_list.append([(
            v['venue']['categories'][0]['name'],
            v['venue']['location']['lat'], 
            v['venue']['location']['lng']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = [ 
                  'Venue Category',
                  'Venue Latitude', 
                  'Venue Longitude']
    return(nearby_venues)

In [6]:
df = getNearbyVenues([latitude], [longitude])

In [7]:
df.shape

(100, 3)

In [8]:
df.head()

Unnamed: 0,Venue Category,Venue Latitude,Venue Longitude
0,Theater,41.876058,-87.625303
1,Cuban Restaurant,41.875724,-87.626386
2,Sushi Restaurant,41.876969,-87.624534
3,Hostel,41.875757,-87.626537
4,Donut Shop,41.876768,-87.624575


In [9]:
onehot = pd.get_dummies(df[['Venue Category']], prefix="", prefix_sep="")
fixed_columns = list(onehot.columns)#[onehot.columns[-1]] + list(onehot.columns[:-1])
onehot = onehot[fixed_columns]

In [10]:
onehot.shape

(100, 63)

In [11]:
onehot.head(3)

Unnamed: 0,American Restaurant,Arepa Restaurant,Art Museum,Arts & Crafts Store,Asian Restaurant,Bakery,Bookstore,Boutique,Bubble Tea Shop,Building,...,Sandwich Place,Snack Place,Spanish Restaurant,Speakeasy,Sushi Restaurant,Tapas Restaurant,Thai Restaurant,Theater,Trail,Whisky Bar
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0


In [39]:
venue_categories = list(onehot.columns)
print(len(venue_categories))
print(venue_categories[:5])

63
['American Restaurant', 'Arepa Restaurant', 'Art Museum', 'Arts & Crafts Store', 'Asian Restaurant']


In [13]:
file_path = 'chicago_crime.csv'
crime_df = pd.read_csv(file_path)

In [14]:
crime_df.dropna(inplace=True)
crime_df.reset_index(inplace=True, drop=True);

In [15]:
print(crime_df.shape)
crime_df.head()

(254366, 5)


Unnamed: 0,DATE OF OCCURRENCE,PRIMARY DESCRIPTION,LOCATION DESCRIPTION,LATITUDE,LONGITUDE
0,6/24/2019 18:24,BATTERY,SIDEWALK,41.753506,-87.665947
1,12/5/2019 18:43,NARCOTICS,SIDEWALK,41.862559,-87.721771
2,6/24/2019 11:00,THEFT,STREET,41.992936,-87.700697
3,11/19/2019 19:20,THEFT,CTA BUS,41.778768,-87.683628
4,11/19/2019 0:10,BATTERY,APARTMENT,41.883109,-87.760218


In [16]:
crime_categories = list(crime_df['PRIMARY DESCRIPTION'].unique())
crime_categories

['BATTERY',
 'NARCOTICS',
 'THEFT',
 'CRIMINAL DAMAGE',
 'KIDNAPPING',
 'DECEPTIVE PRACTICE',
 'WEAPONS VIOLATION',
 'CRIMINAL TRESPASS',
 'ASSAULT',
 'OTHER OFFENSE',
 'ROBBERY',
 'MOTOR VEHICLE THEFT',
 'BURGLARY',
 'OFFENSE INVOLVING CHILDREN',
 'PUBLIC PEACE VIOLATION',
 'SEX OFFENSE',
 'CONCEALED CARRY LICENSE VIOLATION',
 'INTERFERENCE WITH PUBLIC OFFICER',
 'CRIM SEXUAL ASSAULT',
 'STALKING',
 'PROSTITUTION',
 'GAMBLING',
 'INTIMIDATION',
 'ARSON',
 'HOMICIDE',
 'LIQUOR LAW VIOLATION',
 'NON-CRIMINAL',
 'PUBLIC INDECENCY',
 'OBSCENITY',
 'HUMAN TRAFFICKING',
 'OTHER NARCOTIC VIOLATION']

## C. Methodology <a name="methodology"></a>

A good question to ask would be what does it mean for a crime to have occured near a type of venue. In this analysis, we will determine near to mean less than 0.1 miles or 0.16 km.

The steps that will be taken for processing the crime and venue dataframes are

1) loop through each type of crime (Battery, Theft, etc...)
2) from the crime dataframe, get all crime incidents of this type
3) loop through each of these incidents
4) loop through each type of venue (Restaurant, Movie Theatre, etc...)


In [32]:
crime_by_venue = pd.DataFrame(index=crime_categories, columns=venue_categories).fillna(value=0)

In [33]:
crime_by_venue

Unnamed: 0,American Restaurant,Arepa Restaurant,Art Museum,Arts & Crafts Store,Asian Restaurant,Bakery,Bookstore,Boutique,Bubble Tea Shop,Building,...,Sandwich Place,Snack Place,Spanish Restaurant,Speakeasy,Sushi Restaurant,Tapas Restaurant,Thai Restaurant,Theater,Trail,Whisky Bar
BATTERY,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
NARCOTICS,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
THEFT,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
CRIMINAL DAMAGE,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
KIDNAPPING,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
DECEPTIVE PRACTICE,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
WEAPONS VIOLATION,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
CRIMINAL TRESPASS,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
ASSAULT,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
OTHER OFFENSE,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [29]:
#crime_by_venue.loc['ARSON']['American Restaurant'] = 1

In [19]:
crime_by_venue.iloc[0]['American Restaurant'] = 1

In [42]:
for crime_type in crime_categories:
    print(f'Crime type: {crime_type}')
    crimes_of_this_type = crime_df[crime_df['PRIMARY DESCRIPTION'] == crime_type][:100]
    for incident in crimes_of_this_type.iterrows():
        crime = incident[1][1]
        crime_location = (incident[1][3], incident[1][4])
        for venue_type in venue_categories:
            #print(f'\tVenue type: {venue_type}')
            venues_of_this_type = df[df['Venue Category'] == venue_type]
            for venue_ in venues_of_this_type.iterrows():
                venue = venue_[1]['Venue Category']
                venue_location = (venue_[1]['Venue Latitude'], venue_[1]['Venue Longitude'])
                distance = geodesic(crime_location, venue_location).miles
                if distance < 0.1:
                    crime_by_venue.loc[crime][venue] += 1
                #break
            #break
        #break
    #break

Crime type: BATTERY
Crime type: NARCOTICS
Crime type: THEFT
Crime type: CRIMINAL DAMAGE
Crime type: KIDNAPPING
Crime type: DECEPTIVE PRACTICE
Crime type: WEAPONS VIOLATION
Crime type: CRIMINAL TRESPASS
Crime type: ASSAULT
Crime type: OTHER OFFENSE
Crime type: ROBBERY
Crime type: MOTOR VEHICLE THEFT
Crime type: BURGLARY
Crime type: OFFENSE INVOLVING CHILDREN
Crime type: PUBLIC PEACE VIOLATION
Crime type: SEX OFFENSE
Crime type: CONCEALED CARRY LICENSE VIOLATION
Crime type: INTERFERENCE WITH PUBLIC OFFICER
Crime type: CRIM SEXUAL ASSAULT
Crime type: STALKING
Crime type: PROSTITUTION
Crime type: GAMBLING
Crime type: INTIMIDATION
Crime type: ARSON
Crime type: HOMICIDE
Crime type: LIQUOR LAW VIOLATION
Crime type: NON-CRIMINAL
Crime type: PUBLIC INDECENCY
Crime type: OBSCENITY
Crime type: HUMAN TRAFFICKING
Crime type: OTHER NARCOTIC VIOLATION


In [43]:
crime_by_venue

Unnamed: 0,American Restaurant,Arepa Restaurant,Art Museum,Arts & Crafts Store,Asian Restaurant,Bakery,Bookstore,Boutique,Bubble Tea Shop,Building,...,Sandwich Place,Snack Place,Spanish Restaurant,Speakeasy,Sushi Restaurant,Tapas Restaurant,Thai Restaurant,Theater,Trail,Whisky Bar
BATTERY,21,18,10,2,38,14,24,33,19,13,...,63,11,4,21,1,4,6,10,0,17
NARCOTICS,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
THEFT,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
CRIMINAL DAMAGE,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
KIDNAPPING,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
DECEPTIVE PRACTICE,0,0,0,0,2,0,1,0,1,0,...,1,1,0,0,0,0,0,0,0,1
WEAPONS VIOLATION,1,0,0,0,0,0,0,1,0,1,...,1,0,0,0,0,0,0,0,0,0
CRIMINAL TRESPASS,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
ASSAULT,0,0,1,0,0,0,0,0,0,0,...,1,0,1,0,0,1,0,1,0,0
OTHER OFFENSE,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


##  D. Results <a name="results"></a>

To be added in week 5

##  E. Discussion <a name="discussion"></a>

To be added in week 5

## F. Conclusion <a name="conclusion"></a>

To be added in week 5