<h1>Crime and Weather<h1>
<h3>Lauren Paredes<h3>



In [1]:
import pandas as pd
import numpy as np
import pickle
import matplotlib.pyplot as plt

Read in both crime and weather into data frames using pandas.

In [2]:
crimeDF= pd.read_csv("crime.csv", encoding='windows-1254')
weatherDF = pd.read_csv("weather_data_long.csv")


Clean up and trim down the dataframes

In [3]:
del weatherDF['Time']
del weatherDF["WinSpeed"]
del weatherDF["Pressure"]
del weatherDF["Humidity"]
del weatherDF["Wind"]
del weatherDF["DewPoint"]
del weatherDF["WindGust"]
del weatherDF["Precip."]
del weatherDF["Condition"]

In [4]:
# need the occurance date to be in the format of datetime
crimeDF['Date'] = pd.to_datetime(crimeDF["FIRST_OCCURRENCE_DATE"]).dt.date
# crimeDF.rename(columns={'OCCURRENCE_DATE':'Date'},inplace =True)

In [5]:
del crimeDF["incident_id"]
del crimeDF["offense_id"]
del crimeDF["OFFENSE_CODE"]
del crimeDF["OFFENSE_CODE_EXTENSION"]
del crimeDF["FIRST_OCCURRENCE_DATE"]
del crimeDF["LAST_OCCURRENCE_DATE"]
del crimeDF["REPORTED_DATE"]
del crimeDF["VICTIM_COUNT"]
del crimeDF["GEO_X"]
del crimeDF["GEO_Y"]
del crimeDF["GEO_LON"]
del crimeDF["GEO_LAT"]
del crimeDF["DISTRICT_ID"]
del crimeDF["PRECINCT_ID"]
del crimeDF["NEIGHBORHOOD_ID"]
del crimeDF["IS_CRIME"]
del crimeDF["IS_TRAFFIC"]
del crimeDF["INCIDENT_ADDRESS"]

In [6]:
# Making sure both dataframes have a common col to merge on with the correct datatype
crimeDF['Date']= pd.to_datetime(crimeDF['Date'])
weatherDF['Date']= pd.to_datetime(weatherDF['Date'])

In [7]:
# Set the temperature for the average divided by group
weatherDF= weatherDF.groupby(['Date']).max()

<h3> Merging crime and weather on a common date<h3>

In [8]:
mergeOnDate = pd.merge(crimeDF, weatherDF, how='outer',on='Date')
mergeOnDate=mergeOnDate.dropna(subset=['OFFENSE_TYPE_ID'])
mergeOnDate=mergeOnDate.dropna(subset=['Temperature'])
mergeOnDate['Temperature'] =mergeOnDate['Temperature'].astype('int')
display(mergeOnDate)

Unnamed: 0,OFFENSE_TYPE_ID,OFFENSE_CATEGORY_ID,Date,Temperature
0,criminal-mischief-other,public-disorder,2017-06-25,73
1,criminal-mischief-other,public-disorder,2017-06-25,73
2,criminal-mischief-other,public-disorder,2017-06-25,73
3,criminal-mischief-other,public-disorder,2017-06-25,73
4,criminal-mischief-other,public-disorder,2017-06-25,73
...,...,...,...,...
379047,fraud-by-use-of-computer,white-collar-crime,2019-01-14,43
379048,fraud-by-use-of-computer,white-collar-crime,2019-01-14,43
379049,pawn-broker-viol,all-other-crimes,2019-01-14,43
379050,outside-steal-recovered-veh,all-other-crimes,2019-01-14,43


Assigning crime types in terms of society, person, property. This categorization logic is explained in report.

In [9]:
types ={'auto-theft': 'property',
        'robbery': 'property',
        'arson': 'property',
        'theft-from-motor-vehicle': 'property',
        'burglary': 'property',
        'larceny': 'property',
        'sexual-assault': 'person',
        'drug-alcohol':'society',
        'other-crimes-against-persons': 'person',
        'aggravated-assault': 'person',
        'murder': 'person',
        'white-collar-crime': 'society',
        'public-disorder': 'society',
        'all-other-crimes': 'society'}

In [10]:
mergeOnDate["OFFENSE_CATEGORY_ID"] = mergeOnDate["OFFENSE_CATEGORY_ID"].map(types)
display(mergeOnDate)

Unnamed: 0,OFFENSE_TYPE_ID,OFFENSE_CATEGORY_ID,Date,Temperature
0,criminal-mischief-other,society,2017-06-25,73
1,criminal-mischief-other,society,2017-06-25,73
2,criminal-mischief-other,society,2017-06-25,73
3,criminal-mischief-other,society,2017-06-25,73
4,criminal-mischief-other,society,2017-06-25,73
...,...,...,...,...
379047,fraud-by-use-of-computer,society,2019-01-14,43
379048,fraud-by-use-of-computer,society,2019-01-14,43
379049,pawn-broker-viol,society,2019-01-14,43
379050,outside-steal-recovered-veh,society,2019-01-14,43


Separate into hot, mild, cold dataframes

In [11]:
hotDays = mergeOnDate[mergeOnDate['Temperature'] > 75]
mildDays = mergeOnDate[(mergeOnDate['Temperature'] >= 45) & (mergeOnDate['Temperature'] <= 75)]
coldDays = mergeOnDate[mergeOnDate['Temperature'] < 45]

numofweathercrimes = len(mergeOnDate.index)
print("Total Entry Count: ",numofweathercrimes)
print("Hot days overview")
display(hotDays)
print("Mild days overview")
display(mildDays)
print("cold days overview")
display(coldDays)

Total Entry Count:  228089
Hot days overview


Unnamed: 0,OFFENSE_TYPE_ID,OFFENSE_CATEGORY_ID,Date,Temperature
347,criminal-mischief-other,society,2017-06-27,96
348,criminal-mischief-other,society,2017-06-27,96
349,criminal-mischief-other,society,2017-06-27,96
350,criminal-mischief-other,society,2017-06-27,96
351,criminal-mischief-other,society,2017-06-27,96
...,...,...,...,...
377513,theft-unauth-use-of-ftd,society,2018-07-02,94
377514,fraud-by-use-of-computer,society,2018-07-02,94
377515,fraud-by-use-of-computer,society,2018-07-02,94
377516,fraud-by-use-of-computer,society,2018-07-02,94


Mild days overview


Unnamed: 0,OFFENSE_TYPE_ID,OFFENSE_CATEGORY_ID,Date,Temperature
0,criminal-mischief-other,society,2017-06-25,73
1,criminal-mischief-other,society,2017-06-25,73
2,criminal-mischief-other,society,2017-06-25,73
3,criminal-mischief-other,society,2017-06-25,73
4,criminal-mischief-other,society,2017-06-25,73
...,...,...,...,...
378404,outside-steal-recovered-veh,society,2018-03-27,47
378405,outside-steal-recovered-veh,society,2018-03-27,47
378406,outside-steal-recovered-veh,society,2018-03-27,47
378407,outside-steal-recovered-veh,society,2018-03-27,47


cold days overview


Unnamed: 0,OFFENSE_TYPE_ID,OFFENSE_CATEGORY_ID,Date,Temperature
2034,criminal-mischief-other,society,2017-04-01,40
2035,criminal-mischief-other,society,2017-04-01,40
2036,criminal-mischief-other,society,2017-04-01,40
2037,criminal-mischief-other,society,2017-04-01,40
2038,criminal-mischief-other,society,2017-04-01,40
...,...,...,...,...
379047,fraud-by-use-of-computer,society,2019-01-14,43
379048,fraud-by-use-of-computer,society,2019-01-14,43
379049,pawn-broker-viol,society,2019-01-14,43
379050,outside-steal-recovered-veh,society,2019-01-14,43


<h4>General distributions of crime types on different temperature categories<h4>

In [12]:
# overall merged data counts
print("All Data grouped by categoryID counts")
crimetypes = mergeOnDate.groupby(["OFFENSE_CATEGORY_ID"]).size()
print(crimetypes.head())
print(mergeOnDate.shape)

All Data grouped by categoryID counts
OFFENSE_CATEGORY_ID
person       29619
property    105089
society      93381
dtype: int64
(228089, 4)


In [13]:
# category types on hot days from the hotDays dataframe
print("Hot Data grouped by categoryID counts")
hotTypes=hotDays.groupby(["OFFENSE_CATEGORY_ID"]).size()
print(hotTypes.head())
print(hotDays.shape)


Hot Data grouped by categoryID counts
OFFENSE_CATEGORY_ID
person      12716
property    44579
society     38674
dtype: int64
(95969, 4)


In [14]:
# category types on mild days from the mildDays dataframe
print("Mild Data grouped by categoryID counts")
mildTypes =mildDays.groupby(["OFFENSE_CATEGORY_ID"]).size()
print(mildTypes.head())
print(mildDays.shape)

Mild Data grouped by categoryID counts
OFFENSE_CATEGORY_ID
person      12765
property    45306
society     41450
dtype: int64
(99521, 4)


In [15]:
# category types on cold days from the coldDays dataframe
print("Cold Data grouped by categoryID counts")
coldTypes = coldDays.groupby(["OFFENSE_CATEGORY_ID"]).size()
print(coldTypes.head())
print(coldDays.shape)

Cold Data grouped by categoryID counts
OFFENSE_CATEGORY_ID
person       4138
property    15204
society     13257
dtype: int64
(32599, 4)


In [22]:
pdisHot=(hotTypes/numofweathercrimes)*100
pdisCold=(coldTypes/numofweathercrimes)*100
pdisMild = (mildTypes/numofweathercrimes)*100
print("General Disributions for type and temperature")
print("Hot general Distribution:")
display(pdisHot)
print("Mild general Distribution:")
display(pdisMild)
print("Cold general Distribution:")
display(pdisCold)

General Disributions for type and temperature
Hot general Distribution:


OFFENSE_CATEGORY_ID
person       5.575017
property    19.544564
society     16.955662
dtype: float64

Mild general Distribution:


OFFENSE_CATEGORY_ID
person       5.596500
property    19.863299
society     18.172731
dtype: float64

Cold general Distribution:


OFFENSE_CATEGORY_ID
person      1.814204
property    6.665819
society     5.812205
dtype: float64

<h3>Bayesian Classifications<h3>

P(Category|Temp) = /frac{(P(C and T))}{P(T)}`

P(Category) is represented by a series with Person, Property, and Society crimes
P(Temperature) is represented by a series with Hot, Mild, and Cold days

In [16]:
# def calculateClassPropbs(dataset):
#     numDataPoint = dataset.size
#     classProbs= {}
#     for dataPoint in 

In [17]:
priorCategories = (crimetypes / (numofweathercrimes))
print("Prior Probabilities of Categories")
display(priorCategories)
print("Sum of percentages of Categories: ",priorCategories.sum())



Prior Probabilities of Categories


OFFENSE_CATEGORY_ID
person      0.129857
property    0.460737
society     0.409406
dtype: float64

Sum of percentages of Categories:  1.0


In [18]:
pHot = ((hotDays.shape[0])/numofweathercrimes)
pMild= ((mildDays.shape[0])/numofweathercrimes)
pCold=((coldDays.shape[0])/numofweathercrimes)
d = {'Hot':pHot, 'Mild':pMild,'Cold':pCold}
priorTemperatures= pd.Series(data=d, index=['Hot','Mild','Cold'])
print("Probabilities of Temperatures")
display(priorTemperatures)
print("Sum of percentages of Temperatures",priorTemperatures.sum())

Probabilities of Temperatures


Hot     0.420752
Mild    0.436325
Cold    0.142922
dtype: float64

Sum of percentages of Temperatures 1.0


P(T|C)= P(T and C)* P(C)

In [19]:

pHotTypes=(hotTypes/hotDays.shape[0])*100
pColdTypes=(coldTypes/coldDays.shape[0])*100
pMildTypes = (mildTypes/mildDays.shape[0])*100
print("Given a hot day probability of a type of crime: ")
print("Total sum: ", pHotTypes.sum())
display(pHotTypes)
print("Given a mild day probability of a type of crime: ")
print("Total sum: ", pColdTypes.sum())
display(pColdTypes)
print("Given a cold day probability of a type of crime: ")
print("Total sum: ", pMildTypes.sum())
display(pMildTypes)
# display(hotTypes.sum())




Given a hot day probability of a type of crime: 
Total sum:  100.0


OFFENSE_CATEGORY_ID
person      13.250112
property    46.451458
society     40.298430
dtype: float64

Given a mild day probability of a type of crime: 
Total sum:  100.0


OFFENSE_CATEGORY_ID
person      12.693641
property    46.639467
society     40.666892
dtype: float64

Given a cold day probability of a type of crime: 
Total sum:  100.0


OFFENSE_CATEGORY_ID
person      12.826439
property    45.524060
society     41.649501
dtype: float64