# Crime in Silver Spring, MD and Temperature

The below script explores the relationship between the incidence of crime in Silver Spring, Maryland from 2017-2019 and the temperature of the weather. 
We used Pandas, Matplotlib, Linregress to accomplish our task.

[Logic behind our topic] We believe there might be a correlation between the temperature and the incidence of crime. During the winter when it's hardly feasible to be outside, it seems logical that there will be less incidence of crime compared to the summer, spring or fall.

In [None]:
# Dependencies
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import linregress
import datetime
import warnings
warnings.filterwarnings('ignore')
import scipy.stats as stats

## Data Retrieval

Crime Data: We retrieved the raw data of crime statistic in Maryland from dataMontgomery (https://data.montgomerycountymd.gov/Public-Safety/Crime/icn6-v9z3) in the csv format. 

Climate Data: ____________________________________ [URL HERE] 

Limitation with API: _______________

We ran into the problem of not being able to get the temperature data for data point after our 2nd project meeting, and decided to get the average temperature of each month. ---------EXPLAIN THE VALIDITY OF USING CLIMATE DATA---------------

Because of this limitation we also have to limit our analysis to one city as opposed to using every city with the available data in our csv file. We selected our data to be of a city with the highest number of crimes, so we have a good amount of data.

In [None]:
# Crime Data
# Save file path to variable
crimeMD_path = "Crime_MD.csv"

# Read with Pandas
crime_df = pd.read_csv(crimeMD_path, low_memory=False)
crime_df

In [None]:
# Climate Data
# Save file path to variable
climateMD_path = "silver_spring_climate.csv"

# Read with Pandas
climate_df = pd.read_csv(climateMD_path, low_memory=False)
climate_df

In [None]:
# Formatting the month for climate data for merge later
climate_df.columns = ['MonthName','Temp']
df = pd.to_datetime(climate_df['MonthName'], format='%B').dt.month
climate_df['Month'] = df
climate_df

In [None]:
# Pulling out only the columns of interest
main_crime_df = crime_df[['Crime Name1', 'Crime Name2', 'Crime Name3', 'City', 'Start_Date_Time']]
main_crime_df

In [None]:
# Remove data with entry not considered as 'crime' and changing columns name 
main_crime_df = main_crime_df.loc[main_crime_df['Crime Name1'] != 'Not a Crime']
clean_crime_df = main_crime_df.rename(columns={'Crime Name1': 'Crime Main Category',
                                             'Crime Name2': 'Crime Sub Category',
                                             'Start_Date_Time': 'Date & Time of Crime'})

# Changed the string format of the Date & Time to be in the date/time format
clean_crime_df['Date & Time of Crime']= pd.to_datetime(clean_crime_df['Date & Time of Crime'])

In [None]:
# City Selection: Explore what city has the highest number of crime - Silver Spring!
clean_crime_df['City'].value_counts()

In [None]:
# Create a crime dataframe for Silver Spring
silverspring_crime_df = clean_crime_df.loc[clean_crime_df['City'] == 'SILVER SPRING']
silverspring_crime_df

In [None]:
# Figure out the timeline of our raw data
data_first_date = silverspring_crime_df['Date & Time of Crime'].min()
data_last_date = silverspring_crime_df['Date & Time of Crime'].max()

print(data_first_date)
print(data_last_date)

# Throw away 2016 and 2020 in order to have a data of a full year
silverspring_clean_df = silverspring_crime_df.loc[(silverspring_crime_df['Date & Time of Crime'] > '2016-12-31 23:59:59') & (silverspring_crime_df['Date & Time of Crime'] < '2020-01-01 00:00:00')]
silverspring_clean_df

In [None]:
# Extract just the year and month from the Data/Time column
silverspring_clean_df['Year'] = pd.DatetimeIndex(silverspring_clean_df['Date & Time of Crime']).year
silverspring_clean_df['Month'] = pd.DatetimeIndex(silverspring_clean_df['Date & Time of Crime']).month

silverspring_clean_df.head(3)

In [None]:
# Explore the types of crimes
silverspring_clean_df['Crime Main Category'].value_counts()

Overall Crime in Silver Spring
-------------------------------------------

In [None]:
# Paul's code starts here 

In [None]:
# Merge the cleaned crime data with the climate data
combined_df = pd.merge(silverspring_clean_df, climate_df, on='Month')
combined_df

In [None]:
# Group the data by month
group_monthly = combined_df.groupby(['Year','Month'])
monthly = group_monthly['Crime Main Category'].count()
temp = group_monthly['Temp'].first()
monthly_temp_df = pd.DataFrame({'Crime' : monthly, 'Temperature' : temp})


In [None]:
# Create 4 bins to place average number of crime
bins = [40, 50, 70, 80, 90]

# Create labels for the bins
group_bins = ['Cold', 'Cool', 'Warm', 'Hot']

# Slice the data and place it into bins
bin_slice = pd.cut(monthly_temp_df['Temperature'], bins, labels=group_bins)

# Create a new column for the bins
monthly_temp_df['Temperature Category'] = bin_slice


In [None]:
monthly_temp_df.boxplot('Crime', by='Temperature Category', figsize=(20,10))

In [None]:
# Divide the month data into dataframes of their categories
cold_df = monthly_temp_df.query('Month == 1 | Month == 2 | Month == 12')
cool_df = monthly_temp_df.query('Month == 3 | Month == 4 | Month == 11')
warm_df = monthly_temp_df.query('Month == 5 | Month == 9 | Month == 10')
hottest_df = monthly_temp_df.query('Month == 6 | Month == 7 | Month == 8')

Null Hypothesis:  Climate average temperature does not affect crime level in the city of Silver Spring, Maryland, in any way.

In [None]:
# Perform the ANOVA
stats.f_oneway(cold_df['Crime'], cool_df['Crime'], warm_df['Crime'], hottest_df['Crime'])

We cannot reject the null hypothesis.

Therefore based on the data we had in this overall group, we can not say that crime was influenced by the climate in Silver Spring, Maryland

In [None]:
# Paul's code ends here

## Crime Against Property/Other

In [None]:
# Dan's code starts here - Property/Other

In [None]:
# Separate Crime Against Property from other crimes
sspring_property = silverspring_clean_df.loc[(silverspring_clean_df['Crime Main Category'] == 'Crime Against Property')]
sspring_property

In [None]:
# Separate Property Crimes by Year
propcrime_2017 = sspring_property.loc[(sspring_property['Year'] == 2017)]
propcrime_2018 = sspring_property.loc[(sspring_property['Year'] == 2018)]
propcrime_2019 = sspring_property.loc[(sspring_property['Year'] == 2019)]
propcrime_2017

In [None]:
pc2017_data = propcrime_2017['Month'].value_counts()
pc2017_data = pd.DataFrame(pc2017_data)
pc2017_data = pc2017_data.sort_index()

In [None]:
pc2017graph, = plt.plot(pc2017_data, color="blue", label="2017 Property Crime Data" )

In [None]:
pc2018_data = propcrime_2018['Month'].value_counts()
pc2018_data = pd.DataFrame(pc2018_data)
pc2018_data = pc2018_data.sort_index()

In [None]:
pc2018graph, = plt.plot(pc2018_data, color="blue", label="2018 Property Crime Data" )

In [None]:
pc2019_data = propcrime_2019['Month'].value_counts()
pc2019_data = pd.DataFrame(pc2019_data)
pc2019_data = pc2019_data.sort_index()

In [None]:
pc2019graph, = plt.plot(pc2019_data, color="blue", label="2019 Property Crime Data" )

In [None]:
md_temps = climate_df['Temp']
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']


In [None]:
pc2019_compiled_data = pc2019_data.rename(columns={"Month": "2019 Property Crime Count"})
pc2018_compiled_data = pc2018_data.rename(columns={"Month": "2018 Property Crime Count"})
pc2017_compiled_data = pc2017_data.rename(columns={"Month": "2017 Property Crime Count"})

pc2019_compiled_data.reset_index(drop=True, inplace=True)
pc2018_compiled_data.reset_index(drop=True, inplace=True)
pc2017_compiled_data.reset_index(drop=True, inplace=True)

pc2019_compiled_data['Temp'] = md_temps
pc2018_compiled_data['Temp'] = md_temps
pc2017_compiled_data['Temp'] = md_temps

pc2019_compiled_data['Month'] = months
pc2018_compiled_data['Month'] = months
pc2017_compiled_data['Month'] = months

pc2019_compiled_data['Year'] = 2019
pc2018_compiled_data['Year'] = 2018
pc2017_compiled_data['Year'] = 2017

pcMerged_data = pd.merge(pc2017_compiled_data, pc2018_compiled_data, on='Month', how='inner')
pcMerged_data = pd.merge(pcMerged_data, pc2019_compiled_data, on='Month', how='inner')

pcMerged_data.drop(columns = ['Temp_x', 'Temp_y'], inplace=True)
pcMerged_data


In [None]:
# Create a New Dataframe for number of crimes in each month, indexed by year & month
propMonthYeargroup = sspring_property.groupby(['Year','Month'])
prop_crime_count_mmyy = propMonthYeargroup['Month'].count()
prop_crime_count_mmyy_data = pd.DataFrame({'Number of Crime Incidence':prop_crime_count_mmyy})

# Reset the index to allow further data anaylysis
prop_crime_count_mmyy_data = prop_crime_count_mmyy_data.reset_index(drop=False)
prop_crime_count_mmyy_data

In [None]:
avgcrimepropgroup = prop_crime_count_mmyy_data.groupby('Month')
avgcrime_prop = avgcrimepropgroup['Number of Crime Incidence'].mean()

propcrime_avg_ct_df = pd.DataFrame({'Average Number of Crimes (Property)':avgcrime_prop})
propcrime_avg_ct_df = propcrime_avg_ct_df.reset_index(drop=False)
propcrime_avg_ct_df.drop(columns= 'Month')
propcrime_avg_ct_df['Month'] = months
propcrime_avg_ct_df['Temp'] = climate_df['Temp']
propcrime_avg_ct_df

In [None]:
# Perform regression to look at average of crime and temperature
x_avgprop = propcrime_avg_ct_df['Temp']
y_avgprop = propcrime_avg_ct_df['Average Number of Crimes (Property)'].astype(int)

plt.scatter(x_avgprop, y_avgprop, marker='+')

# Labels
plt.xlabel('Temperature (F)')
plt.ylabel('Average Number of Crimes')
plt.title('Average Number of Crime Agaist Property VS Temperature (F)', fontsize=12, fontweight='bold')
# plt.legend(loc="best")

# Add the linear regression equation and line to plot
(slope, intercept, rvalue, pvalue, stderr) = linregress(x_avgprop, y_avgprop)
regress_values = x_avgprop * slope + intercept
line_eq = "y = " + str(round(slope,2)) + "x + " + str(round(intercept,2))
plt.plot(x_avgprop,regress_values,"r-")
plt.annotate(line_eq,(60,600),fontsize=15,color="red")
print(f"The r value is: {rvalue}")
print(f"The r-squared is: {rvalue**2}")

plt.show()

In [None]:
# Dan's code ends here

## Crime Against Person

In [None]:
# Cynthia's code starts here - Person

Sub-Hypothesis: We believe that as temperature decreases to a certain point during the winter, the crime against person will likely decreases. This is because people are less likely to come out during the winter time when it gets too cold, making crime occur less.


In [None]:
# Create Crime Against Person df
person_crime_df = silverspring_clean_df.loc[(silverspring_clean_df['Crime Main Category'] == 'Crime Against Person'), :]
person_crime_df

In [None]:
# Explore the types of crimes against person
person_crime_df['Crime Sub Category'].value_counts()

In [None]:
# Using Groupby to get the # of crime incidence for each month
personmonthgroup = person_crime_df.groupby('Month')
personcrime_count_month = personmonthgroup['Month'].count()
personcrime_count_month

# Turn count of crime into a dataframe
personcrime_df = pd.DataFrame({'Number of Crime Incidence':personcrime_count_month})

# Drop the index to get month column, add month name 
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
personcrime_index_df = personcrime_df.reset_index(drop=True)
personcrime_index_df['Month'] = months
personcrime_index_df = personcrime_index_df[['Month','Number of Crime Incidence']]
personcrime_index_df

In [None]:
# Visualize the incidence of Crime Against Person for each month from 2017-2020
x_all = personcrime_index_df['Month']
y_all = personcrime_index_df['Number of Crime Incidence']

plt.plot(x_all, y_all, marker='o')
plt.xticks(rotation=45)

plt.xlabel('Month')
plt.ylabel('Number of Crime Incidence')
plt.title('Numbers of Crimes Against Person for each Month from 2017-2020')

plt.show()

In [None]:
# Create different df for different year (2017-2019)

person_crime_2017df = person_crime_df.loc[person_crime_df['Year'] == 2017]
person_crime_2018df = person_crime_df.loc[person_crime_df['Year'] == 2018]
person_crime_2019df = person_crime_df.loc[person_crime_df['Year'] == 2019]

In [None]:
# Number of crime in each month for 2017

personmonth2017 = person_crime_2017df.groupby('Month')
personcrime_ct_2017 = personmonth2017['Month'].count()

# Turn count of crime into a dataframe
personcrime_ct_2017df = pd.DataFrame({'Number of Crime Incidence':personcrime_ct_2017})

# Drop the index to get month column, add month name 
personcrime_ct_2017df = personcrime_ct_2017df.reset_index(drop=True)
personcrime_ct_2017df['Month'] = months
personcrime_ct_2017df = personcrime_ct_2017df[['Month','Number of Crime Incidence']]
personcrime_ct_2017df

In [None]:
# Visualize the incidence of Crime Against Person for each month from 2017
x_2017 = personcrime_ct_2017df['Month']
y_2017 = personcrime_ct_2017df['Number of Crime Incidence']

handle17, = plt.plot(x_2017, y_2017, marker='x')
plt.xticks(rotation=45)

plt.xlabel('Month')
plt.ylabel('Number of Crime Incidence')
plt.title('Numbers of Crimes Against Person for each Month in 2017')

plt.show()

In [None]:
# Number of crime in each month for 2018

personmonth2018 = person_crime_2018df.groupby('Month')
personcrime_ct_2018 = personmonth2018['Month'].count()

# Turn count of crime into a dataframe
personcrime_ct_2018df = pd.DataFrame({'Number of Crime Incidence':personcrime_ct_2018})

# Drop the index to get month column, add month name 
personcrime_ct_2018df = personcrime_ct_2018df.reset_index(drop=True)
personcrime_ct_2018df['Month'] = months
personcrime_ct_2018df = personcrime_ct_2018df[['Month','Number of Crime Incidence']]
personcrime_ct_2018df

In [None]:
# Visualize the incidence of Crime Against Person for each month from 2018
x_2018 = personcrime_ct_2018df['Month']
y_2018 = personcrime_ct_2018df['Number of Crime Incidence']

handle18, = plt.plot(x_2018, y_2018, marker='x')
plt.xticks(rotation=45)

plt.xlabel('Month')
plt.ylabel('Number of Crime Incidence')
plt.title('Numbers of Crimes Against Person for each Month in 2018')

plt.show()

In [None]:
# Number of crime in each month for 2019

personmonth2019 = person_crime_2019df.groupby('Month')
personcrime_ct_2019 = personmonth2019['Month'].count()

# Turn count of crime into a dataframe
personcrime_ct_2019df = pd.DataFrame({'Number of Crime Incidence':personcrime_ct_2019})

# Drop the index to get month column, add month name 
personcrime_ct_2019df = personcrime_ct_2019df.reset_index(drop=True)
personcrime_ct_2019df['Month'] = months
personcrime_ct_2019df = personcrime_ct_2019df[['Month','Number of Crime Incidence']]
personcrime_ct_2019df

In [None]:
# Visualize the incidence of Crime Against Person for each month from 2019
x_2019 = personcrime_ct_2019df['Month']
y_2019 = personcrime_ct_2019df['Number of Crime Incidence']

handle19, = plt.plot(x_2019, y_2019, marker='x')
plt.xticks(rotation=45)

plt.xlabel('Month')
plt.ylabel('Number of Crime Incidence')
plt.title('Numbers of Crimes Against Person for each Month in 2019')

plt.show()

In [None]:
# Visualize the incidence of Crime Agasint person for all three years
plt.figure(figsize=(12,4))
# Graph from 2017
handle17, = plt.plot(x_2017, y_2017, marker='x', label='2017')
# Graph from 2018
handle18, = plt.plot(x_2018, y_2018, marker='x', label='2018')
# Graph from 2019
handle19, = plt.plot(x_2019, y_2019, marker='x', label='2019')


plt.xticks(rotation=45)

plt.xlabel('Month')
plt.ylabel('Number of Crime Incidence')
plt.title('Numbers of Crimes Against Person for each Month in during 2017-2019', fontsize=12, fontweight='bold')

plt.legend(loc="best")
plt.show()

Observation: Overall, there seems to be lower indcidence of crime agaist person overall in January-February, and higher crime incidence in September.

In [None]:
# Dataframe of Number of Crime incidence for all the years
personcrime_ct_split_df =  pd.DataFrame({'2017 Count':personcrime_ct_2017,
                                        '2018 Count': personcrime_ct_2018,
                                        '2019 Count': personcrime_ct_2019})
personcrime_ct_split_df

In [None]:
# Adding climate and crime analysis
personcrime_climate_df = pd.merge(personcrime_ct_split_df, climate_df, how='left', on='Month')
# Removing Month number column
del personcrime_climate_df['Month']
# Rename and re-organize the columns in the dataframe
personcrime_climate_df = personcrime_climate_df.rename(columns={
    'MonthName': 'Month', 'Temp': 'Temperature (F)'
})
personcrime_climate_df = personcrime_climate_df[['Month','2017 Count', '2018 Count', '2019 Count', 'Temperature (F)']]
personcrime_climate_df

Statistical Analysis

In [None]:
# Create a New Dataframe for number of crimes in each month, indexed by year & month
personmonthyeargroup = person_crime_df.groupby(['Year','Month'])
personcrime_count_mmyy = personmonthyeargroup['Month'].count()
personcrime_count_mmyy_df = pd.DataFrame({'Number of Crime Incidence':personcrime_count_mmyy})
personcrime_count_mmyy_df

In [None]:
# Reset the index to allow further data anaylysis
personcrime_count_mmyy_index_df = personcrime_count_mmyy_df.reset_index(drop=False)
personcrime_count_mmyy_index_df

In [None]:
# Taking the average of the amount of crime for each month from 2017-2019
avgcrimepersongroup = personcrime_count_mmyy_index_df.groupby('Month')
avgcrime_person = avgcrimepersongroup['Number of Crime Incidence'].mean()

personcrime_avg_ct_df = pd.DataFrame({'Average Number of Crime (Person)':avgcrime_person})
personcrime_avg_ct_df = personcrime_avg_ct_df.reset_index(drop=False)
personcrime_avg_ct_df

In [None]:
# Merge the average of crime df with climate df
avgcrime_person_temp_df = pd.merge(personcrime_avg_ct_df, climate_df, how='inner', on='Month')
avgcrime_person_temp_df

In [None]:
# Removing Month number column
del avgcrime_person_temp_df['Month']

In [None]:
# Rename and re-organize the columns in the dataframe
avgcrime_person_temp_df = avgcrime_person_temp_df.rename(columns={
    'MonthName': 'Month', 'Temp': 'Temperature (F)'
})
avgcrime_person_temp_df = avgcrime_person_temp_df[['Month','Average Number of Crime (Person)', 'Temperature (F)']]

avgcrime_person_temp_df

In [None]:
# Perform regression to look at average of crime and temperature
plt.figure(figsize=(10,4))
x_avgperson = avgcrime_person_temp_df['Temperature (F)']
y_avgperson = avgcrime_person_temp_df['Average Number of Crime (Person)'].astype(int)

plt.scatter(x_avgperson, y_avgperson, marker='+')

# Labels
plt.xlabel('Temperature (F)')
plt.ylabel('Average Number of Crime')
plt.title('Average Number of Crime Agaist Person VS Temperature (F)', fontsize=12, fontweight='bold')
# plt.legend(loc="best")

# Add the linear regression equation and line to plot
(slope, intercept, rvalue, pvalue, stderr) = linregress(x_avgperson, y_avgperson)
regress_values = x_avgperson * slope + intercept
line_eq = "y = " + str(round(slope,2)) + "x + " + str(round(intercept,2))
plt.plot(x_avgperson,regress_values,"r-")
plt.annotate(line_eq,(60,120),fontsize=15,color="red")
print(f"The r value is: {rvalue}")
print(f"The r-squared is: {rvalue**2}")

plt.show()

Our r value indicates that there is a moderate positive correlation between the temperature and the average number of crime agaisnt person. Our r squared is very low and suggests that the predictive power of our regression is not very reliable.

In [None]:
## Further analysis on the temperature and the incidence of crime
# Sort the df by temperature to get some insight
avgcrime_person_temp_df.sort_values('Temperature (F)', ascending=False)

In [None]:
# Create 4 bins to place average number of crime
bins = [40, 50, 67, 80, 90]

# Create labels for the bins
group_bins = ['Cold', 'Cool', 'Warm', 'Hot']

# Slice the data and place it into bins
bin_slice = pd.cut(avgcrime_person_temp_df['Temperature (F)'], bins, labels=group_bins)

# Create a new column where the data shows the bins they belong
avgcrime_person_temp_df['Temperature Category'] = bin_slice
avgcrime_person_temp_df

In [None]:
# Group the data according to the temperature category
temp_group = avgcrime_person_temp_df.groupby('Temperature Category')

# Get the average # of crime for each temperature
tempgroup_avg_df = temp_group[['Average Number of Crime (Person)']].mean()

# Formating the average number of crime to be whole number
tempgroup_avg_df.astype(int)

The incidence of crime against person seems to be lower when the tempearture is colder, and highest when the temperature is warm.

Next, we perform hypothesis testing to see if any of these temperature category is statistically significant...

# Hypothesis attempt #1 - NOT STATISTICALLY SIGNIFICANT

In [None]:
## Hypothesis Test
# Setting up each group using the data from 2017, 2018, 2019
cold_dec = personcrime_count_mmyy_index_df[personcrime_count_mmyy_index_df['Month'] == 12]['Number of Crime Incidence']
cold_jan = personcrime_count_mmyy_index_df[personcrime_count_mmyy_index_df['Month'] == 1]['Number of Crime Incidence']
cold_feb = personcrime_count_mmyy_index_df[personcrime_count_mmyy_index_df['Month'] == 2]['Number of Crime Incidence']

cool_nov = personcrime_count_mmyy_index_df[personcrime_count_mmyy_index_df['Month'] == 11]['Number of Crime Incidence']
cool_mar = personcrime_count_mmyy_index_df[personcrime_count_mmyy_index_df['Month'] == 3]['Number of Crime Incidence']
cool_apr = personcrime_count_mmyy_index_df[personcrime_count_mmyy_index_df['Month'] == 4]['Number of Crime Incidence']

warm_may = personcrime_count_mmyy_index_df[personcrime_count_mmyy_index_df['Month'] == 5]['Number of Crime Incidence']
warm_sep = personcrime_count_mmyy_index_df[personcrime_count_mmyy_index_df['Month'] == 9]['Number of Crime Incidence']
warm_oct = personcrime_count_mmyy_index_df[personcrime_count_mmyy_index_df['Month'] == 10]['Number of Crime Incidence']


hot_jun = personcrime_count_mmyy_index_df[personcrime_count_mmyy_index_df['Month'] == 6]['Number of Crime Incidence']
hot_jul = personcrime_count_mmyy_index_df[personcrime_count_mmyy_index_df['Month'] == 7]['Number of Crime Incidence']
hot_aug = personcrime_count_mmyy_index_df[personcrime_count_mmyy_index_df['Month'] == 8]['Number of Crime Incidence']

In [None]:
# Perform the ANOVA
stats.f_oneway(cold_dec, cold_jan, cold_feb, cool_nov, cool_mar, cool_apr, warm_may, warm_sep, warm_oct, hot_jun, hot_jul, hot_aug)

Dom's comment:
I was saying you'd have four groups (hot, cold, warm, cool) and that the list for each group would have nine values in it.
The way you did it is also fine, but it's using months instead of the hot, cold, warm, and cool categories. (Even though you're appending those names to the variables, they don't really play a role in your current analysis.)

Statistical Conclusion: The p-value is > 0.05 suggesting that the difference in number of crime incidence against person for each temperature category is not statistically significant.

# Hypothesis attempt #2 -- STATISTICALLY SIGNIFICANT

In [None]:
## Hypothesis Test
# Setting up each temparture category group using the data from 2017, 2018 & 2019
cold = personcrime_count_mmyy_index_df[((personcrime_count_mmyy_index_df['Year'] == 2017) & (personcrime_count_mmyy_index_df['Month'] == 12)) | 
                                       ((personcrime_count_mmyy_index_df['Year'] == 2018) & (personcrime_count_mmyy_index_df['Month'] == 12)) |
                                       ((personcrime_count_mmyy_index_df['Year'] == 2019) & (personcrime_count_mmyy_index_df['Month'] == 12)) |
                                       ((personcrime_count_mmyy_index_df['Year'] == 2017) & (personcrime_count_mmyy_index_df['Month'] == 1)) | 
                                       ((personcrime_count_mmyy_index_df['Year'] == 2018) & (personcrime_count_mmyy_index_df['Month'] == 1)) |
                                       ((personcrime_count_mmyy_index_df['Year'] == 2019) & (personcrime_count_mmyy_index_df['Month'] == 1)) |
                                       ((personcrime_count_mmyy_index_df['Year'] == 2017) & (personcrime_count_mmyy_index_df['Month'] == 2)) | 
                                       ((personcrime_count_mmyy_index_df['Year'] == 2018) & (personcrime_count_mmyy_index_df['Month'] == 2)) |
                                       ((personcrime_count_mmyy_index_df['Year'] == 2019) & (personcrime_count_mmyy_index_df['Month'] == 2))]['Number of Crime Incidence']
                                        
cool = personcrime_count_mmyy_index_df[((personcrime_count_mmyy_index_df['Year'] == 2017) & (personcrime_count_mmyy_index_df['Month'] == 11)) | 
                                       ((personcrime_count_mmyy_index_df['Year'] == 2018) & (personcrime_count_mmyy_index_df['Month'] == 11)) |
                                       ((personcrime_count_mmyy_index_df['Year'] == 2019) & (personcrime_count_mmyy_index_df['Month'] == 11)) |
                                       ((personcrime_count_mmyy_index_df['Year'] == 2017) & (personcrime_count_mmyy_index_df['Month'] == 3)) | 
                                       ((personcrime_count_mmyy_index_df['Year'] == 2018) & (personcrime_count_mmyy_index_df['Month'] == 3)) |
                                       ((personcrime_count_mmyy_index_df['Year'] == 2019) & (personcrime_count_mmyy_index_df['Month'] == 3)) |
                                       ((personcrime_count_mmyy_index_df['Year'] == 2017) & (personcrime_count_mmyy_index_df['Month'] == 4)) | 
                                       ((personcrime_count_mmyy_index_df['Year'] == 2018) & (personcrime_count_mmyy_index_df['Month'] == 4)) |
                                       ((personcrime_count_mmyy_index_df['Year'] == 2019) & (personcrime_count_mmyy_index_df['Month'] == 4))]['Number of Crime Incidence']                                   

warm = personcrime_count_mmyy_index_df[((personcrime_count_mmyy_index_df['Year'] == 2017) & (personcrime_count_mmyy_index_df['Month'] == 5)) | 
                                       ((personcrime_count_mmyy_index_df['Year'] == 2018) & (personcrime_count_mmyy_index_df['Month'] == 5)) |
                                       ((personcrime_count_mmyy_index_df['Year'] == 2019) & (personcrime_count_mmyy_index_df['Month'] == 5)) |
                                       ((personcrime_count_mmyy_index_df['Year'] == 2017) & (personcrime_count_mmyy_index_df['Month'] == 9)) | 
                                       ((personcrime_count_mmyy_index_df['Year'] == 2018) & (personcrime_count_mmyy_index_df['Month'] == 9)) |
                                       ((personcrime_count_mmyy_index_df['Year'] == 2019) & (personcrime_count_mmyy_index_df['Month'] == 9)) |
                                       ((personcrime_count_mmyy_index_df['Year'] == 2017) & (personcrime_count_mmyy_index_df['Month'] == 10)) | 
                                       ((personcrime_count_mmyy_index_df['Year'] == 2018) & (personcrime_count_mmyy_index_df['Month'] == 10)) |
                                       ((personcrime_count_mmyy_index_df['Year'] == 2019) & (personcrime_count_mmyy_index_df['Month'] == 10))]['Number of Crime Incidence']  

hot = personcrime_count_mmyy_index_df[((personcrime_count_mmyy_index_df['Year'] == 2017) & (personcrime_count_mmyy_index_df['Month'] == 6)) | 
                                       ((personcrime_count_mmyy_index_df['Year'] == 2018) & (personcrime_count_mmyy_index_df['Month'] == 6)) |
                                       ((personcrime_count_mmyy_index_df['Year'] == 2019) & (personcrime_count_mmyy_index_df['Month'] == 6)) |
                                       ((personcrime_count_mmyy_index_df['Year'] == 2017) & (personcrime_count_mmyy_index_df['Month'] == 7)) | 
                                       ((personcrime_count_mmyy_index_df['Year'] == 2018) & (personcrime_count_mmyy_index_df['Month'] == 7)) |
                                       ((personcrime_count_mmyy_index_df['Year'] == 2019) & (personcrime_count_mmyy_index_df['Month'] == 7)) |
                                       ((personcrime_count_mmyy_index_df['Year'] == 2017) & (personcrime_count_mmyy_index_df['Month'] == 8)) | 
                                       ((personcrime_count_mmyy_index_df['Year'] == 2018) & (personcrime_count_mmyy_index_df['Month'] == 8)) |
                                       ((personcrime_count_mmyy_index_df['Year'] == 2019) & (personcrime_count_mmyy_index_df['Month'] == 8))]['Number of Crime Incidence']

In [None]:
# Perform the ANOVA test
stats.f_oneway(cold, cool, warm, hot)

Statistical Conclusion: The p-value is < 0.05 suggesting that the difference in the number of crime against person for each temperature category is statistically significant. The weather temperature seems to be a factor in the number of crime against person in Silver Spring, Maryland.

We acknowledge that the limitation of not having the real temperature data for Silver Spring, MD during 2017-2019 essentially negates the validity of our statistically significant result. However, if we are to perform the same analysis using the real tempearture data, we would expect our statistically significant results to be more valid.

In [None]:
# Cynthia's code ends here

## Crime Against Society

In [None]:
# Rose's code starts here - Society

In [None]:
#Society crimes df
scrime_df = silverspring_clean_df.loc[(silverspring_clean_df['Crime Main Category'] == 'Crime Against Society'), :]
scrime_df

In [None]:
#crime per month
society_month = scrime_df.groupby("Month")
society_month_count = society_month["Month"].count()
society_month_count

#df
society_df = pd.DataFrame({"Number of Crime Incidence":society_month_count})

months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]
scrime_index_df = society_df.reset_index(drop=True)
scrime_index_df["Month"] = months
scrime_index_df = scrime_index_df[["Month","Number of Crime Incidence"]]
scrime_index_df

In [None]:
#graph
x_all = scrime_index_df["Month"]
y_all = scrime_index_df["Number of Crime Incidence"]


plt.plot(x_all, y_all, marker='o')
plt.xticks(rotation=45)

plt.xlabel("Month")
plt.ylabel("Number of Crime Incidence")
plt.title("Crimes Against Society 2017-2020")

plt.show()

In [None]:
#df for each year
scrime_2017df = scrime_df.loc[scrime_df["Year"] == 2017]
scrime_2018df = scrime_df.loc[scrime_df["Year"] == 2018]
scrime_2019df = scrime_df.loc[scrime_df["Year"] == 2019]

In [None]:
#crimes per month 2017
scrimemonth2017 = scrime_2017df.groupby("Month")
scrime2017count = scrimemonth2017["Month"].count()

#2017 df
society_2017df = pd.DataFrame({"Number of Crime Incidence":scrime2017count})

#index
society_2017df = society_2017df.reset_index(drop=True)
society_2017df["Month"] = months
society_2017df = society_2017df[["Month", "Number of Crime Incidence"]]
society_2017df

In [None]:
#graph
x_2017 = society_2017df["Month"]
y_2017 = society_2017df["Number of Crime Incidence"]

handle17, = plt.plot(x_2017, y_2017, marker="x")
plt.xticks(rotation=45)

plt.xlabel("Month")
plt.ylabel("Number of Crime Incidence")
plt.title("Crimes Against Society 2017")

plt.show()

In [None]:
#crimes per month 2018
scrimemonth2018 = scrime_2018df.groupby("Month")
scrime2018count = scrimemonth2018["Month"].count()

#2018 df
society_2018df = pd.DataFrame({"Number of Crime Incidence":scrime2018count})

#index
society_2018df = society_2018df.reset_index(drop=True)
society_2018df["Month"] = months
society_2018df = society_2018df[["Month", "Number of Crime Incidence"]]
society_2018df

In [None]:
#graph
x_2018 = society_2018df["Month"]
y_2018 = society_2018df["Number of Crime Incidence"]

handle18, = plt.plot(x_2018, y_2018, marker="x")
plt.xticks(rotation=45)

plt.xlabel("Month")
plt.ylabel("Number of Crime Incidence")
plt.title("Crimes Against Society 2018")

plt.show()

In [None]:
#crimes per month 2019
scrimemonth2019 = scrime_2019df.groupby("Month")
scrime2019count = scrimemonth2019["Month"].count()

#2018 df
society_2019df = pd.DataFrame({"Number of Crime Incidence":scrime2019count})

#index
society_2019df = society_2019df.reset_index(drop=True)
society_2019df["Month"] = months
society_2019df = society_2019df[["Month", "Number of Crime Incidence"]]
society_2019df

In [None]:
#graph
x_2019 = society_2019df["Month"]
y_2019 = society_2019df["Number of Crime Incidence"]

handle19, = plt.plot(x_2019, y_2019, marker="x")
plt.xticks(rotation=45)

plt.xlabel("Month")
plt.ylabel("Number of Crime Incidence")
plt.title("Crimes Against Society 2019")
plt.show()

In [None]:
#show all 3 figures
handle17, = plt.plot(x_2017, y_2017, label="2017")
handle18, = plt.plot(x_2018, y_2018, label="2017")
handle19, = plt.plot(x_2019, y_2019, label="2017")

plt.xticks(rotation=45)

plt.xlabel("Month")
plt.ylabel("Number of Crimes")
plt.title("Numbers of Crimes Against Society 2017-2019", fontsize=12, fontweight="bold")

plt.legend(loc="best")
plt.show()

In [None]:
societycrime_split_df = pd.DataFrame({"2017 Crimes": scrime2017count,
                                      "2018 Crimes": scrime2018count,
                                      "2019 Crimes": scrime2019count})
societycrime_split_df

In [None]:
#merge crime and climate
scrime_climate_df = pd.merge(societycrime_split_df, climate_df, how="left", on="Month")
del scrime_climate_df["Month"]
scrime_climate_df = scrime_climate_df.rename(columns={
    "MonthName": "Month", "Temp": "Temperature (F)"
})

scrime_climate_df = scrime_climate_df[["Month", "2017 Crimes", "2018 Crimes", "2019 Crimes", "Temperature (F)"]]

scrime_climate_df

In [None]:
societygroup = scrime_df.groupby(["Year","Month"])
society_month_year = societygroup["Month"].count()
s_month_year_df = pd.DataFrame({"Number of Crime Incidence":society_month_year})

s_month_year_df = s_month_year_df.reset_index(drop=False)
s_month_year_df

In [None]:
#average for each month
scrimeavg = s_month_year_df.groupby("Month")
societycrimeavg = scrimeavg["Number of Crime Incidence"].mean()

scrime_avg_df = pd.DataFrame({"Average Number of Crime (Society)": societycrimeavg})
scrime_avg_df = scrime_avg_df.reset_index(drop=False)
scrime_avg_df

In [None]:
#merge with climate
savg_crime_climate = pd.merge(scrime_avg_df, climate_df, how="inner", on="Month")
savg_crime_climate

In [None]:
del savg_crime_climate["Month"]

In [None]:
# Rename and re-organize the columns in the dataframe
savg_crime_climate = savg_crime_climate.rename(columns={
    'MonthName': 'Month', 'Temp': 'Temperature (F)'
})
savg_crime_climate = savg_crime_climate[['Month','Average Number of Crime (Society)', 'Temperature (F)']]

savg_crime_climate

In [None]:
# Regression
x_avgsociety = savg_crime_climate['Temperature (F)']
y_avgsociety = savg_crime_climate['Average Number of Crime (Society)'].astype(int)

plt.scatter(x_avgsociety, y_avgsociety, marker='+')

# Labels
plt.xlabel('Temperature (F)')
plt.ylabel('Average Number of Crime')
plt.title('Average Number of Crime Agaist Society VS Temperature (F)', fontsize=12, fontweight='bold')
# plt.legend(loc="best")

# Add the linear regression equation and line to plot
(slope, intercept, rvalue, pvalue, stderr) = linregress(x_avgsociety, y_avgsociety)
regress_values = x_avgsociety * slope + intercept
line_eq = "y = " + str(round(slope,2)) + "x + " + str(round(intercept,2))
plt.plot(x_avgsociety,regress_values,"r-")
plt.annotate(line_eq,(60,500),fontsize=15,color="red")
print(f"The r value is: {rvalue}")
print(f"The r-squared is: {rvalue**2}")

plt.show()

In [None]:
# Sort the df by temperature to get some insight
savg_crime_climate.sort_values('Temperature (F)', ascending=False)

# Create 4 bins to place average number of crime
bins = [40, 50, 67, 80, 90]

# Create labels for the bins
group_bins = ['Cold', 'Cool', 'Warm', 'Hot']

# Slice the data and place it into bins
bin_slice = pd.cut(savg_crime_climate['Temperature (F)'], bins, labels=group_bins)

# Create a new column where the data shows the bins they belong
savg_crime_climate['Temperature Category'] = bin_slice
savg_crime_climate

In [None]:
society_temp = savg_crime_climate.groupby("Temperature Category")

society_avg_temp = society_temp[["Average Number of Crime (Society)"]].mean()

society_avg_temp

In [None]:
# Rose's code ends here

In [None]:
# Paul's code starts here - API!!!

In [None]:
# Paul's code ends here