# 👀 At a glance - what can we see and expect? 👀

Let's have a look at historical data to explore:

- each type of crime for all areas in a line graph: is there a **general pattern**?

- each type of crime for a specified area in the last three years in a line graph and a bar graph: can we see any closer, more **seasonal trends**?

Once we establish what we know, we can look at what to expect.

💭 Note 1: I used a line and bar graph because to campare the visualisations and check my interpretation; the bar graph provided different dates on the x axis.
💭 Note 2: if I could use r, this library looks awesome for time series! https://cran.r-project.org/web/packages/TSstudio/vignettes/Plotting_Time_Series.html

In [44]:
#Import the data and dependencies
from sqlalchemy import create_engine
import psycopg2
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
from pandas.plotting import autocorrelation_plot
from statsmodels.tsa.arima.model import ARIMA
import json

#Load into database for Flask API
engine = create_engine(f'postgresql://postgres:postgres@localhost:5432/district_db')
connection = engine.connect()

dbConnect = engine.connect()
df = pd.read_sql('select * from district', dbConnect)
jsonData= json.loads(df.to_json(orient='records'))
dbConnect.close()

df.head()

Unnamed: 0,index,month_and_year,area,murder,assault_on_police_officer,arson,drug_offences,fraud_and_related_offences
0,0,2007-01-01,South West District,0,0,7,92,28
1,1,2007-02-01,South West District,0,11,9,73,25
2,2,2007-03-01,South West District,0,0,11,107,37
3,3,2007-04-01,South West District,1,11,11,126,27
4,4,2007-05-01,South West District,0,3,4,111,6


# 😱 <font color=red>murder</font> 😱

### Murder in all districts from 2007 - 2020

In [45]:
px.line(df, x='month_and_year', y ='murder', color='area')

**Observation:** Low range on y axis; generally stable; 1-2 average; outlier spike.

### Murder in Mirabooka from Jan 2018 - Dec 2020 (3 years)

In [46]:
df_murder_mirrabooka_small = df.loc[df.area=="Mirrabooka District", ["month_and_year", "murder"]].reset_index(drop=True)
df_murder_mirra =  df_murder_mirrabooka_small.tail(36)
px.line(df_murder_mirra, x='month_and_year', y ='murder')

In [47]:
fig = px.bar(df_murder_mirra, x='month_and_year', y='murder')
fig.show()

**Observation:** as above; no seasonality.

# 👮‍ <font color=blue>assault on a police officer</font> 👮‍

### Assaults on police officers in all districts from 2007 - 2020

In [48]:
px.line(df, x='month_and_year', y ='assault_on_police_officer', color='area')

**Observation:** Midland consistently higher than other over the years, north regional also consistently higher; medium range; fairly consistent trend.

### Assualt on police officers in teh Kimberley from Jan 2018 - Dec 2020 (3 years)

In [49]:
df_assault_kimberley_small = df.loc[df.area=="Kimberley District", ["month_and_year", "assault_on_police_officer"]].reset_index(drop=True)
df_assault_kim =  df_assault_kimberley_small.tail(36)
px.line(df_assault_kim, x='month_and_year', y ='assault_on_police_officer')

In [50]:
fig = px.bar(df_assault_kim, x='month_and_year', y='assault_on_police_officer')
fig.show()

**Observation:** Consistent peaks and troffs (but low range); no obvious seasonality

# 🔥 <font color=orange>arson</font> 🔥

### Arson in all districts from 2007 - 2020

In [51]:
px.line(df, x='month_and_year', y ='arson', color='area')

**Observation:** Midland, Mandurah, Joondalup a little higher; consistent time spikes across all areas.

### Arson in Midland from Jan 2018 - Dec 2020 (3 years)

In [52]:
df_arson_midland_small = df.loc[df.area=="Midland District", ["month_and_year", "arson"]].reset_index(drop=True)
df_arson_mid =  df_arson_midland_small.tail(36)
px.line(df_arson_mid, x='month_and_year', y ='arson')

In [53]:
fig = px.bar(df_arson_mid, x='month_and_year', y='arson')
fig.show()

**Observation:** Seasonal: drops in winter (June, July, September); peaks in summer (December, February).

# 💉  <font color=grey>drug offences</font> 💉 

### Drug offences in all districts from 2007 - 2020

In [54]:
px.line(df, x='month_and_year', y ='drug_offences', color='area')

**Observation:** Slight upward trend generally; larger differenciation between areas than other crimes.

### Drug offences in Joondalup from Jan 2018 - Dec 2020 (3 years)

In [55]:
df_murder_joondalup_small = df.loc[df.area=="Joondalup District", ["month_and_year", "drug_offences"]].reset_index(drop=True)
df_drugs_joond =  df_murder_joondalup_small.tail(36)
px.line(df_drugs_joond, x='month_and_year', y ='drug_offences')

In [56]:
fig = px.bar(df_drugs_joond, x='month_and_year', y='drug_offences')
fig.show()

**Observation:** Drops in January, higher in May. Curious.

# 💳 <font color=green>fraud and related offences</font> 💳

### Fraud and related offences in all districts from 2007 - 2020

In [57]:
px.line(df, x='month_and_year', y ='fraud_and_related_offences', color='area')

**Observation:** Highest spike amongst all crimes (Jan 2014 Perth huge outlier); Perth, Cannington and Mirrabooka generally more than others; random yearly spikes over time (e. 09, 14, 19); recent increase generally in about 50% of areas.

### Fraud in Perth from Jan 2018 - Dec 2020 (3 years)

In [58]:
df_fraud_perth_small = df.loc[df.area=="Perth District", ["month_and_year", "fraud_and_related_offences"]].reset_index(drop=True)
df_fraud_per =  df_fraud_perth_small.tail(36)
px.line(df_fraud_per, x='month_and_year', y ='fraud_and_related_offences')

In [59]:
fig = px.bar(df_fraud_per, x='month_and_year', y='fraud_and_related_offences')
fig.show()

**Observation:** Not seasonal, very unstable, large range.

# Conclusion 

 1. murder: most consitent generally/oscillates between 1 and 2 except outlier years, lowest range (thank goodness); not particularly seasonal in Mirrabooka (makes sense!)
 2. assault on police: midland consistently higher than other over the years, north regional also consistently higher.
 3. arson: stable trends of spikes and dips over the years generally; dips in winter in Miland.
 4. drugs:slow upwards trend generally in all areas; recent dips in January in Joondalup.
 5. fraud: not a whole heap of pattern generally; larger difference between areas; can spike randomly (not obvisouly dependent on a season)
 

# 👎 Limitations 👎

    

1. I could have created seasons in the data clean up phase to illustrate this better and avoid eye-balling.
2. *ffp2* in R would have been pretty and has seasonal packages.
3. No consideration of *external factors*.
4. No consideration of *ratio* a offences to population size.
5. The bar graph and line graph, although useful to confirm observation, basically did th same job and did not provide a *different insight*.

# What does the ML prediction model say ? Let's see! 

Now that we've looked at what we have, let's go to the ML part to predict!

I'm going to break the process down for Arson.


spoiler... 🍐 / 🎢