# Welcome!
This is a notebook detailing my analysis of some data about motor theft in New Zealand over 6 months.
The records contain information about the vehicle, including its unique ID, type, make, manufacture year, color the location of the incident, the day it was added to the database.

I figured it was an appropriate phenomenon to study, considering the imminent announcement of GTA VI.

The dataset is kindly provided by the wonderful people at [Maven Analytics](mavenanalytics.io), it can be acquired there for free.

## Goals:
### Which makes were stolen the most? 
- I expect standard cars to be the biggest victims, luxury cars tend to have more security features, and I don't think a common outlaw would want to accidentally steal a mob boss' car. Trailers are such are also prime candidates because you can just stick them to a tractor of any kind and run off with them, with the owner inside if you're feeling adventurous.
### What day of the week are vehicles most often and least often stolen?
- I'm quite curious about this one! If you stole a car, what day would you choose to do it?
### Is the model year a factor? Does it affect a vehicle differently based on its type?
- I expect prized cars from the 80s and earlier to be prime targets, highly valuable (particularly Japanese cars) without as much security as say, a 2023 Mercedes.
### Which regions had the most carjackings?
- I have no idea what the New Zealand provinces are, but a little filled map action will sort us well.
### Have carjackings increased or decreased in these six months?
- Analyzing the changes and trends of the phenomenon.




## Let's start by importing all the libraries we'll need:


In [1]:
# pandas
import pandas as pd

# numpy
import numpy as np

# matplotlib
import matplotlib.pyplot as plt

# seaborn, might need it
import seaborn as sns

# setting which plot style to use, I chose this one because a beige background is easier on the eyes, and it's viewable by colorblind people too
plt.style.use("Solarize_Light2")

#magic function that stores all the plots in the notebook
%matplotlib inline

## Data Import and Preprocessing

### Next, we'll import our CSV files and perform some import pre-processing to save memory and time:
- First we import the carjackings dataframe itself, set the date parser to true for performance gain, and use smaller integer data types when possible, for massive memory gains.

In [5]:
# little variable to help make the code more readable
veh_cols=["vehicle_id","vehicle_type","make_id","model_year","color","date_stolen","location_id"]

pd.read_csv("E:/projects/nz-motor-theft-py-analysis/stolen_vehicles.csv",
            usecols=veh_cols, # this will pass only the specified columns
            index_col= "date_stolen", # setting the date column as an index
            header=0, # setting the first row as the header
            parse_dates=True, # allows pandas to directly make the date column a datetime64 type
            dtype={"vehicle_id": "Int16","make_id":"Int16","model_year":"Int16","location_id":"Int16"}, #notice the capital letter in the "Int"s, more on that later
            keep_default_na=True #keeps the NA values, so we'll be able to get rid of them later
            )

SyntaxError: invalid syntax. Perhaps you forgot a comma? (3415840379.py, line 6)

There's a lot of missing values here, and unfortunately we can't really do anything to fill them, so we'll have to get rid of them, but we'll save that for the cleaning stage.

- For now, we'll import the second CSV file, the make details table.

In [None]:
pd.read_csv('E:/projects/nz-motor-theft-py-analysis/make_details.csv',
            index_col = "make_id",
            header=0, 
            dtype={"make_id":"Int16"},
            keep_default_na=True
  )

What a pretty dataframe. We'll join it to the first table soon, using **merge()** - pandas' version of the SQL join, I love SQL, so I feel right at home.


- Now to import the location data:

In [14]:
location_cols=["location_id","region","population"] # columns to imported, there's only one country after all
pd.read_csv('E:/projects/nz-motor-theft-py-analysis/locations.csv',
            index_col = "location_id",
            usecols=location_cols,
            header=0, 
            dtype={"location_id":"int8"},
            keep_default_na=True
  )

Unnamed: 0_level_0,region,population
location_id,Unnamed: 1_level_1,Unnamed: 2_level_1
101,Northland,201500
102,Auckland,1695200
103,Waikato,513800
104,Bay of Plenty,347700
105,Gisborne,52100
106,Hawke's Bay,182700
107,Taranaki,127300
108,Manawatū-Whanganui,258200
109,Wellington,543500
110,Tasman,58700
