[header stuff here]


# **02: Modeling**
---
In this notebook, we'll seek to train an Long Short-Term Memory Recurrent Neural Network in a fashion inspired by [this TensorFlow tutorial](https://www.tensorflow.org/tutorials/structured_data/time_series#setup).

Start by importing a few important libraries:

In [1]:
import os
import datetime

import IPython
import IPython.display
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow as tf

from sklearn.preprocessing import StandardScaler

Import the data:

In [2]:
df = pd.read_csv('../data/model_inputs/predictors_and_targets.csv')

In [5]:
#to datetime inspired by user Zero: https://stackoverflow.com/a/46658244
df['year'] = pd.to_datetime(df['year'], format = '%Y')

In [7]:
df['year']

0      1979-01-01
1      1980-01-01
2      1981-01-01
3      1982-01-01
4      1983-01-01
          ...    
2137   2016-01-01
2138   2017-01-01
2139   2018-01-01
2140   2019-01-01
2141   2020-01-01
Name: year, Length: 2142, dtype: datetime64[ns]

## Modeling Goals
* Build a model that can produce a prediction for a specified crime rate in each state. Do this for two crime categories:
    * `violent_crime_1000`, the number of violent crimes committed per thousand population in a state and
    * `property_crime_1000`, the number of property crimes committed per thousand population in a state.
* Compare the results of the model to a baseline for evaluation
* Append the predictions to the crime rates table

---

### Modeling steps that must happen for each state:
1. Extract the state dataframe
1. Define the X and Y
1. Scale the Data
1. Make predictions using baseline model
1. Make predictions using LSTM model
1. Append prediction to final dataframe

**We will run the above for each state plus D.C. We'll evaluate the model based on mean MAE of all the state's predictions VS the mean MAE of all the states predicted using the baseline model.**

---

### Prototype the model on one state's dataframe:

In [9]:
df.columns

Index(['state_abbr', 'year', 'population', 'violent_crime', 'homicide', 'rape',
       'robbery', 'aggravated_assault', 'property_crime', 'burglary',
       'larceny', 'motor_vehicle_theft', 'arson', 'violent_crime_1000',
       'homicide_1000', 'rape_1000', 'robbery_1000', 'aggravated_assault_1000',
       'property_crime_1000', 'burglary_1000', 'larceny_1000',
       'motor_vehicle_theft_1000', 'arson_1000', 'avg_unemployment_rate',
       'avg_CPI', 'ag_Democrat', 'ag_Mixed', 'ag_Republican', 'ag_Unknown'],
      dtype='object')

In [13]:
nm_crime = df.copy()
nm_crime = nm_crime[nm_crime['state_abbr']=='NM']
nm_crime.head()

Unnamed: 0,state_abbr,year,population,violent_crime,homicide,rape,robbery,aggravated_assault,property_crime,burglary,...,burglary_1000,larceny_1000,motor_vehicle_theft_1000,arson_1000,avg_unemployment_rate,avg_CPI,ag_Democrat,ag_Mixed,ag_Republican,ag_Unknown
1260,NM,1979-01-01,1241000,7272,154,582,1502,5034,64563,18385,...,14.814666,33.638195,3.572119,1.313457,6.666667,72.583333,1,0,0,0
1261,NM,1980-01-01,1295474,7967,170,561,1657,5579,69490,19335,...,14.925039,35.209506,3.506053,0.565044,7.608333,82.383333,1,0,0,0
1262,NM,1981-01-01,1327000,8913,151,628,1868,6266,73369,21405,...,16.130369,35.608139,3.550867,0.259231,7.225,90.933333,1,0,0,0
1263,NM,1982-01-01,1359000,9982,158,656,1715,7453,79816,22135,...,16.287712,39.29507,3.148639,0.677704,9.033333,96.533333,1,0,0,0
1264,NM,1983-01-01,1399000,9608,124,671,1595,7218,79175,21813,...,15.591851,37.939242,3.062902,0.312366,9.708333,99.583333,1,0,0,0
