# Modeling and Output

This notebook goes over how to output fitted encoders, standardizers, and models.

[Pickle](https://docs.python.org/3/library/pickle.html) and [JobLib](https://joblib.readthedocs.io/en/latest/index.html) were utilized to output fitted encoders, standardizers, and fitted models.

**Note**: Pickle *could* be used to output fitted models. However, Pickle can output very large files for more complicated models. Therefore, while not necessarily not required for this project, in the name of good practice, was used to output the fitted model.

In [None]:
#Import and Manipulate Data
import pandas as pd
import sqlite3

#Pre-processing
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler

#Modeling
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

#Export Fitted Data
import pickle
import joblib

#Turn off warnings.
import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter(action='ignore', category=FutureWarning)

### Import

Import and drop missing values.

In [None]:
conn = sqlite3.connect('../data/crime_census_weather_tod.db')        
df = pd.read_sql_query("select * from all_crimes", conn)
conn.close()

In [None]:
df = df.dropna(how='any')

### Encode Block Group

In [None]:
le = LabelEncoder()
bg_fit = le.fit(df['BLOCK_GROUP'])

In [None]:
df['bg_cat'] = bg_fit.transform(df['BLOCK_GROUP'])

### Standardize Features

The only features that are standardized in this notebook are those features used in the final model. For a review of feature selection and a standardization of all features, see [Model_Feature_Standardization](https://github.com/georgetown-analytics/DC-Criminalistics/blob/master/notebooks/Model_Feature_Standardization.ipynb), [Model_Feature_Selection_Modeling](https://github.com/georgetown-analytics/DC-Criminalistics/blob/master/notebooks/Model_Feature_Selection_Modeling.ipynb) notebooks.


**Features**: Census block group, day of the month, day of the week, time of day, temperature, and UV index.  
**Target**: Standardized and classified crime rate.

In [None]:
feature_cols = [
    'bg_cat', 'day', 'weekday', 'tod_num', 'temperature', 'uv_index',
]

In [None]:
scaler = StandardScaler()
scaler_fit = scaler.fit(df[feature_cols])

In [None]:
features = pd.DataFrame(scaler_fit.transform(df[feature_cols]),
                        columns=feature_cols)

In [None]:
target = df['crime_rate_cat']

### Modeling

**Model**: Bagging Classifier with Decision Tree estimator, which is the default estimator for this classifier.

For a more detailed exposition of other models and techniques used to run and store multiple models in working memory, see [Model_Feature_Selection_Modeling](https://github.com/georgetown-analytics/DC-Criminalistics/blob/master/notebooks/Model_Feature_Selection_Modeling.ipynb).

In [None]:
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2)

In [None]:
model = BaggingClassifier(base_estimator=DecisionTreeClassifier())

In [None]:
bc_fit = model.fit(X_train, y_train)

### Output Data

We have three fits we need to export:
1. Encodeded block group
2. Standardized features
3. Bagging classification model.

Block Group Encoding

In [None]:
pickle.dump(bg_fit, open('../model/block-group-encoding-fit.sav', 'wb'))

Standardization

In [None]:
pickle.dump(scaler_fit, open('../model/feature-standardization-fit.sav', 'wb'))

Bagging Classifier with Decision Tree Estimator

In [None]:
joblib.dump(bc_fit, '../model/bagging-classifier-fit.sav')