#  Project Progress Report
## Andres Nigenda, Elias Serrania, Krista Chan

**1. Updates in Data**

There are no updates in the data sources we are using.

**2. Analysis you've done and results you're getting**

We have made significant progress on our pipeline and written skeleton code which cleans the data, splits training and test sets, adds key features, and loops through different classifiers.

Below, we test our pipeline using a few features and a single training and test set.

In [1]:
import read_data as rd
import train_test as tt
import feature_generation as fg
import ml_loop as ml
import pandas as pd
import numpy as np

In [2]:
officer_df = rd.create_df('officer')
allegation_df = rd.create_df('allegation')
officerallegation_df = rd.create_df('officerallegation')

In [3]:
allegation_df = allegation_df.merge(officerallegation_df, left_on='crid', right_on='allegation_id')

In [4]:
allegation_sets = tt.split_sets(allegation_df, np.timedelta64(2, 'Y'), 'incident_date', verbose=True)

Sets trained with 1 year(s) of data

Training set 0 is trained from 2010-01-01 to 2010-12-31 to predict the outcome from 2011-01-01 to 2012-12-31 and tested on outcomes in 2013-01-01 to 2014-12-31
Training set 1 is trained from 2010-01-01 to 2011-12-31 to predict the outcome from 2012-01-01 to 2013-12-31 and tested on outcomes in 2014-01-01 to 2015-12-31
Training set 2 is trained from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and tested on outcomes in 2015-01-02 to 2016-12-30
Training set 3 is trained from 2010-01-01 to 2013-12-30 to predict the outcome from 2013-12-31 to 2016-01-01 and tested on outcomes in 2016-01-02 to 2017-12-30

Sets trained with 2 year(s) of data

Training set 4 is trained from 2010-01-01 to 2011-12-31 to predict the outcome from 2012-01-01 to 2013-12-31 and tested on outcomes in 2014-01-01 to 2015-12-31
Training set 5 is trained from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and tested on ou

Below we select training set 7 to test our pipeline

In [5]:
testing = allegation_sets[7]
testing.keys()

dict_keys(['train', 'test', 'start_date_train', 'end_date_train', 'start_date_outcome', 'end_date_outcome', 'start_date_test', 'end_date_test', 'outcome_time'])

In [6]:
#training
officer_df_train = officer_df.loc[(officer_df.appointed_date < testing.get('start_date_outcome')) & \
                            (officer_df.resignation_date >= testing.get('start_date_train'))]

#testing
officer_df_test = officer_df.loc[(officer_df.appointed_date < testing.get('start_date_test')) & \
                            (officer_df.resignation_date >= testing.get('start_date_train'))]

In [7]:
# training
added_features = fg.gen_allegation_features(
    officer_df_train, testing.get('train'), testing.get('end_date_train'))
added_features = fg.create_sustained_outcome(
    officer_df_train, testing.get('train'), testing.get('end_date_train'))

In our training set, we find that among 4056 officers, 2.6% of them had a sustained complaint in the outcome time frame. In the training time period, on average, each officer had 0.64 complaints, 3.7% of each officer's complaints were sustained, and 5.8% of each officer's complaints were filed by another police officer.

In [8]:
added_features.sustained_outcome.mean()

0.02638067061143984

In [9]:
added_features.number_complaints.mean()

0.6380670611439843

In [10]:
added_features.pct_sustained_complaints.mean()

0.037048790864774986

In [11]:
added_features.pct_officer_complaints.mean()

0.05799846517399461

In [12]:
added_features.id.nunique()

4056

In [13]:
forml = added_features.dropna()

In [14]:
# testing
added_features_test = fg.gen_allegation_features(
    officer_df_test, testing.get('test'), testing.get('end_date_outcome'))
added_features_test = fg.create_sustained_outcome(
    officer_df_test, testing.get('train'), testing.get('end_date_outcome'))

In [15]:
testl = added_features_test.dropna()

In [16]:
testl.columns

Index(['id', 'gender', 'race', 'appointed_date', 'rank', 'active',
       'birth_year', 'resignation_date', 'current_badge', 'current_salary',
       'last_unit_id', 'number_complaints', 'pct_officer_complaints',
       'pct_sustained_complaints', 'sustained_outcome'],
      dtype='object')

In [17]:
testl.drop(columns=['gender', 'race', 'appointed_date', 'rank', 'active',
       'birth_year', 'resignation_date', 'current_badge', 'last_unit_id'], inplace=True)
forml.drop(columns=['gender', 'race', 'appointed_date', 'rank', 'active',
       'birth_year', 'resignation_date', 'current_badge', 'last_unit_id'], inplace=True)

In [18]:
testing['train'] = forml
testing['test'] = testl

tt = [testing]

Below, we determine recall on a variety of classifiers using this training and test set. As expected, due to the few features we included, recall is very low for all models.

In [19]:
clfrs, params = ml.set_parameters('all')
ml.ml_loop(clfrs, params, tt, 'sustained_outcome', 'meow', None, 'moo')

Working on the RandomForest classifier:
*** with parameters {'max_depth': 5, 'max_features': 'sqrt', 'min_samples_split': 2, 'n_estimators': 100, 'n_jobs': -1}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30


  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)


**** Model recall is 0.0
Working on the DecisionTree classifier:
*** with parameters {'criterion': 'gini', 'max_depth': 1, 'max_features': None, 'min_samples_split': 2}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 1, 'max_features': None, 'min_samples_split': 5}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 1, 'max_features': None, 'min_samples_split': 10}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 1, 'max_features': 'sqrt', 'min_samples_split': 2}


  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)


**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 5, 'max_features': 'sqrt', 'min_samples_split': 10}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 5, 'max_features': 'log2', 'min_samples_split': 2}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 5, 'max_features': 'log2', 'min_samples_split': 5}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 5, 'max_features': 'log2', 'min_samples_split': 10}
Training from 2010-01-01 to 2012-

  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)


**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 10, 'max_features': None, 'min_samples_split': 5}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 10, 'max_features': None, 'min_samples_split': 10}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 10, 'max_features': 'sqrt', 'min_samples_split': 2}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 10, 'max_features': 'sqrt', 'min_samples_split': 5}
Training from 2010-01-01 to 2012-1

  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)


**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 10, 'max_features': 'sqrt', 'min_samples_split': 10}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 10, 'max_features': 'log2', 'min_samples_split': 2}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 10, 'max_features': 'log2', 'min_samples_split': 5}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 10, 'max_features': 'log2', 'min_samples_split': 10}
Training from 2010-01-01 to 2

  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)


**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 20, 'max_features': None, 'min_samples_split': 5}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 20, 'max_features': None, 'min_samples_split': 10}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 20, 'max_features': 'sqrt', 'min_samples_split': 2}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 20, 'max_features': 'sqrt', 'min_samples_split': 5}
Training from 2010-01-01 to 2012-1

  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)


**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 20, 'max_features': 'log2', 'min_samples_split': 2}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 20, 'max_features': 'log2', 'min_samples_split': 5}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 20, 'max_features': 'log2', 'min_samples_split': 10}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 50, 'max_features': None, 'min_samples_split': 2}
Training from 2010-01-01 to 2012

  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)


**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 50, 'max_features': None, 'min_samples_split': 10}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 50, 'max_features': 'sqrt', 'min_samples_split': 2}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 50, 'max_features': 'sqrt', 'min_samples_split': 5}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 50, 'max_features': 'sqrt', 'min_samples_split': 10}
Training from 2010-01-01 to 201

  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)


**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 50, 'max_features': 'log2', 'min_samples_split': 5}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 50, 'max_features': 'log2', 'min_samples_split': 10}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 100, 'max_features': None, 'min_samples_split': 2}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 100, 'max_features': None, 'min_samples_split': 5}
Training from 2010-01-01 to 2012

  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)


**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 100, 'max_features': 'sqrt', 'min_samples_split': 2}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 100, 'max_features': 'sqrt', 'min_samples_split': 5}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 100, 'max_features': 'sqrt', 'min_samples_split': 10}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 100, 'max_features': 'log2', 'min_samples_split': 2}
Training from 2010-01-01 t

  'recall', 'true', average, warn_for)


**** Model recall is 0.0
*** with parameters {'criterion': 'gini', 'max_depth': 100, 'max_features': 'log2', 'min_samples_split': 10}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 1, 'max_features': None, 'min_samples_split': 2}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 1, 'max_features': None, 'min_samples_split': 5}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 1, 'max_features': None, 'min_samples_split': 10}
Training from 2010-01-01 to 

  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)


**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 5, 'max_features': 'sqrt', 'min_samples_split': 2}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 5, 'max_features': 'sqrt', 'min_samples_split': 5}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 5, 'max_features': 'sqrt', 'min_samples_split': 10}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 5, 'max_features': 'log2', 'min_samples_split': 2}
Training from 2010-01-

  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)


**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 10, 'max_features': None, 'min_samples_split': 2}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 10, 'max_features': None, 'min_samples_split': 5}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 10, 'max_features': None, 'min_samples_split': 10}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 10, 'max_features': 'sqrt', 'min_samples_split': 2}
Training from 2010-01-01

  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)


**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 10, 'max_features': 'log2', 'min_samples_split': 5}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 10, 'max_features': 'log2', 'min_samples_split': 10}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 20, 'max_features': None, 'min_samples_split': 2}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 20, 'max_features': None, 'min_samples_split': 5}
Training from 2010-01-

  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)


**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 20, 'max_features': 'sqrt', 'min_samples_split': 5}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 20, 'max_features': 'sqrt', 'min_samples_split': 10}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 20, 'max_features': 'log2', 'min_samples_split': 2}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 20, 'max_features': 'log2', 'min_samples_split': 5}
Training from 2010

  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)


**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 50, 'max_features': 'sqrt', 'min_samples_split': 2}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 50, 'max_features': 'sqrt', 'min_samples_split': 5}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 50, 'max_features': 'sqrt', 'min_samples_split': 10}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 50, 'max_features': 'log2', 'min_samples_split': 2}
Training from 2010

  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)


**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 100, 'max_features': 'sqrt', 'min_samples_split': 2}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 100, 'max_features': 'sqrt', 'min_samples_split': 5}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 100, 'max_features': 'sqrt', 'min_samples_split': 10}
Training from 2010-01-01 to 2012-12-30 to predict the outcome from 2012-12-31 to 2015-01-01 and testing on outcomes from  2015-01-02 to 2016-12-30
**** Model recall is 0.0
*** with parameters {'criterion': 'entropy', 'max_depth': 100, 'max_features': 'log2', 'min_samples_split': 2}
Training from 

  'recall', 'true', average, warn_for)


**3. Changes in scope**

For our project, we will be creating two outcome variables and testing the effect of network features on both outcomes. The first outcome is whether or not a police officer is involved in the use of a firearm over the next two years. The next outcome is whether or not a police officer has a sustained investigation over the next two years. While our overarching policy goal is to determine whether or not network features have an influence on police officers using unjust force in order to inform journalistic investigation, we do not have a perfect proxy for this outcome, and therefore are using these two outcome variables to better inform the interpretation of our results.