### For calculating your own risk (or an example of choice):

Look for occurrences of the \<EXAMPLE>\ tag in the the code below. Change the values in the indicated array to whatever you want. Then run the cell and it will print out the classification for your example.  Values used in the rest of the training set for each feature are listed below.  If there are no possible values listed after the "#", then you can enter an integer value into the field.

* 'sex', # 'Male' or 'Female' (I know this enforces a binary but it's what the training data from the FOIA requests provided)
* 'age',
* 'race', # 'African-American', 'Caucasian', 'Asian', 'Hispanic, 'Native American', 'Other'
* 'juv_fel_count',
* 'juv_misd_count',
* 'juv_other_count',
* 'priors_count',
* 'c_charge_degree' # 'F' or 'M' (felony or misdemeanor)


In [401]:
import pandas as pd
import logging

import numpy as np
import math

from sklearn.model_selection import train_test_split
# from sklearn.datasets import make_classification
# from sklearn.datasets import make_regression
from sklearn.metrics import roc_auc_score
from sklearn import preprocessing

# from mla.datasets import *
# from mla.metrics.metrics import root_mean_squared_log_error, mean_squared_error
from mla.neuralnet import NeuralNet
from mla.neuralnet.constraints import MaxNorm, UnitNorm
from mla.neuralnet.layers import Activation, Dense, Dropout
from mla.neuralnet.optimizers import SGD, RMSprop, Adagrad, Adadelta, Adam
from mla.neuralnet.parameters import Parameters
from mla.neuralnet.regularizers import *
from mla.utils import one_hot

In [402]:
logging.basicConfig(level=logging.DEBUG)


def classification(X, y):
    
    example = X[-1]
    example = np.asarray([example, example])
    
#     print X
#     print example
    
    y = one_hot(y)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=1111)

    model = NeuralNet(
        layers=[
            Dense(512, Parameters(init='uniform', regularizers={'W': L2(0.05)})),
            Activation('relu'),
            Dropout(0.9),
            Dense(128, Parameters(init='normal', constraints={'W': MaxNorm()})),
            Activation('relu'),
            Dense(3),
            Activation('softmax'),
        ],
        loss='categorical_crossentropy',
        optimizer=Adadelta(),
        metric='accuracy',
        batch_size=256,
        max_epochs=25,

    )
#     print X_train.shape
#     print y_train.shape
    model.fit(X_train, y_train)
    predictions = model.predict(X_test)
    print('classification accuracy', roc_auc_score(y_test[:, 0], predictions[:, 0]))
    
    ex_predict = model.predict(example)
    scores = ["LOW", "MEDIUM", "HIGH"]
    print "Your example is predicted to have a %s risk score" % scores[np.argmax(ex_predict[0])]



In [403]:
def process_x_y(x_data, y_data, to_int):
    
    ''' convert the text columns of np.arrays of desired x_data and y_data into int / vector representation
        to_int is the indices of columns with text values (make this programmatic in the next update)'''
    
    # convert text columns to integer values
    le = preprocessing.LabelEncoder()
    for i in to_int:
        temp = x_data[:,i]
        temp_fit = le.fit(temp)
        x_data[:,i] = le.transform(temp)

    for i in range(len(x_data)):
        for j in range(len(x_data[i])):
            if np.isnan(x_data[i][j]):
                x_data[i][j] = 0

    x_data = x_data.astype(int)
    

    y_fit = le.fit(y_data)
    y_data = le.transform(y_data)
    
    return x_data, y_data

In [404]:
def run_model(keep, target, to_int):

    y = target.as_matrix()
    x = keep.as_matrix()

    x_data, y_data = process_x_y(x,y, to_int)

    x_train, x_test, y_train, y_test = train_test_split(x_data, y_data, test_size=0.15, random_state=1111)
    classification(x_data, y_data)

In [405]:
csv_file = 'compas-scores-two-years-violent.csv'
df = pd.read_csv(csv_file)

In [406]:
# Predicting the score assigned for risk of violent recidivism

keep = [
 'sex',
 'age',
 'race',
 'juv_fel_count',
 'juv_misd_count',
 'juv_other_count',
 'priors_count',
 'c_charge_degree']

target = ['v_score_text']
text_cols = [0,2,7]
to_keep = df[keep]
to_target = df[target]

# <EXAMPLE>: Adding in an example of choice
to_keep.loc[df.shape[0]] = ['Male',22,'Caucasian',0,0,0,0,'F'] # <--- change the values in this array
to_target.loc[df.shape[0]] = ['Low']

run_model(to_keep, to_target, text_cols)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
INFO:root:Total parameters: 70659
Epoch progress: 100%|██████████| 16/16 [00:00<00:00, 70.07it/s]
INFO:root:Epoch:0, train loss: 5.30650038236, train accuracy: 0.715277777778, elapsed: 0.263963937759 sec.
Epoch progress: 100%|██████████| 16/16 [00:00<00:00, 66.97it/s]
INFO:root:Epoch:1, train loss: 2.00994070763, train accuracy: 0.715277777778, elapsed: 0.272969961166 sec.
Epoch progress: 100%|██████████| 16/16 [00:00<00:00, 70.30it/s]
INFO:root:Epoch:2, train loss: 1.6883196485, train accuracy: 0.715277777778, elapsed: 0.261718988419 sec.
Epoch progress: 100%|██████████| 16/16 [00:00<00:00, 68.18it/s]
INFO:root:Ep

('classification accuracy', 0.85127333122429605)
Your example is predicted to have a MEDIUM risk score


In [407]:
# Predicting the score assigned for risk of violent recidivism using JUST age, sex and race

keep = [
 'sex',
 'age',
 'race']

target = ['v_score_text']
text_cols = [0,2]

to_keep = df[keep]
to_target = df[target]

# <EXAMPLE>: Adding in an example of choice
to_keep.loc[df.shape[0]] = ['Male',22,'Caucasian']
to_target.loc[df.shape[0]] = ['Low']

run_model(to_keep, to_target, text_cols)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
INFO:root:Total parameters: 68099
Epoch progress: 100%|██████████| 16/16 [00:00<00:00, 71.46it/s]
INFO:root:Epoch:0, train loss: 2.8566724982, train accuracy: 0.715277777778, elapsed: 0.260395050049 sec.
Epoch progress: 100%|██████████| 16/16 [00:00<00:00, 64.71it/s]
INFO:root:Epoch:1, train loss: 1.83904580054, train accuracy: 0.715277777778, elapsed: 0.279430866241 sec.
Epoch progress: 100%|██████████| 16/16 [00:00<00:00, 72.22it/s]
INFO:root:Epoch:2, train loss: 1.55244373549, train accuracy: 0.715277777778, elapsed: 0.251684904099 sec.
Epoch progress: 100%|██████████| 16/16 [00:00<00:00, 71.10it/s]
INFO:root:Ep

('classification accuracy', 0.78498365496151012)
Your example is predicted to have a HIGH risk score


In [408]:
# Predicting the score assigned for risk of violent recidivism using JUST race

keep = [
 'race']

target = ['v_score_text']
text_cols = [0]

to_keep = df[keep]
to_target = df[target]

# <EXAMPLE>: Adding in an example of choice
to_keep.loc[df.shape[0]] = ['Hispanic']
to_target.loc[df.shape[0]] = ['Low']

run_model(to_keep, to_target, text_cols)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
INFO:root:Total parameters: 67075
Epoch progress: 100%|██████████| 16/16 [00:00<00:00, 75.07it/s]
INFO:root:Epoch:0, train loss: 0.97967315348, train accuracy: 0.715277777778, elapsed: 0.24632191658 sec.
Epoch progress: 100%|██████████| 16/16 [00:00<00:00, 76.32it/s]
INFO:root:Epoch:1, train loss: 0.845374569387, train accuracy: 0.715277777778, elapsed: 0.243900060654 sec.
Epoch progress: 100%|██████████| 16/16 [00:00<00:00, 73.06it/s]
INFO:root:Epoch:2, train loss: 0.818622871883, train accuracy: 0.715277777778, elapsed: 0.251918077469 sec.
Epoch progress: 100%|██████████| 16/16 [00:00<00:00, 66.67it/s]
INFO:root:

('classification accuracy', 0.61923969208056517)
Your example is predicted to have a MEDIUM risk score
