# Part 3: Unbiased Evaluation using a New Test Set

In this part, we are given a new test set (`/dsa/data/all_datasets/back_order/Kaggle_Test_Dataset_v2.csv`). We can now take advantage of the entire smart sample that we created in Part I. 

* Retrain a pipeline using the optimal parameters that the pipeline learned. We don't need to repeat GridSearch here. 

In [20]:
# Libraries

%matplotlib inline
import matplotlib.pyplot as plt

import os, sys
import itertools
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.covariance import EllipticEnvelope

import joblib

## Import modules as needed

In [2]:
# In addition to the above
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report, confusion_matrix

## Load smart sample and the best pipeline from Part II

In [3]:
Sample_X, Sample_y = joblib.load('sample-data-v1.pkl')
Pipe_1 = joblib.load('model_one.pkl')
Model_1 = joblib.load('model_one_best.pkl')


##  Retrain a pipeline using the full sampled training data set

Use the full sampled training data set to train the pipeline.

In [4]:
X_train, X_test, y_train, y_test = train_test_split(Sample_X, Sample_y, test_size = 0.2)

In [5]:
# Add code below this comment  (Question #E301)
# ----------------------------------

mod_1_run = Pipe_1.fit(X_train, y_train)



In [6]:
# testing the prediction of this model
y_pred = mod_1_run.predict(X_test)

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.81      0.56      0.66      2215
           1       0.67      0.87      0.76      2303

    accuracy                           0.72      4518
   macro avg       0.74      0.72      0.71      4518
weighted avg       0.74      0.72      0.71      4518



### Save the trained model with the pickle library.

In [7]:
# Add code below this comment  
# -----------------------------

joblib.dump(mod_1_run, 'mod_1_run.pkl')




['mod_1_run.pkl']


## Load the Testing Data and evaluate your model

 * `/dsa/data/all_datasets/back_order/Kaggle_Test_Dataset_v2.csv`
 
* We need to preprocess this test data (follow the steps similar to Part I)
* If we have fitted any normalizer/standardizer in Part 2, then we have to transform this test data using the fitted normalizer/standardizer

In [8]:
# Preprocess the given test set  (Question #E302)
# ----------------------------------

new_data = pd.read_csv('/dsa/data/all_datasets/back_order/Kaggle_Test_Dataset_v2.csv')

new_data.head()



  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,sku,national_inv,lead_time,in_transit_qty,forecast_3_month,forecast_6_month,forecast_9_month,sales_1_month,sales_3_month,sales_6_month,...,pieces_past_due,perf_6_month_avg,perf_12_month_avg,local_bo_qty,deck_risk,oe_constraint,ppap_risk,stop_auto_buy,rev_stop,went_on_backorder
0,3285085,62.0,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,-99.0,-99.0,0.0,Yes,No,No,Yes,No,No
1,3285131,9.0,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,-99.0,-99.0,0.0,No,No,Yes,No,No,No
2,3285358,17.0,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.92,0.95,0.0,No,No,No,Yes,No,No
3,3285517,9.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.78,0.75,0.0,No,No,Yes,Yes,No,No
4,3285608,2.0,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.54,0.71,0.0,No,No,No,Yes,No,No


In [9]:
new_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 242076 entries, 0 to 242075
Data columns (total 23 columns):
 #   Column             Non-Null Count   Dtype  
---  ------             --------------   -----  
 0   sku                242076 non-null  object 
 1   national_inv       242075 non-null  float64
 2   lead_time          227351 non-null  float64
 3   in_transit_qty     242075 non-null  float64
 4   forecast_3_month   242075 non-null  float64
 5   forecast_6_month   242075 non-null  float64
 6   forecast_9_month   242075 non-null  float64
 7   sales_1_month      242075 non-null  float64
 8   sales_3_month      242075 non-null  float64
 9   sales_6_month      242075 non-null  float64
 10  sales_9_month      242075 non-null  float64
 11  min_bank           242075 non-null  float64
 12  potential_issue    242075 non-null  object 
 13  pieces_past_due    242075 non-null  float64
 14  perf_6_month_avg   242075 non-null  float64
 15  perf_12_month_avg  242075 non-null  float64
 16  lo

In [10]:
# Drop same columns as before

new_data = new_data.drop(['sku', 'lead_time', 'in_transit_qty','min_bank', 'local_bo_qty'], axis=1)


In [11]:
# Same way to confirm yes/no columns

yes_no_columns = list(filter(lambda i: new_data[i].dtype!=np.float64, new_data.columns))
print(yes_no_columns)

# Add code below this comment  (Question #E102)
# ----------------------------------

print('potential_issue', new_data['potential_issue'].unique())
print('deck_risk', new_data['deck_risk'].unique())
print('oe_constraint', new_data['oe_constraint'].unique())
print('ppap_risk', new_data['ppap_risk'].unique())
print('stop_auto_buy', new_data['stop_auto_buy'].unique())
print('rev_stop', new_data['rev_stop'].unique())
print('went_on_backorder', new_data['went_on_backorder'].unique())


['potential_issue', 'deck_risk', 'oe_constraint', 'ppap_risk', 'stop_auto_buy', 'rev_stop', 'went_on_backorder']
potential_issue ['No' 'Yes' nan]
deck_risk ['Yes' 'No' nan]
oe_constraint ['No' 'Yes' nan]
ppap_risk ['No' 'Yes' nan]
stop_auto_buy ['Yes' 'No' nan]
rev_stop ['No' 'Yes' nan]
went_on_backorder ['No' 'Yes' nan]


In [12]:
# Preprocess filling with mode and converting to binary

for column_name in yes_no_columns:
    mode = new_data[column_name].apply(str).mode()[0]
    print('Filling missing values of {} with {}'.format(column_name, mode))
    new_data[column_name].fillna(mode, inplace=True)
    
for column_name in yes_no_columns:
    new_data[column_name] = new_data[column_name].map({'Yes': 1, 'No': 0})

Filling missing values of potential_issue with No
Filling missing values of deck_risk with No
Filling missing values of oe_constraint with No
Filling missing values of ppap_risk with No
Filling missing values of stop_auto_buy with Yes
Filling missing values of rev_stop with No
Filling missing values of went_on_backorder with No


In [13]:
# Drop NA

new_data = new_data.dropna()

We can now predict and evaluate with the preprocessed test set. It would be interesting to see the performance with and without outliers removal from the test set. We can report confusion matrix, precision, recall, f1-score, accuracy, and other measures (if any). 

In [14]:
# Add code below this comment  (Question #E303)
# ----------------------------------

#Pipe_2 = joblib.load('model_one.pkl')

# Loading the model fresh into Model_2

Model_2 = joblib.load('mod_1_run.pkl')

#-----------------------------------------------------------------------------------------------
# Commenting out the split and will just run the full portion of the data on the model

# X_train, X_test, y_train, y_test = train_test_split(new_data.iloc[:,:-1], new_data.iloc[:,-1:], test_size = 0.2)

#-----------------------------------------------------------------------------------------------
#----------The below code is old as I played around with the model------------------------------

# Fitting the pipe to the new data - This was done for practice. Fitting the above model below
# model_3 = Pipe_2.fit(X_train, y_train)

# y_pred_2 = model_3.predict(X_test)

# Fitting above model with the new data

# mod_2_run = Model_2.fit(X_train, y_train)

#------------------------------------------------------------------------------------------------

y_pred_2 = Model_2.predict(new_data.iloc[:,:-1])


In [15]:
# Classification report

#y_pred = mod_1_run.predict(X_test)

print(classification_report(new_data.iloc[:,-1:], y_pred_2))

              precision    recall  f1-score   support

           0       1.00      0.54      0.70    239387
           1       0.02      0.86      0.04      2688

    accuracy                           0.55    242075
   macro avg       0.51      0.70      0.37    242075
weighted avg       0.99      0.55      0.70    242075



In [16]:
confusion_matrix(new_data.iloc[:,-1:], y_pred_2)

array([[129847, 109540],
       [   377,   2311]])

Based on the above confusion matrix and the classification report this model is not the best as-is at predicting back orders. It did end up predicting 86% of the back-order items correctly, but it also classified nearly 110,000 non-back order items as at risk for back order. 

In [17]:
# Ran to evaluate the true number of predictions compared to what actually existed in the data

new_data.went_on_backorder.value_counts()

0    239387
1      2688
Name: went_on_backorder, dtype: int64

Now to run the same after cleaning up and removing outliers

In [21]:
# Outlier with the elliptic envelope function code again

def elliptic_envelope_session(X, y):
    # Fit envelope
    envelope = EllipticEnvelope(support_fraction=1, contamination=0.2).fit(X)

    # Create an boolean indexing array to pick up outliers
    outliers = envelope.predict(X)==-1

    # Re-slice X,y into a cleaned dataset with outliers excluded
    X_clean = X[~outliers]
    y_clean = y[~outliers]
    return X_clean, y_clean

In [22]:
X = new_data.iloc[:,:-1]
y = new_data.iloc[:,-1:]

X_in, y_in = elliptic_envelope_session(X, y)

y_pred_3 = Model_2.predict(X_in)


In [23]:
print(classification_report(y_in, y_pred_3))

              precision    recall  f1-score   support

           0       1.00      0.47      0.64    191492
           1       0.02      0.89      0.04      2165

    accuracy                           0.48    193657
   macro avg       0.51      0.68      0.34    193657
weighted avg       0.99      0.48      0.63    193657



In [24]:
confusion_matrix(y_in, y_pred_3)

array([[ 90140, 101352],
       [   245,   1920]])

Overall, this model appears to have run better without the outlier detection code and keeping the dataset whole.

## Conclusion

## Reflect

Imagine you are data scientist that has been tasked with developing a system to save your 
company money by predicting and preventing back orders of parts in the supply chain.

Write a **brief summary** for "management" that details your findings, 
your level of certainty and trust in the models, 
and recommendations for operationalizing these models for the business.

# Save your notebook!
## Then `File > Close and Halt`