# Semiconductor Manufacturing Process Dataset

## Project Description

This project builds a logistic regression model to predict whether a manufactured part passes or fails.

Source: https://www.kaggle.com/saurabhbagchi/fmst-semiconductor-manufacturing-project

A complex modern semiconductor manufacturing process is normally under constant surveillance via the monitoring of signals/variables collected from sensors and or process measurement points. However, not all of these signals are equally valuable in a specific monitoring system. The measured signals contain a combination of useful information, irrelevant information as well as noise. Engineers typically have a much larger number of signals than are actually required. If we consider each type of signal as a feature, then feature selection may be applied to identify the most relevant signals. The Process Engineers may then use these signals to determine key factors contributing to yield excursions downstream in the process. This will enable an increase in process throughput, decreased time to learning, and reduce per-unit production costs. These signals can be used as features to predict the yield type. And by analyzing and trying out different combinations of features, essential signals that are impacting the yield type can be identified.

Dataset: SemiconductorManufacturingProcessDataset.csv (on Canvas)

Later, we will learn how to apply PCA (Principal Component Analyses) for feature selection; then we will apply ANN to predict the Pass/Fail. in this exercise our objective is to repeat the same steps we did above for Supplier Data: Cleaning & Scaling Data, Encode Categorical Data, Split the Data to Training & Test Sets. 

## Importing the Libraries

In [30]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Importing the Dataset

In [31]:
dataset = pd.read_csv('SemiconductorManufacturingProcessDataset.csv')

## Showing the Dataset in a Table

In [32]:
pd.DataFrame(dataset)
# dataset

Unnamed: 0,Time,Sensor 1,Sensor 2,Sensor 3,Sensor 4,Sensor 5,Sensor 6,Sensor 7,Sensor 8,Sensor 9,...,Sensor 429,Sensor 430,Sensor 431,Sensor 432,Sensor 433,Sensor 434,Sensor 435,Sensor 436,Sensor 437,Pass/Fail
0,7/19/2008 11:55,3030.93,2564.00,2187.7333,1411.1265,1.3602,97.6133,0.1242,1.5005,0.0162,...,14.9509,0.5005,0.0118,0.0035,2.3630,,,,,Pass
1,7/19/2008 12:32,3095.78,2465.14,2230.4222,1463.6606,0.8294,102.3433,0.1247,1.4966,-0.0005,...,10.9003,0.5019,0.0223,0.0055,4.4447,0.0096,0.0201,0.0060,208.2045,Pass
2,7/19/2008 13:17,2932.61,2559.94,2186.4111,1698.0172,1.5102,95.4878,0.1241,1.4436,0.0041,...,9.2721,0.4958,0.0157,0.0039,3.1745,0.0584,0.0484,0.0148,82.8602,Fail
3,7/19/2008 14:43,2988.72,2479.90,2199.0333,909.7926,1.3204,104.2367,0.1217,1.4882,-0.0124,...,8.5831,0.4990,0.0103,0.0025,2.0544,0.0202,0.0149,0.0044,73.8432,Pass
4,7/19/2008 15:22,3032.24,2502.87,2233.3667,1326.5200,1.5334,100.3967,0.1235,1.5031,-0.0031,...,10.9698,0.4800,0.4766,0.1045,99.3032,0.0202,0.0149,0.0044,73.8432,Pass
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1562,10/16/2008 15:13,2899.41,2464.36,2179.7333,3085.3781,1.4843,82.2467,0.1248,1.3424,-0.0045,...,11.7256,0.4988,0.0143,0.0039,2.8669,0.0068,0.0138,0.0047,203.1720,Pass
1563,10/16/2008 20:49,3052.31,2522.55,2198.5667,1124.6595,0.8763,98.4689,0.1205,1.4333,-0.0061,...,17.8379,0.4975,0.0131,0.0036,2.6238,0.0068,0.0138,0.0047,203.1720,Pass
1564,10/17/2008 5:26,2978.81,2379.78,2206.3000,1110.4967,0.8236,99.4122,0.1208,,,...,17.7267,0.4987,0.0153,0.0041,3.0590,0.0197,0.0086,0.0025,43.5231,Pass
1565,10/17/2008 6:01,2894.92,2532.01,2177.0333,1183.7287,1.5726,98.7978,0.1213,1.4622,-0.0072,...,19.2104,0.5004,0.0178,0.0038,3.5662,0.0262,0.0245,0.0075,93.4941,Pass


## A Quick Review of the Data

In [33]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1567 entries, 0 to 1566
Columns: 439 entries, Time to Pass/Fail
dtypes: float64(437), object(2)
memory usage: 5.2+ MB


## Seperate The Input and Output
Here, we put the independent variables in X and the dependent variable in y. 

In [34]:
X = dataset.iloc[:, 1:438].values
y = dataset.iloc[:, -1].values

## Showing the Input Data in a Table format

In [35]:
pd.DataFrame(X)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,427,428,429,430,431,432,433,434,435,436
0,3030.93,2564.00,2187.7333,1411.1265,1.3602,97.6133,0.1242,1.5005,0.0162,-0.0034,...,1.6765,14.9509,0.5005,0.0118,0.0035,2.3630,,,,
1,3095.78,2465.14,2230.4222,1463.6606,0.8294,102.3433,0.1247,1.4966,-0.0005,-0.0148,...,1.1065,10.9003,0.5019,0.0223,0.0055,4.4447,0.0096,0.0201,0.0060,208.2045
2,2932.61,2559.94,2186.4111,1698.0172,1.5102,95.4878,0.1241,1.4436,0.0041,0.0013,...,2.0952,9.2721,0.4958,0.0157,0.0039,3.1745,0.0584,0.0484,0.0148,82.8602
3,2988.72,2479.90,2199.0333,909.7926,1.3204,104.2367,0.1217,1.4882,-0.0124,-0.0033,...,1.7585,8.5831,0.4990,0.0103,0.0025,2.0544,0.0202,0.0149,0.0044,73.8432
4,3032.24,2502.87,2233.3667,1326.5200,1.5334,100.3967,0.1235,1.5031,-0.0031,-0.0072,...,1.6597,10.9698,0.4800,0.4766,0.1045,99.3032,0.0202,0.0149,0.0044,73.8432
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1562,2899.41,2464.36,2179.7333,3085.3781,1.4843,82.2467,0.1248,1.3424,-0.0045,-0.0057,...,1.4879,11.7256,0.4988,0.0143,0.0039,2.8669,0.0068,0.0138,0.0047,203.1720
1563,3052.31,2522.55,2198.5667,1124.6595,0.8763,98.4689,0.1205,1.4333,-0.0061,-0.0093,...,1.0187,17.8379,0.4975,0.0131,0.0036,2.6238,0.0068,0.0138,0.0047,203.1720
1564,2978.81,2379.78,2206.3000,1110.4967,0.8236,99.4122,0.1208,,,,...,1.2237,17.7267,0.4987,0.0153,0.0041,3.0590,0.0197,0.0086,0.0025,43.5231
1565,2894.92,2532.01,2177.0333,1183.7287,1.5726,98.7978,0.1213,1.4622,-0.0072,0.0032,...,1.7085,19.2104,0.5004,0.0178,0.0038,3.5662,0.0262,0.0245,0.0075,93.4941


## A Quick Check of the Output Data

In [36]:
pd.DataFrame(y)

Unnamed: 0,0
0,Pass
1,Pass
2,Fail
3,Pass
4,Pass
...,...
1562,Pass
1563,Pass
1564,Pass
1565,Pass


## Taking care of missing data

In [37]:
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer.fit(X)
X = imputer.transform(X)

In [38]:
# A quick check
print(X)

[[3.03093000e+03 2.56400000e+03 2.18773330e+03 ... 1.64749042e-02
  5.28333333e-03 9.96700663e+01]
 [3.09578000e+03 2.46514000e+03 2.23042220e+03 ... 2.01000000e-02
  6.00000000e-03 2.08204500e+02]
 [2.93261000e+03 2.55994000e+03 2.18641110e+03 ... 4.84000000e-02
  1.48000000e-02 8.28602000e+01]
 ...
 [2.97881000e+03 2.37978000e+03 2.20630000e+03 ... 8.60000000e-03
  2.50000000e-03 4.35231000e+01]
 [2.89492000e+03 2.53201000e+03 2.17703330e+03 ... 2.45000000e-02
  7.50000000e-03 9.34941000e+01]
 [2.94492000e+03 2.45076000e+03 2.19544440e+03 ... 1.62000000e-02
  4.50000000e-03 1.37784400e+02]]


## Encoding Categorical Data

### Encoding the Independent Variable

We don't have any categorical data to encode.

### Encoding the Dependent Variable

In [39]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y = le.fit_transform(y)

In [40]:
# a quick check
print(y)

[1 1 0 ... 1 1 1]


## Feature Scaling

In [41]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X = sc.fit_transform(X)

## Splitting the Dataset into the Training set and Test set

In [42]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 34)

In [43]:
# print(X_train)

In [44]:
# print(X_test)

In [45]:
# print(y_train)


In [46]:
# print(y_test)

## Building the Logistic Regression Model

### Training the Model

In [47]:
from sklearn.linear_model import LogisticRegression
# Initialize the Logistic Regressor
LR_model = LogisticRegression()
LR_model.max_iter = 250
LR_model.fit(X_train, y_train)

LogisticRegression(max_iter=250)

### Evaluate the Model

In [48]:
# produce predictions
y_pred = LR_model.predict(X_test)
y_pred

array([1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0,
       1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1,

Model Accuracy

In [49]:
from sklearn.model_selection import cross_val_score

scores = cross_val_score(LR_model, X_test, y_test, scoring='accuracy', cv=5)
average_score = np.average(scores)

print('\nAverage model prediction accuracy: {:.2%}\n'.format(average_score))


Average model prediction accuracy: 91.09%



Confusion Matrix

In [50]:
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print("Confusion matrix for Logistic Regressor")
print(cm)

Confusion matrix for Logistic Regressor
[[  4  18]
 [ 17 275]]


In [54]:
# produce probabilities
y_predicted_proba = LR_model.predict_proba(sc.transform(X_test))
# y_predicted_proba
# y_predicted_proba[:,1]
print(y_predicted_proba[0])
print("Chance of Failure is ", 100*y_predicted_proba[0,1], "%" )

ValueError: Expected 2D array, got 1D array instead:
array=[-4.10316093e+01 -3.11294753e+01 -7.49156139e+01 -3.17650901e+00
 -7.57019359e-02 -1.61407726e+01 -1.76669476e+01 -7.88195053e+00
 -2.30037579e+01  3.41914665e+01 -5.45982666e+01 -6.12619774e+01
 -2.73893518e+00 -2.40282052e+01 -4.24838459e+00 -6.78487687e+01
 -6.80796596e+01 -5.69238045e+01 -1.13104029e+02  8.97187487e+00
 -9.14385944e+00  2.76375286e+00  1.02992075e-01 -5.06466180e+00
 -8.43917167e+00 -4.95493355e+00 -2.01295949e+01 -7.25879649e+00
 -2.44824574e+01 -7.52967779e+00 -4.21515316e+01 -6.80091502e+00
 -4.31178902e+01 -2.51901071e+01 -4.14942584e+01 -2.19079096e+02
 -1.96051262e+02 -6.57562171e+01 -2.82917802e+00 -1.38167606e+00
 -5.71286922e+01 -5.91807529e+01 -1.73937402e+01 -6.03188001e+01
 -6.09747298e-01 -3.09081875e+01 -7.31797179e+01 -2.57436269e+00
 -1.08627620e+02 -1.02399836e+02 -1.11074175e+02 -2.26493241e+02
 -2.86061197e+02 -5.16976586e+01 -3.24513383e-01 -5.91145926e+01
 -4.05972685e+01 -1.34723483e+01 -2.01306083e+00 -4.23139809e+00
 -3.84754573e+00 -6.09215626e+01 -5.46475432e-02 -3.47818110e+01
 -6.50097006e+01 -3.33925082e+00 -1.16497549e+01 -3.78217082e+01
 -1.99264901e+01  2.61400362e+01 -3.75875301e+00 -3.26843434e+01
  9.41723418e+01 -3.45828917e+01  1.82762949e+01 -6.69343323e+00
 -1.56714388e+01 -3.38977840e+02 -4.17218233e+01 -4.87279529e+01
 -3.37839082e+01 -4.23123793e+00 -2.26530547e+01 -4.63202319e+00
 -6.80008709e+01  1.26317994e+02 -8.97216448e+03  1.28975197e+04
  6.29013074e+00 -9.17502967e-01  5.52813047e+00 -5.39347631e+03
 -1.91184413e+03  5.19742846e+00  6.60420512e+01 -3.95053589e+02
  6.85150165e+01 -1.28394190e+02  4.20138850e-02  1.29749465e+00
 -1.28690941e+01 -6.17977909e+01 -1.52735259e+01 -3.57290982e+02
 -9.06234214e+00 -3.90304895e+01 -1.30725279e+02 -5.42155042e+01
 -1.93197750e+02 -3.62801822e+00 -2.03201234e+02 -1.36865321e+02
 -2.58715857e+00 -1.57388741e+01 -2.13554771e+00 -1.36247759e+01
  1.02177466e+00 -2.42560889e+01  3.76595527e+03 -4.51970768e+01
 -1.54040277e+02 -1.30785552e+01 -2.06504426e+00 -2.59545508e+00
 -2.34626480e+00 -4.58844646e+00 -1.59249003e+00 -5.17916309e-02
 -1.89081689e+00 -2.90818285e+02 -4.17034327e+00 -3.30107473e+01
 -2.75170039e+00 -9.23776875e+00 -4.46438150e-01 -1.42928974e+00
 -4.55227430e-01 -5.22512485e-02 -4.86438377e+01 -1.38781680e+00
 -5.87077468e-01  6.09995257e+00 -8.97616918e-01 -9.67277763e-01
 -9.59337183e-01 -7.32507428e-01 -7.35829751e+00 -4.92985133e-01
 -1.53735123e+00 -1.60514073e+00 -7.93229598e-01 -5.81531207e-01
 -6.55677066e-01 -5.59355417e+00 -1.34347462e+00 -7.75361107e+00
  8.54465449e-02 -7.75306538e+00 -1.35929688e+01 -2.03798983e+01
  6.21319431e+00 -5.83205450e+00  1.34439285e+00 -2.57100720e+00
 -3.71308137e+00 -2.71796659e+00 -1.01938410e+00 -2.01479888e+00
 -1.99985487e+00 -1.08145796e+00 -5.73194796e-01 -1.98293851e+00
  5.58015979e-01 -7.01134119e-01 -2.09277747e+00 -1.68820393e+00
 -7.18697613e-01 -1.70875354e+00 -5.72414940e-02 -8.12620552e-01
 -1.16881877e+00 -2.63226170e+00  1.87377296e+01  3.56539745e+01
 -1.52288851e+01 -6.65661296e+00  4.77243593e+00 -3.53599610e+00
  4.85032750e+01 -1.53342640e+01 -2.91338090e+00 -4.68651190e+02
 -3.98538187e+01 -1.47571751e+00 -2.24390442e+00 -7.20564301e+00
 -2.44558880e+00  6.66395036e+00 -6.62663390e+01  3.49255398e+01
  5.39393011e+02 -1.21870189e+01 -1.57277972e+00 -2.01858165e+00
 -2.46080775e-01 -7.25012819e-02 -6.77989253e+00  9.51433359e+01
  5.64371927e+00 -1.95352433e+01 -2.82984534e+00 -2.27011764e+00
 -3.58284144e+00 -2.65903728e+00 -2.32668693e+00 -5.13074681e+00
 -1.60647991e+00 -5.20037638e-02 -2.06267098e+00 -1.89612768e+03
 -1.89578954e+01 -1.48998813e+02  4.30814142e+01 -2.76711024e+01
 -4.63226833e-01  1.00245554e-01 -4.44867206e-01 -5.28670227e-02
 -1.43646130e+02 -1.35638527e+00 -8.45553589e-01  1.14434280e+01
 -8.40887555e-01 -8.93602758e-01 -9.50396721e-01 -7.26305723e-01
 -1.13915724e+01 -6.23076809e-01 -2.09467857e+00  1.99013418e-01
  1.29358013e+00  3.50756412e+01 -2.36580738e+00 -1.17082967e+01
 -1.33060238e+00  4.96365973e+00  2.08790938e+00  4.96191552e+00
 -3.64131198e+01 -3.78497195e+01  1.81503872e+01 -6.21116578e+00
  1.25491534e+01 -2.75099597e+00 -3.02655330e+00 -9.92613994e+00
 -9.54360697e-01 -1.92846336e+00 -2.04902834e+00 -2.56927769e+00
 -5.00843118e-01 -2.04743606e+00  1.97258659e+00 -5.95709694e-01
 -1.31159335e+00 -1.72467764e+00 -5.79658671e-01 -1.48989959e+00
 -5.64015491e-02 -8.11163152e-01 -1.18025372e+00 -2.69683717e+00
 -6.47426698e-01 -4.81385876e-01  4.76883758e+01  1.11015760e+02
 -3.08580814e+01 -1.10430354e+01  4.13236106e+01 -9.09754253e+00
  8.59027470e+01 -3.16705208e+01 -3.50782910e+00 -1.60478595e+03
 -1.07701702e+02 -4.26099734e+00 -2.38819173e+00 -1.89308283e+01
 -2.44718770e+00  1.24523442e+01 -3.04570477e+02  1.81530967e+02
 -1.77189590e+01 -3.56811208e+02  1.51228435e+03 -4.41486117e+01
 -7.96317018e+00 -1.37046749e+01 -2.06624238e+00 -3.88181241e-01
 -7.17806967e-02 -4.09474312e+01  3.26906846e+02  2.28355404e+01
 -3.64090742e+01 -2.76685186e+00  4.32299771e-01 -2.32253991e+00
 -3.39058425e+00 -2.10734425e+00 -2.50709568e+00 -1.72061158e+00
 -5.36369716e-01 -1.71899092e+00 -3.63440459e+00 -2.07578850e+00
 -1.10980259e+00 -9.53481562e-01 -6.68438039e-01 -5.82008047e-01
 -2.35628010e+00 -5.31098394e-01 -2.91669905e-01 -1.68230337e+00
 -1.21440311e+00 -4.11739485e-01 -5.74693659e-01 -4.98252215e-01
 -6.18152760e-01 -7.39876378e-01 -9.15180119e-01 -4.41785194e-01
 -2.75346887e-01 -2.22763603e-01 -1.86838808e+00 -1.58341680e+00
 -1.83703125e+00 -1.83052253e+00 -5.41264927e+00 -1.90832376e+00
 -5.80084292e+00 -1.85263553e+00 -6.50054156e+00 -1.11627618e+01
 -1.79820599e+01  7.81050134e+00 -6.09313355e+00 -2.04682018e+00
 -2.56658929e+00 -2.51435555e+00 -1.56426606e+00 -1.15213757e+00
 -1.66253267e+00 -2.24400125e+00 -7.38228847e-01 -9.73891926e-01
 -2.03419666e+00 -8.94479117e-01 -9.46873572e-01 -2.91091599e+00
 -1.79510781e+00 -1.54586637e+00 -1.52932456e+00 -1.33925457e+00
 -6.40507636e-01 -1.26051377e+00 -2.10752821e+00 -1.13675765e+00
 -1.08532384e+00 -1.01531571e+00 -9.30229505e-01 -1.06252181e+00
 -9.17259115e-01 -1.40447083e+00 -1.20843656e+00 -2.82394022e+00
 -2.57197155e+00 -3.51065395e+00 -1.55630444e-01 -2.33892448e+00
 -1.25889309e+00 -2.54387062e+00 -1.48254334e+00 -4.82590388e-01
 -2.15062950e+00 -1.10937005e-01 -2.74851666e-01 -1.46026060e+00
 -6.03994703e-01 -2.78749233e+00 -2.83233280e+00 -4.33020425e+00
 -2.96670844e+00 -2.55834455e+02 -2.91336098e+02  1.03318933e+03
 -6.07809156e+00 -2.92697336e+00 -8.72915981e+01 -2.44498701e+01
 -1.07896232e+00 -3.75198995e+00 -9.90429650e-01 -1.09718878e+00
 -3.84860446e+00 -9.21252496e-01 -1.79660674e+00 -3.60864242e+00
 -9.50512657e-01 -2.45174029e+01  2.37585883e+00  2.49258398e+00
 -1.63845390e+00 -3.79041296e+01 -6.14494467e+00 -2.69407736e+00
 -1.97598851e+00 -2.78304691e+00 -2.01908648e+00 -2.70887776e+00
 -2.27622647e+00 -3.03134136e+01 -2.80535187e+00 -3.32610683e-01
 -1.60111766e+00 -3.49805988e-01 -3.38071186e+00 -3.43652317e-01
 -1.35367716e+00  1.03884747e+02  5.83248178e+00  2.45330915e+01
 -8.29424302e-01  5.74677437e+00 -7.77027182e+01 -2.31249953e+02
 -1.06820919e+00].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

## Comparison to Another Method

We will compare to the Naive Bayes Classifier.

In [52]:
# compare with Naive Bayes Classifier
from sklearn.naive_bayes import GaussianNB
nb_clf = GaussianNB()
nb_clf.fit(X_train, y_train)
NB_scores = cross_val_score(nb_clf, X_train, y_train, scoring='accuracy', cv=5)
NB_avg_score = np.average(NB_scores)

# make a prediction
y_test_pred_nb = nb_clf.predict(X_test)
cm = confusion_matrix(y_test, y_test_pred_nb)
print("Confusion matrix for # Naive Bayes Classifier")
print(cm)
print('\nAverage model prediction accuracy using Naive Bayes Classifier: {:.2%}\n'.format(NB_avg_score))

Confusion matrix for # Naive Bayes Classifier
[[ 19   3]
 [223  69]]

Average model prediction accuracy using Naive Bayes Classifier: 34.14%



## Conclusion

The logistic regression model is an accurate model with an average prediction accuracy of 91.09%.  

We compared to the results of the Naive Bayes Classifier, and the logistic regressor is much more accurate in this application.