## Assessing Model Performance for Classification Models
### Creating a Classification Model for Computing Evaluation Metrics

In [1]:
#2 import libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

In [5]:
#3 create headers since data doesn't have any
_headers = ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety', 'car']
df = pd.read_csv('car-data.csv', names=_headers, index_col=None)

In [6]:
df.head()

Unnamed: 0,buying,maint,doors,persons,lug_boot,safety,car
0,vhigh,vhigh,2,2,small,low,unacc
1,vhigh,vhigh,2,2,small,med,unacc
2,vhigh,vhigh,2,2,small,high,unacc
3,vhigh,vhigh,2,2,med,low,unacc
4,vhigh,vhigh,2,2,med,med,unacc


In [7]:
#4 encode categorical variables
_df = pd.get_dummies(df, columns=['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety'])
_df.head()

Unnamed: 0,car,buying_high,buying_low,buying_med,buying_vhigh,maint_high,maint_low,maint_med,maint_vhigh,doors_2,...,doors_5more,persons_2,persons_4,persons_more,lug_boot_big,lug_boot_med,lug_boot_small,safety_high,safety_low,safety_med
0,unacc,0,0,0,1,0,0,0,1,1,...,0,1,0,0,0,0,1,0,1,0
1,unacc,0,0,0,1,0,0,0,1,1,...,0,1,0,0,0,0,1,0,0,1
2,unacc,0,0,0,1,0,0,0,1,1,...,0,1,0,0,0,0,1,1,0,0
3,unacc,0,0,0,1,0,0,0,1,1,...,0,1,0,0,0,1,0,0,1,0
4,unacc,0,0,0,1,0,0,0,1,1,...,0,1,0,0,0,1,0,0,0,1


In this step, you convert categorical columns into numeric columns using a technique called one-hot encoding.You need to do this because the inputs to your model must be numeric. You get numeric variables from categorical variables using get_dummies from the pandas library. You provide your DataFrame as input and specify the columns to be encoded.

In [8]:
#5 split data into training and evaluation datasets
features = _df.drop('car', axis=1).values
labels = _df['car'].values
X_train, X_eval, y_train, y_eval = train_test_split(features, labels, 
                                                    test_size=0.3,
                                                    random_state=0)
X_val, X_test, y_val, y_test = train_test_split(X_eval, y_eval, 
                                                test_size=0.5,
                                                random_state=0)


In this step, you begin by extracting your feature columns and your labels into two NumPy arrays called features and labels. You then proceed to extract 70% into X_train and y_train, with the remaining 30% going into X_eval and y_eval. You then further split X_eval and y_eval into two equal parts and assign those to X_val and y_val for validation, and X_test and y_test for testing much later.

In [9]:
#6 train a Logistic Regression Model
model = LogisticRegression()
model.fit(X_train, y_train)

LogisticRegression()

In [10]:
#7 make predictions for the validation set
y_pred = model.predict(X_val)
y_pred[0:9]

array(['unacc', 'acc', 'unacc', 'unacc', 'acc', 'acc', 'vgood', 'unacc',
       'unacc'], dtype=object)

This model works because you are able to use it to make predictions. The predictions classify each car as acceptable (acc) or unacceptable (unacc) based on the features of the car. At this point, you are ready to apply various assessments to the model.

## The Confusion Matrix

### Exercise 6.06: Generating a Confusion Matrix for the Classification Model

In [11]:
#2 import confusion_matrix
from sklearn.metrics import confusion_matrix

In [12]:
#3 generate confusion_matrix
confusion_matrix(y_val, y_pred)

array([[ 41,   1,   9,   0],
       [  7,   2,   0,   1],
       [  7,   0, 178,   0],
       [  1,   0,   0,  12]])

We can see that our data has four classes. The first column shows all of the data that should belong to the first class. The first row shows the number of predictions that were correctly placed in the first class. In this example, that number is 41. The second row shows the number of predictions that were placed in the second class but should have been in the first class. In this example, that number is 7. In the third row, you see the number of items that were predicted to be in the third class but should have been in the first class. That number is 7. Finally, in the fourth row, you see the number of items that were wrongly classified into the fourth class when they should have been in the first class. In this case, the number is 1.

## Precision
The precision is the total number of cases that were correctly classified as positive (called true positive and abbreviated as TP) divided by the total number of cases in that prediction (that is, the total number of entries in the row, both correctly classified (TP) and wrongly classified (FP) from the confusion matrix).

## Exercise 6.07: Computing Precision for the Classification Model

In [13]:
#1 import precision_score
from sklearn.metrics import precision_score

In [14]:
#2
precision_score(y_val, y_pred, average='macro')

0.8184395261601145

## Recall

Recall is the total number of predictions that were true divided by the number of predictions for the class, both true and false. Think of it as the true positive divided by the sum of entries in the column. The equation is given as follows:

## Exercise 6.08: Computing Recall for the Classification Model

In [15]:
#2 import recall_score
from sklearn.metrics import recall_score

In [16]:
recall_score(y_val, y_pred, average='macro')

0.7222901634666341

## F1 Score

The F1 score is another important parameter that helps us to evaluate the model performance. It considers the contribution of both precision and recall using the following equation:

## Exercise 6.09: Computing the F1 Score for the Classification Model

In [17]:
#2 import f1_score
from sklearn.metrics import f1_score

In [18]:
f1_score(y_val, y_pred, average='macro')

0.7385284045669938

In this step, you compute the F1 score by passing in y_val and y_pred. You also specify average='macro' because this is not binary classification. 

## Accuracy

## Exercise 6.10: Computing Model Accuracy for the Classification Model

In [19]:
#2 import accuracy_score
from sklearn.metrics import accuracy_score

In [20]:
accuracy_score(y_val, y_pred)

0.8996138996138996

## Logarithmic Loss
The logarithmic loss (or log loss) is the loss function for categorical models. It is also called categorical cross-entropy. It seeks to penalize incorrect predictions. The sklearn documentation defines it as "the negative log-likelihood of the true values given your model predictions."

## Exercise 6.11: Computing the Log Loss for the Classification Model

In [21]:
from sklearn.metrics import log_loss

In [22]:
_loss = log_loss(y_val, model.predict_proba(X_val))
print(_loss)

0.22578836758534435


In this step, you compute the log loss and store it in a variable called _loss. You need to observe something very important: previously, you made use of y_val, the ground truths, and y_pred, the predictions.

In this step, you do not make use of predictions. Instead, you make use of predicted probabilities. You see that in the code where you specify model.predict_proba(). You specify the validation dataset and it returns the predicted probabilities.

## Receiver Operating Characteristic Curve
The Receiver Operating Characteristic (ROC) curve is a plot that shows how the true positive and false positive rates vary for a model as the threshold is changed.