<b>CONFUSION MATRIX</b>

• The confusion matrix is one of the most powerful tools for predictive
analysis in machine learning.
• A confusion matrix gives you information about how your machine
classifier has performed, pitting properly classified examples against
misclassified examples
• In the machine learning context, a confusion matrix is a metric used
to quantify the performance of a machine learning classifier.
• The confusion matrix is used when there are two or more classes as
the output of the classifier.

<b>Actual vs predicted<b>


• A confusion matrix presents a table layout of the different outcomes
of the prediction and results of a classification problem and helps
visualize its outcomes.
• It plots a table of all the predicted and actual values of a classifier.

2x2 Confusion matrix
<b>Accuracy vs confusion matrix</b>

• Confusion matrices are useful because they give direct comparisons
of values like<b> True Positives, False Positives, True Negatives and False
Negatives.</b>

• In contrast, other machine learning classification metrics like
“Accuracy” give less useful information, as Accuracy is simply the
<b>difference between correct predictions divided by the total number of
predictions.<b>


Confusion matrix

• Confusion matrices are used to visualize important predictive analytics
like
1. Recall
2. Accuracy
3. Precision.
4. F1-Score

<b>Recall

• <b>The term recall refers to the proportion of genuine positive examples
that a predictive model has identified.</b>
• Recall is also sometimes called the hit rate, while sensitivity describes
a model’s true positive prediction rate or the recall likelihood.

<b>Recall<b>=TP/TP+FN

<b>Precision</b>

• Precision is similar to recall, in the respect that it’s concerned with
your model’s predictions of positive examples
• Precision is interested in the number of genuinely positive examples
your model identified against all the examples it labeled positive

<b> Precision= TP/TP+FN</b>


<b>Accuracy<b>

• Accuracy is the simplest. It defines your total number of true
predictions in total dataset.
• It is represented by the equation of true positive and true negative
examples divided by true positive, false positive, true negative and
false negative examples.

<b>Accuracy= TP+TN/TP+TN+FP+FN

<B>F1-Score</B>

• It is the harmonic mean of Recall and Precision. It is useful when you
need to take both Precision and Recall into account.
<B>F1-SCORE = 2*PRECISION*RECALL/PRECISION+RECALL</B>

Confusion Matrix

 <B>EXAMPLE<B>:
We'll build a logistic regression model using a heart attack
dataset to predict if a patient is at risk of a heart attack

First, we import all the necessary libraries to create the model, and
then read the dataset using pandas

In [5]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
df = pd.read_csv("heart.csv")
df.head()

Unnamed: 0,age,sex,cp,trtbps,chol,fbs,restecg,thalachh,exng,oldpeak,slp,caa,thall,output
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [4]:
df = pd.read_csv("heart.csv")
df.head()

Unnamed: 0,age,sex,cp,trtbps,chol,fbs,restecg,thalachh,exng,oldpeak,slp,caa,thall,output
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


Confusion Matrix

• Let’s split our dataset into the input features and target output
dataset.

In [10]:
y = df["output"]
x = df.drop("output", axis =1)

To make our calculations more straightforward, we will scale our data
and reduce it to a small range of values using the Standard Scaler.

In [11]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
x = pd.DataFrame(scaler.fit_transform(x))

Confusion Matrix

• Now, let's split our dataset into two: one to train our model and
another to test our model. To do this, we use train_test_split imported
from sklearn

In [13]:
X_train, X_test, y_train, y_test = train_test_split(x, y , test_size = 0.2,
random_state = 42)

Using a Logistic Regression Model, we will perform Classification on
our train data and predict our test data to check the accuracy

In [14]:
model = LogisticRegression()
model.fit(X_train, y_train)
pred = model.predict(X_test)

Confusion Matrix


Using the predicted values(pred) and our actual values(y_test), we can
create a confusion matrix with the confusion_matrix function.

In [15]:
confusion_matrix(y_test, pred)

array([[25,  4],
       [ 5, 27]], dtype=int64)

<b>• Then, using the ravel() method of our confusion_matrix function, we
can get the True Positive, True Negative, False Positive, and False
Negative values.</b>

In [17]:
tp, fp, fn, tn = confusion_matrix(y_test, pred).ravel()
(tp, fp, fn, tn)

(25, 4, 5, 27)

Confusion Matrix

• Finally, using the classification_report, we can find the values of
various metrics of our confusion matrix.

15.03.2022 Shadi.Saleh 17


In [24]:
matrix = classification_report(y_test, pred)
print( "Classification report : \n" , matrix)

Classification report : 
               precision    recall  f1-score   support

           0       0.83      0.86      0.85        29
           1       0.87      0.84      0.86        32

    accuracy                           0.85        61
   macro avg       0.85      0.85      0.85        61
weighted avg       0.85      0.85      0.85        61

