# Evaluating Classifier Performance

How can we evaluate how well our classifier is making predictions? The simplest way is to determine what fraction of the predictions on our test dataset are made correctly. Recall that we use our training dataset for fitting our model to the data, and then we evaluate performance by making predictions on the features of our test data and comparing the predictions to the corresponding target variables for our test data. 

Let's use an example to walk through this. Imagine that our test dataset consists of the 10 observations below:

| $x_1$<br>Sepal width| $x_2$<br>Petal width | $y$<br>Iris Species |
| ----- | ------ | ---- |
| 3.4 | 0.2	| setosa     |
| 2.9 | 1.3	| versicolor |
| 3.2 | 2.3	| virginica  |
| 3   | 2.2	| virginica  |
| 3	  | 2.3	| virginica  |
| 3.8 | 2	| virginica  |
| 3   | 0.3	| setosa     |
| 3   | 1.5	| versicolor |
| 2.9 | 1.3	| versicolor |
| 3.1 | 0.2	| setosa     |

Now let's say we use the features of these data to make a *prediction* of the target class label for each of these 10 observations. In other words, we use our classification algorithm, $f$ to make a prediction of $y$ that we refer to as $\hat{y}$. In this case, $\hat{y}$ is based on our two features of sepal width ($x_1$) and petal width ($x_2$), such that $\hat{y} = f(x_1, x_2)$.

Let's add a column to this table with each of our predictions from our classification algorithm

| $x_1$<br>Sepal width| $x_2$<br>Petal width | $y$<br>Iris Species | $\hat{y}$<br>Iris Species **prediction**|
| ----- | ------ | ---- | ---- |
| 3.4 | 0.2	| setosa     | setosa     |
| 2.9 | 1.3	| versicolor | versicolor |
| 3.2 | 2.3	| virginica  | virginica  |
| 3   | 2.2	| **virginica**  | **setosa**     |
| 3	  | 2.3	| **virginica**  | **setosa**     |
| 3.8 | 2	| virginica  | virginica  |
| 3   | 0.3	| setosa     | setosa     |
| 3   | 1.5	| versicolor | versicolor |
| 2.9 | 1.3	| **versicolor** | **virginica**  |
| 3.1 | 0.2	| setosa     | setosa     |

While our predictions ($\hat{y}$) generally match the true test data target variable values ($y$), there are three discrepancies in bold where the prediction did not match the target variable value. That means that 7 out of the 10 predictions were correct. Another way of saying this is that the classification algorithm was correct for 7/10 of the values. In other words, it was 70% accurate on the test data set. Overall accuracy is defined as the fraction of all observations that are correctly classified. We generally seek to create predictive algorithms that are as accurate as possible.

Now, it's your turn - write a function that computes the accuracy of predictions when you input the target variable values ($y$) and the predictions ($\hat{y}$).

In [None]:
y = ["setosa", "versicolor", "virginica", "virginica", "virginica", "virginica", "setosa", "versicolor", "versicolor", "setosa"]
y_hat = ["setosa", "versicolor", "virginica", "setosa", "setosa", "virginica", "setosa", "versicolor", "verginica", "setosa"]

# Compute classification accuracy
def accuracy(y,y_hat):
    # Your code here

It's always good to test your code - make sure that the result produced is 0.7. Try this before moving on to view the solution

## Solution

In [3]:
import numpy as np

y = np.array(["setosa", "versicolor", "virginica", "virginica", "virginica", "virginica", "setosa", "versicolor", "versicolor", "setosa"])
y_hat = np.array(["setosa", "versicolor", "virginica", "setosa", "setosa", "virginica", "setosa", "versicolor", "verginica", "setosa"])

# Metric of overall classification accuracy
#  both y and y_hat should be numpy arrays
def accuracy(y,y_hat):
    nvalues = len(y)
    accuracy = sum(y == y_hat) / nvalues
    return accuracy

accuracy(y,y_hat)

0.7

We will use this function to evaluate our classification algorithm going forward.