### Codio Activity 12.4: Accuracy, Precision, and Recall

**Expected Time: 60 Minutes**

**Total Points: 55**

This activity focuses on differentiating between three classification metrics -- accuracy, precision, and recall.  Depending on the situation you may have different perspectives.  In this assignment, you will use the scikit-learn metrics to evaluate and compare performance metrics.  In the next assignment, you will use confusion matrices to visually intuit these ideas.  

#### Index

- [Problem 1](#Problem-1)
- [Problem 2](#Problem-2)
- [Problem 3](#Problem-3)
- [Problem 4](#Problem-4)
- [Problem 5](#Problem-5)
- [Problem 6](#Problem-6)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, recall_score, precision_score
from sklearn.datasets import load_breast_cancer
from sklearn import set_config

set_config(display="diagram")

### The Data

Your dataset for this problem will be a built in dataset from scikitlearn containing measurements determined from images of breast cancer tumors and the label of malignant or benign.  There are 30 features and the target feature.  The data is loaded and split below. 
<p>Target = 0 means the cancer is malignant, Target = 1 means the cancer is benign

In [None]:
cancer = load_breast_cancer(as_frame=True)

In [None]:
df = cancer.frame

In [None]:
df.head()

In [None]:
df['target'] = np.where(df['target'] == 0, 'malignant', 'benign')

In [None]:
sns.countplot(data=df, x = 'target')
plt.title('Count of target observations');

In [None]:
X_train, X_test, y_train, y_test = train_test_split(df.drop('target', axis = 1), df.target, 
                                                    random_state = 42,
                                                   stratify = df.target)

[Back to top](#-Index)

### Problem 1

#### Setting a Baseline

**5 Points**

It is always important to get in the habit of checking the baseline score for a classification model.  Here, when splitting the data the `stratify` argument was used so that both the train and test set would have a similar proportion of classes.  This can be seen below.  Using this data, what is a baseline score for the model that predicts the majority class for all data points?  Enter your answer as a string to `baseline` below.

```
a) 37% accuracy
b) 63% accuracy
c) 50% accuracy
d) 100% accuracy
```

In [None]:
y_test.value_counts(normalize = True)


In [None]:
y_train.value_counts(normalize = True)

In [None]:
### GRADED

baseline = ''

# YOUR CODE HERE
raise NotImplementedError()

# Answer check
print(baseline)

[Back to top](#-Index)

### Problem 2

#### Pipeline for scaling and KNN

**10 Points**

To begin, create a pipeline `knn_pipe` with named steps `scale` and `knn` that uses the `StandardScaler` followed by the `KNeighborsClassifier` with `n_neighbors = 10`. Use the `fit` function on `knn_pipe` to train the pipeline on `X_train` and `y_train`.

In [None]:
### GRADED

knn_pipe = ''

# YOUR CODE HERE
raise NotImplementedError()

# Answer check
knn_pipe

[Back to top](#-Index)

### Problem 3

#### Evaluating your classifier

**10 Points**

Three scoring methods have been imported from scikit-learn that are used by comparing predictions to actual values.  Choose which method from `precision_score`, `recall_score`, and `accuracy_score` indicate fewer false positives (where a higher score means FEWER false positives). 

To achieve this, use the `precision_score` function with arguments `y_test` and `knn_pipe.predict(X_test)` and with `pos_label`  equal to `'malignant'`. Assign yoour result to `min_fp`.


In [None]:
### GRADED

min_fp = ''

# YOUR CODE HERE
raise NotImplementedError()

# Answer check
print(min_fp)

[Back to top](#-Index)

### Problem 4

#### Right kind of mistakes

**10 Points**

In this situation, which mistake is more detrimental to the patient if we attempt to use our algorithm to classify tumors as malignant or benign.  Would you rather avoid false positives or false negatives?  What metric does this mean we should use here? Enter your answer as a string to `best_metric` below -- `precision`, `recall`, or `accuracy`?

In [None]:
### GRADED

best_metric = ''

# YOUR CODE HERE
raise NotImplementedError()

# Answer check
print(best_metric)

[Back to top](#-Index)

### Problem 5

#### Improving a model based on specific metric

**10 Points**

Before, when using the `GridSearchCV` the best model has been selected using the default scoring method of the estimator.  You can change this behavior by passing an appropriate metric to the `scoring` argument. 

- Use the `map` function on `y_train` with arugument equal to `target_map`. Assign your result to `y_train_numeric`.
- Use the `map` function on `y_test` with arugument equal to `target_map`. Assign your result to `y_test_numeric`.
- Use the `GridSearchCV` function to implement a grid search on `knn_pipe` for odd numbers of neighbors from 1 to 21 where `recall` is the scoring metric used. Assign the resul to `recall_grid`.
- Use the `fit` function on `recall_grid` to train your model using `X_train` and `y_train_numeric`.
- Use the `score` function on `recall_grid` to calculate the best model using `X_test` and  `y_test_numeric`. Assing your result to `best_score`.

In [None]:
target_map = {'malignant': 1, 'benign': 0}

In [None]:
### GRADED

y_train_numeric = ''
y_test_numeric = ''
recall_grid = ''


# YOUR CODE HERE
raise NotImplementedError()

# Answer check
print(f'The best recall score is: {best_score: .2f}')

[Back to top](#-Index)

### Problem 6

#### Verifying the score

**10 Points**

Use your `recall_grid` to make predictions on the test data and assign to preds.  Use these predictions to count the number of false negatives and true positives.  Assign these as integers to `fn` and `tp` respectively below.  This should show that the grid search scoring method has been changed to recall.  

In [None]:
### GRADED
recall_preds = ''
fp = ''
tp = ''


# YOUR CODE HERE
raise NotImplementedError()

### ANSWER CHECK
print(f'Recall by hand is: {tp/(tp + fn): .2f}')

In other situations, a different metric may make sense.  Here, a specific kind of error -- labeling a cancerous tumor as not so -- is something we certainly want to avoid.  In the next activity, you will continue to consider these issues using confusion matrices to unpack the errors and how changing parameters of the estimator effects this.