# Metrics

Let's experiment with the metrics. In the following exercises, the confusion matrices are given and you need to decide which one to use.

We'll start be defining a class for you to use, as it's easier than a dict in this case.

In [None]:
# Some preliminary code
class Confusion_matrix():

    def __init__(self, tp, tn, fp, fn):
        self.tp = tp
        self.tn = tn
        self.fp = fp
        self.fn = fn

    def __str__(self):
        return f"Confusion matrix: \n" \
                f"TP: {self.tp}\t| FP: {self.fp} \n" \
                f"FN: {self.fn}\t| TN: {self.tn}"

cats = Confusion_matrix(107, 42, 23, 69)
print(cats)

## Exercise 1

Consider two classification models, Model A and Model B, trained to detect fraudulent transactions. You have their respective confusion matrices:

In [None]:
models = {}
models["A"] = Confusion_matrix(150, 280, 50, 20)
models["B"] = Confusion_matrix(200, 260, 40, 40)

print(f"Model A: \n{models['A']}")
print(f"\nModel B: \n{models['B']}")

1. Calculate the accuracy for both models and determine which model has a higher accuracy.
1. Calculate the precision for both models and identify which model has higher precision.
1. Calculate the sensitivity for both models and identify which model has higher sensitivity.
1. Calculate the specificity for both models and identify which model has higher specificity.
1. Calculate the F1 score for both models and determine which model has a higher F1 score.

Tip: look at [the eval-function](https://realpython.com/python-eval-function/) to keep the copy-pasting to a minimum.

In [None]:
#DELETE
def accuracy(confusion_matrix):
    return (confusion_matrix.tp + confusion_matrix.tn) / (confusion_matrix.tp + confusion_matrix.tn + confusion_matrix.fp + confusion_matrix.fn)

def precision(confusion_matrix):
    return confusion_matrix.tp / (confusion_matrix.tp + confusion_matrix.fp)

def sensitivity(confusion_matrix):
    return confusion_matrix.tp / (confusion_matrix.tp + confusion_matrix.fn)

def specificity(confusion_matrix):
    return confusion_matrix.tn / (confusion_matrix.tn + confusion_matrix.fp)

def f1_score(confusion_matrix):
    return 2 * (precision(confusion_matrix) * sensitivity(confusion_matrix)) / (precision(confusion_matrix) + sensitivity(confusion_matrix))

metrics = ["accuracy","precision","sensitivity","specificity","f1_score"]

print(f"Model A: \n{models['A']}\n")
print(f"Model B: \n{models['B']}\n")

for metric in metrics:
    print(metric, end="\t")
    for model in models.values():
        print(round(eval(metric)(model),3), end="\t")
    print()
    


Now you have numbers, so you can make the following decisions easily. Remember the model was about detecting fraude in transactions.

1. You are a bank and legally need to have this model in place, but really you don't want to inconvenience your clients to much. Better let some bad ones get through than to annoy our good customers!
1. You are the government and are using this model to weed out the obviously ok transactions. Anything tagged fraudulent here will be investigated and if it is then discovered that it was fine that is no problem, but you really don't want any tax dodgers to get through!
1. You are a student who is graded on the overall performance of this model. Try to find the best performance, balanced between false positives and negatives.

In [None]:
# DELETE

# 1. Focus on low false positives, i.e. high specificity. Model B;
# 2. Focus on low false negatives, i.e. high sensitivity. Model A;
# 3. Focus a good model, high accuracy. Model A;

## Exercise 2

Consider two classification models, Model X and Model Y, trained to diagnose a specific medical condition. You have their respective confusion matrices:

In [None]:
models = {}
models["X"] = Confusion_matrix(100, 270, 10, 20)
models["Y"] = Confusion_matrix(90, 260, 20, 10)

print(f"Model X: \n{models['X']}")
print(f"\nModel Y: \n{models['Y']}")

1. Calculate the accuracy for both models and determine which model has a higher accuracy.
1. Calculate the precision for both models and identify which model has higher precision.
1. Calculate the sensitivity for both models and identify which model has higher sensitivity.
1. Calculate the specificity for both models and identify which model has higher specificity.
1. Calculate the F1 score for both models and determine which model has a higher F1 score.

Tip: don't rewrite the functions you made for exercises 1.

In [None]:
print(f"Model X: \n{models['X']}\n")
print(f"Model Y: \n{models['Y']}\n")

for metric in metrics:
    print(metric, end="\t")
    for model in models.values():
        print(round(eval(metric)(model),3), end="\t")
    print()

Remember this is a model diagnosing a certain medical condition. Write a scenario for each metric when you would want it maximized.

Inspiration for scenario's:
* A disease that can only be cured when detected fast
* A disease for which the cure can be dangerous
* An insurance-company looking to minimize cost
* A pharmaceutical company looking to boost the numbers on an illness to secure more funding

In [None]:
# DELETE

# Accuracy is an overall metric, it does not tell us how well the model performs on each class. Both are comparable, so no scenario required here.

# Precision: False positives are a problem, that means people who don't have the illness but are diagnosed with it.
# You are insurance-company looking to minimize cost. Sick people need to be treated, yes, but healthy people need to be at work paying their insurance. Some sick people may be at work to, that is ok for us.

# Sensitivity: False negatives are a problem, that means people who have the illness but are not diagnosed with it.
# You are a doctor looking to minimize risk. You want to treat all sick people, even if that means treating some healthy people too.

# specificity: Correctly identifying negative instances is crucial, which means people who don't have the illness and are not diagnosed with it.
# The difference with precision is that we are not looking to minimize false positives, we are looking to maximize true negatives.
# You are the same insurance company but got some bad reviews on the internet. You focus now still on keeping healthy people at work, but also keep in mind that sick people shouldn't be working.

# f1_score: You want to balance precision and sensitivity.
# You are a doctor looking to minimize risk, but you also want to minimize cost. You want to treat all sick people, but you also want to minimize the amount of healthy people you treat.