# 05-5: Evaluate Classification

# Evaluate Classification
| | |
|----------|-------------|
| Author(s)   | Renato Leite (renatoleite@), Egon Soares (egon@) |
| Reviewer(s)   | Jarek Kazmierczak (jarekk@), Rajesh Thallam (rthallam@)|
| Last updated | 09/05/2023 |

## Per Class

- Dataset used for this sample
<cite>
  <a href="https://www.aclweb.org/anthology/D18-1404">CARER: Contextualized Affect Representations for Emotion Recognition</a> by Elvis Saravia, Hsien-Chi Toby Liu, Yen-Hao Huang, Junlin Wu, and Yi-Shin Chen. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3687-3697, Brussels, Belgium, October-November 2018. Association for Computational Linguistics.
</cite>

In [None]:
# from https://github.com/dair-ai/emotion_dataset - modified to binary classification
texts = [
  'i left with my bouquet of red and yellow tulips under my arm feeling slightly more optimistic than when i arrived',
  'i explain why i clung to a relationship with a boy who was in many ways immature and uncommitted despite the excitement i should have been feeling for getting accepted into the masters program at the university of virginia',
  'i like to have the same breathless feeling as a reader eager to see what will happen next',
  'i jest i feel grumpy tired and pre menstrual which i probably am but then again its only been a week and im about as fit as a walrus on vacation for the summer',
  'i don t feel particularly agitated',
  'i feel beautifully emotional knowing that these women of whom i knew just a handful were holding me and my baba on our journey',
  'i pay attention it deepens into a feeling of being invaded and helpless',
  'i just feel extremely comfortable with the group of people that i dont even need to hide myself',
  'i find myself in the odd position of feeling supportive of',
  'i was feeling as heartbroken as im sure katniss was',
  'i feel a little mellow today',
  'i feel like my only role now would be to tear your sails with my pessimism and discontent',
  'i feel just bcoz a fight we get mad to each other n u wanna make a publicity n let the world knows about our fight',
  'i feel like reds and purples are just so rich and kind of perfect']

# Positive Sentiment = 1
# Negative Sentiment = 0
ground_truth = [ 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1]

# Sample prediction
predicted = [ 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1]

In [None]:
def count_tp_fp_fn(ground_truth_list: list, predicted_list: list, positive_class) -> tuple:
    true_positives = 0
    false_positives = 0
    false_negatives = 0
    
    for i in range(len(ground_truth_list)):
        if ground_truth_list[i] == positive_class:
            if predicted_list[i] == positive_class:
                true_positives += 1
            else:
                false_negatives += 1
        elif predicted_list[i] == positive_class:
            false_positives += 1

    return true_positives, false_positives, false_negatives

In [None]:
# Sample results
positive_class = 1

true_positives, false_positives, false_negatives = count_tp_fp_fn(ground_truth, predicted, positive_class)

print(f"True Positives: {true_positives}")
print(f"False Positives: {false_positives}")
print(f"False Negatives: {false_negatives}")

### F1 Score

$precision = \frac{TP}{TP + FP}$

In [None]:
precision = true_positives / (true_positives + false_positives)
print(f"Precision: {precision:.3f}")

$recall = \frac{TP}{TP+FN}$

In [None]:
recall = true_positives / (true_positives + false_negatives)
print(f"Recall: {recall:.3f}")

In [None]:
print(f"Precision: {precision:.3f}")
print(f"Recall: {recall:.3f}")

First Method: using precision and recall

$F_1 = \cfrac{2}{\cfrac{1}{precision}+\cfrac{1}{recall}}$

In [None]:
f1_score_a = 2 / ((1 / precision) + (1 / recall))
print(f"F1 Score calculated using precision and recall: {f1_score_a:.3f}")

Second method using TP, FP and FN

$F_1 = \cfrac{TP}{TP + \cfrac{FP+FN}{2}}$

In [None]:
f1_score_b = true_positives / (true_positives + (false_positives + false_negatives) / 2)
print(f"F1 Score calculated using TP FP and FN: {f1_score_b:.3f}")

In [None]:
import math
print(f"The two f1 scores are equal? {f1_score_a == f1_score_b}")
print(f"The two f1 scores are close up to 15 decimal places? {math.isclose(f1_score_a, f1_score_b, abs_tol=0.0000000000000001)}")
print(f1_score_a)
print(f1_score_b)

## Multiclass

- Dataset used for this sample
<cite>
  <a href="https://www.aclweb.org/anthology/D18-1404">CARER: Contextualized Affect Representations for Emotion Recognition</a> by Elvis Saravia, Hsien-Chi Toby Liu, Yen-Hao Huang, Junlin Wu, and Yi-Shin Chen. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3687-3697, Brussels, Belgium, October-November 2018. Association for Computational Linguistics.
</cite>

In [None]:
# from https://github.com/dair-ai/emotion_dataset
multi_class_texts = ['im feeling rather rotten so im not very ambitious right now',
  'im updating my blog because i feel shitty',
  'i never make her separate from me because i don t ever want her to feel like i m ashamed with her',
  'i left with my bouquet of red and yellow tulips under my arm feeling slightly more optimistic than when i arrived',
  'i was feeling a little vain when i did this one',
  'i cant walk into a shop anywhere where i do not feel uncomfortable',
  'i felt anger when at the end of a telephone call',
  'i explain why i clung to a relationship with a boy who was in many ways immature and uncommitted despite the excitement i should have been feeling for getting accepted into the masters program at the university of virginia',
  'i like to have the same breathless feeling as a reader eager to see what will happen next',
  'i jest i feel grumpy tired and pre menstrual which i probably am but then again its only been a week and im about as fit as a walrus on vacation for the summer',
  'i don t feel particularly agitated',
  'i feel beautifully emotional knowing that these women of whom i knew just a handful were holding me and my baba on our journey',
  'i pay attention it deepens into a feeling of being invaded and helpless',
  'i just feel extremely comfortable with the group of people that i dont even need to hide myself',
  'i find myself in the odd position of feeling supportive of',
  'i was feeling as heartbroken as im sure katniss was',
  'i feel a little mellow today',
  'i feel like my only role now would be to tear your sails with my pessimism and discontent',
  'i feel just bcoz a fight we get mad to each other n u wanna make a publicity n let the world knows about our fight',
  'i feel like reds and purples are just so rich and kind of perfect']


# 0: 'sadness'
# 1: 'joy'
# 2: 'love'
# 3: 'anger'
# 4: 'fear'
# 5: 'surprise'
ground_truth_multi = [0, 0, 0, 1, 0, 4, 3, 1, 1, 3, 4, 0, 4, 1, 2, 0, 1, 0, 3, 1]
predicted_multi =    [0, 1, 2, 1, 2, 4, 3, 3, 1, 4, 4, 0, 4, 1, 2, 0, 1, 0, 3, 1]

In [None]:
# Sample Results
n_class = 5
multiclass_results_list = [count_tp_fp_fn(ground_truth_multi, predicted_multi, i) for i in range(n_class)]
true_positives_list = [class_result[0] for class_result in multiclass_results_list]
false_positives_list = [class_result[1] for class_result in multiclass_results_list]
false_negatives_list = [class_result[2] for class_result in multiclass_results_list]

In [None]:
true_positives_list

In [None]:
false_positives_list

In [None]:
false_negatives_list

### MacroF1

$Macro F_1 = \cfrac{\sum_{i=1}^{n} F1 Score_i}{n}$

Example for 2 classes

In [None]:
f1_score_0 = true_positives_list[0] / (true_positives_list[0] + (false_positives_list[0] + false_negatives_list[0]) / 2)
f1_score_1 = true_positives_list[1] / (true_positives_list[1] + (false_positives_list[1] + false_negatives_list[1]) / 2)

In [None]:
macro_f1_score = (f1_score_0 + f1_score_1) / 2

print(macro_f1_score)

Example for all classes

In [None]:
f1_scores = [true_positives_list[i] / (true_positives_list[i] + (false_positives_list[i] + false_negatives_list[i]) / 2) for i in range(n_class)]

In [None]:
print(f1_scores)

In [None]:
macro_f1_score = sum(f1_scores) / len(f1_scores)

print(macro_f1_score)

In [None]:
from statistics import mean

In [None]:
mean(f1_scores)

### MicroF1

$Micro F_1 = \cfrac{\sum_{i=1}^{n} TP_i}{\sum_{i=1}^{n} TP_i + \cfrac{\sum_{i=1}^{n} FP_i + \sum_{i=1}^{n} FN_i}{2}}$

In [None]:
micro_f1_score = sum(true_positives_list) / (sum(true_positives_list) + ((sum(false_positives_list) + sum(false_negatives_list))/2))

In [None]:
print(micro_f1_score)

In [None]:
tp_sum = sum(true_positives_list)
fp_sum = sum(false_positives_list)
fn_sum = sum(false_negatives_list)

In [None]:
micro_f1_score = tp_sum / (tp_sum + (fp_sum + fn_sum) / 2)

In [None]:
print(micro_f1_score)

## Scikit Learn

In [None]:
!pip install -U scikit-learn

In [None]:
from sklearn.metrics import f1_score

In [None]:
# Per class
f1_score(ground_truth_multi, predicted_multi, average=None)

In [None]:
# Macro
f1_score(ground_truth_multi, predicted_multi, average='macro')

In [None]:
# Micro
f1_score(ground_truth_multi, predicted_multi, average='micro')