# **Understand the Metric: Row-Wise Micro Averaged F1**

### Kernels 

- notebook by [@ihelon](https://www.kaggle.com/ihelon) :: [row-wise-micro-averaged-f1-score-metric](https://www.kaggle.com/ihelon/row-wise-micro-averaged-f1-score-metric)

- notebook by [@shonenkov](https://www.kaggle.com/shonenkov) :: [competition-metrics](https://www.kaggle.com/shonenkov/competition-metrics)

- https://github.com/yisaienkov/evaluations   
- https://evaluations.readthedocs.io/en/latest/

### References

- https://machinelearningmastery.com/fbeta-measure-for-machine-learning/
- https://medium.com/@douglaspsteen/beyond-the-f-1-score-a-look-at-the-f-beta-score-3743ac2ef6e3

In [None]:
# Internet ON.

!pip install -U pip
!pip install evaluations -q

# Internet OFF. You need to add evaluations dataset (see input folders).
# !pip install --no-deps '../input/evaluations/'

In [None]:
from evaluations.kaggle_2020 import row_wise_micro_averaged_f1_score
import numpy as np
import pandas as pd
import os, sys, gc
import warnings
warnings.filterwarnings(action='ignore')

# Introduction 

Since the F-score (and Fbeta) is calculated using precision and recall let's refresh first these definitions:

- `Precision`:  Precision is a metric that quantifies the number of correct positive predictions made.

    It is calculated as the ratio of correctly predicted positive examples divided by the total number of positive examples that were predicted: 

    `Precision = TruePositives / (TruePositives + FalsePositives)`

    The result is a value between 0.0 for no precision and 1.0 for full or perfect precision. The intuition for precision is that it is not concerned with false negatives and it minimizes false positives. 

- `Recall`: calculates the percentage of correct predictions for the positive class out of all positive predictions that could be made. It is calculated as the ratio of correctly predicted positive examples divided by the total number of positive examples that could be predicted.

    `Recall = TruePositives / (TruePositives + FalseNegatives)`

    The result is a value between 0.0 for no recall and 1.0 for full or perfect recall. The intuition for recall is that it is not concerned with false positives and it minimizes false negatives.

Maximizing precision will minimize the false-positive errors, whereas maximizing recall will minimize the false-negative errors.

- `F1 Score`: calculated as the harmonic mean of precision and recall, giving each the same weighting. The F-1 score can be represented by the following equation:

    `F1 = 2 x Precision x Recall / (Precision + Recall)`
    
 
- The `Fbeta score` is a generalization of the F-measure that adds a configuration parameter called beta. A default beta value is 1.0, which is the same as the F-measure. A smaller beta value, such as 0.5, gives more weight to precision and less to recall, whereas a larger beta value, such as 2.0, gives less weight to precision and more weight to recall in the calculation of the score.

It is a useful metric to use when both precision and recall are important but slightly more attention is needed on one or the other, such as when false negatives are more important than false positives, or the reverse.


# Competition metric

> From description: Submissions will be evaluated based on their `row-wise micro averaged F1 score`. For each row_id/time window, you need to provide a space delimited list of the set of unique birds that made a call beginning or ending in that time window. If there are no bird calls in a time window, use the code nocall.





- `Row-wise`: means that TP, FN, FP is calculated using every value (bird) in row

- `Micro averaged`: means that F1 is caluclated by counting the total TP, FN and FP in one row (!), after F1 for all rows are used as average

# Implementation 

If you are interested on the implementation unfold below to see the code produced by [@shonenkov](https://www.kaggle.com/shonenkov/) 

In [None]:
## see original nb: https://www.kaggle.com/shonenkov/competition-metrics

import numpy as np

def row_wise_f1_score_micro(y_true, y_pred):
    """ author @shonenkov """
    F1 = []
    for preds, trues in zip(y_pred, y_true):
        TP, FN, FP = 0, 0, 0
        preds = preds.split()
        trues = trues.split()
        for true in trues:
            if true in preds:
                TP += 1
            else:
                FN += 1
        for pred in preds:
            if pred not in trues:
                FP += 1
        F1.append(2*TP / (2*TP + FN + FP))
    return np.mean(F1)

In [None]:
train = pd.read_csv('../input/birdclef-2021/train_metadata.csv',)
train_csv = pd.read_csv("../input/birdclef-2021/train_soundscape_labels.csv")
# test_csv = pd.read_csv("../input/birdclef-2021/test.csv")
# sample_sub= pd.read_csv("../input/birdclef-2021/sample_submission.csv")

In [None]:
# len(train_csv.birds.unique()), train_csv.birds.unique()

# 'nocall', 'rubwre1', 'obnthr1', 'brnjay', 'brnjay sthwoo1',
# 'rucwar', 'grekis rucwar', 'rucwar runwre1', 'rtlhum rucwar',
#  'hofwoo1', 'hofwoo1 rucwar', 'hofwoo1 rucwar runwre1', 'runwre1',
#  'grekis', 'grekis runwre1', 'clcrob rucwar', 'clcrob',
#  'runwre1 yehcar1', 'rucwar runwre1 yehcar1', 'melbla1', 'crfpar',
#  'crfpar rucwar', 'rucwar whcpar', 'whcpar', 'hofwoo1 whcpar',
#  'crfpar runwre1', 'bobfly1', 'bobfly1 rucwar', 'grhcha1',
#  'plawre1', 'bobfly1 plawre1', 'orcpar', 'bobfly1 orfpar',

### Simple examples

In [None]:
# intuition for precision
from sklearn.metrics import precision_score

# no precision
y_true = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
y_pred = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
score = precision_score(y_true, y_pred)
print('No Precision: %.3f' % score)

# some false positives
y_true = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
y_pred = [0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
score = precision_score(y_true, y_pred)
print('Some False Positives: %.3f' % score)

# some false negatives
y_true = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
y_pred = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1]
score = precision_score(y_true, y_pred)
print('Some False Negatives: %.3f' % score)

# perfect precision
y_true = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
y_pred = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
score = precision_score(y_true, y_pred)
print('Perfect Precision: %.3f' % score)

In [None]:
## some simple tests

print('Competition metric - simple tests\n')


y_true = [
   'acafly', 
    'acowoo', 
    'aldfly',
    'nocall',
]

y_pred = [
    'acafly', 
    'acowoo', 
    'aldfly',
    'nocall',
]

y_pred1 = [
    'acafly', 
    'acowoo', 
    'aldfly',
#     'nocall',
]

# print('single sample scores\n')
print(f'F1-score with all correct:', row_wise_micro_averaged_f1_score(y_true, y_pred) )
print(f'F1-score with 3 correct:', row_wise_micro_averaged_f1_score(y_true, y_pred1) )

In [None]:
print('Competition metric - more tests\n')


print('[all equal]:', row_wise_micro_averaged_f1_score(
    y_true=['nocall', 'ameavo'], 
    y_pred=['nocall', 'ameavo'],
))

print()
print('[nothing]:', row_wise_micro_averaged_f1_score(
    y_true=['nocall', 'ameavo'], 
    y_pred=['amebit', 'amebit'],
))

print()
print('[1 correct]:', row_wise_micro_averaged_f1_score(
    y_true=['nocall', 'ameavo'], 
    y_pred=['nocall', 'amebit'],
))

print()
print('[double prediction]:', row_wise_micro_averaged_f1_score(
    y_true=['nocall', 'ameavo amebit'], 
    y_pred=['nocall', 'ameavo amebit'],
))

print()
print('[double prediction with permutation]:', row_wise_micro_averaged_f1_score(
    y_true=['nocall', 'ameavo amebit'], 
    y_pred=['nocall', 'amebit ameavo'],
))

print()
print('[semi prediction]:', row_wise_micro_averaged_f1_score(
    y_true=['nocall', 'ameavo amebit'], 
    y_pred=['nocall', 'ameavo'],
))

print()
print('[semi prediction with odd]:', row_wise_micro_averaged_f1_score(
    y_true=['nocall', 'ameavo'], 
    y_pred=['nocall', 'ameavo amebit'],
))

print()
print('[semi prediction with double odd]:', row_wise_micro_averaged_f1_score(
    y_true=['nocall', 'ameavo'], 
    y_pred=['nocall', 'ameavo amebit amecro'],
))

print()
print('[semi prediction of triple with odd]:', row_wise_micro_averaged_f1_score(
    y_true=['nocall', 'ameavo amecro'], 
    y_pred=['nocall', 'ameavo amebit amecro'],
))

#### Please, if I'm missing anything or you have any feedback leave a comment bellow

#### Fork it and start experimenting to be familiar with the metric before start the heavy job :-)