# Exercise 1: Precision of a Classifier

The precision of a classifier can be calculated using the formula:

$$	\text{Precision} = \frac{	\text{True Positives}}{	\text{True Positives} + 	\text{False Positives}}$$

## Calculation
We will assume the following layout for the confusion matrix: the cell in position $i,j$ in the matrix indicates the count of instances of class $i$ classificated as class $j$.

### Identify the true positives and false positives.
For each class, precision is calculated as:

$$\text{Precision}_i = \frac{\text{True Positives for Class } i}{\text{True Positives for Class } i + \text{False Positives for Class } i}$$

- **True Positives (TP)**: Values along the diagonal of the matrix.
- **False Positives (FP)**: Sum of the values in the respective column, excluding the diagonal value.

### Calculate precision for each class.
For each class, find:

1. **Class 1:**
   - TP = 97
   - FP = 28 + 71 + 14 + 42 = 155
   - $\text{Precision}_1 = \frac{97}{97 + 155} \approx 0.384$

2. **Class 2:**
   - TP = 13
   - FP = 14 + 9 + 8 + 15 = 46
   - $\text{Precision}_2 = \frac{13}{13 + 46} \approx 0.220$

3. **Class 3:**
   - TP = 92
   - FP = 45 + 31 + 22 + 42 = 140
   - $\text{Precision}_3 = \frac{92}{92 + 140} \approx 0.396$

4. **Class 4:**
   - TP = 184
   - FP = 63 + 85 + 47 + 58 = 253
   - $\text{Precision}_4 = \frac{184}{184 + 253} \approx 0.421$

5. **Class 5:**
   - TP = 187
   - FP = 83 + 93 + 115 + 34 = 325
   - $\text{Precision}_5 = \frac{187}{187 + 325} \approx 0.365$

### Calculate the overall precision.
If the precision values are averaged equally for all classes, that is using a macro averaging, then:

$\text{Macro-Averaging Precision} = \frac{0.384 + 0.220 + 0.396 + 0.421 + 0.365}{5} \approx 0.357$



# Exercise 2: BLEU score
In the following we are going to implement a bleu scoring function.

In [None]:
import evaluate


In [18]:
def tokenizer(text):
    return text.split()

def bleu_score(text:list[str] ,reference:list[str], n) -> list[float]:
    """ BLEU score calculation, the implementation only consider if the n-gram is present in the reference and not how many times it is present
    Args:
        text: list of tokenized strings, it is the prediction to evaluate
        reference: list of tokenized strings, it is the reference to compare the prediction with
        n: size of the biggest n-gram to use for evaluation
    Returns:
        list of size n of BLEU scores for each n-gram size, in order from 1 to n
    """
    if len(text) < n:
        raise ValueError("The text is too short to calculate the BLEU score with the given n-gram size")
    # Calculate the BLEU score for each n-gram size
    bleu_scores = []
    print("Text: ",text)
    print("Reference: ",reference)
    for i in range(1, n+1):
        # Calculate the precision for the current n-gram size
        precision = 0
        for j in range(len(text) - i + 1):
            ngram = text[j:j+i]
            if ngram in [reference[k:k+i] for k in range(len(reference) - i + 1)]:
                precision += 1
        precision /= len(text) - i + 1
        bleu_scores.append(precision)
    return bleu_scores


# example 1 of testing the BLEU score
print("Example 1, tokenized text and reference\n")

references = "you are four minutes too slow"
predictions = "you are four minutes late"

# Calculate the BLEU score from the implementation
bleu_scores = bleu_score(tokenizer(predictions), tokenizer(references), 4)
print("Implemented bleu score: ",bleu_scores)

# Calculate the BLEU score from a library 
bleu = evaluate.load("bleu")
true_bleu = bleu.compute(predictions = [predictions], references = [references])
print("True bleu score: ",true_bleu['precisions'],"\n")


# example 2 of testing the BLEU score
print("Example 2, tokenized text and reference\n")

ref2 = "it was at least certain that Phileas Fogg had not absented himself from London for many years"
pred2 = "what was certain however was that Phileas Fogg had not left London for many years"

# Calculate the BLEU score from the implementation 
bleu_scores = bleu_score(tokenizer(pred2), tokenizer(ref2), 4)
print("Implemented bleu score: ",bleu_scores)

# Calculate the BLEU score from a library
true_bleu = bleu.compute(predictions = [pred2], references = [ref2])
print("True bleu score: ",true_bleu['precisions'])

Example 1, tokenized text and reference

Text:  ['you', 'are', 'four', 'minutes', 'late']
Reference:  ['you', 'are', 'four', 'minutes', 'too', 'slow']
Implemented bleu score:  [0.8, 0.75, 0.6666666666666666, 0.5]
True bleu score:  [0.8, 0.75, 0.6666666666666666, 0.5] 

Example 2, tokenized text and reference

Text:  ['what', 'was', 'certain', 'however', 'was', 'that', 'Phileas', 'Fogg', 'had', 'not', 'left', 'London', 'for', 'many', 'years']
Reference:  ['it', 'was', 'at', 'least', 'certain', 'that', 'Phileas', 'Fogg', 'had', 'not', 'absented', 'himself', 'from', 'London', 'for', 'many', 'years']
Implemented bleu score:  [0.8, 0.5, 0.38461538461538464, 0.25]
True bleu score:  [0.7333333333333333, 0.5, 0.38461538461538464, 0.25]


We are still satisfied with the results of our implementation, even though we get a slightly different set of results.