# Exercises
## Exercise 1 - Using the Hugging Face pipeline, apply the Named Entity Recognition (ner) model *dslim/bert-base-NER* to the sentence "My name is [Your Name Here] and I live in [Your City Here]"

In [2]:
!pip install transformers tokenizers datasets
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER")
model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER")

nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "My name is Felipe and I live in São Paulo"

ner_results = nlp(example)
print(ner_results)



Some weights of the model checkpoint at dslim/bert-base-NER were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'entity': 'B-PER', 'score': 0.99655044, 'index': 4, 'word': 'Felipe', 'start': 11, 'end': 17}, {'entity': 'B-LOC', 'score': 0.9992907, 'index': 9, 'word': 'São', 'start': 32, 'end': 35}, {'entity': 'I-LOC', 'score': 0.9993781, 'index': 10, 'word': 'Paulo', 'start': 36, 'end': 41}]


## Exercise 2

In this exercise, we will briefly explore the metrics typically employed to evaluate classification models. These metrics are commonly applied to evaluate BERT when used for text classification.

Please fill in the missing content in the following functions to calculate accuracy, precision, and recall metrics. You can refer to the definitions of these metrics on Wikipedia. Below, we present a function to test your metrics and verify the success of your implementation.

In [10]:
from sklearn.metrics import accuracy_score as accuracy
from sklearn.metrics import precision_score as precision
from sklearn.metrics import recall_score as recall
from tqdm import tqdm
import numpy as np

def test_metric(user_metric, reference_metric, num_tests=100, samples_per_test=100, seed=12345, tolerance=1e-4):
    for i in tqdm(range(num_tests)):
        y_pred = list(np.random.randint(2, size=samples_per_test))
        y_gold = list(np.random.randint(2, size=samples_per_test))
        user_metric_value = user_metric(y_gold, y_pred)
        reference_metric_value = reference_metric(y_gold, y_pred)
        if abs(user_metric_value - reference_metric_value) > tolerance:
            raise ValueError(f"Test Failed. user_metric returned {user_metric_value}, but {reference_metric_value} was expected.")
    print("\nThe function passed all tests")

In [13]:
def my_accuracy(y_gold, y_pred):
    if len(y_pred) != len(y_gold):
        raise ValueError("Input lists must have the same length.")

    true_positives = sum(1 for pred, gold in zip(y_pred, y_gold) if pred == 1.0 and gold == 1.0)
    true_negatives = sum(1 for pred, gold in zip(y_pred, y_gold) if pred == 0.0 and gold == 0.0)
    false_positives = sum(1 for pred, gold in zip(y_pred, y_gold) if pred == 1.0 and gold == 0.0)
    false_negatives = sum(1 for pred, gold in zip(y_pred, y_gold) if pred == 0.0 and gold == 1.0)

    accuracy = (true_positives + true_negatives) / (true_positives + true_negatives + false_positives + false_negatives)

    return accuracy

In [14]:
test_metric(my_accuracy, accuracy)

100%|██████████| 100/100 [00:00<00:00, 329.74it/s]


The function passed all tests





In [15]:
def my_precision(y_gold, y_pred):

    if len(y_pred) != len(y_gold):
        raise ValueError("Input lists must have the same length.")

    true_positives = sum(1 for pred, gold in zip(y_pred, y_gold) if pred == 1.0 and gold == 1.0)
    false_positives = sum(1 for pred, gold in zip(y_pred, y_gold) if pred == 1.0 and gold == 0.0)

    # Avoid division by zero
    if true_positives + false_positives == 0:
        return 0.0

    precision = true_positives / (true_positives + false_positives)


    return precision


In [16]:
test_metric(my_precision, precision)

100%|██████████| 100/100 [00:00<00:00, 503.21it/s]


The function passed all tests





In [19]:

def my_recall(y_gold, y_pred):
    if len(y_pred) != len(y_gold):
        raise ValueError("Input lists must have the same length.")
    true_positive = sum(1 for gold, pred in zip(y_gold, y_pred) if gold == 1 and pred == 1)
    actual_positive = sum(1 for gold in y_gold if gold == 1)
    if actual_positive == 0:
        recall=0
    else:
        recall = true_positive / actual_positive
        return recall

In [20]:
test_metric(my_recall, recall)

100%|██████████| 100/100 [00:00<00:00, 252.16it/s]


The function passed all tests



