# Clasificación de tiro de cero con Bart Large

En este notebook se muestra un ejemplo de los resultados que se obtienen si tratamos de clasificar sentencias en múltiples categorías con un modelo que no ha sido entrenado para resolver esta tarea.

In [None]:
!pip install transformers
!pip install datasets

In [28]:
from transformers import pipeline
from datasets import load_dataset, DatasetDict, Dataset
from sklearn.metrics import accuracy_score
import numpy as np

Importamos el modelo facebook/bart-large-mnli

In [38]:
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

def classification(text):
    sequence_to_classify = text
    candidate_labels = ['service', 'metric', 'objective', 'remedy', 'claim', 'exception', 'definition']
    return classifier(sequence_to_classify, candidate_labels, multi_label=True)

In [None]:
data_files = {"validation": "validation.csv"}
dataset = load_dataset("csv", data_files=data_files)

dataset

In [62]:
sents_test = dataset['validation'][:]['text']

Aplicamos el modelo al conjunto de sentencias.

In [63]:
i=1
dic_details = {}
for sentence in sents_test:
  c_like = classification(sentence)
  results = [1 if x > 0.5 else 0 for x in c_like['scores']]
  dic_details[sentence] = results
  i = i+1

Obtenemos las labels del conjunto de validación

In [65]:
labels = [label for label in dataset['validation'].features.keys() if label not in ['text', 'obligation', 'right', 'neither']]

def preprocess_data(examples):
    text = examples["text"]
    labels_batch = {k: examples[k] for k in examples.keys() if k in labels}
    labels_matrix = np.zeros((len(text), len(labels)))
    for idx, label in enumerate(labels):
        labels_matrix[:, idx] = labels_batch[label]

    examples["labels"] = labels_matrix.tolist()
    
    return examples

In [66]:
cols_to_remove = dataset['validation'].column_names
cols_to_remove.remove('text')
real_results = dataset.map(preprocess_data, batched=True, remove_columns=cols_to_remove)
real_results

Loading cached processed dataset at C:\Users\elena\.cache\huggingface\datasets\csv\default-3efa38e8206a7f29\0.0.0\6b34fb8fcf56f7c8ba51dc895bfa2bfbe43546f190a60fcf74bb5e8afdcc2317\cache-536ca14b77fc3b57.arrow


DatasetDict({
    validation: Dataset({
        features: ['text', 'labels'],
        num_rows: 51
    })
})

Obtenemos los resultados clasificados manualmente y los determinados por el modelo.

In [72]:
results = real_results['validation'][:]['labels']
predictions = [list(dic_details.values())[i] for i in range(len(dic_details))]
predictions = np.array(predictions).astype(np.float32).tolist()

In [73]:

accuracy = accuracy_score(results, predictions)
print("Accuracy:", accuracy)

Accuracy: 0.2549019607843137


Para la realización de este notebook se ha tomado de referencia algunos de los recursos disponibles en Hugging face. Por ejemplo:

- **BART**: https://huggingface.co/docs/transformers/model_doc/bart
- **NLI-based Zero Shot Text Classification**: https://huggingface.co/facebook/bart-large-mnli