1. Elegir una tarea (diferente de “text classification”). En el siguiente enlace, en la 
sección de “Natural Language Processing” se pueden inspeccionar diferentes tareas
de minería de texto:

Elegimos question answering y lo probamos con 2 modelos diferentes 

In [26]:
from transformers import pipeline

# Cargar el pipeline de QA con el modelo especificado
qa_pipeline = pipeline("question-answering", model="bert-large-uncased-whole-word-masking-finetuned-squad")

# Contexto y pregunta
contexto = """
El Amazonas es el río más largo del mundo, ubicado en América del Sur. Su longitud supera los 7,000 kilómetros.
"""
pregunta = "¿Dónde está ubicado el Amazonas?"

# Realizar la pregunta
respuesta = qa_pipeline(question=pregunta, context=contexto)

# Mostrar la respuesta
print(f"Pregunta: {pregunta}")
print(f"Respuesta: {respuesta['answer']}")


Some weights of the model checkpoint at bert-large-uncased-whole-word-masking-finetuned-squad were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Pregunta: ¿Dónde está ubicado el Amazonas?
Respuesta: el río más largo del mundo, ubicado en América del Sur


In [11]:
from transformers import pipeline

# Cargar pipeline con otro modelo en español
qa_pipeline = pipeline(
    "question-answering",
    model="PlanTL-GOB-ES/roberta-base-bne-sqac",
    tokenizer="PlanTL-GOB-ES/roberta-base-bne-sqac",
    framework="pt"
)

contexto = "El Amazonas es el río más largo del mundo, ubicado en América del Sur."
pregunta = "¿Dónde está ubicado el Amazonas?"

respuesta = qa_pipeline(question=pregunta, context=contexto)
print(f"Pregunta: {pregunta}")
print(f"Respuesta: {respuesta['answer']}")


Pregunta: ¿Dónde está ubicado el Amazonas?
Respuesta: América del Sur


4. Evaluar sobre el dataset elegido y hacer una comparativa de los modelos.

In [None]:
%pip install transformers datasets evaluate

In [28]:
import pandas as pd
from evaluate import load
from datasets import Dataset
from transformers import pipeline

# Cargar las métricas
metric_squad = load("squad_v2")  # Usar "squad" si no se espera respuesta nula
metric_bleu = load("bleu")  # Métrica BLEU

# Datos de ejemplo
data = Dataset.from_dict({
    "context": [
        "El Amazonas es el río más largo del mundo, ubicado en América del Sur."
    ],
    "question": [
        "¿Dónde está ubicado el Amazonas?"
    ],
    "answers": [
        {"text": ["América del Sur"], "answer_start": [54]}
    ]
})

# Definir modelos a comparar
models = {
    "PlanTL-GOB-ES/roberta-base-bne-sqac": pipeline("question-answering", model="PlanTL-GOB-ES/roberta-base-bne-sqac"),
    "bert-large-uncased-whole-word-masking-finetuned-squad": pipeline("question-answering", model="bert-large-uncased-whole-word-masking-finetuned-squad"),
}

# Crear lista para almacenar métricas de cada modelo
metrics_data = []

# Evaluación de los modelos
for model_name, qa_pipeline in models.items():
    predictions = []
    references = []

    for example in data:
        # Realizar la predicción
        prediction = qa_pipeline(
            question=example["question"],
            context=example["context"]
        )["answer"]

        # Agregar predicciones y referencias en el formato esperado
        predictions.append({
            "id": str(example["question"][0]),  # Usando la pregunta como ID
            "prediction_text": prediction,
            "no_answer_probability": 0.0  # Ajustar si es una pregunta sin respuesta
        })
        references.append({
            "id": str(example["question"][0]),
            "answers": [{"text": example["answers"]["text"][0], "answer_start": example["answers"]["answer_start"][0]}]
        })

    # Calcular las métricas SQUAD (F1 Score)
    squad_results = metric_squad.compute(predictions=predictions, references=references)
    f1_score = squad_results["f1"]

    # Calcular BLEU score
    bleu_results = metric_bleu.compute(predictions=[pred["prediction_text"] for pred in predictions], references=[[ref["answers"][0]["text"]] for ref in references])
    bleu_score = bleu_results["bleu"]

    # Almacenar las métricas en la lista
    metrics_data.append({
        "Modelo": model_name,
        "F1 Score": f1_score,
        "BLEU Score": bleu_score
    })

# Crear un DataFrame de pandas para mostrar la tabla
df_metrics = pd.DataFrame(metrics_data)

# Mostrar la tabla de métricas
print(df_metrics)


Some weights of the model checkpoint at bert-large-uncased-whole-word-masking-finetuned-squad were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


                                              Modelo    F1 Score  BLEU Score
0                PlanTL-GOB-ES/roberta-base-bne-sqac  100.000000         0.0
1  bert-large-uncased-whole-word-masking-finetune...   42.857143         0.0


In [23]:
%pip install gradio

Defaulting to user installation because normal site-packages is not writeable
Collecting gradio
  Downloading gradio-5.8.0-py3-none-any.whl.metadata (16 kB)
Collecting aiofiles<24.0,>=22.0 (from gradio)
  Downloading aiofiles-23.2.1-py3-none-any.whl.metadata (9.7 kB)
Collecting fastapi<1.0,>=0.115.2 (from gradio)
  Downloading fastapi-0.115.6-py3-none-any.whl.metadata (27 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.4.0-py3-none-any.whl.metadata (2.9 kB)
Collecting gradio-client==1.5.1 (from gradio)
  Downloading gradio_client-1.5.1-py3-none-any.whl.metadata (7.1 kB)
Collecting huggingface-hub>=0.25.1 (from gradio)
  Downloading huggingface_hub-0.26.5-py3-none-any.whl.metadata (13 kB)
Collecting orjson~=3.0 (from gradio)
  Downloading orjson-3.10.12-cp312-none-win_amd64.whl.metadata (42 kB)
Collecting pydantic>=2.0 (from gradio)
  Downloading pydantic-2.10.3-py3-none-any.whl.metadata (172 kB)
Collecting pydub (from gradio)
  Downloading pydub-0.25.1-py2.py3-none-any.whl.met

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lightning 2.0.0 requires fastapi<0.89.0, but you have fastapi 0.115.6 which is incompatible.


In [None]:
import gradio as gr
from transformers import pipeline

# Cargar el pipeline de QA con el modelo elegido
qa_pipeline = pipeline(
    "question-answering",
    model="PlanTL-GOB-ES/roberta-base-bne-sqac",
    tokenizer="PlanTL-GOB-ES/roberta-base-bne-sqac",
    framework="pt"
)

# Función para realizar la predicción
def responder_pregunta(contexto, pregunta):
    respuesta = qa_pipeline(question=pregunta, context=contexto)
    return respuesta['answer']

# Crear la interfaz de Gradio
interface = gr.Interface(
    fn=responder_pregunta, 
    inputs=[
        gr.Textbox(label="Contexto", placeholder="Introduce el contexto aquí...", lines=5),
        gr.Textbox(label="Pregunta", placeholder="Introduce la pregunta aquí...")
    ], 
    outputs=gr.Textbox(label="Respuesta"),
    live=True
)

# Iniciar la interfaz
interface.launch(share=True)