# **Question & Answer**

Creating a Question-Answer Transformer model or QA Transformer can be beneficial for several reasons, particularly in the field of Natural Language Processing (NLP). Here are some compelling reasons why you might want to develop a QA Transformer:


1. Question-Answering Systems: QA Transformers are designed to provide accurate and contextually relevant answers to questions posed in natural language. These systems have a wide range of practical applications, including chatbots, virtual assistants, customer support, and information retrieval.
2. Information Retrieval: QA Transformers can be used to search through large corpora of text and extract precise answers to user queries. This can improve the efficiency and effectiveness of information retrieval systems.
3. Document Summarization: QA Transformers can be used to summarize long documents by answering questions about the document's content. This makes it easier for users to quickly understand the key points and relevant information in a text.

4. Education and E-Learning: QA Transformers can be integrated into educational platforms to provide instant answers and explanations to students' questions. They can also help with the automatic generation of quiz questions and answers.

5. Content Generation: QA Transformers can assist in content generation by automatically answering questions based on available knowledge. This can be useful for generating FAQs, product descriptions, and informative articles.

6. Customer Support: Many companies use QA systems to automate responses to frequently asked questions, freeing up human agents to handle more complex queries and providing customers with quick solutions.

7. Medical Diagnosis: QA Transformers can assist medical professionals by answering questions related to patient records, medical literature, and diagnostic information, potentially leading to faster and more accurate diagnoses.

8. Legal and Compliance: In the legal field, QA Transformers can be used to search and extract information from legal documents, assisting lawyers in their research and case preparation.

9. Language Translation: QA Transformers can be used to answer questions about language translation, helping users understand the meaning of words, phrases, or sentences in different languages.

10. Scientific Research: QA Transformers can support researchers by answering questions related to scientific literature, allowing them to quickly access relevant information for their studies.

11. Decision Support: QA Transformers can aid in decision-making processes by providing answers to questions related to data analysis, market research, and business intelligence.

12. Accessibility: QA Transformers can improve accessibility for individuals with disabilities by providing spoken or written answers to their questions, helping them access information more easily.

Overall, QA Transformers have the potential to enhance information retrieval, automation, and user interaction in various domains, making them a valuable tool in the development of intelligent systems and applications. The ability to provide accurate and context-aware answers to questions in natural language is a key advantage of these models.




Exercise:

Now, as a data scientist expert in NLP, you are asked to create a model to be able to answer question in Spanish. Your stakeholders will pass you an article and one question and your model should answer it.

In [1]:

!pip install requests beautifulsoup4



In [2]:
import requests
from bs4 import BeautifulSoup

# URL del artículo
url = "https://time.com/collection/time100-ai/6309026/geoffrey-hinton/"

# Realizar una solicitud HTTP para obtener el contenido de la página
response = requests.get(url)

# Verificar si la solicitud fue exitosa
if response.status_code == 200:
    # Analizar el contenido HTML de la página con BeautifulSoup
    soup = BeautifulSoup(response.text, "html.parser")

    # Encontrar el contenido del artículo (puedes inspeccionar el HTML de la página para encontrar la estructura adecuada)
    article_content = soup.find("div", {"class": "article-content"})

    # Extraer el texto del artículo
    article_text = ""
    for paragraph in article_content.find_all("p"):
        article_text += paragraph.get_text() + "\n"

    # Imprimir el texto del artículo
    print(article_text)
else:
    print("Error al obtener la página:", response.status_code)

Over the course of February, Geoffrey Hinton, one of the most influential AI researchers of the past 50 years, had a “slow eureka moment.”
Hinton, 76, has spent his career trying to build AI systems that model the human brain, mostly in academia before joining Google in 2013. He had always believed that the brain was better than the machines that he and others were building, and that by making them more like the brain, they would improve. But in February, he realized “the digital intelligence we’ve got now may be better than the brain already. It’s just not scaled up quite as big.” 
Developers around the world are currently racing to build the biggest AI systems that they can. Given the current rate at which AI companies are increasing the size of models, it could be less than five years until AI systems have 100 trillion connections—roughly as many as there are between neurons in the human brain.
Alarmed, Hinton left his post as VP and engineering fellow in May and gave a flurry of in

In [3]:

question = "How is Geoffrey Hinton?"

In [7]:

#hide_output
from transformers import pipeline

classifier = pipeline("text-classification")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]



In [17]:
from transformers import pipeline, AutoTokenizer
import pandas as pd


#   Definir el tokenizador para dividir el texto en fragmentos menores a 512 tokens
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

def split_text(text, max_length=512):
    words = text.split()
    chunks = []
    current_chunk = []

    for word in words:
        current_chunk.append(word)
        if len(tokenizer(' '.join(current_chunk))['input_ids']) > max_length:
            chunks.append(' '.join(current_chunk[:-1]))
            current_chunk = [word]

    if current_chunk:
        chunks.append(' '.join(current_chunk))

    return chunks

# Dividir el artículo en fragmentos manejables
text_chunks = split_text(article_text)

#NER - Reconocimiento de Entidades Nombradas
ner_tagger = pipeline("ner", aggregation_strategy="simple")
ner_outputs = []

for chunk in text_chunks:
    ner_outputs.extend(ner_tagger(chunk))

df_ner = pd.DataFrame(ner_outputs)
print("NER Output:")
print(df_ner)


#Resumen
summarizer = pipeline("summarization")
summary_outputs = []

for chunk in text_chunks:
    summary_result = summarizer(chunk, max_length=130, min_length=30, clean_up_tokenization_spaces=True)
    summary_outputs.append(summary_result[0]['summary_text'])

print("\nSummary Output:")
for summary in summary_outputs:
    print(summary)

#Traducir el resumen al español
translator = pipeline("translation_en_to_es", model="Helsinki-NLP/opus-mt-en-es")
translation_outputs = []

for summary in summary_outputs:
    translation_result = translator(summary, clean_up_tokenization_spaces=True)
    translation_outputs.append(translation_result[0]['translation_text'])

print("\nTranslation Output (Ingles a español):")
for translation in translation_outputs:
    print(translation)

Token indices sequence length is longer than the specified maximum sequence length for this model (513 > 512). Running this sequence through the model will result in indexing errors
No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoin

NER Output:
   entity_group     score             word  start   end
0           PER  0.999532  Geoffrey Hinton     29    44
1           PER  0.998838         ” Hinton    137   145
2          MISC  0.713827               AI    188   190
3           ORG  0.998871           Google    261   267
4           ORG  0.540267               AI    717   719
..          ...       ...              ...    ...   ...
68          LOC  0.998040         Nagasaki   1153  1161
69          LOC  0.999680            China   1213  1218
70         MISC  0.928474           Maoist   1299  1305
71          PER  0.998677           Hinton   1307  1313
72          PER  0.974817    Will Henshall   1442  1455

[73 rows x 5 columns]





Summary Output:
 Geoffrey Hinton is one of the most influential AI researchers of the past 50 years. Hinton, 76, has spent his career trying to build AI systems that model the human brain. He worries about what could happen once AI systems are scaled up to the size of human brains.
 Hinton has been instrumental in the development and popularization of neural networks, the dominant AI development paradigm that has allowed huge amounts of data to be ingested and processed, leading to advances in image recognition, language understanding and self-driving cars. Hinton does not know how to prevent superhuman AI systems from taking over.
 Hinton declined Google’s offer to take such a role at the company. Instead, he has spent the past few months sounding the alarm. Hinton has spoken with policymakers, including officials in the U.K., Canada and the European Commission.





Translation Output (English to Spanish):
Geoffrey Hinton es uno de los investigadores de IA más influyentes de los últimos 50 años. Hinton, de 76 años, ha pasado su carrera tratando de construir sistemas de IA que modelen el cerebro humano. Se preocupa por lo que podría suceder una vez que los sistemas de IA se escalan hasta el tamaño de los cerebros humanos.
Hinton ha sido instrumental en el desarrollo y popularización de redes neuronales, el paradigma dominante de desarrollo de IA que ha permitido ingerir y procesar enormes cantidades de datos, lo que ha llevado a avances en el reconocimiento de imágenes, comprensión del lenguaje y autos autoconductores. Hinton no sabe cómo evitar que los sistemas de IA sobrehumanos se hagan cargo.
Hinton declinó la oferta de Google de asumir tal papel en la empresa. En cambio, ha pasado los últimos meses haciendo sonar la alarma. Hinton ha hablado con los responsables políticos, incluyendo funcionarios en el Reino Unido, Canadá y la Comisión Europe

In [19]:
from transformers import pipeline

# Pregunta en inglés
question = "How is Geoffrey Hinton?"


outputs = []

for chunk in text_chunks:
    result = reader(question=question, context=chunk)
    outputs.append(result)

# Tomar la mejor respuesta (con el score más alto)
best_answer = max(outputs, key=lambda x: x['score'])['answer']
print(f"Best answer in English: {best_answer}")

#Traducir la respuesta al español
translator = pipeline("translation_en_to_es", model="Helsinki-NLP/opus-mt-en-es")
translated_answer = translator(best_answer, clean_up_tokenization_spaces=True)

print("\nMejor Respuesta en Español:")
print(translated_answer[0]['translation_text'])


Best answer in English: one of the most influential AI researchers of the past 50 years

Mejor Respuesta en Español:
uno de los investigadores de IA más influyentes de los últimos 50 años
