## Question & Answering using Transformers

### Installation

In [1]:
!pip install transformers



In [2]:
!pip install datasets



## Approach-1: Using custom context

### Import required modules

In [3]:
from transformers import pipeline
import warnings
warnings.filterwarnings("ignore")

2025-01-28 20:17:09.495457: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-01-28 20:17:09.514737: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1738084629.528925  228110 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738084629.533221  228110 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-28 20:17:09.552466: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instr

In [4]:
# Load the Q&A pipeline
qa_pipeline = pipeline("question-answering") #this model uses Distil-BERT Cased model

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


> #### In the Module - 5 we will learn about BERT in depth

In [5]:
# Example context
context = """
Hypertension, also known as high blood pressure, is a common condition in which the force of the blood against the artery walls is too high. 
It is often caused by factors such as poor diet, lack of exercise, and genetic predisposition. If left untreated, 
hypertension can lead to serious health issues, including heart disease, stroke, and kidney problems. A healthy lifestyle 
that includes regular exercise, a balanced diet low in sodium, and stress management can help control blood pressure levels. 
Common medications for managing hypertension include ACE inhibitors, beta-blockers, and diuretics. 
Regular check-ups and monitoring are crucial for individuals diagnosed with this condition.
"""

In [6]:
# Ask questions based out of the context
question1 = "What are the common causes of hypertension?"
question2 = "What health issues can untreated hypertension lead to?"
question3 = "What are some medications used to manage hypertension?"

In [7]:
#Use Question and Answering by default
def answer_question(context, question):

    answer = qa_pipeline(context=context, question=question)
    return answer["answer"]

In [8]:
# Get the answer
answer1 = answer_question(context, question1)
answer2 = answer_question(context,question2)
answer3 = answer_question(context,question3)

In [9]:
print(f"Question:{question1}")
print(f"Answer:{answer1}")

Question:What are the common causes of hypertension?
Answer:poor diet, lack of exercise, and genetic predisposition


In [10]:
print(f"Question:{question2}")
print(f"Answer:{answer2}")

Question:What health issues can untreated hypertension lead to?
Answer:heart disease, stroke, and kidney problems


In [11]:
print(f"Question:{question3}")
print(f"Answer:{answer3}")


Question:What are some medications used to manage hypertension?
Answer:ACE inhibitors, beta-blockers, and diuretics


## Approach-2: Using datasets from HuggingFace

In [12]:
import datasets

### Load the dataset using `load_dataset`

We will use the dataset names **squad**. This dataset contain, Context, Question and Answer features

In [13]:
dataset = datasets.load_dataset("squad")

### Visualize the dataset

It is split into two part:
- Training dataset
- Validation dataset

In [14]:
dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 87599
    })
    validation: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 10570
    })
})

The dataset contains: 5 columns- id, title, context, question and answers. The training data include 87,599 rows whereas validation dataset includes 10,570 rows. By rows it means total number of context, questions and answers

In [15]:
dataset['train']['title'][:10]

['University_of_Notre_Dame',
 'University_of_Notre_Dame',
 'University_of_Notre_Dame',
 'University_of_Notre_Dame',
 'University_of_Notre_Dame',
 'University_of_Notre_Dame',
 'University_of_Notre_Dame',
 'University_of_Notre_Dame',
 'University_of_Notre_Dame',
 'University_of_Notre_Dame']

In [16]:
dataset['train']['question'][:10]

['To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?',
 'What is in front of the Notre Dame Main Building?',
 'The Basilica of the Sacred heart at Notre Dame is beside to which structure?',
 'What is the Grotto at Notre Dame?',
 'What sits on top of the Main Building at Notre Dame?',
 'When did the Scholastic Magazine of Notre dame begin publishing?',
 "How often is Notre Dame's the Juggler published?",
 'What is the daily student paper at Notre Dame called?',
 'How many student news papers are found at Notre Dame?',
 'In what year did the student paper Common Sense begin publication at Notre Dame?']

In [17]:
dataset['train']['answers'][:10]

[{'text': ['Saint Bernadette Soubirous'], 'answer_start': [515]},
 {'text': ['a copper statue of Christ'], 'answer_start': [188]},
 {'text': ['the Main Building'], 'answer_start': [279]},
 {'text': ['a Marian place of prayer and reflection'], 'answer_start': [381]},
 {'text': ['a golden statue of the Virgin Mary'], 'answer_start': [92]},
 {'text': ['September 1876'], 'answer_start': [248]},
 {'text': ['twice'], 'answer_start': [441]},
 {'text': ['The Observer'], 'answer_start': [598]},
 {'text': ['three'], 'answer_start': [126]},
 {'text': ['1987'], 'answer_start': [908]}]

In [18]:
dataset['train']['context'][:10]

['Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.',
 'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building

In [19]:
context = dataset["train"][0]["context"]
question = "Where is the modern stone statue of Mary"

In [20]:
answer = qa_pipeline(question=question, context=context)

In [21]:
print(context)

Architecturally, the school has a Catholic character. Atop the Main Building's gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.


In [22]:
print(f"Question:{question}")

Question:Where is the modern stone statue of Mary


In [23]:
print(f"Answer:{answer}")

Answer:{'score': 0.731683611869812, 'start': 551, 'end': 579, 'answer': 'At the end of the main drive'}
