# Question Answering System


A Question Answering System (QA System) is a computer program or artificial intelligence system designed to understand and respond to natural language questions posed by users based on a knowledge base or a collection of documents. The primary goal of a QA system is to provide accurate and relevant answers to questions, just like a human expert would.


Question Understanding: The system first analyzes and comprehends the user's question. This step involves natural language processing (NLP) techniques to parse the question's structure, extract important keywords, and determine the question type (e.g., fact-based, opinion-based, descriptive, etc.).

Information Retrieval: Once the question is understood, the system searches for relevant information in its knowledge base or a corpus of documents. This process involves indexing and retrieving documents or passages that might contain the answer.

Text Extraction: The system extracts relevant text snippets or passages from the retrieved documents. These passages are potential candidates for containing the answer.

Answer Extraction: Using NLP techniques, the system analyzes the extracted passages to identify and extract the specific answer to the user's question. This can involve various methods such as named entity recognition, sentiment analysis, and more.

Answer Ranking: If multiple answers or candidate answers are found, the system may rank them based on their relevance and confidence scores. This helps in presenting the most accurate answer to the user.

Answer Presentation: Finally, the system presents the answer to the user in a human-readable format, such as plain text, a structured response, or a summarized form. Additionally, it may provide context or sources to support the answer's credibility.

There are different types of Question Answering Systems:

Factoid QA: These systems answer factual questions with specific pieces of information. For example, "What is the capital of France?" would have a straightforward factual answer.

Non-Factoid QA: These systems provide answers that are not necessarily factual but require reasoning or opinion-based responses. For example, "Why is climate change a global concern?" may involve a more complex response.

Document-based QA: These systems use a collection of documents as their knowledge base and extract answers from those documents.

Knowledge-based QA: In this case, the system relies on structured knowledge bases, ontologies, or databases to provide answers.

Open-domain QA: These systems aim to answer questions on a wide range of topics and do not have a specific domain or knowledge base.

QA systems have a wide range of applications, including customer support chatbots, virtual assistants, search engines (for featured snippets), medical diagnosis, legal research, and more. Building effective QA systems often involves advanced NLP techniques, machine learning models, and deep learning architectures like **transformer**s, such as **BERT and GPT**, which can understand context and nuances in human language to provide accurate answers.

Building a Question Answering (QA) system from scratch can be a complex task, and creating a full-fledged system capable of understanding and answering questions based on a knowledge base or documents would require significant effort and resources. However, I can provide you with a simplified example using the Hugging Face **Transformers library**, which leverages pre-trained models like **BERT** to perform QA tasks.


In this code, we use the pipeline class from the Transformers library to create a Question Answering pipeline. We specify a pre-trained QA model, in this case, "bert-large-uncased-whole-word-masking-finetuned-squad." You can choose other QA models provided by the library.

You can customize the context variable with your knowledge base or documents and change the question variable to the question you want to ask. The code will then provide you with an answer based on the given context and question.

 real-world QA systems often involve more complex components, including document retrieval, data preprocessing, and answer ranking. Depending on your specific use case and requirements, you may need to build a more sophisticated QA system.

Additionally, you can **fine-tune pre-trained models on your specific dataset to improve their performance** on domain-specific questions and documents.


In [1]:
pip install transformers




In [2]:
from transformers import pipeline

# Load the pre-trained QA model
qa_pipeline = pipeline("question-answering", model="bert-large-uncased-whole-word-masking-finetuned-squad")

# Sample context (knowledge base or documents)
context = """
The Mona Lisa is a famous portrait painting by Leonardo da Vinci. It is also known as La Gioconda or La Joconde. The painting depicts a woman with an enigmatic expression. The artwork is currently displayed at the Louvre Museum in Paris.
"""

# Sample question
question = "Who painted the Mona Lisa?"

# Perform question answering
answer = qa_pipeline(question=question, context=context)

# Print the answer
print(f"Question: {question}")
print(f"Answer: {answer['answer']}")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/443 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-large-uncased-whole-word-masking-finetuned-squad were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Question: Who painted the Mona Lisa?
Answer: Leonardo da Vinci


In [7]:
# Sample context (knowledge base or documents)
context = """
The Amazon Rainforest is the largest tropical rainforest in the world, located in South America. It covers an area of approximately 5.5 million square kilometers and is home to a diverse range of plant and animal species. Deforestation is a major concern for the preservation of this vital ecosystem.
"""

# Sample question
question = "Where is the Amazon Rainforest located?"

# Perform question answering
answer = qa_pipeline(question=question, context=context)

# Print the answer
print(f"Question: {question}")
print(f"Answer: {answer['answer']}")

Question: Where is the Amazon Rainforest located?
Answer: South America


In [6]:
# Sample context (knowledge base or documents)
context = """
William Shakespeare was an English playwright and poet. He is often referred to as the Bard of Avon. Shakespeare's works include famous plays like "Romeo and Juliet," "Hamlet," and "Macbeth." His writings have had a significant impact on English literature and theater.
"""

# Sample question
question = "Which famous play was written by William Shakespeare?"

# Perform question answering
answer = qa_pipeline(question=question, context=context)

# Print the answer
print(f"Question: {question}")
print(f"Answer: {answer['answer']}")

Question: Which famous play was written by William Shakespeare?
Answer: Romeo and Juliet


In [5]:
# Sample context (knowledge base or documents)
context = """
The Great Wall of China is a historic fortification that stretches across Northern China. It was built over several centuries to protect against invasions. The wall is approximately 13,171 miles long and is considered one of the most impressive architectural feats in history.
"""

# Sample question
question = "What is the purpose of the Great Wall of China?"

# Perform question answering
answer = qa_pipeline(question=question, context=context)

# Print the answer
print(f"Question: {question}")
print(f"Answer: {answer['answer']}")

Question: What is the purpose of the Great Wall of China?
Answer: to protect against invasions


In [4]:
# Sample context (knowledge base or documents)
context = """
Albert Einstein was a renowned physicist known for his theory of relativity. He was born in Ulm, Germany, in 1879. Einstein received the Nobel Prize in Physics in 1921 for his work on the photoelectric effect. His famous equation, E=mc², describes the equivalence of energy and mass.
"""

# Sample question
question = "Where was Albert Einstein born??"

# Perform question answering
answer = qa_pipeline(question=question, context=context)

# Print the answer
print(f"Question: {question}")
print(f"Answer: {answer['answer']}")

Question: Where was Albert Einstein born??
Answer: Ulm, Germany


In [3]:
# Sample context (knowledge base or documents)
context = """
The Eiffel Tower is a famous landmark in Paris, France. It was designed by Gustave Eiffel and was completed in 1889. The tower is made of iron and stands at a height of 324 meters. It is a popular tourist attraction and offers breathtaking views of the city.
"""

# Sample question
question = "What is the height of the Eiffel Tower?"

# Perform question answering
answer = qa_pipeline(question=question, context=context)

# Print the answer
print(f"Question: {question}")
print(f"Answer: {answer['answer']}")

Question: What is the height of the Eiffel Tower?
Answer: 324 meters
