# Introduction

Question answering (QA) is a branch of natural language processing (NLP) that aims to build systems that can automatically answer questions posed by humans in natural language. QA systems can vary depending on the input, output, and domain of the questions. Here are some common types of QA systems:

1. **Extractive Question Answering:** extracts the answer from a given context, such as a text passage, a table, or a web page. The answer is usually a span of text that directly answers the question.

    For example, given the question "What is the capital of France?" and the context "France is a country in Western Europe with a population of about 67 million people. Its capital is Paris, the most populous city in the country.", the system would extract the answer "Paris" from the context.
    
    Extractive QA is often solved with `BERT-like models` that can encode both the question and the context and output the start and end positions of the answer span.

2. **Community Question Answering:** This type of QA system leverages the knowledge and opinions of online communities, such as forums, social media, or question-answer websites, to answer questions. The system can either retrieve relevant posts or comments that answer the question, or generate a new answer by aggregating and summarizing the information from multiple sources.

    For example, given the question "How can I learn Python?" and the context of Stack Overflow, the system would either find an existing post that provides useful resources and tips for learning Python, or create a new answer by synthesizing the information from different posts.
    
    Community QA is often solved with `information retrieval`, `natural language generation`, and `sentiment analysis techniques`.

3. **Long-Form Question Answering:** This type of QA system generates a long and coherent answer to a question, usually in the form of a paragraph or an essay. The system can either use a given context or rely on external sources of knowledge to generate the answer. The answer should not only provide factual information, but also explain the reasoning and evidence behind it.

    For example, given the question "Why is the sky blue?" and no context, the system would generate an answer that describes the phenomenon of Rayleigh scattering, the wavelength of visible light, and the angle of the sun.
    
    Long-form QA is often solved with `neural sequence-to-sequence models` that can generate fluent and informative text.

4. **Generative Question Answering:** This type of QA system generates a natural language answer to a question, without relying on a given context or extracting a span of text. The system can either use open-domain knowledge or a specific domain ontology to generate the answer. The answer can be a single word, a phrase, or a sentence, depending on the question.

    For example, given the question "Who is the author of Harry Potter?" and no context, the system would generate the answer "J.K. Rowling".
    
    Generative QA is often solved with `neural language models` that can leverage `large-scale pre-training` and `fine-tuning` on QA datasets.

5. **Conversational Question Answering:** This type of QA system engages in a natural language dialogue with a user, where the user can ask multiple related questions on a topic, and the system can provide relevant and consistent answers. The system can also ask clarifying or follow-up questions to the user, if the user's question is ambiguous or incomplete. The system can either use a given context or access external sources of knowledge to answer the questions.

    For example, given the question "What is the weather like in New York?" and no context, the system would provide the current weather conditions in New York, and then ask "Do you want to know the forecast for tomorrow?" to continue the conversation.
    
    Conversational QA is often solved with `neural dialogue models` that can maintain the state and coherence of the dialogue.

    Neural dialogue models are a type of neural network models that can generate natural language responses to user questions or utterances in a conversational setting.

    Some examples of neural dialogue models for CQA are:

    - **SDNet:** a contextualized attention-based deep neural network that fuses context into traditional machine reading comprehension (MRC) models. The model leverages both inter-attention and self-attention to comprehend conversation context and extract relevant information from the passage.

    - **CoQA:** a large-scale dataset for building CQA systems that can answer a series of questions in a conversation. The dataset contains 8,000 conversations on various topics, such as news, articles, stories, and Wikipedia pages. The dataset also provides a neural sequence-to-sequence model that can generate answers and rationales for the questions.

    - **ConvBERT:** a BERT-like model that incorporates conversational data into pre-training. The model uses a novel masked language modeling objective that can predict both the masked tokens and the speaker turns in a dialogue. The model can be fine-tuned on CQA tasks, such as QuAC and CoQA.

# Dataset

The **`"subjqa"`** dataset is a question answering dataset that focuses on subjective questions and answers. Subjective questions are those that `do not have a single factual answer`, but depend on the opinions, preferences, or feelings of the person who answers them.

For example, "Is this book interesting?" is a subjective question, while "Who is the author of this book?" is a factual question.

The dataset consists of roughly `10,000` questions over reviews from 6 different domains:
* books,
* movies,
* grocery,
* electronics,
* TripAdvisor` (i.e. hotels), and
* restaurants.

Each question is paired with a review and a span is highlighted as the answer to the question (with some questions having no answer). Moreover, both questions and answer spans are assigned a subjectivity label by annotators. Questions such as "How much does this product weigh?" is a factual question (i.e., low subjectivity), while "Is this easy to use?" is a subjective question (i.e., high subjectivity).

The dataset is constructed based on publicly available review datasets, such as the Amazon Review dataset, the TripAdvisor dataset, and the Yelp dataset.

> The interesting aspect of this dataset is that most of the questions and answers are `subjective`; that is, they depend on the personal experience of the users.

In [1]:
! pip install datasets -qq

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m521.2/521.2 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m14.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m15.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [4]:
from datasets import get_dataset_config_names, load_dataset

subsets = get_dataset_config_names(
    path = "subjqa"
)

print(subsets)

['books', 'electronics', 'grocery', 'movies', 'restaurants', 'tripadvisor']


Since, we will create a question answering system for Electronics we will use the `electronics` subset.

In [5]:
data = load_dataset(
    path = "subjqa",
    name = "electronics"
)

print(data)

Downloading data:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1295 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/358 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/255 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['domain', 'nn_mod', 'nn_asp', 'query_mod', 'query_asp', 'q_reviews_id', 'question_subj_level', 'ques_subj_score', 'is_ques_subjective', 'review_id', 'id', 'title', 'context', 'question', 'answers'],
        num_rows: 1295
    })
    test: Dataset({
        features: ['domain', 'nn_mod', 'nn_asp', 'query_mod', 'query_asp', 'q_reviews_id', 'question_subj_level', 'ques_subj_score', 'is_ques_subjective', 'review_id', 'id', 'title', 'context', 'question', 'answers'],
        num_rows: 358
    })
    validation: Dataset({
        features: ['domain', 'nn_mod', 'nn_asp', 'query_mod', 'query_asp', 'q_reviews_id', 'question_subj_level', 'ques_subj_score', 'is_ques_subjective', 'review_id', 'id', 'title', 'context', 'question', 'answers'],
        num_rows: 255
    })
})


In [7]:
data["train"].to_pandas().head()

Unnamed: 0,domain,nn_mod,nn_asp,query_mod,query_asp,q_reviews_id,question_subj_level,ques_subj_score,is_ques_subjective,review_id,id,title,context,question,answers
0,electronics,great,bass response,excellent,bass,0514ee34b672623dff659334a25b599b,5,0.5,False,882b1e2745a4779c8f17b3d4406b91c7,2543d296da9766d8d17d040ecc781699,B00001P4ZH,"I have had Koss headphones in the past, Pro 4A...",How is the bass?,"{'text': [], 'answer_start': [], 'answer_subj_..."
1,electronics,harsh,high,not strong,bass,7c46670208f7bf5497480fbdbb44561a,1,0.5,False,ce76793f036494eabe07b33a9a67288a,d476830bf9282e2b9033e2bb44bbb995,B00001P4ZH,To anyone who hasn't tried all the various typ...,Is this music song have a goo bass?,"{'text': ['Bass is weak as expected', 'Bass is..."
2,electronics,neutral,sound,present,bass,8fbf26792c438aa83178c2d507af5d77,1,0.5,False,d040f2713caa2aff0ce95affb40e12c2,455575557886d6dfeea5aa19577e5de4,B00001P4ZH,I have had many sub-$100 headphones from $5 Pa...,How is the bass?,{'text': ['The only fault in the sound is the ...
3,electronics,muddy,bass,awesome,bass,9876fd06ed8f075fcad70d1e30e7e8be,1,0.5,False,043e7162df91f6ea916c790c8a6f6b22,6895a59b470d8feee0f39da6c53a92e5,B00001WRSJ,My sister's Bose headphones finally died and s...,How is the audio bass?,"{'text': ['the best of all of them'], 'answer_..."
4,electronics,perfect,bass,incredible,sound,16506b53e2d4c2b6a65881d9462256c2,1,0.65,True,29ccd7e690050e2951be49289e915382,7a2173c502da97c5bd5950eae7cd7430,B00001WRSJ,Wow. Just wow. I'm a 22 yr old with a crazy ob...,Why do I have an incredible sound?,"{'text': ['The sound is so crisp', 'crazy obse..."
