# Project Notebook

In this notebook we have the entire pipeline of the project. Here, we use the models from [Hugging Face](https://huggingface.co/nlp-group-6) to be able to run our pipline. But if you would like to see how we trained the models, you can check the "notebooks" folder in the repository.

## Pipeline

Our pipeline consists of the following steps:
1. Generate the question given the context and the answer.
2. Generate the disractor given the question and the answer.

This can be outline in image below:

![pipeline](images/pipeline.png)

In [1]:
! pip install transformers torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Looking in indexes: https://download.pytorch.org/whl/cu118


First we import the transformers library from Hugging Face and the models we will use in the pipeline. We use both the BartTokenizer and BartForConditionalGeneration models from the transformers library, since we are using the Bart model for both the question generation and the distractor generation.

In [39]:
from transformers import BartTokenizer, BartForConditionalGeneration

Then we start with the question generator model. We can easily load the model from the Hugging Face library by using the `BartForConditionalGeneration.from_pretrained` method. We also load the tokenizer for the model. Ofcourse given the link to the model in the Hugging Face library.

In [40]:
question_model = BartForConditionalGeneration.from_pretrained('nlp-group-6/sciq-question-generator')
question_tokenizer = BartTokenizer.from_pretrained('nlp-group-6/sciq-question-generator')

Finally, we load the disractor generator model. We use the same methods as before to load the model and the tokenizer.

In [41]:
options_model = BartForConditionalGeneration.from_pretrained('nlp-group-6/sciq-options-generator-generated-questions')
options_tokenizer = BartTokenizer.from_pretrained('nlp-group-6/sciq-options-generator-generated-questions')

## Example

Now we put the pipeline together and test it with a simple example. We know that give the support and the correct answer we need to generate the question and the distractors. So lets define the support and the correct answer first.

In [42]:
support = "Milk has a white color."
correct_answer = "Milk is white"

Now to generate the question, we need to tokenize the support and the correct answer. So we use the tokenizer of the model and then put those as inputs to the model to generate the question. We can then decode the output to get the question.

In [43]:
inputs = question_tokenizer(support, correct_answer, return_tensors='pt', max_length=512, truncation=True)
question = question_model.generate(**inputs)
question = question_tokenizer.decode(question[0], skip_special_tokens=True)



Now lets print the question to see what our brilliant model has come up with.

In [44]:
print(question)

What color is milk?


Perfect! We now need to plug the question, correct answer and support into the options generator. But before we need to bundle them together with the BART sentence stopper.

In [45]:
question_answer_context = [question + "</s><s>" + correct_answer + "</s><s>" + support]

In [46]:
# generate the options
inputs = options_tokenizer(question_answer_context, return_tensors='pt', max_length=512, truncation=True)
options = options_model.generate(**inputs, num_return_sequences=4, num_beams=4, max_length=50)
options = options_tokenizer.batch_decode(options, skip_special_tokens=True)

Now lets see what the model has come up with for the distractors.

In [47]:
print(options)

['Milk is red', 'Milk is orange', 'milk is red', 'Milk is black']


Now lets put everything together and see the final output.

In [49]:
print("Question : " + question)
indices = ['a', 'b', 'c', 'd', 'e']

for option in options:
    if option == correct_answer:
        continue
    print(indices.pop(0) + ") " + option)
    
print(indices.pop(0) + ") " + correct_answer)

Question : What color is milk?
a) Milk is red
b) Milk is orange
c) milk is red
d) Milk is black
e) Milk is white
