# Tutorial: In this tutorial you'll learn how to generate questions from a document (or a set of documents).

Typically one can generate thousands or maybe millions of synthetic questions and then fine-tune a QA model on those synthetic questions.

## Step 0: Prepare a Colab Environment to run this tutorial on GPUs
Make sure to "Enable GPU Runtime" by following this [url](https://drive.google.com/file/d/1jhE8CkieQXoW0gvz9IherTDdJY54Q4Yz/view?usp=sharing). This step will make sure the tutorial runs faster.


## Step 1: Install PrimeQA

First, we need to include the required modules.


In [None]:
! pip install --upgrade primeqa

## Step 2: Import and instatiate a TableQA model

In [None]:
from primeqa.qg.models.qg_model import QGModel
model_name = 'PrimeQA/mt5-base-tydi-question-generator'
passage_qg_model = QGModel(model_name, modality='passage')

## Step 3: Now provide a list of documents/ passages.

Passages/ documents should be passed as a `list` of `str`. 
We show one English and one Russian text to generate questions.

In [None]:
text_list = ["Sachin tendulkar was an Indian cricketer born in Mumbai. He scored nearly 350000 runs in his international career",
            
"Симби́рская губе́рния (с 1924 года Ульяновская губерния)\xa0— административно-территориальная\
единица Российской империи, Российской республики и РСФСР, существовавшая в 1796—1928 годах.\
Губернский город\xa0— Симбирск (с 1924 года Ульяновск)"]

id_list = ["abcID123", "xyzID456"]

## Step 4: Generate questions

The `generate_questions` function can take two arguments.
#### Controls:
- `num_questions_per_instance`: Number of questions to generate per table (default=5)
- `answers_list`: Generated questions will have these as the answers. It should be a list of lists, 
        where each list corresponds a passage in `text_list`. (default=[])
- `id_list`: Include an id_list of context passages aligned with text_list, defaults to empty list.

When `answers_list` is not provided, named entity recognition method is used to sample answers.

In [None]:
passage_qg_model.generate_questions(text_list, 
                    num_questions_per_instance = 2, id_list=id_list)

## Example shown below

Answer sampler only supports Arabic, English, Finnish and Russian now. For other languages in TyDi dataset
we should provide the answers explicitly.

In [None]:
text_list = ["শচীন টেন্ডুলকারকে ক্রিকেট ইতিহাসের অন্যতম সেরা ব্যাটসম্যান হিসেবে গণ্য করা হয়।"]
answers_list = [["শচীন টেন্ডুলকার"]]
passage_qg_model.generate_questions(text_list, 
                                answers_list = answers_list)

Congratulations 🎉✨🎊🥳 !! You can now generate questions from documents.