# TyDI Question Generation: Inference example

In this notebook, we will show how to use a pretrained multilingual PassageQG model to generate questions. Given a text snippet, spacy is used to identify noun chunks (named entities) which becomes the answer and an mT5 is used to generate question givent he answer and the text snippet.

## Dependencies

If not already done, make sure to install PrimeQA with notebooks extras before getting started.

In [2]:
from primeqa.qg.models.qg_model import QGModel
from tabulate import tabulate # only used to visualize table

Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.4.0.json:   0%|   …

2022-06-30 08:43:11 INFO: Downloading default packages for language: multilingual (multilingual)...
2022-06-30 08:43:11 INFO: File exists: /u/saneem/stanza_resources/multilingual/default.zip
2022-06-30 08:43:11 INFO: Finished downloading models and saved to /u/saneem/stanza_resources.


## Loading pretrained model from huggingface

This model was trained using PrimeQA library and uploaded to huggingface hub.

In [4]:
model_name = 'ibm/mt5-base-tydi-question-generator'
table_qg_model = QGModel(model_name, modality='passage')

<br>

## Sample instance

Passages should be passed a `list` of `str`. We take one English and one Russian text to generate questions.

In [5]:
text_list = ["Sachin tendulkar was an Indian cricketer born in Mumbai. He scored 100\
centuries in his international carrier.",
            
"Симби́рская губе́рния (с 1924 года Ульяновская губерния)\xa0— административно-территориальная\
единица Российской империи, Российской республики и РСФСР, существовавшая в 1796—1928 годах.\
Губернский город\xa0— Симбирск (с 1924 года Ульяновск)"]

## Generate questions

There is one argument to control number of questions to be generated.
#### Controls:
- `num_questions_per_instance`: Number of questions to generate per table (default=5)

In [6]:
table_qg_model.generate_questions(text_list, 
                                num_questions_per_instance = 3)

Sachin tendulkar was an Indian cricketer born in Mumbai. He scored 100            centuries in his international carrier. en
Input language en
Симби́рская губе́рния (с 1924 года Ульяновская губерния) — административно-территориальная            единица Российской империи, Российской республики и РСФСР, существовавшая в 1796—1928 годах.            Губернский город — Симбирск (с 1924 года Ульяновск) ru
Input language ru


Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


[{'question': 'What country did Sachin tendulkar play?', 'answer': 'Indian'},
 {'question': 'Who won the last Indian cricket season?',
  'answer': 'Sachin tendulkar'},
 {'question': 'Where was Sachin tendulkar born?', 'answer': 'Mumbai'},
 {'question': 'Где находится Симбирская губерния?',
  'answer': 'Российской республики'},
 {'question': 'Как называется Симбирская губерния?',
  'answer': 'Ульяновская губерния'},
 {'question': 'Где находится Симбирская губерния?',
  'answer': 'Российской империи'}]