# Question Generator example


First we need to install HuggingFace's transformers library.

In [None]:
!pip install transformers

Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/b0/9e/5b80becd952d5f7250eaf8fc64b957077b12ccfe73e9c03d37146ab29712/transformers-4.6.0-py3-none-any.whl (2.3MB)
[K     |████████████████████████████████| 2.3MB 11.3MB/s 
[?25hCollecting tokenizers<0.11,>=0.10.1
[?25l  Downloading https://files.pythonhosted.org/packages/ae/04/5b870f26a858552025a62f1649c20d29d2672c02ff3c3fb4c688ca46467a/tokenizers-0.10.2-cp37-cp37m-manylinux2010_x86_64.whl (3.3MB)
[K     |████████████████████████████████| 3.3MB 33.0MB/s 
Collecting huggingface-hub==0.0.8
  Downloading https://files.pythonhosted.org/packages/a1/88/7b1e45720ecf59c6c6737ff332f41c955963090a18e72acbcbeac6b25e86/huggingface_hub-0.0.8-py3-none-any.whl
Collecting sacremoses
[?25l  Downloading https://files.pythonhosted.org/packages/75/ee/67241dc87f266093c533a2d4d3d69438e57d7a90abb216fa076e7d475d4a/sacremoses-0.0.45-py3-none-any.whl (895kB)
[K     |████████████████████████████████| 901kB 33.9MB/s 
Installing 

In [None]:
pip install sentencepiece

Collecting sentencepiece
[?25l  Downloading https://files.pythonhosted.org/packages/f5/99/e0808cb947ba10f575839c43e8fafc9cc44e4a7a2c8f79c60db48220a577/sentencepiece-0.1.95-cp37-cp37m-manylinux2014_x86_64.whl (1.2MB)
[K     |▎                               | 10kB 12.2MB/s eta 0:00:01[K     |▌                               | 20kB 17.9MB/s eta 0:00:01[K     |▉                               | 30kB 21.8MB/s eta 0:00:01[K     |█                               | 40kB 23.8MB/s eta 0:00:01[K     |█▍                              | 51kB 13.8MB/s eta 0:00:01[K     |█▋                              | 61kB 11.1MB/s eta 0:00:01[K     |██                              | 71kB 11.0MB/s eta 0:00:01[K     |██▏                             | 81kB 11.8MB/s eta 0:00:01[K     |██▌                             | 92kB 12.8MB/s eta 0:00:01[K     |██▊                             | 102kB 10.0MB/s eta 0:00:01[K     |███                             | 112kB 10.0MB/s eta 0:00:01[K     |███▎        

Next we have to clone the github repo and import `questiongenerator`:

In [None]:
!git clone https://github.com/amontgomerie/question_generator/

Cloning into 'question_generator'...
remote: Enumerating objects: 199, done.[K
remote: Counting objects: 100% (87/87), done.[K
remote: Compressing objects: 100% (77/77), done.[K
remote: Total 199 (delta 45), reused 24 (delta 9), pack-reused 112[K
Receiving objects: 100% (199/199), 101.67 KiB | 1.30 MiB/s, done.
Resolving deltas: 100% (100/100), done.


In [None]:
%cd question_generator/
%load questiongenerator.py
from questiongenerator import QuestionGenerator
from questiongenerator import print_qa

/content/question_generator


Make sure that we're using the GPU:

In [None]:
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
assert device == torch.device('cuda'), "Not using CUDA. Set: Runtime > Change runtime type > Hardware Accelerator: GPU"


Now we can create a `QuestionGenerator` and feed it some text. We are going to use a BBC article about Twitter getting hacked.

The models should be automatically loaded when instantiating the `QuestionGenerator` class, but if you have them saved somewhere else you can pass the path to the folder containing them as an argument like `QuestionGenerator(MODEL_DIR)`.

In [None]:
qg = QuestionGenerator()

with open('articles/indian_matchmaking.txt', 'r') as a:
    article = a.read()

Now We can call `QuestionGenerator`'s `generate()` method. We can choose an answer style from `['all', 'sentences', 'multiple_choice']`. 

You can choose how many questions you want to generate by setting `num_questions`. Note that the quality of questions may decrease if `num_questions` is high.

If you just want to print the questions without showing the answers, you can optionally set `show_answers=False` when calling `print_qa()`.

In [None]:
article = "Modern science is typically divided  into two branches: empirical and historical. The latter is usually seen as having a more direct, well understood, and generally accepted approach of understanding the phenomena of the last fifty years. It is in this context that the concept of 'historical' science is most commonly considered a term which is used to refer to the period or phases in time or of phenomena. Consequently, even though it is commonly defined as a 'historical science' then the term is most commonly used to denote in the same way science and astronomy does. In this context it is important to note that this is only an empirical definition."
article4 = "Game development is an vernacular term for the process of bringing a story or story-driven game together. A big part of that process is to find and integrate elements from various titles into the game, whether it be the storytelling, narrative, or art style. If you ever play a game like Call of Duty and want to see how it plays in a mobile game, try this at your local PC store. This game offers a very unique experience for gamers through a very unique visual aesthetic. The game is incredibly fluid and the content is beautifully detailed."
article3 = "A computer program is a  program that runs on or has been used by more than one person. It is a computer program which is used on a computer for the purpose 'to compile data on another computer, to obtain new data or to alter existing data. For example, you have a Computer running Java, which runs a program called ' Java Studio .' A Computer is a computer program which contains its own instructions, such as executing Java code, and which interprets that instruction. These instructions are made available to other programmers using that computer for copying to or from other computers, so that there are no modifications necessary. For example, two computers, one running Java and another running Java Studio, can do more than one thing, and some can do more than one thing. When you take each of the computer programs, you also define what kind of programs that Computer is supposed to run."
article2 = "English is the largest language vernacular in existence and is the official language of the Indian subcontinent. Most of these languages use a syllable prefix, sometimes with a vowel. Borrowing a non-English language can also result in a learning problem due to having a lower pronunciatory pitch during the lesson, which may prevent students from properly pronouncing the word for themselves. Students who try to learn a non-English language can lose confidence and the ability to understand the phonetic structure and pronunciation of the word, which can cause problems for the student or the students in other school-based classes. When students are placed into classes which are outside the common language, most students will not begin the discussion of their own language without some form of clarification, such as a statement from the teacher who has been able to clarify the word and provide some examples."
article1 = "Computer science is the study of algorithmic processes, vernacular verbs. From ancient Roman Rome, classical mathematics was the study of the laws of physics, applied to all fields of human psychology and human behavior, since it is the research of those sciences to study the natural world. The classical philosophers and theologians were all familiar with the development of mathematical sciences over a period of centuries, as has been mentioned by Bocci, G.G.D., S.B. and others. All these economists understood that each of their mathematical sciences had an associated goal: to develop and test ideas in the natural sciences, in nature, in philosophy and psychology, or in theory. The problem of making general progress in mathematics is then solved. According to modern mathematical theories, natural numbers and mathematical formulas are not simply words or rules but rather represent the human and machine experience."

In [None]:
qa_list = qg.generate(
    article, 
    num_questions=5, 
    answer_style='all'
)
print_qa(qa_list)

Generating questions...



  f"This sequence already has {self.eos_token}. In future versions this behavior may lead to duplicated eos tokens being added."


Evaluating QA pairs...

1) Q: What is the definition of 'historical science'?
   A: It is in this context that the concept of 'historical' science is most commonly considered a term which is used to refer to the period or phases in time or of phenomena. 

2) Q: What is the definition of 'historical science'?
   A: Modern science is typically divided into two branches: empirical and historical. 

3) Q: What is the definition of 'historical science'?
   A: Consequently, even though it is commonly defined as a 'historical science' then the term is most commonly used to denote in the same way science and astronomy does. 

4) Q: What is the definition of 'historical science'?
   A: In this context it is important to note that this is only an empirical definition. 

5) Q: What is the definition of 'historical science'?
   A: The latter is usually seen as having a more direct, well understood, and generally accepted approach of understanding the phenomena of the last fifty years. 

