# Question Generation

In [185]:
from datasets import load_dataset
from transformers import AutoTokenizer, TFT5ForConditionalGeneration
import tensorflow as tf

squad_v2_data = load_dataset("squad_v2")
train_set = squad_v2_data["train"]
val_set = squad_v2_data["validation"]

In [141]:
tokenizer = AutoTokenizer.from_pretrained("t5-small")
model = TFT5ForConditionalGeneration.from_pretrained("t5-small")

All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


In [166]:
test_context = train_set["context"][0]
test_context

'Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".'

In [167]:
test_question = train_set["question"][0]
test_question

'When did Beyonce start becoming popular?'

In [168]:
inputs = tokenizer(test_context, return_tensors="tf").input_ids
labels = tokenizer(test_question, return_tensors="tf").input_ids

train_data = {}
train_data["input_ids"] = [inputs]
train_data["labels"] = [labels]
dataset = tf.data.Dataset.from_tensor_slices(train_data)

model.compile()
history = model.fit(dataset, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [169]:
history.history

{'loss': [1.5816868543624878,
  0.28255337476730347,
  0.1152406558394432,
  0.02663499489426613,
  0.026487112045288086]}

In [172]:
inputs = tokenizer(test_context, return_tensors="tf").input_ids
outputs = model.generate(inputs)



In [173]:
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

When did Beyonce start becoming popular?


## Train on more data

In [186]:
model = TFT5ForConditionalGeneration.from_pretrained("t5-small")

contexts = train_set["context"]
questions = train_set["question"]
val_contexts = val_set["context"]
val_questions = val_set["question"]

contexts_max_len = max([len(context.split()) for context in contexts])
questions_max_len = max([len(question.split()) for question in questions])

print(contexts_max_len, questions_max_len)

All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


653 40


In [190]:
%%time
# Takes some time!

inputs = tokenizer(contexts, return_tensors="tf", max_length=contexts_max_len, padding="max_length", truncation=True).input_ids
labels = tokenizer(questions, return_tensors="tf", max_length=questions_max_len, padding="max_length", truncation=True).input_ids

val_inputs = tokenizer(val_contexts, return_tensors="tf", max_length=contexts_max_len, padding="max_length", truncation=True).input_ids
val_labels = tokenizer(val_questions, return_tensors="tf", max_length=questions_max_len, padding="max_length", truncation=True).input_ids

train_data = {}
train_data["input_ids"] = inputs
train_data["labels"] = labels

val_data = {}
val_data["input_ids"] = val_inputs
val_data["labels"] = val_labels



CPU times: user 1min 40s, sys: 8.17 s, total: 1min 48s
Wall time: 1min 7s


In [192]:
dataset = tf.data.Dataset.from_tensor_slices(train_data).batch(4)
val_dataset = tf.data.Dataset.from_tensor_slices(val_data).batch(4)

model.compile()

In [None]:
history = model.fit(dataset, epochs=5, validation_data=val_dataset)

Epoch 1/5
  281/32580 [..............................] - ETA: 1:08:41 - loss: 1.0310

In [154]:
test_inputs = tokenizer(test_context, return_tensors="tf").input_ids
test_outputs = model.generate(test_inputs)
print(tokenizer.decode(test_outputs[0], skip_special_tokens=True))

Who did Beyoncé perform in various competitions as a child?


In [138]:
n_sequences = 5
test_context_2 = """Johann Sebastian Bach[n 2] (31 March [O.S. 21 March] 1685 – 28 July 1750) was a German composer and musician of the late Baroque period. He is known for his orchestral music such as the Brandenburg Concertos; instrumental compositions such as the Cello Suites; keyboard works such as the Goldberg Variations and The Well-Tempered Clavier; organ works such as the Schubler Chorales and the Toccata and Fugue in D minor; and vocal music such as the St Matthew Passion and the Mass in B minor. Since the 19th-century Bach revival, he has been generally regarded as one of the greatest composers in the history of Western music."""
test_inputs_2 = tokenizer(test_context_2, return_tensors="tf").input_ids
test_outputs_2 = model.generate(test_inputs_2, num_return_sequences=n_sequences, do_sample=True)

for i in range(n_sequences):
    print(tokenizer.decode(test_outputs_2[i], skip_special_tokens=True))



What kind of composition was used?
What was the first language composer of the late Baroque period?
Which instrument composer was originally originally known as the best composer of the former Baroque period?
What did Johann Sebastian Bach be remembered for his orchestra music?
What is the name of the composer of the late Baroque period?


In [139]:
n_sequences = 5
#test_context_2 = """Johann Sebastian Bach[n 2] (31 March [O.S. 21 March] 1685 – 28 July 1750) was a German composer and musician of the late Baroque period. He is known for his orchestral music such as the Brandenburg Concertos; instrumental compositions such as the Cello Suites; keyboard works such as the Goldberg Variations and The Well-Tempered Clavier; organ works such as the Schubler Chorales and the Toccata and Fugue in D minor; and vocal music such as the St Matthew Passion and the Mass in B minor. Since the 19th-century Bach revival, he has been generally regarded as one of the greatest composers in the history of Western music."""
test_context_2 = """Pokémon[a] is a Japanese media franchise consisting of video games, animated series and films, a trading card game, and other related media. The franchise takes place in a shared universe in which humans co-exist with creatures known as Pokémon, a large variety of species endowed with special powers. The franchise's target audience is children aged 5 to 12,[1] but it is known to attract people of all ages.[2][3][4][5]

The franchise originated as a pair of role-playing games developed by Game Freak, following an original concept by its founder, Satoshi Tajiri. Released on the Game Boy on February 27, 1996, the games became sleeper hits and were followed by manga series, a trading card game, and anime series and films. From 1998 to 2000, Pokémon was exported to the rest of the world, creating an unprecedented global phenomenon dubbed "Pokémania". By 2002, the craze had ended, after which Pokémon became a fixture in popular culture, with new products being released to this day. In the summer of 2016, the franchise spawned a second craze with the release of Pokémon Go, an augmented reality game developed by Niantic. Pokémon has since been estimated to be the world's highest-grossing media franchise and one of the best-selling video game franchises.

Pokémon has an uncommon ownership structure. Unlike most IPs, which are owned by one company, Pokémon is jointly owned by three: Nintendo, Game Freak, and Creatures.[6][7] Game Freak develops the core series role-playing games, which are published by Nintendo exclusively for their consoles, while Creatures manages the trading card game and related merchandise, occasionally developing spin-off titles. The three companies established The Pokémon Company (TPC) in 1998 to manage the Pokémon property within Asia. The Pokémon anime series and films are co-owned by Shogakukan. Since 2009, The Pokémon Company International (TPCi) subsidiary of TPC has managed the franchise in all regions outside of Asia."""
test_inputs_2 = tokenizer(test_context_2, return_tensors="tf").input_ids
test_outputs_2 = model.generate(test_inputs_2, num_return_sequences=n_sequences, do_sample=True)

for i in range(n_sequences):
    print(tokenizer.decode(test_outputs_2[i], skip_special_tokens=True))

Which Pokémon franchise did Pokémon's owners establish which company in 2001?
When did Pokémon compete with Pokémon Go?
About which game game game franchises were developed by Game Warner?
After which sonic game franchise was initially distributed, Nintendo has been allowed to play, and
After which company founded the Pokémon Company they were co-owned by?
