By prompting (without previous specific training), how does GPT-2 answer questions about documents?

In this notebook we'll follow comprehension experiment mentioned in the [GPT-2 paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf), that uses documents from the [Conversation Question Answering dataset (CoQA)](https://stanfordnlp.github.io/coqa/).

Besides testing reading comprehension, this also tests the ability of models to answer questions that depend on conversation history.

Following the GPT-2 paper, we'll doing:

```
Greedy decoding from GPT-2 when conditioned on a document, the history of the associated conversation, and a final token A:
```

In [1]:
import random
from gptbench import Sample, empty_config

In [2]:
ben = Sample(seed=0xFACEC0DE)

cfg = empty_config()

cfg.model.set(dtype='bfloat16')

# the next sample config settings are important:
# top=1 will only emit the most probable token on each step (greedy argmax) - we want accuracy, not randomness
# emit_start=False will skip emitting the initial context
cfg.sample.set(top=1, emit_start=False)

# if you get an out of memory error, try 'gpt2', the smaller model:
ben.init_pretrained('gpt2-xl', cfg)

Initializing model from gpt2-xl
Dataset: dummy 0 tokens
Dataset: loading uint16 tokens
Expanding initial dataset size of 1 (less than block_size+1) by 1025 times to size of 1025
Dataset train_path: dummy empty dataset, val_path: None, train_split: 0.9, vocab_size: 50257
Model params: 1557.61M


In [4]:
# let's use as the answer only the first paragraph
def first_paragraph(text, count=1):
    s = text.split('.')
    return '.'.join(s[:count]) + '.'

In [5]:
# 
doc="""Once upon a time, in a barn near a farm house, there lived a little white kitten named Cotton. Cotton lived high up in a nice warm place above the barn where all of the farmer's horses slept. But Cotton wasn't alone in her little home above the barn, oh no. She shared her hay bed with her mommy and 5 other sisters. All of her sisters were cute and fluffy, like Cotton. But she was the only white one in the bunch. The rest of her sisters were all orange with beautiful white tiger stripes like Cotton's mommy. Being different made Cotton quite sad. She often wished she looked like the rest of her family. So one day, when Cotton found a can of the old farmer's orange paint, she used it to paint herself like them. When her mommy and sisters found her they started laughing. 

"What are you doing, Cotton?!" 

"I only wanted to be more like you". 

Cotton's mommy rubbed her face on Cotton's and said "Oh Cotton, but your fur is so pretty and special, like you. We would never want you to be any other way". And with that, Cotton's mommy picked her up and dropped her into a big bucket of water. When Cotton came out she was herself again. Her sisters licked her face until Cotton's fur was all all dry. 

"Don't ever do that again, Cotton!" they all cried. "Next time you might mess up that pretty white fur of yours and we wouldn't want that!" 

Then Cotton thought, "I change my mind. I like being special"."""

start_text = doc + '\n\nQ: What color was Cotton?' + '\nA:'

print(f"Prompt is {len(ben.train_dataset.encode(start_text))} tokens (up to {ben.model.block_size})")

# we're sampling with the sample config settings defined above: top=1, emit_start=False
out=[]
ben.sample(start_text, dest=out)
answer = first_paragraph(out[0]) 
answer

Prompt is 351 tokens (up to 1024)


' She was a white kitten with orange stripes.'

In [6]:
# add the answer to the prompt
start_text += answer

# next question
start_text += '\nQ: Where did she live?' + '\nA:'

out=[]
ben.sample(start_text, dest=out)
answer = first_paragraph(out[0]) 
answer

' She lived in a barn near a farm house.'

In [7]:
# add last answer to the prompt
start_text += answer

# place next question
start_text += '\nQ: Did she live alone?' + '\nA:'

out=[]
ben.sample(start_text, dest=out)
answer = first_paragraph(out[0]) 
answer

' No, she lived with her mommy and 5 other sisters.'

In [8]:
# add last answer to the prompt
start_text += answer

# place next question
start_text += '\nQ: Who did she live with?' + '\nA:'

out=[]
ben.sample(start_text, dest=out)
answer = first_paragraph(out[0]) 
answer

' Her mommy and 5 other sisters.'

Quite good!

As noted in the GPT-2 paper:
```
While GPT-2’s performance is exciting for a system without any supervised training, some inspection of its answers and errors suggests GPT-2 often uses simple retrieval based heuristics such as answer with a name from the document in response to a who question.
```

Still, it's fascinating that asking for 'who' may fetch names. Not smart, but pointing into that direction.

Just to recap, the prompt is by now:

In [10]:
print(start_text + answer)

Once upon a time, in a barn near a farm house, there lived a little white kitten named Cotton. Cotton lived high up in a nice warm place above the barn where all of the farmer's horses slept. But Cotton wasn't alone in her little home above the barn, oh no. She shared her hay bed with her mommy and 5 other sisters. All of her sisters were cute and fluffy, like Cotton. But she was the only white one in the bunch. The rest of her sisters were all orange with beautiful white tiger stripes like Cotton's mommy. Being different made Cotton quite sad. She often wished she looked like the rest of her family. So one day, when Cotton found a can of the old farmer's orange paint, she used it to paint herself like them. When her mommy and sisters found her they started laughing. 

"What are you doing, Cotton?!" 

"I only wanted to be more like you". 

Cotton's mommy rubbed her face on Cotton's and said "Oh Cotton, but your fur is so pretty and special, like you. We would never want you to be any o