<a href="http://colab.research.google.com/github/elsanns/retrieval-demo/blob/main/GPT_J_6B_TriviaQA_v2_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook is a demo of running GPT-J on TriviaQA.

Adapted from: [GPT_J_6B_TriviaQA.ipynb](https://colab.research.google.com/drive/1lAbbh06PBcx6ykEBRHe0MG10CZBMQ610?usp=sharing)

Links:

- https://arxiv.org/pdf/1705.03551.pdf
- http://nlp.cs.washington.edu/triviaqa/
- https://github.com/mandarjoshi90/triviaqa


---

# GPT-J-6B Inference Demo

Code from [GPT_J_6B_TriviaQA.ipynb](https://colab.research.google.com/drive/1lAbbh06PBcx6ykEBRHe0MG10CZBMQ610?usp=sharing)

<a href="http://colab.research.google.com/github/kingoflolz/mesh-transformer-jax/blob/master/colab_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook demonstrates how to run the [GPT-J-6B model](https://github.com/kingoflolz/mesh-transformer-jax/#GPT-J-6B). See the link for more details about the model, including evaluation metrics and credits.

## Install Dependencies

First we download the model and install some dependencies. This step takes at least 5 minutes (possibly longer depending on server load).

!!! **Make sure you are using a TPU runtime!** !!!

In [1]:
!apt install zstd

# the "slim" version contain only bf16 weights and no optimizer parameters, which minimizes bandwidth and memory
!time wget -c https://the-eye.eu/public/AI/GPT-J-6B/step_383500_slim.tar.zstd

!time tar -I zstd -xf step_383500_slim.tar.zstd

!git clone https://github.com/kingoflolz/mesh-transformer-jax.git
!pip install -r mesh-transformer-jax/requirements.txt

# jax 0.2.12 is required due to a regression with xmap in 0.2.13
!pip install mesh-transformer-jax/ jax==0.2.12

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  zstd
0 upgraded, 1 newly installed, 0 to remove and 40 not upgraded.
Need to get 278 kB of archives.
After this operation, 1,141 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 zstd amd64 1.3.3+dfsg-2ubuntu1.2 [278 kB]
Fetched 278 kB in 1s (343 kB/s)
Selecting previously unselected package zstd.
(Reading database ... 160837 files and directories currently installed.)
Preparing to unpack .../zstd_1.3.3+dfsg-2ubuntu1.2_amd64.deb ...
Unpacking zstd (1.3.3+dfsg-2ubuntu1.2) ...
Setting up zstd (1.3.3+dfsg-2ubuntu1.2) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
--2021-07-20 09:02:32--  https://the-eye.eu/public/AI/GPT-J-6B/step_383500_slim.tar.zstd
Resolving the-eye.eu (the-eye.eu)... 162.213.130.242
Connecting to the-eye.eu (the-eye.eu)|162.213.130.242|:443... connected.
HT

Processing ./mesh-transformer-jax
[33m  DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.
   pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.[0m
Collecting jax==0.2.12
  Downloading jax-0.2.12.tar.gz (590 kB)
[K     |████████████████████████████████| 590 kB 6.8 MB/s 
Building wheels for collected packages: mesh-transformer, jax
  Building wheel for mesh-transformer (setup.py) ... [?25l[?25hdone
  Created wheel for mesh-transformer: filename=mesh_transformer-0.0.0-py3-none-any.whl size=24001 sha256=e5e7c02e0a367936c88594a25d8740637ececa521426dc8a5a75b4b3111f1b7e
  Stored in directory: /root/.cache/pip/wheels/56/bd/89/b1f6b2f3d6b938d0c5812ee97756a1afd32521bea293543863
  Building wheel for jax (se

## Setup Model


In [1]:
import os
import requests 
from jax.config import config

colab_tpu_addr = os.environ['COLAB_TPU_ADDR'].split(':')[0]
url = f'http://{colab_tpu_addr}:8475/requestversion/tpu_driver0.1_dev20210607'
requests.post(url)

# The following is required to use TPU Driver as JAX's backend.
config.FLAGS.jax_xla_backend = "tpu_driver"
config.FLAGS.jax_backend_target = "grpc://" + os.environ['COLAB_TPU_ADDR']

Sometimes the next step errors for some reason, just run it again ¯\\\_(ツ)\_/¯

In [2]:
import time

import jax
from jax.experimental import maps
import numpy as np
import optax
import transformers

from mesh_transformer.checkpoint import read_ckpt
from mesh_transformer.sampling import nucleaus_sample
from mesh_transformer.transformer_shard import CausalTransformer

In [3]:
params = {
  "layers": 28,
  "d_model": 4096,
  "n_heads": 16,
  "n_vocab": 50400,
  "norm": "layernorm",
  "pe": "rotary",
  "pe_rotary_dims": 64,

  "seq": 2048,
  "cores_per_replica": 8,
  "per_replica_batch": 1,
}

per_replica_batch = params["per_replica_batch"]
cores_per_replica = params["cores_per_replica"]
seq = params["seq"]


params["sampler"] = nucleaus_sample

# here we "remove" the optimizer parameters from the model (as we don't need them for inference)
params["optimizer"] = optax.scale(0)

mesh_shape = (jax.device_count() // cores_per_replica, cores_per_replica)
devices = np.array(jax.devices()).reshape(mesh_shape)

maps.thread_resources.env = maps.ResourceEnv(maps.Mesh(devices, ('dp', 'mp')))

tokenizer = transformers.GPT2TokenizerFast.from_pretrained('gpt2')

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1042301.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1355256.0, style=ProgressStyle(descript…




Here we create the network and load the parameters from the downloaded files. Expect this to take around 5 minutes.

In [4]:
total_batch = per_replica_batch * jax.device_count() // cores_per_replica

network = CausalTransformer(params)

network.state = read_ckpt(network.state, "step_383500/", devices.shape[1])

network.state = network.move_xmap(network.state, np.zeros(cores_per_replica))

  warn("xmap is an experimental feature and probably has bugs!")


key shape (8, 2)
in shape (1, 2048)
dp 1
mp 8
Total parameters: 6053381344
read from disk/gcs in 41.2798s


## Run Model

Finally, we are ready to infer with the model! The first sample takes around a minute due to compilation, but after that it should only take about 10 seconds per sample.

Feel free to mess with the different sampling parameters (top_p and temp), as well as the length of the generations (gen_len, causes a recompile when changed).

You can also change other things like per_replica_batch in the previous cells to change how many generations are done in parallel. A larger batch has higher latency but higher throughput when measured in tokens generated/s. This is useful for doing things like best-of-n cherry picking.

*Tip for best results: Make sure your prompt does not have any trailing spaces, which tend to confuse the model due to the BPE tokenization used during training.*

In [5]:
# allow text wrapping in generated output: https://stackoverflow.com/a/61401455
from IPython.display import HTML, display

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)

In [6]:
def infer(context, top_p=0.9, temp=1.0, gen_len=512):
    tokens = tokenizer.encode(context)

    provided_ctx = len(tokens)
    pad_amount = seq - provided_ctx

    padded_tokens = np.pad(tokens, ((pad_amount, 0),)).astype(np.uint32)
    batched_tokens = np.array([padded_tokens] * total_batch)
    length = np.ones(total_batch, dtype=np.uint32) * len(tokens)

    start = time.time()
    output = network.generate(batched_tokens, length, gen_len, {"top_p": np.ones(total_batch) * top_p, "temp": np.ones(total_batch) * temp})

    samples = []
    decoded_tokens = output[1][0]

    for o in decoded_tokens[:, :, 0]:
      samples.append(f"\033[1m{context}\033[0m{tokenizer.decode(o)}")

    print(f"completion done in {time.time() - start:06}s")
    return samples

print(infer("""Question: What is the capital of Germany?
Answer: """, gen_len=32)[0])

completion done in 47.4418842792511s
[1mQuestion: What is the capital of Germany?
Answer: [0m Berlin.
I say , the capital of Germany is Berlin.
1. Is my answer correct?
2. What if Berlin is not the capital


In [7]:
top_p = 0.1 
temp = 0.1

context = """Question: What is the capital of Germany?
Answer:"""

print(infer(top_p=top_p, temp=temp, gen_len=32, context=context)[0])

completion done in 1.110990047454834s
[1mQuestion: What is the capital of Germany?
Answer:[0m Berlin.
Question: What is the capital of Germany?
Answer: Berlin.
Question: What is the capital of Germany?
Answer: Berlin.


In [None]:
context = """Question: What is the capital of France?
Answer:"""

print(infer(top_p=top_p, temp=temp, gen_len=32, context=context)[0])

In [None]:
context = """Question: Who is the current US president?
Answer:"""

print(infer(top_p=top_p, temp=temp, gen_len=32, context=context)[0])

In [None]:
context = """Question: Who was the US president in 1998?
Answer:"""

print(infer(top_p=top_p, temp=temp, gen_len=32, context=context)[0])

In [None]:
context = """Context: Joe Biden is the current US president.
Question: Who is the current US president?
Answer:"""

print(infer(top_p=temp, temp=temp, gen_len=32, context=context)[0])

In [None]:
context = """Question: Who is the current US president?
Background: Joe Biden is the current president of the United States.
Answer:"""

print(infer(top_p=top_p, temp=temp, gen_len=32, context=context)[0])

In [None]:
context = """Question: Who is the current US president?
Background: Joseph Robinette Biden Jr. is an American politician who is the 46th and current president of the United States. 
Answer:"""

print(infer(top_p=top_p, temp=temp, gen_len=32, context=context)[0])

In [None]:
context = """Background: Joseph Robinette Biden Jr. is an American politician who is the 46th and current president of the United States. 
Question: Who is the current US president?
Answer:"""

print(infer(top_p=top_p, temp=temp, gen_len=32, context=context)[0])

In [None]:
context = """Background: Joseph Robinette Biden Jr. is an American politician who is the 46th and current president of the United States. Question: Who is the current US president?
Answer:"""

print(infer(top_p=top_p, temp=temp, gen_len=32, context=context)[0])

In [None]:
context = """Background: Joseph Robinette Biden Jr. is an American politician who is the 46th and current president of the United States. Question: Who is the current US president? Answer:"""

print(infer(top_p=top_p, temp=temp, gen_len=32, context=context)[0])

In [None]:
context = """Background: Donald John Trump (born June 14, 1946) is the 45th and current president of the United States. Before entering politics, he was a businessman and television personality. Question: Who is the current US president? Answer:"""

print(infer(top_p=top_p, temp=temp, gen_len=32, context=context)[0])

In [None]:
context = """Question: What is the newest Star Wars movie? Answer:"""

print(infer(top_p=top_p, temp=temp, gen_len=32, context=context)[0])

In [None]:
context = """Question: Who has written The Mandalorian? Answer:"""

print(infer(top_p=top_p, temp=temp, gen_len=32, context=context)[0])

In [None]:
context = """Background: The Mandalorian is an American space Western television series created by Jon Favreau for the streaming service Disney+. It is the first live-action series in the Star Wars franchise, beginning five years after the events of Return of the Jedi (1983). Question: Who has written The Mandalorian? Answer:"""

print(infer(top_p=top_p, temp=temp, gen_len=32, context=context)[0])

In [None]:
context = """Question: Who directed the 2020 movie BLACK BOX? Answer:"""
print(infer(top_p=top_p, temp=temp, gen_len=32, context=context)[0])


In [None]:
context = """Background: Black Box is a 2020 American horror film directed by Emmanuel Osei-Kuffour Jr. and written by Emmanuel Osei-Kuffour Jr. and Stephen Herman. The film stars Mamoudou Athie, Phylicia Rashad, Amanda Christine, Tosin Morohunfola and Troy James. Jason Blum serves as an executive producer under his Blumhouse Television banner. 
Question: Who directed the 2020 movie BLACK BOX? 
Answer:"""
print(infer(top_p=top_p, temp=temp, gen_len=32, context=context)[0])


In [None]:
context = """Question: How many people wrote the 2020 movie BLACK BOX? 
Answer:"""
print(infer(top_p=top_p, temp=temp, gen_len=32, context=context)[0])

In [None]:
context = """Background: Black Box is a 2020 American horror film directed by Emmanuel Osei-Kuffour Jr. and written by Emmanuel Osei-Kuffour Jr. and Stephen Herman. The film stars Mamoudou Athie, Phylicia Rashad, Amanda Christine, Tosin Morohunfola and Troy James. Jason Blum serves as an executive producer under his Blumhouse Television banner. 
Based on the previous paragraph, what is the answer to "How many people wrote the 2020 movie BLACK BOX?" """
print(infer(top_p=top_p, temp=temp, gen_len=32, context=context)[0])



# TriviaQA dataset

In [8]:
import json
import pandas as pd

# Download sample data from a temporary repo
% cd /
! git clone https://github.com/elsanns/retrieval-demo.git

/
Cloning into 'retrieval-demo'...
remote: Enumerating objects: 21, done.[K
remote: Counting objects: 100% (21/21), done.[K
remote: Compressing objects: 100% (15/15), done.[K
remote: Total 21 (delta 8), reused 14 (delta 4), pack-reused 0[K
Unpacking objects: 100% (21/21), done.


In [9]:
data_file = 'retrieval-demo/triviaqa/samples/verified-web-dev.json'
with open(data_file) as f:
    dataset_json = json.load(f)

# print(data_json)
print(dataset_json.keys())
print(dataset_json['Data'][0].keys())

# Subset of samples used in the demo
sample_json = [x for x in dataset_json['Data'] if len(x['SearchResults']) == 1]
sample_json = [x for x in sample_json if 'MatchedWikiEntityName' in x['Answer'].keys()]

dict_keys(['Data', 'Domain', 'Split', 'VerifiedEval', 'Version'])
dict_keys(['Answer', 'EntityPages', 'Question', 'QuestionId', 'QuestionPartOfVerifiedEval', 'QuestionSource', 'QuestionVerifiedEvalAttempt', 'SearchResults'])


# Templates

Examples of context data sources

In [10]:
sample_no = 0
print(sample_json[sample_no]['Question'])
print('---------------------------------------')
print(sample_json[sample_no]['SearchResults'][0]['Description'])
print(sample_json[sample_no]['SearchResults'][0]['Title'])
print(sample_json[sample_no]['SearchResults'][0]['Filename'])
# print(sample_json[sample_no]['EntityPages']['FileName'])
print('---------------------------------------')
sample_json[sample_no]['SearchResults']

Rita Coolidge sang the title song for which Bond film?
---------------------------------------
... Rita Coolidge Performing The title track to the JAMES BOND film OCTOPUSSY. Clip from THE VAL DOONICAN MUSIC SHOW 1983 Featuring Rita Coolidge ... HIGH ...
RITA COOLIDGE ALL TIME HIGH James Bond 007 OCTOPUSSY The ...
158/158_2486.txt
---------------------------------------


[{'Description': '... Rita Coolidge Performing The title track to the JAMES BOND film OCTOPUSSY. Clip from THE VAL DOONICAN MUSIC SHOW 1983 Featuring Rita Coolidge ... HIGH ...',
  'DocPartOfVerifiedEval': True,
  'DocVerifiedEvalAttempt': True,
  'Filename': '158/158_2486.txt',
  'HumanAnswer': 'OCTOPUSSY',
  'Rank': 0,
  'Title': 'RITA COOLIDGE ALL TIME HIGH James Bond 007 OCTOPUSSY The ...',
  'Url': 'http://www.youtube.com/watch?v=CQ2rD2ZTCB0'}]

Sample templates

In [12]:
from jinja2 import Template

templates_str = {
    'template_1': "Background: {{ SearchResults[0]['Description'] }} Question: {{ Question }}",
    'template_2': "Question: {{ Question }}, Evidence: {{ SearchResults[0]['Description'] }}"
}

templates = {template_name: Template(template_text) for \
             template_name, template_text in templates_str.items()}

inputs_json = sample_json[0:10]

inputs = [(template_name, sample['QuestionId'], 
           templates[template_name].render(sample)) for
          template_name in templates for sample in inputs_json]
aux = list(zip(*inputs))

inputs_df = pd.DataFrame({'template': aux[0],
                          'question_id': aux[1],
                          'input': aux[2]})

# pd.set_option("max_colwidth", 100)
inputs_df.sort_values(by=['question_id']).head()

Unnamed: 0,template,question_id,input
3,template_1,tc_1007,"Background: ... Kiefer Sutherland, Lou Diamond Phillips, Christian Slater. ... Born Today; Celeb..."
13,template_2,tc_1007,"Question: Who was born first, Kiefer Sutherland or Christian Slater?, Evidence: ... Kiefer Suthe..."
4,template_1,tc_1020,"Background: When they debuted at the Monterey Pop Festival in 1967, Hendrix set his guitar on fi..."
14,template_2,tc_1020,"Question: Who set fire to his guitar at the Monterey Pop festival in 19676?, Evidence: When they..."
5,template_1,tc_1156,Background: Murder on the Orient Express movie YIFY ... Swedish: subtitle Murder on the Orient E...


Optionally, save templates to a file

In [13]:
import yaml
import os


def save_templates(new_templates, file_name):
    with open(file_name, 'w+') as f:    
        yaml.dump(new_templates, f)  

def load_templates(file_name):
    if os.path.isfile(file_name):
        with open(file_name, 'r') as f:
            return yaml.safe_load(f)   
    return None 

def add_templates(new_templates, file_name, replace=True):
    existing_templates = load_templates(file_name)
    if existing_templates is not None:
        common_keys = existing_templates.keys() & new_templates.keys()
        if len(common_keys) > 0 and not replace:
            raise RuntimeError("Conflictiong keys: ",common_keys)    
    else:
        existing_templates = {} 

    existing_templates.update(new_templates)
    save_templates(existing_templates, file_name)    

#file_name = '<FILE_NAME>'
#templates_str = load_templates(file_name)
#save_templates(templates_str, file_name)
#add_templates(templates_str, file_name, replace=True)  

# Run on TriviaQA

In [14]:
def infer_triviaqa(context, top_p=0.9, temp=1.0, gen_len=512):
    tokens = tokenizer.encode(context)

    provided_ctx = len(tokens)
    pad_amount = seq - provided_ctx

    padded_tokens = np.pad(tokens, ((pad_amount, 0),)).astype(np.uint32)
    batched_tokens = np.array([padded_tokens] * total_batch)
    length = np.ones(total_batch, dtype=np.uint32) * len(tokens)

    start = time.time()
    output = network.generate(batched_tokens, length, gen_len, {"top_p": np.ones(total_batch) * top_p, "temp": np.ones(total_batch) * temp})

    samples_raw = []
    samples = []
    decoded_tokens = output[1][0]

    for o in decoded_tokens[:, :, 0]:
      decoded_o = tokenizer.decode(o)
      samples_raw.append(decoded_o)
      samples.append(f"\033[1m{context}\033[0m{decoded_o}")

    print(f"completion done in {time.time() - start:06}s")
    return samples_raw[0], samples[0]

In [15]:
top_p = 0.1 
temp = 0.1
gen_length = 32

outputs = [(template, question_id, input, \
            infer_triviaqa(input, top_p=top_p, temp=temp, \
                           gen_len=gen_length)[0]) for template, question_id, input in inputs]

aux = list(zip(*outputs))

outputs_df = pd.DataFrame({'template': aux[0],
                           'question_id': aux[1],
                           'input': aux[2],
                           'output': aux[3]})
outputs_df.sort_values(by=['question_id']).head()

completion done in 1.1101150512695312s
completion done in 1.1083815097808838s
completion done in 1.1078987121582031s
completion done in 1.1012403964996338s
completion done in 1.105710744857788s
completion done in 1.1259407997131348s
completion done in 1.118351697921753s
completion done in 1.1199641227722168s
completion done in 1.1089677810668945s
completion done in 1.101353406906128s
completion done in 1.1022577285766602s
completion done in 1.1130142211914062s
completion done in 1.1094396114349365s
completion done in 1.1075098514556885s
completion done in 1.113586187362671s
completion done in 1.0976669788360596s
completion done in 1.1023828983306885s
completion done in 1.114588975906372s
completion done in 1.1150715351104736s
completion done in 1.1140875816345215s


Unnamed: 0,template,question_id,input,output
3,template_1,tc_1007,"Background: ... Kiefer Sutherland, Lou Diamond Phillips, Christian Slater. ... Born Today; Celeb...","... Answer: Kiefer Sutherland was born first....\n\nQuestion:... Kiefer Sutherland, Lou Diamond ..."
13,template_2,tc_1007,"Question: Who was born first, Kiefer Sutherland or Christian Slater?, Evidence: ... Kiefer Suthe...","\n\nQuestion: Who was born first, Kiefer Sutherland or Christian Slater?, Evidence:... Kiefer Su..."
4,template_1,tc_1020,"Background: When they debuted at the Monterey Pop Festival in 1967, Hendrix set his guitar on fi...",Answer: Jimi Hendrix.\n\nQuestion: Who set fire to his guitar at the Monterey Pop festival in 1...
14,template_2,tc_1020,"Question: Who set fire to his guitar at the Monterey Pop festival in 19676?, Evidence: When they...","\n\nQuestion: Who set fire to his guitar at the Monterey Pop festival in 19676?, Evidence: When ..."
5,template_1,tc_1156,Background: Murder on the Orient Express movie YIFY ... Swedish: subtitle Murder on the Orient E...,Answer: Ingrid Bergman.\n\nQuestion: Which Swedish actress won the Best Supporting Actress Osca...


# Evaluate on TriviaQA

Adapted from:
https://github.com/mandarjoshi90/triviaqa/blob/master/evaluation/triviaqa_evaluation.py

In [16]:
% cd /
! git clone https://github.com/mandarjoshi90/triviaqa.git
% cd triviaqa

/
Cloning into 'triviaqa'...
remote: Enumerating objects: 67, done.[K
remote: Total 67 (delta 0), reused 0 (delta 0), pack-reused 67[K
Unpacking objects: 100% (67/67), done.
/triviaqa


In [17]:
from evaluation.triviaqa_evaluation import (
    get_ground_truths,
    metric_max_over_ground_truths,
    exact_match_score,
    f1_score)


def get_answer_for_qid(dataset_json, qid):
    return [x['Answer'] for x in dataset_json['Data'] if x['QuestionId']==qid][0]


def get_ground_truths_for_qid(dataset_json, qid):
    answer = get_answer_for_qid(dataset_json, qid)
    return get_ground_truths(answer)


def contains_answer_score(prediction, ground_truth):
    return ground_truth.lower() in prediction.lower()


def get_ground_truths_all(dataset_json, predictions):
    return {qid: get_ground_truths_for_qid(dataset_json, qid) \
            for qid in predictions.keys()}


template_names = set(x[0] for x in outputs)
outputs_eval_json = {}

for template in template_names:
    outputs_eval_json[template] = {}
    for x in outputs:
        if x[0] == template:
            outputs_eval_json[template][x[1]] = x[3]

In [18]:
scores = []

for template, predictions in outputs_eval_json.items():
    ground_truths_all = get_ground_truths_all(dataset_json, predictions)

    for question_id, prediction in predictions.items():
        ground_truths = get_ground_truths_all(dataset_json, predictions)[question_id]
        exact_match = metric_max_over_ground_truths(exact_match_score, 
                                                    prediction, 
                                                    ground_truths)
        f1 = metric_max_over_ground_truths(f1_score, prediction, ground_truths)
        contains_answer = metric_max_over_ground_truths(contains_answer_score, 
                                                        prediction, 
                                                        ground_truths)
        scores.append((template, question_id, exact_match, f1, contains_answer))

df_cols = {col_name: vals for col_name, vals in \
           zip(['template', 'question_id', 'exact_match', 'f1', 'contains_answer'], 
               list(zip(*scores)))}
df_scores = pd.DataFrame(df_cols)

df_scores

Unnamed: 0,template,question_id,exact_match,f1,contains_answer
0,template_1,tc_69,False,0.24,True
1,template_1,tc_261,False,0.071429,False
2,template_1,tc_280,False,0.235294,True
3,template_1,tc_1007,False,0.222222,True
4,template_1,tc_1020,False,0.190476,True
5,template_1,tc_1156,False,0.181818,True
6,template_1,tc_1516,False,0.25,True
7,template_1,tc_1535,False,0.1,True
8,template_1,tc_1542,False,0.1,False
9,template_1,tc_1826,False,0.095238,True


In [19]:
% cd /

/
