
<a href="https://colab.research.google.com/github/google-research/text-to-text-transfer-transformer/blob/master/notebooks/t5-trivia.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##### Copyright 2020 The T5 Authors

Licensed under the Apache License, Version 2.0 (the "License");

In [None]:
# Copyright 2019 The T5 Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

# Fine-Tuning the Text-To-Text Transfer Transformer (T5) for Conversational Query Rewriting using CANARD

*The following tutorial guides you through the process of fine-tuning a pre-trained T5 model, evaluating its accuracy, and using it for prediction,
all on a free Google Cloud TPU <a href="https://colab.research.google.com/github/google-research/text-to-text-transfer-transformer/blob/master/notebooks/t5-trivia.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>.*

This is a modified version of the original notebook from the T5 authors available [here](https://github.com/google-research/text-to-text-transfer-transformer).
In this version of the notebook we train T5 in the conversational query rewriting task.

You can also download the **trained model** from [here](https://drive.google.com/file/d/1TBWNWHSxFYzDIZ8wVbFXKMQRWSrAfUq0/view?usp=sharing).

For more information about our work please read the original paper
[_Open-Domain Conversational Search Assistant with Transformers_](https://arxiv.org/pdf/2101.08197.pdf).
If you find anything useful, please cite our work.

### Background

T5 was introduced in the paper [_Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer_](https://arxiv.org/abs/1910.10683). In that paper, the authors provided a comprehensive picture of how to pre-train a standard text-to-text Transformer model on a large text corpus, achieving state-of-the-art results on many NLP tasks after fine-tuning.

The T5 model is pre-trained on a mixture of supervised and unsupervised tasks with the majoriy of data coming from an unlabeled dataset called [C4](https://www.tensorflow.org/datasets/catalog/c4). C4 is based on a massive scrape of the web produced by [Common Crawl](https://commoncrawl.org). Loosely speaking, pre-training on C4 ideally gives T5 an understanding of natural language in addition to general world knowledge.

### Testing T5?

As the name implies, T5 is a text-to-text model, which enables us to train it on arbitrary tasks involving a textual input and output. As in the original paper, a huge variety of NLP tasks can be cast in this format, including translation, summarization, and even classification and regression tasks.

One way to use this text-to-text framework is on reading comprehension problems, where the model is fed some context along with a question and is trained to predict the question's answer. For example, we might feed the model the text from the Wikipedia article about [Hurrican Connie](https://en.wikipedia.org/wiki/Hurricane_Connie) along with the question "On what date did Hurricane Connie occur?" and train the model to predict the answer "August 3rd, 1955".
A related task is open-domain question answering (QA) where the model is not provided with this oracle context. Typically, open-domain QA systems include a mechanism to look up information in an external knowledge source. This setting is similar to an "open-book" exam.

In this notebook, we'll be training T5 on a variant of this task which we call **conversational query rewriting**. We feed the model a conversational question and the history and train it rewritte the query to create a context-independent query. We will use the CANARD dataset and a single task.

### Caveats

* While we provide instructions for running on a [Cloud TPU](https://cloud.google.com/tpu/) via Colab for free, a [Google Cloud Storage (GCS)](http://console.cloud.google.com/storage) bucket is required for storing model parameters and data. The [GCS free tier](https://cloud.google.com/free/) provides 5 GB of storage, which should be enough to train the `large` model and smaller but not the `3B` or `11B` parameter models. You can use part of your initial $300 credit to get more space.
* The Cloud TPU provided by Colab (a `v2-8`) does not have enough memory to fine-tune the `11B` parameter model. For this model, you will need to fine-tune inside of a GCP instance (see [README](https://github.com/google-research/text-to-text-transfer-transformer/)).


# Set Up

<h3><a href="https://cloud.google.com/tpu/"><img valign="middle" src="https://raw.githubusercontent.com/GoogleCloudPlatform/tensorflow-without-a-phd/master/tensorflow-rl-pong/images/tpu-hexagon.png" width="50"></a>  &nbsp;&nbsp;Train on TPU</h3>




   1. Create a Cloud Storage bucket for your data and model checkpoints at http://console.cloud.google.com/storage, and fill in the `BASE_DIR` parameter in the following form. There is a [free tier](https://cloud.google.com/free/) if you do not yet have an account.
 
   1. On the main menu, click Runtime and select **Change runtime type**. Set "TPU" as the hardware accelerator.
   1. Run the following cell and follow instructions to:
    *  Set up a Colab TPU running environment
    *   Verify that you are connected to a TPU device
    *   Upload your credentials to TPU to access your GCS bucket


In [None]:
print("Installing dependencies...")
%tensorflow_version 2.x
!pip install -q t5

import functools
import os
import time
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

import tensorflow.compat.v1 as tf
import tensorflow_datasets as tfds

import t5

BASE_DIR = "gs://<name of bucket>" #@param { type: "string" }
if not BASE_DIR or BASE_DIR == "gs://":
  raise ValueError("You must enter a BASE_DIR.")
DATA_DIR = os.path.join(BASE_DIR, "data")
MODELS_DIR = os.path.join(BASE_DIR, "models")
ON_CLOUD = True


if ON_CLOUD:
  print("Setting up GCS access...")
  import tensorflow_gcs_config
  from google.colab import auth
  # Set credentials for GCS reading/writing from Colab and TPU.
  TPU_TOPOLOGY = "v2-8"
  try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  # TPU detection
    TPU_ADDRESS = tpu.get_master()
    print('Running on TPU:', TPU_ADDRESS)
  except ValueError:
    raise BaseException('ERROR: Not connected to a TPU runtime; please see the previous cell in this notebook for instructions!')
  auth.authenticate_user()
  tf.enable_eager_execution()
  tf.config.experimental_connect_to_host(TPU_ADDRESS)
  tensorflow_gcs_config.configure_gcs_from_colab_auth()

tf.disable_v2_behavior()

# Improve logging.
from contextlib import contextmanager
import logging as py_logging

if ON_CLOUD:
  tf.get_logger().propagate = False
  py_logging.root.setLevel('INFO')

@contextmanager
def tf_verbosity_level(level):
  og_level = tf.logging.get_verbosity()
  tf.logging.set_verbosity(level)
  yield
  tf.logging.set_verbosity(og_level)

Installing dependencies...
[K     |████████████████████████████████| 174kB 6.3MB/s 
[K     |████████████████████████████████| 71kB 4.8MB/s 
[K     |████████████████████████████████| 348kB 5.9MB/s 
[K     |████████████████████████████████| 3.7MB 9.9MB/s 
[K     |████████████████████████████████| 1.1MB 47.5MB/s 
[K     |████████████████████████████████| 1.4MB 34.7MB/s 
[K     |████████████████████████████████| 2.6MB 50.7MB/s 
[K     |████████████████████████████████| 2.9MB 47.5MB/s 
[K     |████████████████████████████████| 890kB 49.1MB/s 
[?25h  Building wheel for sacremoses (setup.py) ... [?25l[?25hdone
Setting up GCS access...
Running on TPU: grpc://10.53.192.178:8470
Instructions for updating:
non-resource variables are not supported in the long term


Instructions for updating:
non-resource variables are not supported in the long term


# Creating new Tasks and Mixture

Two core components of the T5 library are `Task` and `Mixture` objects.

A `Task` is a dataset along with preprocessing functions and evaluation metrics. A `Mixture` is a collection of `Task` objects along with a mixing rate or a function defining how to compute a mixing rate based on the properties of the constituent `Tasks`.

For this example, we will fine-tune the model to do conversational query rewriting.

### CANARD Dataset

Created based on query rewrittes from the QuAC dataset

In [None]:
import gzip
import json

canard_tsv_path = {
    "train": os.path.join(DATA_DIR, "training_t5_canard_data.tsv"),
    "validation": os.path.join(DATA_DIR, "validation_t5_canard_data.tsv"),
    "test": os.path.join(DATA_DIR, "test_t5_canard_data.tsv"),
    "cast_test": os.path.join(DATA_DIR, "trec_cast_evaluation.tsv")
}

Next, we define a function to load the TSV data as a `tf.data.Dataset` in TensorFlow.

In [None]:
def canard_dataset_fn(split, shuffle_files=False):
  # We only have one file for each split.
  del shuffle_files

  # Load lines from the text file as examples.
  ds = tf.data.TextLineDataset(canard_tsv_path[split])
  # Split each "<context>\t<rewrite>" example into (context, rewrite) tuple.
  ds = ds.map(
      functools.partial(tf.io.decode_csv, record_defaults=["", ""],
                        field_delim="\t", use_quote_delim=False),
      num_parallel_calls=tf.data.experimental.AUTOTUNE)
  # Map each tuple to a {"context": ... "rewrite": ...} dict.
  ds = ds.map(lambda *ex: dict(zip(["context", "rewrite"], ex)))
  return ds

print("A few raw validation examples...")
for ex in tfds.as_numpy(canard_dataset_fn("validation").take(5)):
  print(ex)

A few raw validation examples...
{'context': b'What group disbanded? [CTX] Frank Zappa [TURN] Disbandment', 'rewrite': b'What group disbanded?'}
{'context': b'When did they disband? [CTX] Frank Zappa [TURN] Disbandment [TURN] What group disbanded? [TURN] Zappa and the Mothers of Invention', 'rewrite': b'When did Zappa and the Mothers of Invention disband?'}
{'context': b'What kind of music did they play? [CTX] Frank Zappa [TURN] Disbandment [TURN] What group disbanded? [TURN] Zappa and the Mothers of Invention [TURN] When did they disband? [TURN] In late 1969, Zappa broke up the band.', 'rewrite': b'What kind of music did Zappa and the Mothers of Invention play?'}
{'context': b'Why did they break up? [CTX] Frank Zappa [TURN] Disbandment [TURN] What group disbanded? [TURN] Zappa and the Mothers of Invention [TURN] When did they disband? [TURN] In late 1969, Zappa broke up the band. [TURN] What kind of music did they play? [TURN] major influence on the development of the jazz-rock fusion

Now, we write a preprocess function to convert the examples into a text-to-text format, with both `inputs` and `targets` fields. The preprocessor also normalizes the text by lowercasing it and removing quotes since the answers are sometimes formatted in odd ways. Finally, we prepend 'canard context:' to the inputs so that the model knows what task it's trying to solve.

In [None]:
def canard_preprocessor(ds):
  def normalize_text(text):
    """Lowercase and remove quotes from a TensorFlow string."""
    #text = tf.strings.lower(text)
    text = tf.strings.regex_replace(text,"'(.*)'", r"\1")
    return text

  def to_inputs_and_targets(ex):
    """Map {"context": ..., "rewrite": ...}->{"inputs": ..., "targets": ...}."""
    return {
        "inputs":
             tf.strings.join(
                 ["canard context: ", normalize_text(ex["context"])]),
        "targets": normalize_text(ex["rewrite"])
    }
  return ds.map(to_inputs_and_targets, 
                num_parallel_calls=tf.data.experimental.AUTOTUNE)

Finally, we put everything together to create a `Task`.

In [None]:
t5.data.TaskRegistry.add(
    "canard_task",
    # Supply a function which returns a tf.data.Dataset.
    dataset_fn=canard_dataset_fn,
    splits=["train", "validation", "test", "cast_test"],
    # Supply a function which preprocesses text from the tf.data.Dataset.
    text_preprocessor=[canard_preprocessor],
    # Use the same vocabulary that we used for pre-training.
    # sentencepiece_model_path=t5.data.DEFAULT_SPM_PATH, # this was from old version of T5
    # Lowercase targets before computing metrics.
    postprocess_fn=t5.data.postprocessors.lower_text,
    # We'll use accuracy as our evaluation metric.
    metric_fns=[t5.evaluation.metrics.accuracy, t5.evaluation.metrics.bleu, t5.evaluation.metrics.rouge],
    # Not required, but helps for mixing and auto-caching.
    #num_input_examples=num_nq_examples
)

**Note**: Instead of defining `nq_dataset_fn` and above, we also could have used the `TextLineTask` class with the `parse_tsv` preprocessor for equivalent results as follows:

```py
t5.data.TaskRegistry.add(
    "nq_context_free",
    t5.data.TextLineTask,
    split_to_filepattern=nq_tsv_path,
    text_preprocessor=[
      functools.partial(
          t5.data.preprocessors.parse_tsv,
          field_names=["question", "answer"]),
      trivia_preprocessor
    ],
    postprocess_fn=t5.data.postprocessors.lower_text, 
    metric_fns=[t5.evaluation.metrics.accuracy],
    num_input_examples=num_nq_examples
)
```


# Transferring to new Tasks

We are now ready to fine-tune one of the pre-trained T5 models.

First, we'll instantiate a `Model` object using the model size of your choice. Note that larger models are slower to train and use but will likely achieve higher accuracy. You also may be able to increase accuracy by training longer with more `FINETUNE_STEPS` below.


## Caveats

* Due to its memory requirements, you will not be able to train the `11B` parameter model on the TPU provided by Colab. Instead, you will need to fine-tune inside of a GCP instance (see [README](https://github.com/google-research/text-to-text-transfer-transformer/)).
* Due to the checkpoint size, you will not be able use the 5GB GCS free tier for the `3B` parameter models. You will need at least 25GB of space, which you can purchase with your $300 of initial credit on GCP.
* While `large` can achieve decent results, it is recommended that you fine-tune at least the `3B` parameter model.


## Define Model

In [None]:
MODEL_SIZE = "base" #@param["small", "base", "large", "3B", "11B"]
FOLDER_TO_STORE = "_temperature_0" #@param { type: "string" }

# Public GCS path for T5 pre-trained model checkpoints
BASE_PRETRAINED_DIR = "gs://t5-data/pretrained_models"
PRETRAINED_DIR = os.path.join(BASE_PRETRAINED_DIR, MODEL_SIZE)
MODEL_DIR = os.path.join(MODELS_DIR, MODEL_SIZE + FOLDER_TO_STORE)

if ON_CLOUD and MODEL_SIZE == "3B":
  tf.logging.warning(
      "The `3B` model is too large to use with the 5GB GCS free tier. "
      "Make sure you have at least 25GB on GCS before continuing."
  )
elif ON_CLOUD and MODEL_SIZE == "11B":
  raise ValueError(
      "The `11B` parameter is too large to fine-tune on the `v2-8` TPU "
      "provided by Colab. Please comment out this Error if you're running "
      "on a larger TPU."
  )

# Set parallelism and batch size to fit on v2-8 TPU (if possible).
# Limit number of checkpoints to fit within 5GB (if possible).
model_parallelism, train_batch_size, keep_checkpoint_max = {
    "small": (1, 256, 16),
    "base": (2, 128, 8),
    "large": (8, 64, 4),
    "3B": (8, 16, 1),
    "11B": (8, 16, 1)}[MODEL_SIZE]

tf.io.gfile.makedirs(MODEL_DIR)
# The models from our paper are based on the Mesh Tensorflow Transformer.
model = t5.models.MtfModel(
    model_dir=MODEL_DIR,
    tpu=TPU_ADDRESS,
    tpu_topology=TPU_TOPOLOGY,
    model_parallelism=model_parallelism,
    batch_size=train_batch_size,
    sequence_length={"inputs": 256, "targets": 64},
    learning_rate_schedule=0.003,
    save_checkpoints_steps=5000,
    keep_checkpoint_max=keep_checkpoint_max if ON_CLOUD else None,
    iterations_per_loop=100,
)

Before we continue, let's load a [TensorBoard](https://www.tensorflow.org/tensorboard) visualizer so that we can keep monitor our progress. The page should automatically update as fine-tuning and evaluation proceed.

In [None]:
if ON_CLOUD:
  %reload_ext tensorboard
  import tensorboard as tb
tb.notebook.start("--logdir " + MODELS_DIR)

## Fine-tune

We are now ready to fine-tune our model. This will take a while (~25 minutes with default settings), so please be patient! The larger the model and more `FINETUNE_STEPS` you use, the longer it will take.

Don't worry, you can always come back later and increase the number of steps, and it will automatically pick up where you left off.

In [None]:
#5000
FINETUNE_STEPS = 5000 #@param {type: "integer"}

model.finetune(
    mixture_or_task_name="canard_task",
    pretrained_model_dir=PRETRAINED_DIR,
    finetune_steps=FINETUNE_STEPS
)

INFO:tensorflow:Using config: {'_model_dir': 'gs://t5testbucket/models/base_temperature_0', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.53.192.178:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({'worker': ['10.53.192.178:8470']}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': 'grpc://10.53.192.178:8470', '_evaluation_master': 'grpc://10.53.192.178:8470', '_is_chief': True, '_num_ps_replicas': 0, '_num_wo

  return dataset.map(my_fn, num_parallel_calls=tf.data.experimental.AUTOTUNE)


INFO:tensorflow:num_cores_per_replica: 1
INFO:tensorflow:computation_shape: [1, 1, 1, 1]
INFO:tensorflow:num_replicas: 8
INFO:tensorflow:device_assignment.topology.device_coordinates: [[[0 0 0 0]
  [0 0 0 1]
  [1 0 0 0]
  [1 0 0 1]
  [0 1 0 0]
  [0 1 0 1]
  [1 1 0 0]
  [1 1 0 1]]]
INFO:tensorflow:device_assignment.core_assignment: [[[0 0 0 0]]

 [[0 0 0 1]]

 [[1 0 0 0]]

 [[1 0 0 1]]

 [[0 1 0 0]]

 [[0 1 0 1]]

 [[1 1 0 0]]

 [[1 1 0 1]]]
INFO:tensorflow:auto_logical_to_physical_tpu logical_shape=[4, 2] physical_shape=[2, 2, 2]
INFO:tensorflow:auto_logical_to_physical_tpu logical_shape=[2] physical_shape=[1, 1, 2]
INFO:tensorflow:auto_logical_to_physical_tpu logical_to_physical = [(0, 0, 0), (0, 0, 1)]
INFO:tensorflow:auto_logical_to_physical_tpu logical_to_physical = [(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 1, 0), (1, 1, 1), (1, 0, 0), (1, 0, 1)]
INFO:tensorflow:SimdMeshImpl init: Shape[batch=4, model=2] LayoutRules{('heads', 'model'), ('vocab', 'model'), ('experts', 'batch'

## Evaluate

We now evaluate on the validation sets of the tasks in our mixture. Accuracy results will be logged and added to the TensorBoard above.

In [None]:
# Use a larger batch size for evaluation, which requires less memory.
model.batch_size = train_batch_size * 4
model.eval(
    mixture_or_task_name="canard_task",
    checkpoint_steps="all",
    split="validation"
)

INFO:tensorflow:Using config: {'_model_dir': 'gs://t5testbucket/models/base_temperature_0', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.53.192.178:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({'worker': ['10.53.192.178:8470']}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': 'grpc://10.53.192.178:8470', '_evaluation_master': 'grpc://10.53.192.178:8470', '_is_chief': True, '_num_ps_replicas': 0, '_num_wo

  return dataset.map(my_fn, num_parallel_calls=tf.data.experimental.AUTOTUNE)


INFO:tensorflow:Checkpoint path gs://t5testbucket/models/base_temperature_0/model.ckpt-999900
INFO:tensorflow:Querying Tensorflow master (grpc://10.53.192.178:8470) for TPU system metadata.
INFO:tensorflow:Initializing TPU system (master: grpc://10.53.192.178:8470) to fetch topology for model parallelism. This might take a while.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, -1, -7965212499576698751)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, -3539072570157500835)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, -406844559197703334)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU,

INFO:absl:rouge1 = 21.72, 95% confidence [21.02, 22.44]
INFO:absl:rouge2 = 11.90, 95% confidence [11.37, 12.43]
INFO:absl:rougeLsum = 21.30, 95% confidence [20.63, 21.95]


INFO:tensorflow:eval/canard_task/rouge1 at step 999900: 21.723
INFO:tensorflow:eval/canard_task/rouge2 at step 999900: 11.902
INFO:tensorflow:eval/canard_task/rougeLsum at step 999900: 21.299
INFO:tensorflow:Checkpoint path gs://t5testbucket/models/base_temperature_0/model.ckpt-1004900
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Padding 'canard_task' with sequence lengths: {'inputs': 256, 'targets': 64}
INFO:tensorflow:auto_logical_to_physical_tpu logical_shape=[4, 2] physical_shape=[2, 2, 2]
INFO:tensorflow:auto_logical_to_physical_tpu logical_shape=[2] physical_shape=[1, 1, 2]
INFO:tensorflow:auto_logical_to_physical_tpu logical_to_physical = [(0, 0, 0), (0, 0, 1)]
INFO:tensorflow:auto_logical_to_physical_tpu logical_to_physical = [(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 1, 0), (1, 1, 1), (1, 0, 0), (1, 0, 1)]
INFO:tensorflow:SimdMeshImpl init: Shape[batch=4, model=2] LayoutRules{('heads', 'model'), ('vocab', 'model'), ('experts', 'batch'), ('batch', 'batch'), ('d_ff', 

INFO:absl:rouge1 = 82.81, 95% confidence [82.19, 83.41]
INFO:absl:rouge2 = 71.67, 95% confidence [70.74, 72.47]
INFO:absl:rougeLsum = 80.10, 95% confidence [79.46, 80.77]


INFO:tensorflow:eval/canard_task/rouge1 at step 1004900: 82.806
INFO:tensorflow:eval/canard_task/rouge2 at step 1004900: 71.668
INFO:tensorflow:eval/canard_task/rougeLsum at step 1004900: 80.100


In [None]:
# Use a larger batch size for evaluation, which requires less memory.
model.batch_size = train_batch_size * 4
model.eval(
    mixture_or_task_name="canard_task",
    checkpoint_steps="all",
    split="test"
)

INFO:tensorflow:Using config: {'_model_dir': 'gs://t5testbucket/models/base_temperature_0', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.53.192.178:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({'worker': ['10.53.192.178:8470']}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': 'grpc://10.53.192.178:8470', '_evaluation_master': 'grpc://10.53.192.178:8470', '_is_chief': True, '_num_ps_replicas': 0, '_num_wo

  return dataset.map(my_fn, num_parallel_calls=tf.data.experimental.AUTOTUNE)


INFO:tensorflow:Checkpoint path gs://t5testbucket/models/base_temperature_0/model.ckpt-999900
INFO:tensorflow:Querying Tensorflow master (grpc://10.53.192.178:8470) for TPU system metadata.
INFO:tensorflow:Initializing TPU system (master: grpc://10.53.192.178:8470) to fetch topology for model parallelism. This might take a while.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, -1, -7965212499576698751)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, -3539072570157500835)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, -406844559197703334)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU,

INFO:absl:rouge1 = 20.08, 95% confidence [19.55, 20.60]
INFO:absl:rouge2 = 10.09, 95% confidence [9.68, 10.48]
INFO:absl:rougeLsum = 19.71, 95% confidence [19.22, 20.23]


INFO:tensorflow:eval/canard_task/rouge1 at step 999900: 20.081
INFO:tensorflow:eval/canard_task/rouge2 at step 999900: 10.090
INFO:tensorflow:eval/canard_task/rougeLsum at step 999900: 19.712
INFO:tensorflow:Checkpoint path gs://t5testbucket/models/base_temperature_0/model.ckpt-1004900
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Padding 'canard_task' with sequence lengths: {'inputs': 256, 'targets': 64}
INFO:tensorflow:auto_logical_to_physical_tpu logical_shape=[4, 2] physical_shape=[2, 2, 2]
INFO:tensorflow:auto_logical_to_physical_tpu logical_shape=[2] physical_shape=[1, 1, 2]
INFO:tensorflow:auto_logical_to_physical_tpu logical_to_physical = [(0, 0, 0), (0, 0, 1)]
INFO:tensorflow:auto_logical_to_physical_tpu logical_to_physical = [(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 1, 0), (1, 1, 1), (1, 0, 0), (1, 0, 1)]
INFO:tensorflow:SimdMeshImpl init: Shape[batch=4, model=2] LayoutRules{('heads', 'model'), ('vocab', 'model'), ('experts', 'batch'), ('batch', 'batch'), ('d_ff', 

INFO:absl:rouge1 = 81.40, 95% confidence [80.92, 81.89]
INFO:absl:rouge2 = 69.09, 95% confidence [68.37, 69.79]
INFO:absl:rougeLsum = 78.58, 95% confidence [77.99, 79.11]


INFO:tensorflow:eval/canard_task/rouge1 at step 1004900: 81.404
INFO:tensorflow:eval/canard_task/rouge2 at step 1004900: 69.091
INFO:tensorflow:eval/canard_task/rougeLsum at step 1004900: 78.580


In [None]:
# Use a larger batch size for evaluation, which requires less memory.
model.batch_size = train_batch_size * 4
model.eval(
    mixture_or_task_name="canard_task",
    checkpoint_steps="all",
    split="cast_test"
)

INFO:tensorflow:Using config: {'_model_dir': 'gs://t5testbucket/models/base_temperature_0', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.53.192.178:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({'worker': ['10.53.192.178:8470']}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': 'grpc://10.53.192.178:8470', '_evaluation_master': 'grpc://10.53.192.178:8470', '_is_chief': True, '_num_ps_replicas': 0, '_num_wo

  return dataset.map(my_fn, num_parallel_calls=tf.data.experimental.AUTOTUNE)


INFO:tensorflow:Checkpoint path gs://t5testbucket/models/base_temperature_0/model.ckpt-999900
INFO:tensorflow:Querying Tensorflow master (grpc://10.53.192.178:8470) for TPU system metadata.
INFO:tensorflow:Initializing TPU system (master: grpc://10.53.192.178:8470) to fetch topology for model parallelism. This might take a while.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, -1, -7965212499576698751)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, -3539072570157500835)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, -406844559197703334)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU,

INFO:absl:rouge1 = 11.80, 95% confidence [10.04, 13.72]
INFO:absl:rouge2 = 6.43, 95% confidence [5.16, 7.97]
INFO:absl:rougeLsum = 11.79, 95% confidence [10.04, 13.81]


INFO:tensorflow:eval/canard_task/rouge1 at step 999900: 11.798
INFO:tensorflow:eval/canard_task/rouge2 at step 999900: 6.429
INFO:tensorflow:eval/canard_task/rougeLsum at step 999900: 11.792
INFO:tensorflow:Checkpoint path gs://t5testbucket/models/base_temperature_0/model.ckpt-1004900
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Padding 'canard_task' with sequence lengths: {'inputs': 256, 'targets': 64}
INFO:tensorflow:auto_logical_to_physical_tpu logical_shape=[4, 2] physical_shape=[2, 2, 2]
INFO:tensorflow:auto_logical_to_physical_tpu logical_shape=[2] physical_shape=[1, 1, 2]
INFO:tensorflow:auto_logical_to_physical_tpu logical_to_physical = [(0, 0, 0), (0, 0, 1)]
INFO:tensorflow:auto_logical_to_physical_tpu logical_to_physical = [(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 1, 0), (1, 1, 1), (1, 0, 0), (1, 0, 1)]
INFO:tensorflow:SimdMeshImpl init: Shape[batch=4, model=2] LayoutRules{('heads', 'model'), ('vocab', 'model'), ('experts', 'batch'), ('batch', 'batch'), ('d_ff', '

INFO:absl:rouge1 = 92.15, 95% confidence [90.97, 93.19]
INFO:absl:rouge2 = 84.88, 95% confidence [82.91, 86.77]
INFO:absl:rougeLsum = 90.91, 95% confidence [89.67, 92.15]


INFO:tensorflow:eval/canard_task/rouge1 at step 1004900: 92.154
INFO:tensorflow:eval/canard_task/rouge2 at step 1004900: 84.883
INFO:tensorflow:eval/canard_task/rougeLsum at step 1004900: 90.909


Let's look at a few random predictions from the validation sets. Note that we measure accuracy based on an *exact match* of the predicted answer and the ground-truth answer. As a result, some of the answers are semantically correct but are counted wrong by the exact match score.

## Predict

Now that we have fine-tuned the model, we can feed T5 the conversational questions and the context and have it predict the rewritten question!

There is a significant amount of overhead in initializing the model so this may take a few minutes to run each time even though the prediction itself is quite fast.


To avoid this overhead, you might consider exporting a `SavedModel` and running it on [Cloud ML Engine](https://cloud.google.com/ml-engine/).



In [None]:
question_1 = "Did they have any clues? [CTX] Anna Politkovskaya [TURN] The murder remains unsolved, 2016" #@param {type:"string"}
question_2 = "How did they target her email? [CTX] Anna Politkovskaya [TURN] The murder remains unsolved, 2016 [TURN] Did they have any clues? [TURN] probably FSB) are known to have targeted the webmail account of the murdered Russian journalist Anna Politkovskaya." #@param {type:"string"}
question_3 = "Did they have any murder suspects? [CTX] Anna Politkovskaya [TURN] The murder remains unsolved, 2016 [TURN] Did they have any clues? [TURN] probably FSB) are known to have targeted the webmail account of the murdered Russian journalist Anna Politkovskaya. [TURN] How did they target her email? [TURN] On 5 December 2005, RFIS initiated an attack against the account annapolitovskaya@US Provider1, by deploying malicious software [TURN] Did they get into trouble for that? [TURN] I don't know." #@param {type:"string"}
question_4 = "Is there anything else interesting in the article? [CTX] Anna Politkovskaya [TURN] The murder remains unsolved, 2016 [TURN] Did they have any clues? [TURN] probably FSB) are known to have targeted the webmail account of the murdered Russian journalist Anna Politkovskaya. [TURN] How did they target her email? [TURN] On 5 December 2005, RFIS initiated an attack against the account annapolitovskaya@US Provider1, by deploying malicious software [TURN] Did they get into trouble for that? [TURN] I don't know. [TURN] Did they have any murder suspects? [TURN] After the three Makhmudov brothers, Khadjikurbanov and Lom-Ali Gaitukayev were convicted in 2014, [TURN] Did they go to jail? [TURN] I don't know." #@param {type:"string"}

questions = [question_1, question_2, question_3, question_4]

now = time.time()
# Write out the supplied questions to text files.
predict_inputs_path = os.path.join(MODEL_DIR, "predict_inputs_%d.txt" % now)
predict_outputs_path = os.path.join(MODEL_DIR, "predict_outputs_%d.txt" % now)
# Manually apply preprocessing by prepending "canard context:".
with tf.io.gfile.GFile(predict_inputs_path, "w") as f:
  for q in questions:
    f.write("canard context: %s\n" % q.lower())

# Ignore any logging so that we only see the model's answers to the questions.
with tf_verbosity_level('ERROR'):
  model.batch_size = 8
  model.predict(
      input_file=predict_inputs_path,
      output_file=predict_outputs_path,
      # Select the most probable output token at each step.
      temperature=0,
  )

# The output filename will have the checkpoint appended so we glob to get 
# the latest.
prediction_files = sorted(tf.io.gfile.glob(predict_outputs_path + "*"))
print("\nPredictions using checkpoint %s:\n" % prediction_files[-1].split("-")[-1])
with tf.io.gfile.GFile(prediction_files[-1]) as f:
  for q, a in zip(questions, f):
    if q:
      print("Context: " + q)
      print("rewrite: " + a)
      print()


Predictions using checkpoint 1004900:

Context: Did they have any clues? [CTX] Anna Politkovskaya [TURN] The murder remains unsolved, 2016
rewrite: did police have any clues about anna politkovskaya's murder?


Context: How did they target her email? [CTX] Anna Politkovskaya [TURN] The murder remains unsolved, 2016 [TURN] Did they have any clues? [TURN] probably FSB) are known to have targeted the webmail account of the murdered Russian journalist Anna Politkovskaya.
rewrite: how did fsb target anna politkovskaya's email?


Context: Did they have any murder suspects? [CTX] Anna Politkovskaya [TURN] The murder remains unsolved, 2016 [TURN] Did they have any clues? [TURN] probably FSB) are known to have targeted the webmail account of the murdered Russian journalist Anna Politkovskaya. [TURN] How did they target her email? [TURN] On 5 December 2005, RFIS initiated an attack against the account annapolitovskaya@US Provider1, by deploying malicious software [TURN] Did they get into troubl

# Export Model for Serving

As mentioned in the previous section, exporting a [`SavedModel`](https://www.tensorflow.org/guide/saved_model) can be useful for improving performance during inference or allowing your model to be deployed on a variety of platforms (e.g., TFLite, TensorFlow.js, TensorFlow Serving, or TensorFlow Hub).

**Note:** we currently only support exporting a SavedModel that runs on both CPU and GPU, not TPU.

## Export SavedModel

We first export the SavedModel. We set a batch size of 1 for simplicity, but it may be more efficient to use a larger batch size if you want to handle multiple requests per call.

For 3B and 11B models the export will take approximately 30-45 minutes.

In [None]:
export_dir = os.path.join(MODEL_DIR, "export")

model.batch_size = 1 # make one prediction per call
saved_model_path = model.export(
    export_dir,
    checkpoint_step=-1,  # use most recent
    beam_size=1,  # no beam search
    temperature=0.0,  # sample according to predicted distribution
)
print("Model saved to:", saved_model_path)

INFO:tensorflow:Using config: {'_model_dir': 'gs://t5testbucket/models/base_temperature_0', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.53.192.178:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({'worker': ['10.53.192.178:8470']}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': 'grpc://10.53.192.178:8470', '_evaluation_master': 'grpc://10.53.192.178:8470', '_is_chief': True, '_num_ps_replicas': 0, '_num_wo

  return dataset.map(my_fn, num_parallel_calls=tf.data.experimental.AUTOTUNE)


INFO:tensorflow:Variable decoder/block_000/layer_000/SelfAttention/k                  size 589824       slice_size 589824       Shape[d_model=768, heads=768]                               
INFO:tensorflow:Variable decoder/block_000/layer_000/SelfAttention/o                  size 589824       slice_size 589824       Shape[heads=768, d_model=768]                               
INFO:tensorflow:Variable decoder/block_000/layer_000/SelfAttention/q                  size 589824       slice_size 589824       Shape[d_model=768, heads=768]                               
INFO:tensorflow:Variable decoder/block_000/layer_000/SelfAttention/relative_attention_bias size 384          slice_size 384          Shape[heads=12, buckets=32]                                 
INFO:tensorflow:Variable decoder/block_000/layer_000/SelfAttention/v                  size 589824       slice_size 589824       Shape[d_model=768, heads=768]                               
INFO:tensorflow:Variable decoder/block_000/layer_0

## Load SavedModel

One way to test our model is to load it either in eager mode or a TF 1.x session so that we can repeatedly predict from the model without the overhead of loading the graph and weights each time.

We pay the overhead once here, but it shouldn't take more than a few minutes.


### Optional: Switch to GPU Runtime

Changing the runtime type to GPU in the `Runtime` menu above before loading the SavedModel will speed up inference by using the GPU instead of CPU.



In [None]:
#@title Optional: Run this cell to re-initialize if you switched to GPU runtime.
%tensorflow_version 2.x
!pip install tensorflow-text
from google.colab import auth
auth.authenticate_user()

Collecting tensorflow-text
[?25l  Downloading https://files.pythonhosted.org/packages/28/b2/2dbd90b93913afd07e6101b8b84327c401c394e60141c1e98590038060b3/tensorflow_text-2.3.0-cp36-cp36m-manylinux1_x86_64.whl (2.6MB)
[K     |████████████████████████████████| 2.6MB 8.8MB/s 
Installing collected packages: tensorflow-text
Successfully installed tensorflow-text-2.3.0


In [None]:
import tensorflow as tf
import tensorflow_text  # Required to run exported model.

# TODO change this to load a different model
saved_model_path = "gs://<name of bucket>/models/base_temperature_0/export/<checkpoint>"

def load_predict_fn(model_path):
  if tf.executing_eagerly():
    print("Loading SavedModel in eager mode.")
    imported = tf.saved_model.load(model_path, ["serve"])
    return lambda x: imported.signatures['serving_default'](tf.constant(x))['outputs'].numpy()
  else:
    print("Loading SavedModel in tf 1.x graph mode.")
    tf.compat.v1.reset_default_graph()
    sess = tf.compat.v1.Session()
    meta_graph_def = tf.compat.v1.saved_model.load(sess, ["serve"], model_path)
    signature_def = meta_graph_def.signature_def["serving_default"]
    return lambda x: sess.run(
        fetches=signature_def.outputs["outputs"].name, 
        feed_dict={signature_def.inputs["input"].name: x}
    )

predict_fn = load_predict_fn(saved_model_path)

Loading SavedModel in eager mode.


## Predict

We can now call the predict method with different inputs each time and relatively quickly get results.

In [None]:
def answer(question):
  return predict_fn([question])[0].decode('utf-8')

In [None]:
targets =  [
           "Aside from the discontinued case against a convicted killer, how is the murder of Anna Politkovskaya similar to any other cases?",
           "Why did Superstar Billy Graham return to the WWWF?",
           "What was Superstar Billy Graham's agreement with McMahon?",
           "How did people respond to Superstar Billy Graham's return?"
]

questions = ["canard context: Is it similar to any other cases? [CTX] Anna Politkovskaya [TURN] The murder remains unsolved, 2016 [TURN] Did they have any clues? [TURN] probably FSB) are known to have targeted the webmail account of the murdered Russian journalist Anna Politkovskaya. [TURN] How did they target her email? [TURN] On 5 December 2005, RFIS initiated an attack against the account annapolitovskaya@US Provider1, by deploying malicious software [TURN] Did they get into trouble for that? [TURN] I don't know. [TURN] Did they have any murder suspects? [TURN] After the three Makhmudov brothers, Khadjikurbanov and Lom-Ali Gaitukayev were convicted in 2014, [TURN] Did they go to jail? [TURN] I don't know. [TURN] Is there anything else interesting in the article? [TURN] In accordance with Russian law there is a 15-year statute of limitation for the 'particularly grave' crime of first degree murder. [TURN] Are they close to solving it? [TURN] In May that year the case against him was discontinued because the statute of limitations had expired.",
             "canard context: Why did he return to the WWWF? [CTX] Superstar Billy Graham [TURN] Return to WWWF (1977-1981)",
             "canard context: What was his agreement with McMahon? [CTX] Superstar Billy Graham [TURN] Return to WWWF (1977-1981) [TURN] Why did he return to the WWWF? [TURN] an agreement with promoter Vincent J. McMahon (Senior",
             "canard context: How did people respond to his return? [CTX] Superstar Billy Graham [TURN] Return to WWWF (1977-1981) [TURN] Why did he return to the WWWF? [TURN] an agreement with promoter Vincent J. McMahon (Senior [TURN] What was his agreement with McMahon? [TURN] I don't know."
            ]

def test_examples():
  for i, question in enumerate(questions):
      print("Target: " + targets[i])
      print("Rewrite: " + answer(question))
      print()

In [None]:
for j in range(1):
  test_examples()

Target: Aside from the discontinued case against a convicted killer, how is the murder of Anna Politkovskaya similar to any other cases?
Rewrite: Is the case similar to any other cases besides the murder of Anna Politkovskaya?

Target: Why did Superstar Billy Graham return to the WWWF?
Rewrite: Why did Billy Graham return to the WWWF?

Target: What was Superstar Billy Graham's agreement with McMahon?
Rewrite: What was Billy Graham's agreement with McMahon?

Target: How did people respond to Superstar Billy Graham's return?
Rewrite: How did people respond to Billy Graham's return to WWWF?



## Deploy SavedModel

You can now deploy your SavedModel for serving (e.g., with [TensorFlow Serving](https://www.tensorflow.org/tfx/tutorials/serving/rest_simple)).

## Predict on CAsT

We can now predict the queries on CAsT 2019.
Do not forget to load to Google Colab the necessary files available at the repo and at https://www.treccast.ai/:
   * trec_cast_evaluation.tsv
   * evaluation_topics_annotated_resolved_v1.0.tsv
   * evaluation_topics_v1.0.json to colab

In [None]:
import csv
import json

def answer_trec_cast():
  # read evaluation files and write predictions to tsv file
  fileA = open("trec_cast_evaluation.tsv")
  fileB = open("evaluation_topics_annotated_resolved_v1.0.tsv")
  fileWrite = open("trec_cast_evaluation_predicted_t5.tsv", "w")
  csv_reader1 = csv.reader(fileA, delimiter='\t')
  csv_reader2 = csv.reader(fileB, delimiter='\t')
  tsv_writer = csv.writer(fileWrite, delimiter='\t')
  for lineA, lineB in zip(csv_reader1, csv_reader2):
    predicted = answer("canard context: " + lineA[0])
    print("Topic: " + lineB[0])
    print("Target:    " + lineA[1])
    print("Predicted: " + predicted)
    tsv_writer.writerow([lineB[0], predicted])

# TODO upload trec_cast_evaluation.tsv and evaluation_topics_annotated_resolved_v1.0.tsv to colab before running this cell
answer_trec_cast()


def create_json_t5_predictions(input_json_file, output_file):
    # write evaluation topics in json format to t5 json predictions to use for retrieval
    with open(input_json_file, "r") as input_file, open(output_file, "w") as out_file:
        output_dic = {}
        input_json = json.load(input_file)
        for topic in input_json:
            topic_number = topic["number"]
            resolved_utterances_array = []
            for turn in topic["turn"]:
                turn_number = turn["number"]
                topic_turn_id = '%d_%d' % (topic_number, turn_number)

                if turn["number"] == 1:
                    predicted_query = turn["raw_utterance"]
                    output_dic[str(topic_number)] = {str(turn_number): turn["raw_utterance"]}
                else:
                    history = " [TURN] ".join(resolved_utterances_array)
                    question_orig = turn["raw_utterance"]
                    context = question_orig + " [CTX] " + history
                    predicted_query = answer("canard context: " + context)
                    output_dic[str(topic_number)][str(turn_number)] = predicted_query
                    #print(context)
                #print(predicted_query)


                resolved_utterances_array.append(predicted_query)
        json.dump(output_dic, out_file, indent=2)

# TODO upload evaluation_topics_v1.0.json to colab before running this cell
create_json_t5_predictions("evaluation_topics_v1.0.json", "trec_cast_evaluation_t5_real_time.json")