<a href="https://colab.research.google.com/github/google-research/language/blob/master/language/decontext/decontextualization_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Decontextualization Demo

This colab contains a T5 model for decontextualizing individual 
sentences. The decontextualization task is described in
[Decontextualization: Making Sentences Stand-Alone]().

Please cite as:

```
@article{choi2021making,
  title = {Decontextualization: Making Sentences Stand-Alone},
  author = {Eunsol Choi and Jennimaria Palomaki and Matthew Lamm and Tom Kwiatkowski and Dipanjan Das and Michael Collins},
  year = {2021},
  journal = {Transactions of the Association of Computational Linguistics}
}
```

## Input format
The Decontextualization model is trained on Wikipedia pages. The input is made
up of the page title; the (possibly empty) section titles; and a paragraph that is split into a prefix, the target sentence (to be decontextualized), and a
suffix. The model input should have the form:

```<page title> [SEP] <section title> [SEP] <preceeding sentences> [SEP] <target sentence> [SEP] <succeeding sentences>```

where any of the fields apart from `<target sentence>` may be empty, but all of
the `[SEP]` tokens should be included.

## Load a tuned T5 model

Choose which model you'd like to load, and define a prediction function.

These models are tuned versions of the
[released T5 models](https://github.com/google-research/text-to-text-transfer-transformer#released-model-checkpoints).
Details of the model tuning are available in [the paper]().
 Be warned that T5-11B is very slow on CPU.

In [11]:
print("Installing dependencies...")
!pip install -q tensorflow_text

from os import path

import tensorflow as tf
import tensorflow_text  # Required to run exported model.

MODEL_SIZE = "base" #@param["base", "3B", "11B"]

DATASET_BUCKET = 'gs://decontext_dataset'

SAVED_MODELS = {
  "base": f'{DATASET_BUCKET}/t5_base/1611267950',
  "3B": f'{DATASET_BUCKET}/t5_3B/1611333896',
  "11B": f'{DATASET_BUCKET}/t5_11B/1605298402'
}

SAVED_MODEL_PATH = SAVED_MODELS[MODEL_SIZE]
DEV = path.join(DATASET_BUCKET, 'decontext_dev.jsonl')
SAVED_MODEL_PATH = path.join(DATASET_BUCKET, 't5_base/1611267950')

def load_predict_fn(model_path):
  print("Loading SavedModel in eager mode.")
  imported = tf.saved_model.load(model_path, ["serve"])
  return lambda x: imported.signatures['serving_default'](
      tf.constant(x))['outputs'].numpy()

predict_fn = load_predict_fn(SAVED_MODEL_PATH)

def decontextualize(input):
  return predict_fn([input])[0].decode('utf-8')

Loading SavedModel in eager mode.


## Try on some of your own input

Type in a paragraph, one sentence per line, as well as the page title and 
any section titles.
Then, indicate which sentence you would like to decontextualize and run the 
model in prediction mode.

In [12]:
paragraph = [
  "Gagarin was a keen sportsman and played ice hockey as a goalkeeper.",
  "He was also a basketball fan and coached the Saratov Industrial Technical School team, as well as being a referee.",
  "In 1957, while a cadet in flight school, Gagarin met Valentina Goryacheva at the May Day celebrations at the Red Square in Moscow.",
  "She was a medical technician who had graduated from Orenburg Medical School.",
  "They were married on 7 November of the same year, the same day Gagarin graduated from his flight school, and they had two daughters.",
  "Yelena Yurievna Gagarina, born 1959, is an art historian who has worked as the director-general of the Moscow Kremlin Museums since 2001; and Galina Yurievna Gagarina, born 1961, is a professor of economics and the department chair at Plekhanov Russian University of Economics in Moscow."
]

page_title = 'Yuri Gagarin'
section_title = 'Personal Life'  # can be empty
target_sentence_idx = 4  # zero-based index


if target_sentence_idx >= len(paragraph) or target_sentence_idx < 0:
  raise ValueError(
      f'Target sentence index must be in range [0, {len(paragraph) - 1}].')


def create_input(paragraph,
                 target_sentence_idx,
                 page_title='',
                 section_title=''):
  """Creates a single Decontextualization example input for T5.

  Args:
    paragraph: List of strings. Each string is a single sentence.
    target_sentence_idx: Integer index into `paragraph` indicating which
      sentence should be decontextualized.
    page_title: Optional title string. Usually Wikipedia page title.
    section_title: Optional title of section within page.
  """
  prefix = ' '.join(paragraph[:target_sentence_idx])
  target = paragraph[target_sentence_idx]
  suffix = ' '.join(paragraph[target_sentence_idx + 1:])
  return ' [SEP] '.join((page_title, section_title, prefix, target, suffix))

d = decontextualize(
        create_input(paragraph, target_sentence_idx, page_title,
                     section_title))
print(f'Original sentence:         {paragraph[target_sentence_idx]}\n'
      f'Decontextualized sentence: {d}')

Original sentence:         They were married on 7 November of the same year, the same day Gagarin graduated from his flight school, and they had two daughters.
Decontextualized sentence: DONE #### Yuri Gagarin and Valentina Goryacheva were married on 7 November of the same year, the same day Gagarin graduated from his flight school, and they had two daughters.
