<a href="https://colab.research.google.com/github/miweru/colab-experiments/blob/master/forensic.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Welcome to the [Tensor2Tensor](https://github.com/tensorflow/tensor2tensor) Colab

Tensor2Tensor, or T2T for short, is a library of deep learning models and datasets designed to make deep learning more accessible and [accelerate ML research](https://research.googleblog.com/2017/06/accelerating-deep-learning-research.html). T2T is actively used and maintained by researchers and engineers within the [Google Brain team](https://research.google.com/teams/brain/) and a community of users. This colab shows you some datasets we have in T2T, how to download and use them, some models we have, how to download pre-trained models and use them, and how to create and train your own models.

In [0]:
#@title
# Copyright 2018 Google LLC.

# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at

# https://www.apache.org/licenses/LICENSE-2.0

# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

In [2]:
!pip install smart_open




In [0]:
from smart_open import smart_open

In [4]:
!wget https://www.uni-weimar.de/medien/webis/corpora/corpus-pan-labs-09-today/pan-19/pan19-data/pan19-celebrity-profiling-training-dataset-2019-01-31.zip

--2019-06-19 10:59:43--  https://www.uni-weimar.de/medien/webis/corpora/corpus-pan-labs-09-today/pan-19/pan19-data/pan19-celebrity-profiling-training-dataset-2019-01-31.zip
Resolving www.uni-weimar.de (www.uni-weimar.de)... 141.54.1.34
Connecting to www.uni-weimar.de (www.uni-weimar.de)|141.54.1.34|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3169542881 (3.0G) [application/zip]
Saving to: ‘pan19-celebrity-profiling-training-dataset-2019-01-31.zip.1’


2019-06-19 11:04:52 (9.81 MB/s) - ‘pan19-celebrity-profiling-training-dataset-2019-01-31.zip.1’ saved [3169542881/3169542881]



In [0]:
import zipfile
import json

In [0]:
zipf = zipfile.ZipFile("pan19-celebrity-profiling-training-dataset-2019-01-31.zip")

In [0]:
rdict={di["id"]:di for di in  [json.loads(line) for line in zipf.open("pan19-celebrity-profiling-training-dataset-2019-01-31/labels.ndjson")]}

In [8]:
zipf.filelist

[<ZipInfo filename='pan19-celebrity-profiling-training-dataset-2019-01-31/' filemode='drwxrwxr-x' external_attr=0x10>,
 <ZipInfo filename='pan19-celebrity-profiling-training-dataset-2019-01-31/labels.ndjson' compress_type=deflate filemode='-rw-r--r--' file_size=3174942 compress_size=244802>,
 <ZipInfo filename='pan19-celebrity-profiling-training-dataset-2019-01-31/feeds.ndjson' compress_type=deflate filemode='-rw-r--r--' file_size=7962789753 compress_size=3169297191>]

In [0]:
def tweetgen():
  for line in zipf.open("pan19-celebrity-profiling-training-dataset-2019-01-31/feeds.ndjson"):
    item = json.loads(line)
    id = item["id"]
    for tweet in item["text"]:
      info = rdict[id]
      yield (
          "{0}_{1}_{2}_{3}".format(
          info["birthyear"],
          info["fame"],
          info["occupation"],
          info["gender"]
        ),
          tweet)



In [10]:
for line in tweetgen():
  print(line)
  break

('1984_superstar_performer_female', "Back at it with @americanidol looking for...her 🎤💫 Circa early 2000’s @ Coeur d'Alene, Idaho https://t.co/yiZzHb2RVD")


In [11]:
!pip install flair



In [0]:
from flair.embeddings import FlairEmbeddings

In [0]:
embeddings=FlairEmbeddings("mix-forward")

In [0]:
from flair.embeddings import WordEmbeddings, DocumentRNNEmbeddings

document_embedding=DocumentRNNEmbeddings([embeddings],
                                                                     hidden_size=512,
                                                                     reproject_words=True,
                                                                     reproject_words_dimension=256,
                                                                     )

In [0]:
from flair.embeddings import DocumentPoolEmbeddings, BytePairEmbeddings

In [16]:
embed = DocumentPoolEmbeddings([BytePairEmbeddings('en')])

  'See the migration notes for details: %s' % _MIGRATION_NOTES_URL


In [17]:
?Dataset

Object `Dataset` not found.


In [18]:
from flair.embeddings import Sentence
sentence = Sentence('The grass is green . And the sky is blue .')

embed.embed(sentence)

print(sentence.get_embedding())

tensor([-0.2471,  0.0985, -0.4170,  0.5155, -0.0757,  0.0968,  0.0870,  0.6644,
        -0.2122,  0.3244, -0.1085, -0.0676, -0.7510,  0.1455, -0.1477,  0.0505,
        -0.2138,  0.2538, -0.3236,  0.2343, -0.0443, -0.0027, -0.3318,  0.3482,
         0.1479,  0.0939, -1.0260,  0.0993,  0.2405,  0.0881, -0.2861,  0.4035,
        -0.1264,  0.0233,  0.1348, -0.1883, -0.2646,  0.0958, -0.2776, -0.1250,
         0.2242, -0.3398,  0.2380,  0.3410, -1.1524, -0.0839,  0.0518, -0.1196,
         0.1365,  0.1209, -0.2471,  0.0985, -0.4170,  0.5155, -0.0757,  0.0968,
         0.0870,  0.6644, -0.2122,  0.3244, -0.1085, -0.0676, -0.7510,  0.1455,
        -0.1477,  0.0505, -0.2138,  0.2538, -0.3236,  0.2343, -0.0443, -0.0027,
        -0.3318,  0.3482,  0.1479,  0.0939, -1.0260,  0.0993,  0.2405,  0.0881,
        -0.2861,  0.4035, -0.1264,  0.0233,  0.1348, -0.1883, -0.2646,  0.0958,
        -0.2776, -0.1250,  0.2242, -0.3398,  0.2380,  0.3410, -1.1524, -0.0839,
         0.0518, -0.1196,  0.1365,  0.12

In [0]:
import numpy as np

!pip install -q tensorflow==2.0.0-beta1
import tensorflow as tf

In [20]:
rdict

{31448: {'birthyear': 1943,
  'fame': 'star',
  'gender': 'male',
  'id': 31448,
  'occupation': 'performer'},
 25082: {'birthyear': 1961,
  'fame': 'star',
  'gender': 'male',
  'id': 25082,
  'occupation': 'politics'},
 15880: {'birthyear': 1960,
  'fame': 'star',
  'gender': 'male',
  'id': 15880,
  'occupation': 'politics'},
 8821: {'birthyear': 1971,
  'fame': 'star',
  'gender': 'female',
  'id': 8821,
  'occupation': 'creator'},
 14515: {'birthyear': 1983,
  'fame': 'star',
  'gender': 'male',
  'id': 14515,
  'occupation': 'sports'},
 24831: {'birthyear': 1987,
  'fame': 'star',
  'gender': 'female',
  'id': 24831,
  'occupation': 'creator'},
 39822: {'birthyear': 1979,
  'fame': 'star',
  'gender': 'male',
  'id': 39822,
  'occupation': 'science'},
 12598: {'birthyear': 1970,
  'fame': 'star',
  'gender': 'male',
  'id': 12598,
  'occupation': 'science'},
 35004: {'birthyear': 1990,
  'fame': 'superstar',
  'gender': 'male',
  'id': 35004,
  'occupation': 'performer'},
 18431:

In [21]:
rde = {}
for item in rdict.values():
  for k, v in item.items():
    if k not in rde:
      rde[k]=set()
    rde[k].add(v)
rde.pop("id")

{1,
 2,
 3,
 4,
 5,
 6,
 9,
 10,
 11,
 13,
 14,
 17,
 18,
 19,
 20,
 21,
 23,
 24,
 25,
 26,
 28,
 29,
 32,
 33,
 35,
 36,
 37,
 40,
 41,
 42,
 43,
 45,
 46,
 49,
 50,
 51,
 52,
 56,
 59,
 60,
 62,
 63,
 64,
 65,
 66,
 67,
 69,
 70,
 71,
 72,
 73,
 74,
 75,
 76,
 77,
 78,
 80,
 81,
 82,
 83,
 84,
 85,
 86,
 87,
 88,
 89,
 92,
 93,
 94,
 95,
 96,
 98,
 99,
 100,
 102,
 103,
 104,
 105,
 106,
 108,
 109,
 110,
 112,
 113,
 114,
 115,
 116,
 117,
 119,
 120,
 121,
 122,
 123,
 124,
 126,
 128,
 129,
 130,
 131,
 133,
 136,
 137,
 138,
 139,
 140,
 142,
 143,
 146,
 148,
 150,
 152,
 154,
 155,
 156,
 157,
 159,
 160,
 161,
 162,
 163,
 164,
 165,
 167,
 168,
 169,
 170,
 171,
 172,
 173,
 174,
 175,
 176,
 177,
 178,
 179,
 180,
 181,
 183,
 184,
 185,
 186,
 187,
 189,
 190,
 191,
 192,
 193,
 194,
 195,
 197,
 198,
 199,
 200,
 201,
 203,
 204,
 205,
 207,
 208,
 209,
 210,
 211,
 212,
 213,
 214,
 215,
 216,
 218,
 219,
 220,
 222,
 224,
 225,
 226,
 227,
 228,
 229,
 231,
 232,
 233,


In [22]:
rde

{'birthyear': {1940,
  1941,
  1942,
  1943,
  1944,
  1945,
  1946,
  1947,
  1948,
  1949,
  1950,
  1951,
  1952,
  1953,
  1954,
  1955,
  1956,
  1957,
  1958,
  1959,
  1960,
  1961,
  1962,
  1963,
  1964,
  1965,
  1966,
  1967,
  1968,
  1969,
  1970,
  1971,
  1972,
  1973,
  1974,
  1975,
  1976,
  1977,
  1978,
  1979,
  1980,
  1981,
  1982,
  1983,
  1984,
  1985,
  1986,
  1987,
  1988,
  1989,
  1990,
  1991,
  1992,
  1993,
  1994,
  1995,
  1996,
  1997,
  1998,
  1999,
  2000,
  2001,
  2002,
  2003,
  2004,
  2005,
  2006,
  2007,
  2008,
  2011},
 'fame': {'rising', 'star', 'superstar'},
 'gender': {'female', 'male', 'nonbinary'},
 'occupation': {'creator',
  'manager',
  'performer',
  'politics',
  'professional',
  'religious',
  'science',
  'sports'}}

In [0]:
bset=sorted(list(rde["birthyear"]))

In [24]:
dict(zip(bset,[(y-min(bset))/(max(bset)-min(bset)) for y in bset]))

{1940: 0.0,
 1941: 0.014084507042253521,
 1942: 0.028169014084507043,
 1943: 0.04225352112676056,
 1944: 0.056338028169014086,
 1945: 0.07042253521126761,
 1946: 0.08450704225352113,
 1947: 0.09859154929577464,
 1948: 0.11267605633802817,
 1949: 0.1267605633802817,
 1950: 0.14084507042253522,
 1951: 0.15492957746478872,
 1952: 0.16901408450704225,
 1953: 0.18309859154929578,
 1954: 0.19718309859154928,
 1955: 0.2112676056338028,
 1956: 0.22535211267605634,
 1957: 0.23943661971830985,
 1958: 0.2535211267605634,
 1959: 0.2676056338028169,
 1960: 0.28169014084507044,
 1961: 0.29577464788732394,
 1962: 0.30985915492957744,
 1963: 0.323943661971831,
 1964: 0.3380281690140845,
 1965: 0.352112676056338,
 1966: 0.36619718309859156,
 1967: 0.38028169014084506,
 1968: 0.39436619718309857,
 1969: 0.4084507042253521,
 1970: 0.4225352112676056,
 1971: 0.43661971830985913,
 1972: 0.4507042253521127,
 1973: 0.4647887323943662,
 1974: 0.4788732394366197,
 1975: 0.49295774647887325,
 1976: 0.5070422535

In [0]:
encodedF = dict()
for k, v in rde.items():
  if k == "birthyear":
    bset=sorted(list(v))
    encodedF[k]=dict(zip(bset,[np.array((y-min(bset))/(max(bset)-min(bset))) for y in bset]))
  else:
    its=sorted(list(v))
    encodedF[k]={v:np.array([1 if c==x else 0 for x in range(len(its))], dtype=np.bool) for  c, v in enumerate(its)}

In [26]:
encodedF

{'birthyear': {1940: array(0.),
  1941: array(0.01408451),
  1942: array(0.02816901),
  1943: array(0.04225352),
  1944: array(0.05633803),
  1945: array(0.07042254),
  1946: array(0.08450704),
  1947: array(0.09859155),
  1948: array(0.11267606),
  1949: array(0.12676056),
  1950: array(0.14084507),
  1951: array(0.15492958),
  1952: array(0.16901408),
  1953: array(0.18309859),
  1954: array(0.1971831),
  1955: array(0.21126761),
  1956: array(0.22535211),
  1957: array(0.23943662),
  1958: array(0.25352113),
  1959: array(0.26760563),
  1960: array(0.28169014),
  1961: array(0.29577465),
  1962: array(0.30985915),
  1963: array(0.32394366),
  1964: array(0.33802817),
  1965: array(0.35211268),
  1966: array(0.36619718),
  1967: array(0.38028169),
  1968: array(0.3943662),
  1969: array(0.4084507),
  1970: array(0.42253521),
  1971: array(0.43661972),
  1972: array(0.45070423),
  1973: array(0.46478873),
  1974: array(0.47887324),
  1975: array(0.49295775),
  1976: array(0.50704225),

In [0]:
def encode_mdi(m):
  if "id" in m:
    m.pop("id")
  return [encodedF[k][v] for k, v in m.items()]

In [35]:
zipf.extract("pan19-celebrity-profiling-training-dataset-2019-01-31/feeds.ndjson")

Archive:  pan19-celebrity-profiling-training-dataset-2019-01-31.zip
  inflating: pan19-celebrity-profiling-training-dataset-2019-01-31/labels.ndjson  
replace pan19-celebrity-profiling-training-dataset-2019-01-31/feeds.ndjson? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

In [36]:
!ls -l pan19-celebrity-profiling-training-dataset-2019-01-31

total 7779272
-rw-r--r-- 1 root root 7962789753 Jun 19 11:07 feeds.ndjson
-rw-r--r-- 1 root root    3174942 Jan 31 09:25 labels.ndjson


In [0]:
def datagen():
  #for line in zipf.open("pan19-celebrity-profiling-training-dataset-2019-01-31/feeds.ndjson"):
  for line in open("pan19-celebrity-profiling-training-dataset-2019-01-31/feeds.ndjson"):
    item = json.loads(line)
    iid = item["id"]
    info = rdict[iid]
    returno =[]
    for sent in item["text"]:
      if len(returno)>49:
        yield np.vstack(returno),encode_mdi(rdict[iid])
        returno = []
      s = Sentence(sent)
      embed.embed(s)
      returno.append(s.get_embedding().detach().numpy())




In [38]:
for v,t in datagen():
  print(v)
  print(t)
  break

[[-0.10173237  0.22410418 -0.35566103 ... -0.02321319 -0.00179137
  -0.11675963]
 [-0.25170866  0.4317423  -0.37286314 ... -0.0295395   0.02431801
   0.09393979]
 [-0.26801372 -0.12099776 -1.3368626  ...  0.488567   -0.30842376
  -0.42439324]
 ...
 [-0.21317063  0.0617504  -0.28168342 ... -0.10483559  0.1343348
  -0.28896642]
 [-0.04478863  0.53605384 -0.05588774 ...  0.0303     -0.10340837
   0.01616913]
 [ 0.10418486  0.06207187 -0.49881604 ...  0.17071672 -0.07126286
   0.00425371]]
[array([False, False,  True, False, False, False, False, False]), array([ True, False, False]), array([False, False,  True]), array(0.61971831)]


In [0]:
def batchy():
  i0d=[]
  o1d=[]
  o2d=[]
  o3d=[]
  o4d=[]
  for i, o in datagen():
    o1,o2,o3,o4 = o
    i0d.append(i)
    o1d.append(o1)
    o2d.append(o2)
    o3d.append(o3)
    o4d.append(o4)
    if len(i0d)==500:
      yield (np.array(i0d),[np.array(v) for v in [o1d,o2d,o3d,o4d]])
      i0d=[]
      o1d=[]
      o2d=[]
      o3d=[]
      o4d=[]

In [40]:
v.shape

(50, 100)

In [0]:
from tensorflow import keras

In [0]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, LSTM


In [43]:
i = Input(shape=(50,100))
x = LSTM(100, return_sequences=False)(i)
#x = LSTM(100, return_sequences=True)(x)
#x = LSTM(100, return_sequences=False)(x)
b = Dense(100)(x)
o1 = Dense(8, activation='softmax')(b)
o2 = Dense(3, activation='softmax')(b)
o3 = Dense(3, activation='softmax')(b)
o4 = Dense(1)(b)
model = Model(inputs=i, outputs=[o1,o2,o3,o4])
model.compile("adam", loss=['categorical_crossentropy','categorical_crossentropy','categorical_crossentropy',"mse"])
model.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 50, 100)]    0                                            
__________________________________________________________________________________________________
lstm (LSTM)                     (None, 100)          80400       input_1[0][0]                    
__________________________________________________________________________________________________
dense (Dense)                   (None, 100)          10100       lstm[0][0]                       
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 8)            808         dense[0][0]                      
______________________________________________________________________________________________

In [0]:
model.fit_generator(batchy(), steps_per_epoch=4800)

2019-06-19 11:43:34,858 From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_grad.py:1250: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
  26/4800 [..............................] - ETA: 47:11:24 - loss: 2.4226 - dense_1_loss: 1.3933 - dense_2_loss: 0.7412 - dense_3_loss: 0.2293 - dense_4_loss: 0.0589

In [0]:
# A Problem is a dataset together with some fixed pre-processing.
# It could be a translation dataset with a specific tokenization,
# or an image dataset with a specific resolution.
#
# There are many problems available in Tensor2Tensor
problems.available()

In [0]:
# Fetch the MNIST problem
mnist_problem = problems.problem("image_mnist")
# The generate_data method of a problem will download data and process it into
# a standard format ready for training and evaluation.
mnist_problem.generate_data(data_dir, tmp_dir)

In [0]:
# Now let's see the training MNIST data as Tensors.
mnist_example = tfe.Iterator(mnist_problem.dataset(Modes.TRAIN, data_dir)).next()
image = mnist_example["inputs"]
label = mnist_example["targets"]

plt.imshow(image.numpy()[:, :, 0].astype(np.float32), cmap=plt.get_cmap('gray'))
print("Label: %d" % label.numpy())

# Translate from English to German with a pre-trained model

In [0]:
# Fetch the problem
ende_problem = problems.problem("forensic_p")

# Copy the vocab file locally so we can encode inputs and decode model outputs
# All vocabs are stored on GCS
vocab_name = "vocab.forensic_p.subwords"
vocab_file = os.path.join(gs_data_dir, vocab_name)
#!gsutil cp {vocab_file} {data_dir}

# Get the encoders from the problem
encoders = ende_problem.feature_encoders(data_dir)

# Setup helper functions for encoding and decoding
def encode(input_str, output_str=None):
  """Input str to features dict, ready for inference"""
  inputs = encoders["inputs"].encode(input_str) + [1]  # add EOS id
  batch_inputs = tf.reshape(inputs, [1, -1, 1])  # Make it 3D.
  return {"inputs": batch_inputs}

def decode(integers):
  """List of ints to str"""
  integers = list(np.squeeze(integers))
  if 1 in integers:
    integers = integers[:integers.index(1)]
  return encoders["inputs"].decode(np.squeeze(integers))

In [0]:
ende_problem.get_or_create_vocab(data_dir,tmp_dir)

In [0]:
# # Generate and view the data
# # This cell is commented out because WMT data generation can take hours

ende_problem.generate_data(data_dir, tmp_dir)
example = tfe.Iterator(ende_problem.dataset(Modes.TRAIN, data_dir)).next()
inputs = [int(x) for x in example["inputs"].numpy()] # Cast to ints.
targets = [int(x) for x in example["targets"].numpy()] # Cast to ints.



# # Example inputs as int-tensor.
# print("Inputs, encoded:")
# print(inputs)
# print("Inputs, decoded:")
# # Example inputs as a sentence.
# print(decode(inputs))
# # Example targets as int-tensor.
# print("Targets, encoded:")
# print(targets)
# # Example targets as a sentence.
# print("Targets, decoded:")
# print(decode(targets))

In [0]:
# There are many models available in Tensor2Tensor
registry.list_models()

In [0]:
# Create hparams and the model
model_name = "transformer"
hparams_set = "transformer_base"

hparams = trainer_lib.create_hparams(hparams_set, data_dir=data_dir, problem_name="translate_ende_wmt32k")

# NOTE: Only create the model once when restoring from a checkpoint; it's a
# Layer and so subsequent instantiations will have different variable scopes
# that will not match the checkpoint.
translate_model = registry.model(model_name)(hparams, Modes.EVAL)

In [0]:
# Copy the pretrained checkpoint locally
ckpt_name = "transformer_ende_test"
gs_ckpt = os.path.join(gs_ckpt_dir, ckpt_name)
!gsutil -q cp -R {gs_ckpt} {checkpoint_dir}
ckpt_path = tf.train.latest_checkpoint(os.path.join(checkpoint_dir, ckpt_name))
ckpt_path

In [0]:
# Restore and translate!
def translate(inputs):
  encoded_inputs = encode(inputs)
  with tfe.restore_variables_on_create(ckpt_path):
    model_output = translate_model.infer(encoded_inputs)["outputs"]
  return decode(model_output)

inputs = "The animal didn't cross the street because it was too tired"
outputs = translate(inputs)

print("Inputs: %s" % inputs)
print("Outputs: %s" % outputs)

## Attention Viz Utils

In [0]:
from tensor2tensor.visualization import attention
from tensor2tensor.data_generators import text_encoder

SIZE = 35

def encode_eval(input_str, output_str):
  inputs = tf.reshape(encoders["inputs"].encode(input_str) + [1], [1, -1, 1, 1])  # Make it 3D.
  outputs = tf.reshape(encoders["inputs"].encode(output_str) + [1], [1, -1, 1, 1])  # Make it 3D.
  return {"inputs": inputs, "targets": outputs}

def get_att_mats():
  enc_atts = []
  dec_atts = []
  encdec_atts = []

  for i in range(hparams.num_hidden_layers):
    enc_att = translate_model.attention_weights[
      "transformer/body/encoder/layer_%i/self_attention/multihead_attention/dot_product_attention" % i][0]
    dec_att = translate_model.attention_weights[
      "transformer/body/decoder/layer_%i/self_attention/multihead_attention/dot_product_attention" % i][0]
    encdec_att = translate_model.attention_weights[
      "transformer/body/decoder/layer_%i/encdec_attention/multihead_attention/dot_product_attention" % i][0]
    enc_atts.append(resize(enc_att))
    dec_atts.append(resize(dec_att))
    encdec_atts.append(resize(encdec_att))
  return enc_atts, dec_atts, encdec_atts

def resize(np_mat):
  # Sum across heads
  np_mat = np_mat[:, :SIZE, :SIZE]
  row_sums = np.sum(np_mat, axis=0)
  # Normalize
  layer_mat = np_mat / row_sums[np.newaxis, :]
  lsh = layer_mat.shape
  # Add extra dim for viz code to work.
  layer_mat = np.reshape(layer_mat, (1, lsh[0], lsh[1], lsh[2]))
  return layer_mat

def to_tokens(ids):
  ids = np.squeeze(ids)
  subtokenizer = hparams.problem_hparams.vocabulary['targets']
  tokens = []
  for _id in ids:
    if _id == 0:
      tokens.append('<PAD>')
    elif _id == 1:
      tokens.append('<EOS>')
    elif _id == -1:
      tokens.append('<NULL>')
    else:
        tokens.append(subtokenizer._subtoken_id_to_subtoken_string(_id))
  return tokens

In [0]:
def call_html():
  import IPython
  display(IPython.core.display.HTML('''
        <script src="/static/components/requirejs/require.js"></script>
        <script>
          requirejs.config({
            paths: {
              base: '/static/base',
              "d3": "https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.8/d3.min",
              jquery: '//ajax.googleapis.com/ajax/libs/jquery/2.0.0/jquery.min',
            },
          });
        </script>
        '''))

## Display Attention

In [0]:
# Convert inputs and outputs to subwords
inp_text = to_tokens(encoders["inputs"].encode(inputs))
out_text = to_tokens(encoders["inputs"].encode(outputs))

# Run eval to collect attention weights
example = encode_eval(inputs, outputs)
with tfe.restore_variables_on_create(tf.train.latest_checkpoint(checkpoint_dir)):
  translate_model.set_mode(Modes.EVAL)
  translate_model(example)
# Get normalized attention weights for each layer
enc_atts, dec_atts, encdec_atts = get_att_mats()

call_html()
attention.show(inp_text, out_text, enc_atts, dec_atts, encdec_atts)

# Train a custom model on MNIST

In [0]:
# Create your own model

class MySimpleModel(t2t_model.T2TModel):

  def body(self, features):
    inputs = features["inputs"]
    filters = self.hparams.hidden_size
    h1 = tf.layers.conv2d(inputs, filters,
                          kernel_size=(5, 5), strides=(2, 2))
    h2 = tf.layers.conv2d(tf.nn.relu(h1), filters,
                          kernel_size=(5, 5), strides=(2, 2))
    return tf.layers.conv2d(tf.nn.relu(h2), filters,
                            kernel_size=(3, 3))

hparams = trainer_lib.create_hparams("basic_1", data_dir=data_dir, problem_name="image_mnist")
hparams.hidden_size = 64
model = MySimpleModel(hparams, Modes.TRAIN)

In [0]:
# Prepare for the training loop

# In Eager mode, opt.minimize must be passed a loss function wrapped with
# implicit_value_and_gradients
@tfe.implicit_value_and_gradients
def loss_fn(features):
  _, losses = model(features)
  return losses["training"]

# Setup the training data
BATCH_SIZE = 128
mnist_train_dataset = mnist_problem.dataset(Modes.TRAIN, data_dir)
mnist_train_dataset = mnist_train_dataset.repeat(None).batch(BATCH_SIZE)

optimizer = tf.train.AdamOptimizer()

In [0]:
# Train
NUM_STEPS = 500

for count, example in enumerate(tfe.Iterator(mnist_train_dataset)):
  example["targets"] = tf.reshape(example["targets"], [BATCH_SIZE, 1, 1, 1])  # Make it 4D.
  loss, gv = loss_fn(example)
  optimizer.apply_gradients(gv)

  if count % 50 == 0:
    print("Step: %d, Loss: %.3f" % (count, loss.numpy()))
  if count >= NUM_STEPS:
    break

In [0]:
model.set_mode(Modes.EVAL)
mnist_eval_dataset = mnist_problem.dataset(Modes.EVAL, data_dir)

# Create eval metric accumulators for accuracy (ACC) and accuracy in
# top 5 (ACC_TOP5)
metrics_accum, metrics_result = metrics.create_eager_metrics(
    [metrics.Metrics.ACC, metrics.Metrics.ACC_TOP5])

for count, example in enumerate(tfe.Iterator(mnist_eval_dataset)):
  if count >= 200:
    break

  # Make the inputs and targets 4D
  example["inputs"] = tf.reshape(example["inputs"], [1, 28, 28, 1])
  example["targets"] = tf.reshape(example["targets"], [1, 1, 1, 1])

  # Call the model
  predictions, _ = model(example)

  # Compute and accumulate metrics
  metrics_accum(predictions, example["targets"])

# Print out the averaged metric values on the eval data
for name, val in metrics_result().items():
  print("%s: %.2f" % (name, val))