# Autograph: Intuitive Data-Driven Control At Last

Continuing our blogs, we shift our focus to yet another new feature in TF, the `autograph` functionality.

Previously, and as far as intuitive expression of code was concerned, "graph ops" efficiently solved complex calculations while failed at simple, sequential control.

By generating on-demand Python code now, `autograph` transparently patches all the necessary graph ops together and packages the result into a "python op".

While the generated new ops are potentially faster than the code before them, in this blog we are more interested in the new expressive powers of the `autograph` package.

Specifically, we look at what becomes possible when decorating our functions with the new `tf.function` decorator, as doing this would by default invoke the `autograph` functionality.

Our objective is to ultimately arrive at a model as represented by the [graph](./autograph.pdf).

Just as before, we need to prep our environment to run any meaningful code:

In [1]:
import numpy as np
import tensorflow as tf
import dataset as qd
import custom as qc
import layers as ql
ks = tf.keras
kl = ks.layers

Next, we borrow the `pos_timing` function from our previous blogs, and override it to return a constant "timing signal" tensor, depending on the `width` and `depth` arguments.

As our first task is to implement a "python branch" in our new `Embed` op, we will be using two different "timing" tensors, one for the `encode` input and the other for the `decode` input.

In [2]:
def pos_timing(width, depth):
    t = ql.pos_timing(width, depth)
    t = tf.constant(t, dtype=tf.float32)
    return t

The `Embed` layer will thus create the two constant tensors to be sourced in the subsequent `call` methods.

Our model will call the shared `Embed` instance for both of our stacks. As we have decorated its `call` method with `tf.function`, we can use familiar and intuitive Python comparisons to branch on the value of tensors on-the-fly, during graph execution.

Clearly, our two stacks, while having the same `depth`s, have different `width`s. Also the constant "timing" tensors have different `width`s as well.

Yet we are still able to pick-and-match the otherwise incompatible tensors and successfully add them together, all depending on the actual `width` of our "current" input tensor:

In [3]:
class Embed(qc.Embed):
    def __init__(self, ps):
        super().__init__(ps)
        self.enc_p = pos_timing(ps.width_enc, ps.dim_hidden)
        self.dec_p = pos_timing(ps.width_dec, ps.dim_hidden)

    @tf.function(input_signature=[[
        tf.TensorSpec(shape=[None, None], dtype=tf.int32),
        tf.TensorSpec(shape=[None], dtype=tf.int32)
    ]])
    def call(self, x):
        y, lens = x
        y = tf.nn.embedding_lookup(self.emb, y)
        s = tf.shape(y)
        if s[-2] == self.ps.width_enc:
            y += tf.broadcast_to(self.enc_p, s)
        elif s[-2] == self.ps.width_dec:
            y += tf.broadcast_to(self.dec_p, s)
        else:
            pass
        y *= tf.cast(s[-1], tf.float32)**0.5
        return [y, lens]

Next we demonstrate how on-the-fly "python ops" can also provide insights into inner processes and data flows.

We borrow our `Frames` layer from the previous blog and override its `call` method with a `tf.function` decorated new version that, besides calling `super().call()`, also calls a new `print_row` Python function on every row in our batch.

Yes, we are calling a Python function and printing its results in a TF graph op while never leaving our intuitive and familiar Python environment! Isn't that great?

The `print_row` function itself is simple, it iterates through the tokens of the "row", it does a lookup of each in our `vocab` "table" for the actual character representing the token and then it "joins" all the characters and prints out the resulting string.

And, if we scroll down to the listing of our training session, we can actually see the "sliding context" of our samples as they fly by during our training.

Needless to say, the listing confirms that our `Frames` layer does a good job concatenating the varied length sample inputs, the target results, as well as the necessary separators.

As a result, a simple Python function, usable during graph ops, provides us invaluable insights deep into our inner processes and data flow.

In [4]:
class Frames(qc.Frames):
    @tf.function
    def call(self, x):
        y = super().call.python_function(x)
        tf.print()

        def print_row(r):
            tf.print(
                tf.numpy_function(
                    lambda ts: ''.join([qd.vocab[t] for t in ts]),
                    [r],
                    Tout=[tf.string],
                ))
            return r

        tf.map_fn(print_row, self.prev)
        return y

Our next new layer is the partial `Deduce` layer, showcasing how control is intuitive at last from data-driven branching to searching.

This layer will be used in the next group of blogs as a replacement for our previous `Debed` layer. It contains a tensor-dependent `for` loop to iteratively replace our masked characters with "deduced" ones.

The future `Probe` layer, building on the `Deduce` scheme, implements an approximation of "Beam Search", see [paper](https://arxiv.org/pdf/1702.01806.pdf).

It effectively iterates through the hidden dimensions of the output, and based on parallel `topk` searches, comparing various choices for "debeding" the output, it settles on an "optimal" debedding and thus final token output for our `decoder`.

Without `autograph` the data-driven looping/branching graph ops would have to be expressed in a much more convoluted manner:

In [5]:
"""
class Deduce(Layer):
    @tf.function
    def call(self, x):
        toks, *x = x
        if self.cfg.runtime.print_toks:
            qu.print_toks(toks, qd.vocab)
        y = self.deduce([toks] + x)
        n = tf.shape(y)[1]
        p = tf.shape(toks)[1] - n
        for i in tf.range(n):
            t = toks[:, :n]
            m = tf.equal(t, qd.MSK)
            if tf.equal(tf.reduce_any(m), True):
                t = self.update(t, m, y)
                if self.cfg.runtime.print_toks:
                    qu.print_toks(t, qd.vocab)
                toks = tf.pad(t, [[0, 0], [0, p]])
                y = self.deduce([toks] + x)
            else:
                e = tf.equal(t, qd.EOS)
                e = tf.math.count_nonzero(e, axis=1)
                if tf.equal(tf.reduce_any(tf.not_equal(e, 1)), False):
                    break
        return y
"""
class Probe(ql.Layer):
    def __init__(self, ps):
        super().__init__(ps)
        self.dbd = qc.Dense(self, 'dbd', [ps.dim_hidden, ps.dim_vocab])

    @tf.function
    def call(self, x):
        y, lens = x
        s = tf.shape(y)
        y = tf.reshape(y, [s[0] * s[1], -1])
        y = self.dbd(y)
        y = tf.reshape(y, [s[0], s[1], -1])
        y = y[:, :tf.math.reduce_max(lens), :]
        return y

Our model needs to be updated as well to use the newly defined components.

Other than that, we are ready to start training:

In [6]:
def model_for(ps):
    x = [ks.Input(shape=(), dtype='int32'), ks.Input(shape=(), dtype='int64')]
    x += [ks.Input(shape=(), dtype='int32'), ks.Input(shape=(), dtype='int64')]
    x += [ks.Input(shape=(), dtype='int32'), ks.Input(shape=(), dtype='int64')]
    y = qc.ToRagged()(x)
    y = Frames(ps)(y)
    embed = Embed(ps)
    ye = qc.Encode(ps)(embed(y[:2]))
    yd = qc.Decode(ps)(embed(y[2:]) + [ye[0]])
    y = Probe(ps)(yd)
    m = ks.Model(inputs=x, outputs=y)
    m.compile(optimizer=ps.optimizer, loss=ps.loss, metrics=[ps.metric])
    print(m.summary())
    return m

By firing up our training session, we can confirm the model's layers and connections. The listing of a short session follows.

We can easily adjust the parameters to tailor the length of the sessions to our objectives.

In [10]:
ps = qd.Params(**qc.params)
ps.num_epochs = 1
import masking as qm
qm.main_graph(ps, qc.dset_for(ps).take(10), model_for(ps))

Model: "model_3"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_19 (InputLayer)           [(None,)]            0                                            
__________________________________________________________________________________________________
input_20 (InputLayer)           [(None,)]            0                                            
__________________________________________________________________________________________________
input_21 (InputLayer)           [(None,)]            0                                            
__________________________________________________________________________________________________
input_22 (InputLayer)           [(None,)]            0                                            
____________________________________________________________________________________________

With our TensorBoard `callback` in place, the model's `fit` method will generate the standard summaries that TB can conveniently visualize.

If you haven't run the code below, an already generated graph is [here](./autograph.pdf).

In [1]:
#%load_ext tensorboard
#%tensorboard --logdir /tmp/q/logs

This concludes our blog, please see how to use customize the losses and metrics driving the training by clicking on the next blog.