# Intro to Thinc's `Model` class, model definition and methods

TODO: intro

In [25]:
from thinc.api import Model, chain, clone, concatenate, Linear, ReLu, Softmax, Dropout, with_list, glorot_uniform_init, zero_init

n_hidden = 10
depth = 4

with Model.define_operators({">>": chain, "**": clone}):
    model = (
        (Linear(n_hidden, init_W=glorot_uniform_init) >> ReLu()) ** depth
        >> Linear(n_hidden, init_W=zero_init)
        >> Softmax()
    )

**Building up to that, step by step.** Thinc provides a variety of [layers](#), functions that create `Model` instances. (Thinc tries to avoid inheritance, preferring function composition.) The `Linear` function gives you a model that computes `Y = X @ W.T + b` (the function is defined in `thinc.layers.linear.forward`).

In [5]:
import numpy

n_in = numpy.zeros((128, 16), dtype="f")
n_out = numpy.zeros((128, 10), dtype="f")

model = Linear(n_in.ndim, n_out.ndim, init_W=zero_init)

Models support **dimension inference from data**. You can defer some or all of the dimensions.

In [6]:
model = Linear(init_W=zero_init)
assert model.has_dim("nO") is None
assert model.has_dim("nI") is None
X = numpy.zeros((128, 16), dtype="f")
Y = numpy.zeros((128, 10), dtype="f")
model.initialize(X=X, Y=Y)
assert model.get_dim("nI") == 16
assert model.get_dim("nO") == 10

The `chain` function wires two model instances together, with a feed-forward relationship. Dimension inference is especially helpful here.

In [7]:
n_hidden = 128
X = numpy.zeros((128, 16), dtype="f")
Y = numpy.zeros((128, 10), dtype="f")

model = chain(Linear(n_hidden, init_W=glorot_uniform_init), Linear(init_W=zero_init),)
model.initialize(X=X, Y=Y)
assert model.get_dim("nI") == 16
assert model.get_dim("nO") == 10
assert model.layers[0].get_dim("nO") == n_hidden

We call functions like `chain` **combinators**. Combinators take one or more models as arguments, and return another model instance, without introducing any new weight parameters. Another useful combinator is `concatenate`:

In [12]:
model = concatenate(Linear(n_hidden), Linear(n_hidden))
model.initialize(X=X)
assert model.get_dim("nI") == X.shape[1]
assert model.get_dim("nO") == n_hidden * 2

The `concatenate` function produces a layer that **runs the child layers separately**, and then **concatenates their outputs together**. This is often useful for combining features from different sources. For instance, we use this all the time to build [spaCy](https://spacy.io)'s embedding layers.

Some combinators work on a layer and a numeric argument. For instance, the `clone` combinator creates a number of copies of a layer, and chains them together into a deep feed-forward network. The shape inference is especially handy here: we want the first and last layers to have different shapes, so we can avoid providing any dimensions into the layer we clone. We then just have to specify the first layer's output size, and we can let the rest of the dimensions be inferred from the data.

In [13]:
model = clone(Linear(), 5)
model.layers[0].set_dim("nO", n_hidden)
model.initialize(X=X, Y=Y)

ValueError: Cannot get dimension 'nO' for model 'linear': value unset

We can apply 'clone' to model instances that have child layers, making it easy to define more complex architectures. For instance, we often want to attach an activation function and dropout to a linear layer, and then repeat that substructure a number of times. Of course, you can make whatever intermediate functions you find helpful.

In [16]:
def Hidden(dropout=0.2):
    return chain(Linear(), ReLu(), Dropout(dropout))

model = clone(Hidden(0.2), 5)

Some combinators are unary functions: they take only one model. These are usually **input and output transformations**. For instance, the `with_list` combinator produces a model that flattens lists of arrays into a single array, and then calls the child layer to get the flattened output. It then reverses the transformation on the output.

In [20]:
model = with_list(Linear(4, 2))
Xs = [model.ops.alloc_f2d(10, 2, dtype="f")]
Ys = model.predict(Xs)
assert Ys[0].shape == (10, 4)

TypeError: a bytes-like object is required, not 'list'

The combinator system makes it easy to wire together complex models very concisely. A concise notation is a huge advantage, because it lets you read and review your model with less clutter – making it easy to spot mistakes, and easy to make changes. For the ultimate in concise notation, you can also take advantage of Thinc's **operator overloading**, which lets you use an infix notation. Operator overloading can lead to unexpected results, so you have to enable the overloading explicitly **in a contextmanager**. This also lets you control how the operators are bound, making it easy to use the feature with your own combinators. For instance, here is a definition for a text classification network:

In [27]:
from thinc.api import Model, chain, concatenate, add, clone, with_list, Embed, Maxout, LayerNorm, Dropout, reduce_max, reduce_mean, residual, ReLu, Softmax

nH = 5

with Model.define_operators({">>": chain, "|": concatenate, "+": add, "**": clone}):
    model = (
        with_list(
            (Embed(128, column=0) + Embed(64, column=1))
            >> Maxout(nH)
            >> LayerNorm()
            >> Dropout(0.2)
        )
        >> (reduce_max() | reduce_mean())
        >> residual(ReLu() >> Dropout(0.2)) ** 2
        >> Softmax()
    )

The network above will expect a list of arrays as input, where each array should have two columns with different numeric identifier features. The two features will be embedded using separate embedding tables, and the two vectors added and passed through a `Maxout` layer with layer normalization and dropout. The sequences then pass through two pooling functions, and the concatenated results are passed through 2 `ReLu` layers with dropout and residual connections. Finally, the sequence vectors are passed through an output layer, which has a `Softmax` activation.

---

## Using a model

TODO: intro

In [28]:
from thinc.api import Linear, Adam
import numpy

X = numpy.zeros((128, 10), dtype="f")
dY = numpy.zeros((128, 10), dtype="f")

model = Linear(10, 10)

Run the model over some data

In [29]:
Y = model.predict(X)
Y

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)

Get a callback to backpropagate:

In [30]:
Y, backprop = model.begin_update(X)
Y, backprop

(array([[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]], dtype=float32),
 <function thinc.layers.linear.forward.<locals>.backprop(dY: thinc.types.Array2d) -> thinc.types.Array2d>)

Run the callback to calculate the gradient with respect to the inputs. If the model has trainable parameters, gradients for the parameters are accumulated internally, as a side-effect.

In [31]:
dX = backprop(dY)
dX

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)

The `backprop()` callback only increments the parameter gradients, it doesn't actually change the weights. To increment the weights, call `model.finish_update()`, passing it an optimizer:

In [32]:
optimizer = Adam()
model.finish_update(optimizer)

You can get and set dimensions, parameters and attributes by name:

In [33]:
dim = model.get_dim("nO")
W = model.get_param("W")
model.set_attr("hello", "world")
assert model.get_attr("hello") == "world"

You can also retrieve parameter gradients, and increment them explicitly:

In [34]:
dW = model.get_grad("W")
model.inc_grad("W", dW * 0.1)

Finally, you can serialize models using the `model.to_bytes` and `model.to_disk` methods, and load them back with `from_bytes` and `from_disk`.

In [35]:
model_bytes = model.to_bytes()