If you've written a layer or model definition you're happy with, you can use Thinc's function registry to register it and assign it a string name. Your function can take any arguments that can later be defined in the config. Adding **type hints** ensures that config settings will be **parsed and validated** before they're passed into the function, so you don't end up with incompatible settings and confusing failures later on. Here's the MNIST model, defined as a custom layer:

In [1]:
import thinc

@thinc.registry.layers("MNIST.v1")
def create_mnist(nO: int, dropout: float):
    return chain(
        Relu(nO, dropout=dropout), 
        Relu(nO, dropout=dropout), 
        Softmax()
    )

In the config, we can now refer to it by name and set its arguments. This makes the config maintainable and compact, while still allowing you to change and record the hyperparameters.

In [4]:
from thinc.api import Config, registry

CONFIG1 = """
[model]
@layers = "MNIST.v1"
nO = 32
dropout = 0.2

[optimizer]
@optimizers = "Adam.v1"
learn_rate = 0.001

[training]
n_iter = 10
batch_size = 128
"""

config = Config().from_str(CONFIG1)
config

{'model': {'@layers': 'MNIST.v1', 'nO': 32, 'dropout': 0.2},
 'optimizer': {'@optimizers': 'Adam.v1', 'learn_rate': 0.001},
 'training': {'n_iter': 10, 'batch_size': 128}}

You can also wrap the dataset in a registry function.
Before make sure you have all the dependencies settled. For this example, install the `ml_datasets` package and import all the objects used in the configuration.


In [10]:
!pip install ml_datasets 



In [11]:
@thinc.registry.datasets("mnist_data.v1")
def mnist():
    return ml_datasets.mnist()

In [12]:
from thinc.api import chain, Relu, Softmax
import ml_datasets


CONFIG2 = """
[model]
@layers = "MNIST.v1"
nO = 32
dropout = 0.2

[optimizer]
@optimizers = "Adam.v1"
learn_rate = 0.001

[training]
n_iter = 10
batch_size = 128

[training.data]
@datasets = "mnist_data.v1"
"""

config = Config().from_str(CONFIG2)
loaded_config = registry.make_from_config(config)
loaded_config

{'model': <thinc.model.Model at 0x12df96f28>,
 'optimizer': <thinc.optimizers.Optimizer at 0x12df9c780>,
 'training': {'n_iter': 10,
  'batch_size': 128,
  'data': ((array([[0., 0., 0., ..., 0., 0., 0.],
           [0., 0., 0., ..., 0., 0., 0.],
           [0., 0., 0., ..., 0., 0., 0.],
           ...,
           [0., 0., 0., ..., 0., 0., 0.],
           [0., 0., 0., ..., 0., 0., 0.],
           [0., 0., 0., ..., 0., 0., 0.]], dtype=float32),
    array([[0., 0., 0., ..., 0., 0., 0.],
           [0., 0., 0., ..., 0., 0., 0.],
           [0., 0., 0., ..., 0., 0., 0.],
           ...,
           [0., 0., 0., ..., 1., 0., 0.],
           [0., 0., 1., ..., 0., 0., 0.],
           [0., 0., 0., ..., 0., 0., 1.]], dtype=float32)),
   (array([[0., 0., 0., ..., 0., 0., 0.],
           [0., 0., 0., ..., 0., 0., 0.],
           [0., 0., 0., ..., 0., 0., 0.],
           ...,
           [0., 0., 0., ..., 0., 0., 0.],
           [0., 0., 0., ..., 0., 0., 0.],
           [0., 0., 0., ..., 0., 0., 0.]]

In [17]:
# Now you can use the objects in the registry:

def train_model(data, model, optimizer, n_iter, batch_size):
    print("Pretending to train the model ...")
    print("Done pretending.")

model = loaded_config["model"]
optimizer = loaded_config["optimizer"]
n_iter = loaded_config["training"]["n_iter"]
batch_size = loaded_config["training"]["batch_size"]
data = (train_X, train_Y), (dev_X, dev_Y) = loaded_config["training"]["data"]

# After loading the data from config, they might still need to be moved to the right device
train_X = model.ops.asarray(train_X)
train_Y = model.ops.asarray(train_Y)
dev_X = model.ops.asarray(dev_X)
dev_Y = model.ops.asarray(dev_Y)

model.initialize(X=train_X[:5], Y=train_Y[:5])
train_model(data, model, optimizer, n_iter, batch_size)

Pretending to train the model ...
Done pretending.
