# Configuring the MNIST model

Let's configure a model equivalent to 

```python
n_hidden = 32
dropout = 0.2
model = chain(
    Relu(nO=n_hidden, dropout=dropout), 
    Relu(nO=n_hidden, dropout=dropout), 
    Softmax()
)
``` 

Moreover, let's use Adam optimizer with `learn_rate = 0.001`, 10 iterations over batches of 128 elements.

In [1]:
!pip install "thinc>=8.0.0a0" ml_datasets "tqdm>=4.41"



In [11]:
from thinc.api import prefer_gpu
from thinc.api import Config, registry
import ml_datasets

# Perform operations on GPU if available
prefer_gpu()

False

In [12]:
# Get the dataset
(train_X, train_Y), (test_X, test_Y) = ml_datasets.mnist()
print(f"Training size={len(train_X)}, test size={len(test_X)}")



Training size=54000, test size=10000


In [13]:
CONFIG = """
[hyper_params]
n_hidden = 32
dropout = 0.2
learn_rate = 0.001

[model]
@layers = "chain.v1"

[model.*.relu1]
@layers = "Relu.v1"
nO = ${hyper_params:n_hidden}
dropout = ${hyper_params:dropout}

[model.*.relu2]
@layers = "Relu.v1"
nO = ${hyper_params:n_hidden}
dropout = ${hyper_params:dropout}

[model.*.softmax]
@layers = "Softmax.v1"

[optimizer]
@optimizers = "Adam.v1"
learn_rate = ${hyper_params:learn_rate}

[training]
n_iter = 10
batch_size = 128
"""

config = Config().from_str(CONFIG)
config

{'hyper_params': {'n_hidden': 32, 'dropout': 0.2, 'learn_rate': 0.001},
 'model': {'@layers': 'chain.v1',
  '*': {'relu1': {'@layers': 'Relu.v1', 'nO': 32, 'dropout': 0.2},
   'relu2': {'@layers': 'Relu.v1', 'nO': 32, 'dropout': 0.2},
   'softmax': {'@layers': 'Softmax.v1'}}},
 'optimizer': {'@optimizers': 'Adam.v1', 'learn_rate': 0.001},
 'training': {'n_iter': 10, 'batch_size': 128}}

In [14]:
loaded_config = registry.make_from_config(config)
loaded_config

{'hyper_params': {'n_hidden': 32, 'dropout': 0.2, 'learn_rate': 0.001},
 'model': <thinc.model.Model at 0x14be839d8>,
 'optimizer': <thinc.optimizers.Optimizer at 0x14be9d978>,
 'training': {'n_iter': 10, 'batch_size': 128}}

When you call registry.make_from_config, Thinc will first create the three layers using the specified 
arguments populated by the hyperparameters. It will then pass the return values (the layer objects) to chain. 
It will also create an optimizer. All other values, like the training config, will be passed through as a regular dict.
Your training code can now look like this:

In [15]:
model = loaded_config["model"]
optimizer = loaded_config["optimizer"]
n_iter = loaded_config["training"]["n_iter"]
batch_size = loaded_config["training"]["batch_size"]


# Make sure the data is on the right device
train_X = model.ops.asarray(train_X)
train_Y = model.ops.asarray(train_Y)
dev_X = model.ops.asarray(test_X)
dev_Y = model.ops.asarray(test_Y)

# Initialize the model to infer the missing information
model.initialize(X=train_X[:5], Y=train_Y[:5])

<thinc.model.Model at 0x14be839d8>

The function to train the model is defined in `src/train.py` module, which allows reusability.
It's code is show here for the sake of clarity:

```python
def train_model(data, model, optimizer, n_iter, batch_size):
    (train_X, train_Y), (test_X, test_Y) = data
    indices = model.ops.xp.arange(train_X.shape[0], dtype="i")
    for i in range(n_iter):
        batches = model.ops.multibatch(batch_size, train_X, train_Y, shuffle=True)
        for X, Y in tqdm(batches, leave=False):
            Yh, backprop = model.begin_update(X)
            backprop(Yh - Y)
            model.finish_update(optimizer)
        # Evaluate and print progress
        correct = 0
        total = 0
        for X, Y in model.ops.multibatch(batch_size, test_X, test_Y):
            Yh = model.predict(X)
            correct += (Yh.argmax(axis=1) == Y.argmax(axis=1)).sum()
            total += Yh.shape[0]
        score = correct / total
        print(f" {i} {float(score):.3f}")
```



In [16]:
import sys
import os
from pathlib import PurePath

# add custom python modules root to the path variable,
root_path = PurePath(os.getcwd()).parents[0]
print(root_path)
src_path = str(root_path.joinpath('src'))

if src_path not in sys.path:
    sys.path.insert(0, str(src_path))

print(sys.path)


/Users/jean.metz/workspace/jmetzz/sandbox-thinc.ai
['/Users/jean.metz/workspace/jmetzz/sandbox-thinc.ai/src', '/Users/jean.metz/workspace/jmetzz/sandbox-thinc.ai/notebooks', '/Users/jean.metz/miniconda/envs/thinc.ai/lib/python37.zip', '/Users/jean.metz/miniconda/envs/thinc.ai/lib/python3.7', '/Users/jean.metz/miniconda/envs/thinc.ai/lib/python3.7/lib-dynload', '', '/Users/jean.metz/miniconda/envs/thinc.ai/lib/python3.7/site-packages', '/Users/jean.metz/miniconda/envs/thinc.ai/lib/python3.7/site-packages/IPython/extensions', '/Users/jean.metz/.ipython']


In [18]:
from train import train_model
from tqdm.notebook import tqdm
train_model(((train_X, train_Y), (test_X, test_Y)), model, optimizer, n_iter, batch_size)


 23%|██▎       | 98/422 [00:00<00:00, 972.97it/s] 

0	5688.01	0.929


 45%|████▌     | 192/422 [00:00<00:00, 958.46it/s]

1	5462.72	0.929


 45%|████▌     | 191/422 [00:00<00:00, 947.10it/s]

2	5411.01	0.932


 44%|████▍     | 187/422 [00:00<00:00, 916.07it/s]

3	5195.46	0.932


 45%|████▍     | 188/422 [00:00<00:00, 936.36it/s]

4	5142.77	0.935


 21%|██        | 89/422 [00:00<00:00, 883.71it/s] 

5	4975.33	0.933


 20%|█▉        | 83/422 [00:00<00:00, 827.77it/s] 

6	4884.98	0.934


 22%|██▏       | 93/422 [00:00<00:00, 920.43it/s] 

7	4830.40	0.934


 22%|██▏       | 94/422 [00:00<00:00, 933.17it/s] 

8	4833.06	0.934


                                                  

9	4627.79	0.938


