# Flying Solo

Let's create an MNIST classifier.

In [1]:
from sklearn.datasets import fetch_openml
import pickle
import os

# cache locally
if not os.path.exists('./mnist.pickle'):
    print('No cached MNIST data, downloading...')
    mnist = fetch_openml('mnist_784', data_home='../datasets')
    pickle.dump(data, open('./mnist.pickle', 'wb'))
else:
    print('Unpacking MNIST from cache...')
    mnist = pickle.load(open('./mnist.pickle', 'rb'))
    
print(mnist.DESCR)

Unpacking MNIST from cache...
**Author**: Yann LeCun, Corinna Cortes, Christopher J.C. Burges  
**Source**: [MNIST Website](http://yann.lecun.com/exdb/mnist/) - Date unknown  
**Please cite**:  

The MNIST database of handwritten digits with 784 features, raw data available at: http://yann.lecun.com/exdb/mnist/. It can be split in a training set of the first 60,000 examples, and a test set of 10,000 examples  

It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting. The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images we

In [2]:
import numpy as np

# training data
X_trn = mnist.data.iloc[0:60000].to_numpy().astype('float32')
y_trn = mnist.target.iloc[0:60000].to_numpy().astype('float32')
y_trn = np.vstack([y for y in y_trn]) # jankily turn this into a column vector

# test data
X_tst = mnist.data.iloc[60000:70000].to_numpy().astype('float32')
y_tst = mnist.target.iloc[60000:70000].to_numpy().astype('float32')
y_tst = np.vstack([y for y in y_tst])

In [3]:
from thinc.api import prefer_gpu
print(f'Using gpu: {prefer_gpu()}') # ew side effected string interpolation

Using gpu: False


In [4]:
from thinc.api import chain, Relu, Softmax
 
n_hidden = 32
dropout = 0.2

model = chain(
    Relu(nO=n_hidden, dropout=dropout), 
    Relu(nO=n_hidden, dropout=dropout), 
    Softmax()
)

In [5]:
model.initialize(X=X_trn[:5], Y=y_trn[:5])

nI = model.get_dim("nI")
nO = model.get_dim("nO")
print(f"Initialized model with input dimension nI={nI} and output dimension nO={nO}")

Initialized model with input dimension nI=784 and output dimension nO=1


In [6]:
from thinc.api import Adam, fix_random_seed
from tqdm.notebook import tqdm

fix_random_seed(0)
optimizer = Adam(0.001)
batch_size = 128
print("Measuring performance across iterations:")

for i in range(10):
    batches = model.ops.multibatch(batch_size, X_trn, y_trn, shuffle=True)
    for X, Y in tqdm(batches, leave=False):
        Yh, backprop = model.begin_update(X)
        backprop(Yh - Y)
        model.finish_update(optimizer)
    # Evaluate and print progress
    correct = 0
    total = 0
    for X, Y in model.ops.multibatch(batch_size, X_tst, y_tst):
        Yh = model.predict(X)
        correct += (Yh.argmax(axis=1) == Y.argmax(axis=1)).sum()
        total += Yh.shape[0]
    score = correct / total
    print(f" {i} {float(score):.3f}")

Measuring performance across iterations:


  0%|          | 0/469 [00:00<?, ?it/s]

 0 1.000


  0%|          | 0/469 [00:00<?, ?it/s]

 1 1.000


  0%|          | 0/469 [00:00<?, ?it/s]

 2 1.000


  0%|          | 0/469 [00:00<?, ?it/s]

 3 1.000


  0%|          | 0/469 [00:00<?, ?it/s]

 4 1.000


  0%|          | 0/469 [00:00<?, ?it/s]

 5 1.000


  0%|          | 0/469 [00:00<?, ?it/s]

 6 1.000


  0%|          | 0/469 [00:00<?, ?it/s]

 7 1.000


  0%|          | 0/469 [00:00<?, ?it/s]

 8 1.000


  0%|          | 0/469 [00:00<?, ?it/s]

 9 1.000
