-
-
Notifications
You must be signed in to change notification settings - Fork 95
Closed
Labels
Description
Currently it is very cumbersome to model a neural network as we have to initialized weight and bias manually. For the MNIST example we want to pass from that:
Old API
# Config (API is not finished)
let
# We randomly initialize all weights and bias between [-0.5, 0.5]
# In the future requires_grad will be automatically set for neural network layers
cv1_w = ctx.variable(
randomTensor(20, 1, 5, 5, 1'f32) .- 0.5'f32, # Weight of 1st convolution
requires_grad = true
)
cv1_b = ctx.variable(
randomTensor(20, 1, 1, 1'f32) .- 0.5'f32, # Bias of 1st convolution
requires_grad = true
)
cv2_w = ctx.variable(
randomTensor(50, 20, 5, 5, 1'f32) .- 0.5'f32, # Weight of 2nd convolution
requires_grad = true
)
cv2_b = ctx.variable(
randomTensor(50, 1, 1, 1'f32) .- 0.5'f32, # Bias of 2nd convolution
requires_grad = true
)
fc3 = ctx.variable(
randomTensor(500, 800, 1'f32) .- 0.5'f32, # Fully connected: 800 in, 500 ou
requires_grad = true
)
classifier = ctx.variable(
randomTensor(10, 500, 1'f32) .- 0.5'f32, # Fully connected: 500 in, 10 classes out
requires_grad = true
)
proc model[TT](x: Variable[TT]): Variable[TT] =
# The formula of the output size of convolutions and maxpools is:
# H_out = (H_in + (2*padding.height) - kernel.height) / stride.height + 1
# W_out = (W_in + (2*padding.width) - kernel.width) / stride.width + 1
let cv1 = x.conv2d(cv1_w, cv1_b).relu() # Conv1: [N, 1, 28, 28] --> [N, 20, 24, 24] (kernel: 5, padding: 0, strides: 1)
let mp1 = cv1.maxpool2D((2,2), (0,0), (2,2)) # Maxpool1: [N, 20, 24, 24] --> [N, 20, 12, 12] (kernel: 2, padding: 0, strides: 2)
let cv2 = mp1.conv2d(cv2_w, cv2_b).relu() # Conv2: [N, 20, 12, 12] --> [N, 50, 8, 8] (kernel: 5, padding: 0, strides: 1)
let mp2 = cv2.maxpool2D((2,2), (0,0), (2,2)) # Maxpool1: [N, 50, 8, 8] --> [N, 50, 4, 4] (kernel: 2, padding: 0, strides: 2)
let f = mp2.flatten # [N, 50, 4, 4] -> [N, 800]
let hidden = f.linear(fc3).relu # [N, 800] -> [N, 500]
result = hidden.linear(classifier) # [N, 500] -> [N, 10]
# Stochastic Gradient Descent (API will change)
let optim = newSGD[float32](
cv1_w, cv1_b, cv2_w, cv2_b, fc3, classifier, 0.01f # 0.01 is the learning rate
)to something like that:
Potential new API
# Config
# Assuming MNIST input [batch_size, color, height, width]
# For N-sized batches: [N, 1, 28, 28]
network model_name:
context: ctx
input_shape: [1, 28, 28] # Real shape [N, 1, 28, 28]
layers:
cv1: Conv2D(20, 5, 5) # Output shape [N, 20, 24, 24] (kernel 5x5, padding 0, stride 1)
mp1: MaxPool2D((2,2), (0,0), (2,2)) # Output shape [N, 20, 12, 12] (kernel 2X2, padding 0, stride 2)
cv2: Conv2D(50, 5, 5) # Output shape [N, 50, 8, 8] (kernel 5x5, padding 0, stride 1)
mp2: MaxPool2D((2,2), (0,0), (2,2)) # Output shape [N, 50, 4, 4] (kernel 2X2, padding 0, stride 2)
# Flatten [N, 800]
hidden: Linear(500) # Output shape [N, 500]
classifier: Linear(10) # Output shape [N, 10]
forward x:
x.cv1.relu.mp1.cv2.relu.mp2.flatten.hidden.relu.classifier
# Stochastic Gradient Descent
let optim = newSGD[float32](model_name, 0.01f)Importantly, similar to Keras only the output shape is defined within the layers: Linear(500) instead of [500, 800], this greatly ease the usage as it is inferred depending on the sequence of layers.
Implementation
Concretely that means that the network macro should produce something similar to the following:
type ModelName = object
ctx: Context
cv1_weight: Variable[subtype(ctx)]
cv1_bias: Variable[subtype(ctx)]
mp1: tuple[kernel, padding, stride: Size2D]
cv2_weight: Variable[subtype(ctx)]
cv2_bias: Variable[subtype(ctx)]
mp2: tuple[kernel, padding, stride: Size2D]
hidden_weight: Variable[subtype(ctx)]
classifier_weight: Variable[subtype(ctx)]
proc newModelName(): ModelName =
result.cv1_weight = ctx.variable(
randomTensor(20, InputSize_1, 5, 5, 1'f32) .- 0.5'f32,
requires_grad = true
)
result.cv1_bias = ctx.variable(
randomTensor(20, 1, 1, 1'f32) .- 0.5'f32,
requires_grad = true
)
result.cv2_weight = ctx.variable(
randomTensor(50, InputSize_2, 5, 5, 1'f32) .- 0.5'f32,
requires_grad = true
)
result.cv2_bias = ctx.variable(
randomTensor(50, 1, 1, 1'f32) .- 0.5'f32,
requires_grad = true
)
...
# Note that we must automatically determine the input_size1 and input_size2
# Depending on "input_shape" and the previous layers
proc forward(self: ModelName, x: Variable[T]): Variable[T] =
template cv1(x: Variable[T]): Variable[T]:
x.conv2d(self.cv1_weight, self.cv1_bias)
template cv2(x: Variable[T]): Variable[T]:
x.conv2d(self.cv2_weight, self.cv2_bias)
...
x.cv1.relu.mp1.cv2.relu.mp2.flatten.hidden.relu.classifier
Things to keep in mind:
- The model must be serializable so we can save a trained model to disk
- It should be possible to mix stateless functions like
relu,sigmoidandflattenin the forward proc. - The forward proc should allow a variadic number of inputs. For example to combine video and audio
- We can have temporary values in the forward proc
References:
- Neural-Network in Nim syntax: https://github.com/t8m8/Neural-Network-in-Nim/blob/4bd06420fd7c57afb665c6b4e5148527eda6a410/examples/xor.nim#L14-L21 (includes input and output shape)
- Keras syntax (Sequential and Functional): https://keras.io/getting-started/functional-api-guide/ (only output shape, input is inferred)
- PyTorch syntax: https://github.com/pytorch/examples/blob/dcdabc22b305d2f2989c6f03570dfcd3919e8a5b/mnist/main.py#L52-L68 (input and output shape)
- Tensorflow eager syntax: https://www.tensorflow.org/get_started/eager (infers input shape when going through Keras)
- Grenade (Haskell): https://github.com/HuwCampbell/grenade
type MNIST
= Network
'[ Convolution 1 10 5 5 1 1, Pooling 2 2 2 2, Relu
, Convolution 10 16 5 5 1 1, Pooling 2 2 2 2, Reshape, Relu
, FullyConnected 256 80, Logit, FullyConnected 80 10, Logit]
'[ 'D2 28 28, 'D3 24 24 10, 'D3 12 12 10, 'D3 12 12 10
, 'D3 8 8 16, 'D3 4 4 16, 'D1 256, 'D1 256
, 'D1 80, 'D1 80, 'D1 10, 'D1 10]
randomMnist :: MonadRandom m => m MNIST
randomMnist = randomNetworkFuture
The training loop and validation can be also syntactic sugar-ed with fit and predict
mxbi, Clyybber and manguluka