Skip to content

Create a neural net Domain Specific Language #214

@mratsim

Description

@mratsim

Currently it is very cumbersome to model a neural network as we have to initialized weight and bias manually. For the MNIST example we want to pass from that:

Old API
# Config (API is not finished)
let
  # We randomly initialize all weights and bias between [-0.5, 0.5]
  # In the future requires_grad will be automatically set for neural network layers

  cv1_w = ctx.variable(
    randomTensor(20, 1, 5, 5, 1'f32) .- 0.5'f32,    # Weight of 1st convolution
    requires_grad = true
    )
  cv1_b = ctx.variable(
    randomTensor(20, 1, 1, 1'f32) .- 0.5'f32,       # Bias of 1st convolution
    requires_grad = true
    )

  cv2_w = ctx.variable(
    randomTensor(50, 20, 5, 5, 1'f32) .- 0.5'f32,   # Weight of 2nd convolution
    requires_grad = true
    )

  cv2_b = ctx.variable(
    randomTensor(50, 1, 1, 1'f32) .- 0.5'f32,       # Bias of 2nd convolution
    requires_grad = true
    )

  fc3 = ctx.variable(
    randomTensor(500, 800, 1'f32) .- 0.5'f32,       # Fully connected: 800 in, 500 ou
    requires_grad = true
    )

  classifier = ctx.variable(
    randomTensor(10, 500, 1'f32) .- 0.5'f32,        # Fully connected: 500 in, 10 classes out
    requires_grad = true
    )

proc model[TT](x: Variable[TT]): Variable[TT] =
  # The formula of the output size of convolutions and maxpools is:
  #   H_out = (H_in + (2*padding.height) - kernel.height) / stride.height + 1
  #   W_out = (W_in + (2*padding.width) - kernel.width) / stride.width + 1

  let cv1 = x.conv2d(cv1_w, cv1_b).relu()      # Conv1: [N, 1, 28, 28] --> [N, 20, 24, 24]     (kernel: 5, padding: 0, strides: 1)
  let mp1 = cv1.maxpool2D((2,2), (0,0), (2,2)) # Maxpool1: [N, 20, 24, 24] --> [N, 20, 12, 12] (kernel: 2, padding: 0, strides: 2)
  let cv2 = mp1.conv2d(cv2_w, cv2_b).relu()    # Conv2: [N, 20, 12, 12] --> [N, 50, 8, 8]      (kernel: 5, padding: 0, strides: 1)
  let mp2 = cv2.maxpool2D((2,2), (0,0), (2,2)) # Maxpool1: [N, 50, 8, 8] --> [N, 50, 4, 4]     (kernel: 2, padding: 0, strides: 2)

  let f = mp2.flatten                          # [N, 50, 4, 4] -> [N, 800]
  let hidden = f.linear(fc3).relu              # [N, 800]      -> [N, 500]

  result = hidden.linear(classifier)           # [N, 500]      -> [N, 10]

# Stochastic Gradient Descent (API will change)
let optim = newSGD[float32](
  cv1_w, cv1_b, cv2_w, cv2_b, fc3, classifier, 0.01f # 0.01 is the learning rate
)

to something like that:

Potential new API
# Config
# Assuming MNIST input [batch_size, color, height, width]
# For N-sized batches: [N, 1, 28, 28]

network model_name:
  context: ctx
  input_shape: [1, 28, 28]              # Real shape [N, 1, 28, 28]
  layers:
    cv1: Conv2D(20, 5, 5)               # Output shape [N, 20, 24, 24] (kernel 5x5, padding 0, stride 1)
    mp1: MaxPool2D((2,2), (0,0), (2,2)) # Output shape [N, 20, 12, 12] (kernel 2X2, padding 0, stride 2)
    cv2: Conv2D(50, 5, 5)               # Output shape [N, 50,  8,  8] (kernel 5x5, padding 0, stride 1)
    mp2: MaxPool2D((2,2), (0,0), (2,2)) # Output shape [N, 50,  4,  4] (kernel 2X2, padding 0, stride 2)
                                        # Flatten      [N, 800]
    hidden: Linear(500)                 # Output shape [N, 500]
    classifier: Linear(10)              # Output shape [N, 10]
  forward x:
    x.cv1.relu.mp1.cv2.relu.mp2.flatten.hidden.relu.classifier

# Stochastic Gradient Descent
let optim = newSGD[float32](model_name, 0.01f)

Importantly, similar to Keras only the output shape is defined within the layers: Linear(500) instead of [500, 800], this greatly ease the usage as it is inferred depending on the sequence of layers.

Implementation

Concretely that means that the network macro should produce something similar to the following:

type ModelName = object
  ctx: Context
  cv1_weight: Variable[subtype(ctx)]
  cv1_bias: Variable[subtype(ctx)]
  mp1: tuple[kernel, padding, stride: Size2D]
  cv2_weight: Variable[subtype(ctx)]
  cv2_bias: Variable[subtype(ctx)]
  mp2: tuple[kernel, padding, stride: Size2D]
  hidden_weight: Variable[subtype(ctx)]
  classifier_weight: Variable[subtype(ctx)]

proc newModelName(): ModelName =
  result.cv1_weight = ctx.variable(
    randomTensor(20, InputSize_1, 5, 5, 1'f32) .- 0.5'f32,
    requires_grad = true
    )
  result.cv1_bias = ctx.variable(
    randomTensor(20, 1, 1, 1'f32) .- 0.5'f32,
    requires_grad = true
    )

  result.cv2_weight = ctx.variable(
    randomTensor(50, InputSize_2, 5, 5, 1'f32) .- 0.5'f32,
    requires_grad = true
    )

  result.cv2_bias = ctx.variable(
    randomTensor(50, 1, 1, 1'f32) .- 0.5'f32,
    requires_grad = true
    )

  ...
  # Note that we must automatically determine the input_size1 and input_size2
  # Depending on "input_shape" and the previous layers

proc forward(self: ModelName, x: Variable[T]): Variable[T] =
  template cv1(x: Variable[T]): Variable[T]:
    x.conv2d(self.cv1_weight, self.cv1_bias)
  template cv2(x: Variable[T]): Variable[T]:
    x.conv2d(self.cv2_weight, self.cv2_bias)

  ...

  x.cv1.relu.mp1.cv2.relu.mp2.flatten.hidden.relu.classifier
Things to keep in mind:
  • The model must be serializable so we can save a trained model to disk
  • It should be possible to mix stateless functions like relu, sigmoid and flatten in the forward proc.
  • The forward proc should allow a variadic number of inputs. For example to combine video and audio
  • We can have temporary values in the forward proc
References:
type MNIST
  = Network
    '[ Convolution 1 10 5 5 1 1, Pooling 2 2 2 2, Relu
     , Convolution 10 16 5 5 1 1, Pooling 2 2 2 2, Reshape, Relu
     , FullyConnected 256 80, Logit, FullyConnected 80 10, Logit]
    '[ 'D2 28 28, 'D3 24 24 10, 'D3 12 12 10, 'D3 12 12 10
     , 'D3 8 8 16, 'D3 4 4 16, 'D1 256, 'D1 256
     , 'D1 80, 'D1 80, 'D1 10, 'D1 10]

randomMnist :: MonadRandom m => m MNIST
randomMnist = randomNetwork
Future

The training loop and validation can be also syntactic sugar-ed with fit and predict

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions