Julia C++
Clone or download
Latest commit fb5e07f Jun 30, 2018
Failed to load latest commit information.
deps Update May 2, 2018
docs Update docs Dec 15, 2017
examples/mnist Add pad and update rnn Feb 27, 2018
scripts Refactor Sep 15, 2017
src Small update Jun 30, 2018
test Small update Jun 30, 2018
.gitattributes Add gitattributes for git lfs Jul 21, 2017
.gitignore Change save/load to JLD2 and mnist Oct 24, 2017
.travis.yml Refactor Sep 15, 2017
LICENSE Update LICENSE Dec 3, 2016
Merlin.png Update README Apr 5, 2016
README.md Update README Dec 15, 2017
REQUIRE [WIP] Add CUDA Jan 11, 2018
appveyor.yml Remove -F option from appveyor Oct 18, 2017


Merlin: deep learning framework for Julia

Merlin is a deep learning framework written in Julia.

It aims to provide a fast, flexible and compact deep learning library for machine learning.

Merlin is tested against Julia 0.6 on Linux, OS X, and Windows (x64).

Build Status Build status



  • Julia 0.6
  • g++ (for OSX or Linux)


julia> Pkg.add("Merlin")

Quick Start


  1. Wrap your data with Var (Variable type).
  2. Apply functions to Var.
    Var memorizes a history of function calls for auto-differentiation.
  3. Compute gradients if necessary.
  4. Update the parameters with an optimizer.

Here is an example of three-layer network:

Merlin supports both static and dynamic evaluation of neural networks.

Dynamic Evaluation

using Merlin

T = Float32
x = zerograd(rand(T,10,5)) # instanciate Var with zero gradients
y = Linear(T,10,7)(x)
y = relu(y)
y = Linear(T,7,3)(y)

params = gradient!(y)

opt = SGD(0.01)
foreach(opt, params)

If you don't need gradients of x, use x = Var(rand(T,10,5)) where x.grad is set to nothing.

Static Evalation

For static evaluation, the process are as follows.

  1. Construct a Graph.
  2. Feed your data to the graph.

When you apply Node to a function, it's lazily evaluated.

using Merlin

T = Float32
n = Node(name="x")
n = Linear(T,10,7)(n)
n = relu(n)
n = Linear(T,7,3)(n)
@assert typeof(n) == Node
g = Graph(n)

x = zerograd(rand(T,10,10))
y = g("x"=>x)

params = gradient!(y)

opt = SGD(0.01)
foreach(opt, params)

When the network structure can be represented as static, it is recommended to use this style.




This is an example of batched LSTM.

using Merlin

T = Float32
a = rand(T,20,3)
b = rand(T,20,2)
c = rand(T,20,5)
x = Var(cat(2,a,b,c))
lstm = LSTM(T, 20, 20) # input size: 20, output size: 20
y = lstm(x, [3,2,5])

More examples can be found in examples.