Morello is a synthesizer which generates fast neural network pipelines and kernels for X86 and ARM CPUs. It consumes a neural network specification ("Spec" for short) and generates a C implementation of that specification.
The easiest way to get started running Morello locally is to git clone
the project
and, from the cloned source directory, synthesize one of the predefined specifications.
For example, to synthesize a 2x2x2 matrix multiplication:
cargo r --release -- matmul 2
(Run cargo r --release -- --help
for a list of predefined specifications.)
A good alternative is to launch a GitHub Codespace. This repository has a Dev Container configuration, so launching a Codespace will now connect you to an environment set up for Morello development (Rust toolchain, Clang, etc.).
Synthesizing larger sizes (e.g., 16x16x16) can take a long time (hours or even days). To
speed up subsequent executions, Morello can memoize optimization decisions to disk when
given --db
flag. For example:
cargo r --release -- --db morello.db matmul 2
This stores the optimal implementation for a 2x2x2 matrix multiplication as well as its dependencies, including the optimal 1x1x1 matrix multiplication, optimal kernels for moving data from global memory to registers, and many others. The next time you run Morello to compute a 2x2x2 matrix multiplication, it will be near-instantaneous, but also, if synthesizing a 4x4x4 matrix multiplication or a pipeline of matrix multiplications, you'll have a head-start by reusing that database.
Morello logs useful, additional information via the log crate. Consider setting RUST_LOG=info
in your shell environment to see these logs.
While Morello is primarily intended as a synthesizer, its IR can also be a convenient way of manually lowering a specification to C. An example of manually scheduling a matrix multiplication is given in morello/examples/simple_matmul_x86.rs. To run it:
cargo r --example simple_matmul_x86