Neuron Kernel Generator

A modular C++ tool for generating HLO and NEFF files from GGML/llama.cpp kernels for AWS Neuron (Inferentia/Trainium).

Features

Modular Architecture: Clean base class with extensible kernel implementations
GGML Support: Essential kernels for llama.cpp backend development
Mock Testing: Build and test without full Neuron SDK installation
Docker Ready: Designed for Neuron SDK Docker environment
Automated Generation: Scripts to generate all kernels with organized output

Available Kernels

Arithmetic Operations

add - Element-wise addition
mul - Element-wise multiplication
sub - Element-wise subtraction

Matrix Operations

matmul - Matrix multiplication
transpose - Matrix transpose
reshape - Tensor reshape

Activation Functions

relu - Rectified Linear Unit
gelu - Gaussian Error Linear Unit
silu - Sigmoid Linear Unit (Swish)
softmax - Softmax normalization

Quick Start

Generate All Kernels (Recommended)

# Local generation
./generate_all_kernels.sh trn1

# Docker generation
./build_and_generate.sh trn1 true

# Docker Compose
docker-compose up

Manual Build and Generate

mkdir build && cd build
cmake ..
make

# Generate individual kernels
./kernel-generator add 1024 1024 my_add_kernel
./kernel-generator matmul 512 512 llama_matmul

Output Structure

output/
├── generic/hlo/          # HLO files (device-independent)
│   ├── add.hlo
│   ├── matmul.hlo
│   └── ...
└── trn1/neff/           # NEFF files (device-specific)
    ├── add.neff
    ├── matmul.neff
    └── ...

Docker Usage (Production)

Build and Run

# Build image
docker build -t neuron-kernel-generator .

# Generate for trn1
docker run --rm -v $(pwd)/output:/workspace/output neuron-kernel-generator

# Generate for inf2
docker run --rm -v $(pwd)/output:/workspace/output neuron-kernel-generator ./generate_all_kernels.sh inf2

Docker Compose

# Generate for trn1
docker-compose up neuron-kernel-generator

# Generate for inf2
docker-compose up neuron-kernel-generator-inf2

Architecture

kernel-generator.cpp          # Main entry point
base_kernel.h/cpp            # Abstract base class
mock_xla.h                   # Mock XLA for testing
kernels/
├── arithmetic_kernels.h/cpp # Add, mul, sub
├── matrix_kernels.h/cpp     # MatMul, transpose, reshape
└── activation_kernels.h/cpp # ReLU, GELU, SiLU, softmax
scripts/
├── generate_all_kernels.sh  # Generate all kernels
├── build_and_generate.sh    # Build and generate wrapper
└── docker-compose.yml       # Docker orchestration

Adding New Kernels

Create kernel class:

class MyKernel : public BaseKernel {
public:
    MyKernel(const std::vector<int64_t>& shape) : BaseKernel(shape, "my_kernel") {}
protected:
    std::unique_ptr<xla::HloModule> build_hlo() override;
};

Register in factory:

{"my_kernel", [](const std::vector<int64_t>& shape) { 
    return std::make_unique<MyKernel>(shape); 
}}

Add to generation script:

# In generate_all_kernels.sh
KERNELS="... my_kernel:1024,1024"

Output Files

output/generic/hlo/<kernel>.hlo - HLO intermediate representation
output/<device>/neff/<kernel>.neff - Neuron Executable File Format

Requirements

Development: CMake 3.16+, C++17
Production: AWS Neuron SDK with neuronx-cc compiler
Testing: Mock XLA headers (included)
Docker: Docker Engine for containerized builds

GGML Backend Integration

This tool generates the NEFF files needed for a GGML Neuron backend. Each kernel corresponds to GGML operations used in llama.cpp.

Generated files can be directly integrated into your GGML backend by loading the appropriate NEFF files for your target device.

License

MIT License.

Thank You and Feedback

Reach out to us for any feedback or contributions.

Now Enjoy!

Author: Karthik Kumar Viswanathan
Web : https://karthikkumar.org
Email : me@karthikkumar.org

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.clang-format		.clang-format
.dockerignore		.dockerignore
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
build_and_generate.sh		build_and_generate.sh
generate_all_kernels.sh		generate_all_kernels.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Neuron Kernel Generator

Features

Available Kernels

Arithmetic Operations

Matrix Operations

Activation Functions

Quick Start

Generate All Kernels (Recommended)

Manual Build and Generate

Output Structure

Docker Usage (Production)

Build and Run

Docker Compose

Architecture

Adding New Kernels

Output Files

Requirements

GGML Backend Integration

License

Thank You and Feedback

About

Uh oh!

Releases

Packages

Languages

License

guilt/Neuron-Kernel-Generator

Folders and files

Latest commit

History

Repository files navigation

Neuron Kernel Generator

Features

Available Kernels

Arithmetic Operations

Matrix Operations

Activation Functions

Quick Start

Generate All Kernels (Recommended)

Manual Build and Generate

Output Structure

Docker Usage (Production)

Build and Run

Docker Compose

Architecture

Adding New Kernels

Output Files

Requirements

GGML Backend Integration

License

Thank You and Feedback

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages