GitHub - gmh5225/lemonade: Local LLM Server with NPU Acceleration

🍋 Lemonade SDK: Quickly serve, benchmark and deploy LLMs

The Lemonade SDK makes it easy to run Large Language Models (LLMs) on your PC. Our focus is using the best tools, such as neural processing units (NPUs) and Vulkan GPU acceleration, to maximize LLM speed and responsiveness.

Features

The Lemonade SDK is comprised of the following:

🌐 Lemonade Server: A local LLM server for running ONNX and GGUF models using the OpenAI API standard. Install and enable your applications with NPU and GPU acceleration in minutes.
🐍 Lemonade API: High-level Python API to directly integrate Lemonade LLMs into Python applications.
🖥️ Lemonade CLI: The lemonade CLI lets you mix-and-match LLMs (ONNX, GGUF, SafeTensors) with measurement tools to characterize your models on your hardware. The available tools are:
- Prompting with templates.
- Measuring accuracy with a variety of tests.
- Benchmarking to get the time-to-first-token and tokens per second.
- Profiling the memory utilization.

Click here to get started with Lemonade.

Supported Configurations

Maximum LLM performance requires the right hardware accelerator with the right inference engine for your scenario. Lemonade supports the following configurations, while also making it easy to switch between them at runtime.

Hardware	🛠️ Engine Support			🖥️ OS (x86/x64)
Hardware	OGA	llamacpp	HF	Windows	Linux
🧠 CPU	All platforms	All platforms	All platforms	✅	✅
🎮 GPU	—	Vulkan: All platforms Focus: Radeon™ 7000/9000	—	✅	✅
🤖 NPU	AMD Ryzen™ AI 300 series	—	—	✅	—

Inference Engines Overview

Engine	Description
OnnxRuntime GenAI (OGA)	Microsoft engine that runs `.onnx` models and enables hardware vendors to provide their own execution providers (EPs) to support specialized hardware, such as neural processing units (NPUs).
llamacpp	Community-driven engine with strong GPU acceleration, support for thousands of `.gguf` models, and advanced features such as vision-language models (VLMs) and mixture-of-experts (MoEs).
Hugging Face (HF)	Hugging Face's `transformers` library can run the original `.safetensors` trained weights for models on Meta's PyTorch engine, which provides a source of truth for accuracy measurement.

Contributing

We are actively seeking collaborators from across the industry. If you would like to contribute to this project, please check out our contribution guide.

Maintainers

This project is sponsored by AMD. It is maintained by @danielholanda @jeremyfowers @ramkrishna @vgodsoe in equal measure. You can reach us by filing an issue or email lemonade@amd.com.

License

This project is licensed under the Apache 2.0 License. Portions of the project are licensed as described in NOTICE.md.

Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
.github		.github
docs		docs
examples		examples
img		img
installer		installer
src		src
test		test
.lfsconfig		.lfsconfig
.pylintrc		.pylintrc
LICENSE		LICENSE
NOTICE.md		NOTICE.md
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🍋 Lemonade SDK: Quickly serve, benchmark and deploy LLMs

Features

Click here to get started with Lemonade.

Supported Configurations

Inference Engines Overview

Contributing

Maintainers

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🍋 Lemonade SDK: Quickly serve, benchmark and deploy LLMs

Features

Click here to get started with Lemonade.

Supported Configurations

Inference Engines Overview

Contributing

Maintainers

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages