) (
( \ / ) βββββββ βββββββ βββ ββββββ βββ ββββ ββββ
\_\ /_/ ββββββββββββββββββββ βββββββ βββ βββββ βββββ
.-----------. βββ ββββββββββββββββββ βββ βββ βββββββββββ
/ [*] [*] \ βββ ββββββββββββββββββ βββ βββ βββββββββββ
| \ Ο / | ββββββββββββ ββββββ ββββββββββββββββββββββ βββ βββ
\ .-------. / βββββββ βββ ββββββ ββββββββββββββββββββββ βββ
_/\/ ##### \/\_
/ / ##### \ \ Pronounced "ORC-EL-EL-EM"
/ ,/ ##### \, \ OpenAI-compatible LLM inference for Rockchip NPU.
| / | .-------. | \ | No cloud. No nonsense. Just efficient NPU inference.
|/ '--[=======]--' \|
| | | |
\ , | | , /
\ \. | | ./ /
'--' | | '--'
| |
/ \ / \
' '-' '
oRKLLM is an energy-efficient, OpenAI API-compatible local LLM inference server and premium admin console designed specifically for Rockchip NPU-powered platforms (such as the RK3576 found in the NanoPi M5 and RK3588 series SBCs).
Inspired by jundot/oMLX (which does the same for Apple Silicon), oRKLLM is adaptively re-engineered to run on the Rockchip RKLLM runtime (librkllmrt.so) with its unique hardware and concurrency constraints.
- OpenAI API Compatibility: Drop-in
/v1/chat/completions,/v1/models, and/v1/embeddingsendpoints β works with Open WebUI, Claude Code, and any OpenAI-compatible client. - Full Admin Console: Built with Vue 3 and Vuetify 3 β six dedicated pages:
- Dashboard β live CPU/NPU/RAM/Temperature gauges, serving stats, inference playground
- Models β local model manager, HuggingFace search, collection browser, direct downloader
- Settings β inference defaults, HF token, prefix cache config, trusted proxy
- Logs β full-page real-time log terminal over WebSocket
- Bench β inference benchmark (TTFT, prefill tok/s, generation tok/s)
- Chat β full streaming chat UI with system prompt, model selector, and parameter controls
- Multi-User Auth & RBAC: Local accounts or federated SSO via OIDC/SAML (Keycloak, Google, Azure AD). Two roles:
adminanduser. Site Management UI for user CRUD, auth provider config, and audit log. - OIDC / SAML SSO: Standard Flow with PKCE for public clients (no secret required). Group-to-role mapping from IdP claims. Routes at
/auth/oidc/*and/auth/saml/*. - HuggingFace Integration: Search the HF Hub, browse collections (e.g.
huggingface.co/collections/Qwen/qwen3-...), download.rkllmmodels directly from the admin console. - Prefix KV Cache: Tiered SSD hot/cold LRU cache saves KV state between conversation turns, skipping re-prefill of repeated prefixes. Sliding context window prevents NPU OOM on long conversations.
- Process-Isolated Execution: Inference engine runs in a dedicated child process. Model unload/swap terminates the process, guaranteeing full NPU driver memory cleanup.
- Smart Resource Management: Single active model lock, auto-swap, configurable idle timeout.
- Database Migrations: PRAGMA user_version migration runner β schema changes apply automatically on startup, safe across upgrades from any previous version.
- Seamless Mock Fallback: On non-ARM64/non-Linux platforms, oRKLLM falls back to a JS mock engine β rapid UI development on macOS/Windows without a board.
- Dynamic N-API Bindings: C++ addon uses
dlopen/dlsymβ no compile-time dependency onlibrkllmrt.so. - Secure Auth: PBKDF2-HMAC-SHA256 password hashing, signed session cookies (
userId|username|role|expires|HMAC), backward-compatible with single-user installs.
graph TD
Client[HTTP Client / Open WebUI] -->|REST API| Fastify[Fastify Server]
Fastify -->|Admin SPA| Admin[Vue 3 / Vuetify Admin]
Fastify -->|OpenAI Routes| API[OpenAI API Router]
API -->|Queue Request| Pool[Engine Pool & Resource Manager]
Pool -->|Spawn / Message| Worker[Worker Process]
Worker -->|N-API Addon| Addon[orkllm_napi.node]
Addon -->|Dynamic dlopen| C_API[librkllmrt.so C API]
C_API -->|NPU Driver| NPU[Rockchip NPU Hardware]
Admin -->|WebSocket Telemetry| Monitor[Telemetry Monitor]
Monitor -->|/sys/kernel/debug/rknpu| Linux[Linux Kernel]
| Layer | Technology |
|---|---|
| API Server | Node.js + Fastify (ES Modules) |
| Native Bindings | C++ N-API addon (node-addon-api) with dlopen/dlsym |
| Mock Fallback | Pure JS mock engine (auto-enabled on non-ARM64/non-Linux) |
| Frontend | Vue 3 + Vuetify 3 SPA, built with Vite, route-based code splitting |
| Database | SQLite via node:sqlite (Node β₯22.5) or better-sqlite3 (Node 20) |
| Auth | Local PBKDF2 + OIDC (PKCE) + SAML 2.0 |
| Testing | Playwright E2E (33 tests), mock OIDC service container in CI |
Pre-built .deb packages for ARM64 are available via the oRKLLM APT repository or directly from the GitHub Releases page.
# Trust the oRKLLM signing key
curl -fsSL https://mafischer.github.io/oRKLLM/orkllm.gpg \
| sudo gpg --dearmor -o /usr/share/keyrings/orkllm.gpg
# Add the repository
echo "deb [arch=arm64 signed-by=/usr/share/keyrings/orkllm.gpg] \
https://mafischer.github.io/oRKLLM stable main" \
| sudo tee /etc/apt/sources.list.d/orkllm.list
sudo apt update && sudo apt install orkllmwget https://github.com/mafischer/oRKLLM/releases/latest/download/orkllm_VERSION_arm64.deb
sudo dpkg -i orkllm_VERSION_arm64.debsudo nano /etc/orkllm/orkllm.confORKLLM_HOST=0.0.0.0
ORKLLM_PORT=8000
ORKLLM_LIB_PATH=/usr/lib/librkllmrt.so
ORKLLM_MODELS_DIR=/var/lib/orkllm/models
ORKLLM_DB_PATH=/var/lib/orkllm/orkllm.dbsudo cp your_model.rkllm /var/lib/orkllm/models/
sudo systemctl start orkllmAdmin console: http://<device-ip>:8000/admin
sudo systemctl start|stop|restart|status orkllm
journalctl -u orkllm -f- Node.js β₯ 18 (β₯ 22.5 preferred for native
node:sqlite) node-gypdependencies: Python 3, C++ compiler (Xcode CLT on macOS,build-essentialon Linux)- A compiled
.rkllmmodel (userkllm-toolkitto convert from HuggingFace) librkllmrt.soon the target board (typically at/usr/lib/librkllmrt.so)
# Install all dependencies (compiles native addon)
npm install
# Build Vue frontend
npm run build:frontend
# Start development server (mock engine auto-enabled on macOS)
npm run dev:server
# β http://localhost:8000/admin| Variable | Default | Description |
|---|---|---|
ORKLLM_HOST |
127.0.0.1 |
Listen address (0.0.0.0 for LAN) |
ORKLLM_PORT |
8000 |
Listen port |
ORKLLM_LIB_PATH |
/usr/lib/librkllmrt.so |
Path to Rockchip RKLLM runtime |
ORKLLM_MODELS_DIR |
./models |
Directory scanned for .rkllm files |
ORKLLM_DB_PATH |
~/.config/orkllm/auth.db |
SQLite database path |
ORKLLM_TRUSTED_PROXY |
(unset) | true or CIDR to trust X-Forwarded-* headers |
# Full E2E suite (mock mode, no board required)
npm test
# SSO integration tests using local Keycloak container (same as CI)
npm run test:sso # starts Keycloak + runs SSO tests
npm run test:sso:down # tear down Keycloak when doneCI runs the full suite including OIDC SSO via a containerised Keycloak instance with a pre-configured orkllm realm.
Set these in .env locally (gitignored) or as GitHub Actions secrets/variables. The .env file is loaded automatically by Playwright.
| Variable | Where | Description |
|---|---|---|
ORKLLM_TEST_ADMIN_USER |
Secret | Admin username created during test setup |
ORKLLM_TEST_ADMIN_PASS |
Secret | Admin password |
ORKLLM_TEST_OIDC_ISSUER |
Secret | Real Keycloak issuer URL (for ORKLLM_TEST_LIVE=1) |
ORKLLM_TEST_OIDC_CLIENT_ID |
Secret | OIDC client ID (orkllm-oidc) |
ORKLLM_TEST_SAML_METADATA_URL |
Secret | Real Keycloak SAML metadata URL |
ORKLLM_TEST_OIDC_USER |
Secret | Keycloak test user (testuser) |
ORKLLM_TEST_OIDC_USER_PASS |
Secret | Keycloak test user password |
ORKLLM_TEST_OIDC_ADMIN_USER |
Secret | Keycloak admin test user (testadminuser) |
ORKLLM_TEST_OIDC_ADMIN_PASS |
Secret | Keycloak admin test user password |
ORKLLM_TEST_MOCK_OIDC_URL |
Auto-set | Issuer URL of CI Keycloak container (http://localhost:8080/realms/orkllm) |
ORKLLM_TEST_REDIRECT_BASE |
Auto-set | Base URL for OIDC redirect_uri β derived from this so protocol is correct (http:// in CI, https:// live) |
ORKLLM_TEST_LIVE |
Variable | Set to 1 to run SSO tests against real Keycloak on LAN |
ORKLLM_TEST_LIVE_URL |
Variable | Live server URL (e.g. https://orkllm.fischerapps.com) |
When E2E tests fail in CI, Playwright uploads screenshots and error context as an artifact named playwright-report (retained 7 days).
Download via CLI:
gh run download <run-id> --name playwright-report -D /tmp/report
# Find the run ID with: gh run list --limit 5Download via browser: GitHub Actions run β Summary β Artifacts section at the bottom β download playwright-report.zip.
Each failed test has a test-failed-1.png screenshot and an error-context.md with the stack trace, making it easy to see exactly what the browser showed at the point of failure.
- jundot/oMLX: Inspired the dashboard layout, metrics design, single-model lifecycle, and OpenAI compatibility structures.
- Rockchip: SDKs and runtime libraries (
librkllmrt.so) powering localized NPU inference.