Skip to content

mafischer/oRKLLM

Repository files navigation

oRKLLM

CI Release GitHub release Node.js License Platform Tests Vulnerabilities

              )       (
             ( \     / )          β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•—  β–ˆβ–ˆβ•—β–ˆβ–ˆβ•—     β–ˆβ–ˆβ•—     β–ˆβ–ˆβ–ˆβ•—   β–ˆβ–ˆβ–ˆβ•—
              \_\   /_/          β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ•‘
            .-----------.        β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β• β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•”β–ˆβ–ˆβ–ˆβ–ˆβ•”β–ˆβ–ˆβ•‘
           /  [*]   [*]  \       β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•— β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘
          |    \  Ο‰  /    |      β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘ β•šβ•β• β–ˆβ–ˆβ•‘
           \  .-------.  /        β•šβ•β•β•β•β•β• β•šβ•β•  β•šβ•β•β•šβ•β•  β•šβ•β•β•šβ•β•β•β•β•β•β•β•šβ•β•β•β•β•β•β•β•šβ•β•     β•šβ•β•
          _/\/  #####  \/\_
         /  /   #####   \  \      Pronounced "ORC-EL-EL-EM"
        / ,/    #####    \, \     OpenAI-compatible LLM inference for Rockchip NPU.
       | / |  .-------.  | \ |    No cloud. No nonsense. Just efficient NPU inference.
       |/  '--[=======]--'  \|
       |       |     |       |
        \   ,  |     |  ,   /
         \  \. |     | ./  /
          '--' |     | '--'
               |     |
              / \   / \
             '   '-'   '

oRKLLM is an energy-efficient, OpenAI API-compatible local LLM inference server and premium admin console designed specifically for Rockchip NPU-powered platforms (such as the RK3576 found in the NanoPi M5 and RK3588 series SBCs).

Inspired by jundot/oMLX (which does the same for Apple Silicon), oRKLLM is adaptively re-engineered to run on the Rockchip RKLLM runtime (librkllmrt.so) with its unique hardware and concurrency constraints.


πŸš€ Key Features

  • OpenAI API Compatibility: Drop-in /v1/chat/completions, /v1/models, and /v1/embeddings endpoints β€” works with Open WebUI, Claude Code, and any OpenAI-compatible client.
  • Full Admin Console: Built with Vue 3 and Vuetify 3 β€” six dedicated pages:
    • Dashboard β€” live CPU/NPU/RAM/Temperature gauges, serving stats, inference playground
    • Models β€” local model manager, HuggingFace search, collection browser, direct downloader
    • Settings β€” inference defaults, HF token, prefix cache config, trusted proxy
    • Logs β€” full-page real-time log terminal over WebSocket
    • Bench β€” inference benchmark (TTFT, prefill tok/s, generation tok/s)
    • Chat β€” full streaming chat UI with system prompt, model selector, and parameter controls
  • Multi-User Auth & RBAC: Local accounts or federated SSO via OIDC/SAML (Keycloak, Google, Azure AD). Two roles: admin and user. Site Management UI for user CRUD, auth provider config, and audit log.
  • OIDC / SAML SSO: Standard Flow with PKCE for public clients (no secret required). Group-to-role mapping from IdP claims. Routes at /auth/oidc/* and /auth/saml/*.
  • HuggingFace Integration: Search the HF Hub, browse collections (e.g. huggingface.co/collections/Qwen/qwen3-...), download .rkllm models directly from the admin console.
  • Prefix KV Cache: Tiered SSD hot/cold LRU cache saves KV state between conversation turns, skipping re-prefill of repeated prefixes. Sliding context window prevents NPU OOM on long conversations.
  • Process-Isolated Execution: Inference engine runs in a dedicated child process. Model unload/swap terminates the process, guaranteeing full NPU driver memory cleanup.
  • Smart Resource Management: Single active model lock, auto-swap, configurable idle timeout.
  • Database Migrations: PRAGMA user_version migration runner β€” schema changes apply automatically on startup, safe across upgrades from any previous version.
  • Seamless Mock Fallback: On non-ARM64/non-Linux platforms, oRKLLM falls back to a JS mock engine β€” rapid UI development on macOS/Windows without a board.
  • Dynamic N-API Bindings: C++ addon uses dlopen/dlsym β€” no compile-time dependency on librkllmrt.so.
  • Secure Auth: PBKDF2-HMAC-SHA256 password hashing, signed session cookies (userId|username|role|expires|HMAC), backward-compatible with single-user installs.

πŸ› οΈ Architecture & Tech Stack

graph TD
    Client[HTTP Client / Open WebUI] -->|REST API| Fastify[Fastify Server]
    Fastify -->|Admin SPA| Admin[Vue 3 / Vuetify Admin]
    Fastify -->|OpenAI Routes| API[OpenAI API Router]

    API -->|Queue Request| Pool[Engine Pool & Resource Manager]
    Pool -->|Spawn / Message| Worker[Worker Process]
    Worker -->|N-API Addon| Addon[orkllm_napi.node]
    Addon -->|Dynamic dlopen| C_API[librkllmrt.so C API]
    C_API -->|NPU Driver| NPU[Rockchip NPU Hardware]

    Admin -->|WebSocket Telemetry| Monitor[Telemetry Monitor]
    Monitor -->|/sys/kernel/debug/rknpu| Linux[Linux Kernel]
Loading
Layer Technology
API Server Node.js + Fastify (ES Modules)
Native Bindings C++ N-API addon (node-addon-api) with dlopen/dlsym
Mock Fallback Pure JS mock engine (auto-enabled on non-ARM64/non-Linux)
Frontend Vue 3 + Vuetify 3 SPA, built with Vite, route-based code splitting
Database SQLite via node:sqlite (Node β‰₯22.5) or better-sqlite3 (Node 20)
Auth Local PBKDF2 + OIDC (PKCE) + SAML 2.0
Testing Playwright E2E (33 tests), mock OIDC service container in CI

πŸ“¦ Installing from a Release Package (Ubuntu / Armbian ARM64)

Pre-built .deb packages for ARM64 are available via the oRKLLM APT repository or directly from the GitHub Releases page.

Option A β€” APT repository (recommended)

# Trust the oRKLLM signing key
curl -fsSL https://mafischer.github.io/oRKLLM/orkllm.gpg \
  | sudo gpg --dearmor -o /usr/share/keyrings/orkllm.gpg

# Add the repository
echo "deb [arch=arm64 signed-by=/usr/share/keyrings/orkllm.gpg] \
  https://mafischer.github.io/oRKLLM stable main" \
  | sudo tee /etc/apt/sources.list.d/orkllm.list

sudo apt update && sudo apt install orkllm

Option B β€” Direct download

wget https://github.com/mafischer/oRKLLM/releases/latest/download/orkllm_VERSION_arm64.deb
sudo dpkg -i orkllm_VERSION_arm64.deb

Configure

sudo nano /etc/orkllm/orkllm.conf
ORKLLM_HOST=0.0.0.0
ORKLLM_PORT=8000
ORKLLM_LIB_PATH=/usr/lib/librkllmrt.so
ORKLLM_MODELS_DIR=/var/lib/orkllm/models
ORKLLM_DB_PATH=/var/lib/orkllm/orkllm.db

Add models and start

sudo cp your_model.rkllm /var/lib/orkllm/models/
sudo systemctl start orkllm

Admin console: http://<device-ip>:8000/admin

Service management

sudo systemctl start|stop|restart|status orkllm
journalctl -u orkllm -f

βš™οΈ Installation from Source

Prerequisites

  • Node.js β‰₯ 18 (β‰₯ 22.5 preferred for native node:sqlite)
  • node-gyp dependencies: Python 3, C++ compiler (Xcode CLT on macOS, build-essential on Linux)
  • A compiled .rkllm model (use rkllm-toolkit to convert from HuggingFace)
  • librkllmrt.so on the target board (typically at /usr/lib/librkllmrt.so)

Setup & Run

# Install all dependencies (compiles native addon)
npm install

# Build Vue frontend
npm run build:frontend

# Start development server (mock engine auto-enabled on macOS)
npm run dev:server
# β†’ http://localhost:8000/admin

Environment Variables

Variable Default Description
ORKLLM_HOST 127.0.0.1 Listen address (0.0.0.0 for LAN)
ORKLLM_PORT 8000 Listen port
ORKLLM_LIB_PATH /usr/lib/librkllmrt.so Path to Rockchip RKLLM runtime
ORKLLM_MODELS_DIR ./models Directory scanned for .rkllm files
ORKLLM_DB_PATH ~/.config/orkllm/auth.db SQLite database path
ORKLLM_TRUSTED_PROXY (unset) true or CIDR to trust X-Forwarded-* headers

πŸ§ͺ Running Tests

# Full E2E suite (mock mode, no board required)
npm test

# SSO integration tests using local Keycloak container (same as CI)
npm run test:sso        # starts Keycloak + runs SSO tests
npm run test:sso:down   # tear down Keycloak when done

CI runs the full suite including OIDC SSO via a containerised Keycloak instance with a pre-configured orkllm realm.

Test environment variables

Set these in .env locally (gitignored) or as GitHub Actions secrets/variables. The .env file is loaded automatically by Playwright.

Variable Where Description
ORKLLM_TEST_ADMIN_USER Secret Admin username created during test setup
ORKLLM_TEST_ADMIN_PASS Secret Admin password
ORKLLM_TEST_OIDC_ISSUER Secret Real Keycloak issuer URL (for ORKLLM_TEST_LIVE=1)
ORKLLM_TEST_OIDC_CLIENT_ID Secret OIDC client ID (orkllm-oidc)
ORKLLM_TEST_SAML_METADATA_URL Secret Real Keycloak SAML metadata URL
ORKLLM_TEST_OIDC_USER Secret Keycloak test user (testuser)
ORKLLM_TEST_OIDC_USER_PASS Secret Keycloak test user password
ORKLLM_TEST_OIDC_ADMIN_USER Secret Keycloak admin test user (testadminuser)
ORKLLM_TEST_OIDC_ADMIN_PASS Secret Keycloak admin test user password
ORKLLM_TEST_MOCK_OIDC_URL Auto-set Issuer URL of CI Keycloak container (http://localhost:8080/realms/orkllm)
ORKLLM_TEST_REDIRECT_BASE Auto-set Base URL for OIDC redirect_uri β€” derived from this so protocol is correct (http:// in CI, https:// live)
ORKLLM_TEST_LIVE Variable Set to 1 to run SSO tests against real Keycloak on LAN
ORKLLM_TEST_LIVE_URL Variable Live server URL (e.g. https://orkllm.fischerapps.com)

Debugging failed CI tests

When E2E tests fail in CI, Playwright uploads screenshots and error context as an artifact named playwright-report (retained 7 days).

Download via CLI:

gh run download <run-id> --name playwright-report -D /tmp/report
# Find the run ID with: gh run list --limit 5

Download via browser: GitHub Actions run β†’ Summary β†’ Artifacts section at the bottom β†’ download playwright-report.zip.

Each failed test has a test-failed-1.png screenshot and an error-context.md with the stack trace, making it easy to see exactly what the browser showed at the point of failure.


🀝 Credits & Acknowledgements

  • jundot/oMLX: Inspired the dashboard layout, metrics design, single-model lifecycle, and OpenAI compatibility structures.
  • Rockchip: SDKs and runtime libraries (librkllmrt.so) powering localized NPU inference.