llamadart

A Dart/Flutter plugin for llama.cpp. Run LLM inference directly in Dart and Flutter applications using GGUF models with hardware acceleration.

⚠️ Status

Actively Under Development. The core features are implemented and running. Many more features are in the pipeline, including:

High-level APIs for easier integration.
Multi-modality support (Vision/LLaVA).

We welcome contributors to help us test on more platforms (especially Windows)!

🚀 Supported Platforms

Platform	Architecture(s)	GPU Backend	Status
macOS	Universal (arm64, x86_64)	Metal	✅ Tested (CPU, Metal)
iOS	arm64 (Device), x86_64/arm64 (Sim)	Metal (Device), CPU (Sim)	✅ Tested (CPU, Metal)
Android	arm64-v8a, x86_64	Vulkan (if supported)	✅ Tested (CPU, Vulkan)
Linux	x86_64	CUDA / Vulkan	⚠️ Tested (CPU Verified, Vulkan Untested)
Windows	x86_64	CUDA / Vulkan	❓ Needs Testing
Web	WASM	CPU (WASM)	✅ Tested (WASM)

1. Add Dependency

Add llamadart to your pubspec.yaml:

dependencies:
  llamadart: ^0.1.0

2. Platform Setup

📱 iOS

No manual setup required. The plugin automatically builds llama.cpp for iOS (Device/Simulator) when you run flutter build ios. Note: The first build will take a few minutes to compile the C++ libraries.

💻 macOS / Linux / Windows

The package handles native builds automatically via CMake.

macOS: Metal acceleration is enabled by default.
Linux/Windows: CPU inference is supported.

📱 Android

No manual setup required. The plugin uses CMake to compile the native library automatically.

Ensure you have the Android NDK installed via Android Studio.
The first build will take a few minutes to compile the llama.cpp libraries for your target device's architecture.

🌐 Web

Zero-config by default (uses jsDelivr CDN for wllama).

Import and use LlamaService.

Enable WASM support in Flutter web:

flutter run -d chrome --wasm
# OR build with wasm
flutter build web --wasm

Offline / Bundled Usage (Optional):

Download assets to your assets/ directory:
```
dart run llamadart:download_wllama
```

Add the folder to your pubspec.yaml:

flutter:
  assets:
    - assets/wllama/single-thread/

Initialize with local asset paths:

final service = LlamaService(
  wllamaPath: 'assets/wllama/single-thread/wllama.js',
  wasmPath: 'assets/wllama/single-thread/wllama.wasm',
);

📱 Platform Specifics

iOS

Metal: Acceleration enabled by default on physical devices.
Simulator: Runs on CPU (x86_64 or arm64).

macOS

Sandboxing: Add these entitlements to macos/Runner/DebugProfile.entitlements and Release.entitlements for network access (model downloading):
```
<key>com.apple.security.network.client</key>
<true/>
```

Android

Architectures: arm64-v8a (most devices) and x86_64 (emulators).
Vulkan: GPU acceleration is enabled by default on devices with Vulkan support.
NDK: Requires Android NDK 26+ installed (usually handled by Android Studio).

🎮 GPU Configuration

GPU backends are enabled by default where available. Use the options below to customize.

Runtime Control (Recommended)

Control GPU usage at runtime via ModelParams:

// Use GPU with automatic backend selection (default)
await service.init('model.gguf', modelParams: ModelParams(
  gpuLayers: 99,  // Offload all layers to GPU
  preferredBackend: GpuBackend.auto,
));

// Force CPU-only inference
await service.init('model.gguf', modelParams: ModelParams(
  gpuLayers: 0,  // No GPU offloading
  preferredBackend: GpuBackend.cpu,
));

// Request specific backend (if compiled in)
await service.init('model.gguf', modelParams: ModelParams(
  preferredBackend: GpuBackend.vulkan,
));

Available backends: auto, cpu, cuda, vulkan, metal

Compile-Time Options (Advanced)

To disable GPU backends at build time:

Android (in android/gradle.properties):

LLAMA_DART_NO_VULKAN=true

Desktop (CMake flags):

# Disable CUDA
cmake -DLLAMA_DART_NO_CUDA=ON ...

# Disable Vulkan
cmake -DLLAMA_DART_NO_VULKAN=ON ...

🚀 Usage

import 'package:llamadart/llamadart.dart';

void main() async {
  final service = LlamaService();

  try {
    // 1. Initialize with model path (GGUF)
    // On iOS/macOS, ensures Metal is used if available.
    await service.init('models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf');
    
    // 2. Generate text (streaming)
    final prompt = "<start_of_turn>user\nTell me a story about a llama.<end_of_turn>\n<start_of_turn>model\n";
    
    await for (final token in service.generate(prompt)) {
      stdout.write(token);
    }
  } finally {
    // 3. Always dispose to free native memory
    service.dispose();
  }
}

📱 Examples

Flutter Chat App: example/chat_app
- A full-featured chat interface with real-time streaming, GPU acceleration support, and model management.
Basic Console App: example/basic_app
- Minimal example demonstrating model download and basic inference.

🤝 Contributing

See CONTRIBUTING.md for detailed instructions on:

Setting up the development environment.
Building the native libraries.
Running tests and examples.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
android		android
example		example
ios		ios
lib		lib
linux		linux
macos		macos
scripts		scripts
src/native		src/native
test		test
tool		tool
windows		windows
.gitignore		.gitignore
.gitmodules		.gitmodules
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
analysis_options.yaml		analysis_options.yaml
ffigen.yaml		ffigen.yaml
pubspec.yaml		pubspec.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llamadart

⚠️ Status

🚀 Supported Platforms

1. Add Dependency

2. Platform Setup

📱 iOS

💻 macOS / Linux / Windows

📱 Android

🌐 Web

📱 Platform Specifics

iOS

macOS

Android

🎮 GPU Configuration

Runtime Control (Recommended)

Compile-Time Options (Advanced)

🚀 Usage

📱 Examples

🤝 Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

leehack/llamadart

Folders and files

Latest commit

History

Repository files navigation

llamadart

⚠️ Status

🚀 Supported Platforms

1. Add Dependency

2. Platform Setup

📱 iOS

💻 macOS / Linux / Windows

📱 Android

🌐 Web

📱 Platform Specifics

iOS

macOS

Android

🎮 GPU Configuration

Runtime Control (Recommended)

Compile-Time Options (Advanced)

🚀 Usage

📱 Examples

🤝 Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages