Add Flare as a third inference engine backend

## Summary

Add [Flare LLM](https://github.com/sauravpanda/flarellm) as a third inference engine option alongside MLC WebLLM and Transformers.js. Flare is a pure Rust → WASM inference engine with WebGPU acceleration that loads standard GGUF files directly (no TVM compilation step).

## Why

| | MLC WebLLM | Transformers.js | **Flare** |
|---|---|---|---|
| Model format | TVM artifacts | ONNX | **Standard GGUF** |
| Compilation | Need TVM compile | Need ONNX export | **Direct HuggingFace GGUF** |
| Language | C++ (emscripten) | C++ (ONNX RT) | **Pure Rust → WASM** |
| Progressive loading | No | No | **Yes** |
| LoRA hot-swap | No | No | **Yes** |
| BitNet ternary | No | No | **Yes** |
| Speculative decoding | No | No | **Yes** |
| WASM binary size | ~15MB | ~10MB | **~5MB** (est.) |

Key advantage: users can grab any GGUF model from HuggingFace and use it immediately — no conversion pipeline.

## Integration plan

1. Publish `@aspect/flare` npm package via wasm-pack
2. Create `FlareEngine` adapter implementing the BrowserAI engine interface
3. Map BrowserAI API to Flare WASM API:
   - `loadModel()` → `FlareEngine.load()` + `init_gpu()`
   - `generateText()` → `begin_stream()` + `next_token()` loop
   - `transcribeAudio()` → N/A (Flare doesn't do STT yet)
4. Add GGUF models to model registry
5. Allow engine selection: `new BrowserAI({ engine: 'flare' })`

## Depends on
- #[flare-npm] Publish @aspect/flare npm package
- #[flare-adapter] FlareEngine adapter implementation
- #[flare-models] Add GGUF models to registry

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Flare as a third inference engine backend #293

Summary

Why

Integration plan

Depends on

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	MLC WebLLM	Transformers.js	Flare
Model format	TVM artifacts	ONNX	Standard GGUF
Compilation	Need TVM compile	Need ONNX export	Direct HuggingFace GGUF
Language	C++ (emscripten)	C++ (ONNX RT)	Pure Rust → WASM
Progressive loading	No	No	Yes
LoRA hot-swap	No	No	Yes
BitNet ternary	No	No	Yes
Speculative decoding	No	No	Yes
WASM binary size	~15MB	~10MB	~5MB (est.)

Add Flare as a third inference engine backend #293

Description

Summary

Why

Integration plan

Depends on

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions