Browser LLM Chat is a fully local-first React application that runs language and vision models directly in the browser with WebGPU. There is no backend inference layer, no API key requirement, and no server-side chat state.
- Local-first inference with
@huggingface/transformers - Web Worker based model loading and token streaming
- Shared app state managed with Zustand
- Chat history persisted locally with IndexedDB and localStorage fallback
- Curated browser-ready models plus searchable Hugging Face discovery
- Built-in settings for generation controls and downloaded-model cleanup
- React 19
- Vite
- TypeScript
- Vanilla CSS
@huggingface/transformers4.0.0-next.x
- Node.js
22.x - A WebGPU-capable browser Recommended: recent Chrome or Edge desktop builds
npm install
npm run devOpen http://localhost:5173.
npm run dev: start the Vite dev servernpm run build: typecheck and build the production bundlenpm run preview: preview the production build locallynpm run lint: run ESLint with zero warnings allowednpm run lint:fix: apply safe ESLint autofixesnpm run format: format the repo with Prettiernpm run format:check: verify formatting without rewriting filesnpm run typecheck: run TypeScript without emittingnpm run test: run the Vitest suitenpm run test:watch: run Vitest in watch modenpm run check: run lint, typecheck, tests, and build
src/App.tsxis the top-level composition layer for the SPA shell.- Shared cross-screen UI, chat, and model state lives in
src/store/app-store.ts. - Heavy inference work stays in
src/model.worker.tsplus the focused worker helpers insrc/worker/. - Chat persistence is handled locally through
src/chat-store.ts. - Lightweight preferences, storage helpers, and storage feedback live in
src/storage.ts. - Tests live in
src/test/so contributors have one place to look for coverage. - There is intentionally no router or backend inference path.
Pull requests are expected to pass:
npm run lintnpm run typechecknpm run testnpm run build
GitHub Actions runs the same checks automatically.
- First-time model downloads can be large and slow on constrained networks.
- Larger models remain sensitive to browser, VRAM, and device class.
- WebGPU support is required for the supported experience.
MIT