Name		Name	Last commit message	Last commit date
parent directory ..
cache-usage		cache-usage
chrome-extension-webgpu-service-worker		chrome-extension-webgpu-service-worker
chrome-extension		chrome-extension
function-calling		function-calling
get-started-web-worker		get-started-web-worker
get-started		get-started
json-mode		json-mode
json-schema		json-schema
logit-processor		logit-processor
multi-round-chat		multi-round-chat
next-simple-chat		next-simple-chat
seed-to-reproduce		seed-to-reproduce
simple-chat-upload		simple-chat-upload
simple-chat		simple-chat
streaming		streaming
.gitignore		.gitignore
README.md		README.md

README.md

Awesome WebLLM

This page contains a curated list of examples, tutorials, blogs about WebLLM usecases. Please send a pull request if you find things that belongs to here.

Tutorial Examples

Note that all examples below run in-browser and use WebGPU as a backend.

Basic Chat Completion

get-started: minimum get started example with chat completion.
get-started-web-worker: same as get-started, but using web worker.
multi-round-chat: while APIs are functional, we internally optimize so that multi round chat usage can reuse KV cache
simple-chat: a mininum and complete chat bot app.
next-simple-chat: a mininum and complete chat bot app with Next.js.

Advanced OpenAI API Capabilities

These examples demonstrate various capabilities via WebLLM's OpenAI-like API.

streaming: return output as chunks in real-time in the form of an AsyncGenerator
json-mode: efficiently ensure output is in json format, see OpenAI Reference for more.
json-schema: besides guaranteeing output to be in JSON, ensure output to adhere to a specific JSON schema specified the user
function-calling (WIP): function calling with fields tools and tool_choice.
seed-to-reproduce: use seeding to ensure reproducible output with fields seed.

Chrome Extension

chrome-extension: chrome extension that does not have a persistent background
chrome-extension-webgpu-service-worker: chrome extension using service worker, hence having a persistent background

Others

logit-processor: while logit_bias is supported, we additionally support stateful logit processing where users can specify their own rules. We also expose low-level API forwardTokensAndSample().
cache-usage: demonstrates how WebLLM supports both the Cache API and IndexedDB cache, and users can pick with appConfig.useIndexedDBCache. Also demonstrates various cache utils such as checking whether a model is cached, deleting a model's weights from cache, deleting a model library wasm from cache, etc.
simple-chat-upload: demonstrates how to upload local models to WebLLM instead of downloading via a URL link

Demo Spaces

web-llm-embed: document chat prototype using react-llm with transformers.js embeddings
DeVinci: AI chat app based on WebLLM and hosted on decentralized cloud platform

Files

examples

Directory actions

More options