Host and Run Entire LLM Models Directly in the Browser Locally

Jump to bottom Edit New page

fingerthief edited this page May 11, 2024 · 3 revisions

What is WebLLM?

WebLLM is a modular and customizable javascript package that directly brings language model chats directly onto web browsers with hardware acceleration. Everything runs inside the browser with no server support and is accelerated with WebGPU.
WebLLM is fully compatible with OpenAI API. That is, you can use the same OpenAI API on any open source models locally.

All made possible by the great work done by the folks at WebLLM Github Repo

WebGPU Brower Support

Microsoft Edge
Google Chrome
Generally, Chromium based browsers (YMMV)
Firefox Nightly Build (50/50 if it works)

Notes

Downloading and consequentially caching full models can take up a chunk of storage space so be aware of that.
Performance is highly dependent on the specs of the machine attempting to load/host the model