# üåê LMFast: Browser Deployment

**Run your SLM entirely in the user's browser! No server costs!**

## What You'll Learn
- Export models for WebLLM (WebGPU accelerated)
- Export models for ONNX Runtime Web
- Generate a complete HTML/JS demo app automatically
- Host your AI app for free (GitHub Pages / Netlify)

## Technologies
- **WebGPU**: Near-native GPU performance in Chrome/Edge
- **WebAssembly (Wasm)**: Efficient execution

**Time to complete:** ~15 minutes

## 1Ô∏è‚É£ Setup

In [None]:
!pip install -q lmfast[all] optimum[onnxruntime]

import lmfast
lmfast.setup_colab_env()

## 2Ô∏è‚É£ Select a Model

We need a small model that fits in browser memory (RAM + VRAM). 
SmolLM-135M is perfect (only ~100MB compressed!).

In [None]:
MODEL_ID = "HuggingFaceTB/SmolLM-135M-Instruct"

## 3Ô∏è‚É£ Export for Web (ONNX)

We'll export to ONNX format first, which works on almost all devices (even without WebGPU).

In [None]:
from lmfast.deployment import export_for_browser

print("üåç Exporting to ONNX for Browser...")
export_for_browser(
    model_path=MODEL_ID,
    output_dir="./my_web_ai",
    target="onnx",
    quantization="int8", # 8-bit quantization is a good balance
    create_demo=True
)

print("‚úÖ Export Complete!")

## 4Ô∏è‚É£ Explore the Generated App

LMFast automatically created a `demo/` folder with a working chat interface.

In [None]:
import os

print("üìÇ Generated Files:")
for root, dirs, files in os.walk("./my_web_ai"):
    level = root.replace("./my_web_ai", "").count(os.sep)
    indent = " " * 4 * (level)
    print(f"{indent}{os.path.basename(root)}/")
    subindent = " " * 4 * (level + 1)
    for f in files:
        print(f"{subindent}{f}")

## 5Ô∏è‚É£ Preview the App (Colab Trick)

We can't run a web server easily in Colab, but we can inspect the HTML code.

In [None]:
from IPython.display import HTML

# Display the structure, not the running app (limitations of iframe context)
with open("./my_web_ai/demo/index.html", "r") as f:
    html_content = f.read()

print("üìÑ HTML Source Preview (First 500 chars):")
print(html_content[:500] + "...")

## 6Ô∏è‚É£ Advanced: WebLLM (WebGPU)

For maximum performance, target WebLevel (MLC).
*Note: This requires the `mlc-llm` CLI tool which is a heavy install. We generate the config for you.*

In [None]:
print("üöÄ Generating WebLLM Config...")
export_for_browser(
    model_path=MODEL_ID,
    output_dir="./my_web_gpu",
    target="webllm",
    create_demo=True
)

print("‚úÖ WebLLM Config Ready!")
print("Check ./my_web_gpu/mlc/README.md for conversion steps.")

## 7Ô∏è‚É£ Deployment Instructions

To make your AI live:

1. **Download** the `./my_web_ai` folder.
2. **Upload** to GitHub.
3. **Enable GitHub Pages** in settings.

Or simply run locally:
```bash
cd my_web_ai/demo
python -m http.server 8000
```
Then visit `http://localhost:8000`

## üéâ Summary

You just built a **serverless AI application**!
- The model runs inside the user's device.
- Zero cloud costs.
- Complete privacy.

### Next Steps
- Customize the HTML/CSS in `demo/style.css`
- Add RAG support manually in JS
- Deploy to Netlify/Vercel