✨ Komparator is a local app for comparing HTML results from different LLMs on the same prompt.
It helps you collect outputs, review them visually, compare them side by side, and rate which model did better.
🌍 Try it here: https://llm-komparator.vercel.app/
- 🧪 Create experiments for a specific task or prompt
- 🤖 Save results from different models and providers
- 👀 Preview generated HTML directly inside the app
- ⚖️ Compare two results side by side
- ⭐ Rate results on a 1-10 scale
- 🗂️ Organize experiments by category
- 🧭 Keep your own working model list separate from the reference catalog
- 📥 Load popular models quickly or import updated model catalogs
- ✏️ Fix a saved result if it was attached to the wrong model
- 📊 View aggregate stats across your workspace
- 💾 Export and restore your local data
- Create an experiment.
- Add or edit the prompt for that experiment.
- Copy the prepared prompt into one or more LLM chats.
- Choose the model you used from your own model list or from the catalog.
- Paste the returned HTML result back into Komparator.
- Review the render, compare outputs, and assign ratings.
- If you picked the wrong model by mistake, update it later from the saved result.
Your main workspace.
Here you can:
- browse all experiments
- search and sort them
- group them by category
- open an experiment to review results
Your model workspace and the reference catalog live together here.
Use this section to:
- manage your own working list of models
- load popular models into your workspace
- browse the catalog separately from your own models
- keep provider names and colors consistent
- review possible duplicate models
My models is your actual working list.
Catalog is a reference source you can use to add new models without cluttering your existing workspace.
Reusable prompt wrappers.
Wrappers help you standardize how prompts are sent to different LLMs. For example, you can enforce a specific output format or add shared instructions around the main task.
Overview of your workspace performance.
You can see:
- how many experiments and results you have
- average ratings
- model leaderboard
- provider breakdown
- category and history patterns
Used for backup and restore.
This is the section to use when you want to export your local data or restore it later.
🟡 Komparator stores data locally in your browser.
That means:
- your experiments, models, wrappers, results, and ratings stay on the current browser profile and device
- data is not automatically synced to another computer
- opening the app in a different browser may show an empty workspace
- clearing browser storage may remove your saved data
If the data matters, use backup export regularly.
The model catalog can be updated inside the app, but your own saved models and experiment results stay separate from that catalog.
Go to Settings and export your workspace as a ZIP backup.
This is the safest way to keep a copy of:
- experiments
- prompt versions
- models and providers
- wrappers
- saved HTML results
- ratings and notes
You can restore a previously exported ZIP backup from Settings.
Important:
- restore replaces the current local workspace
- use it carefully if you already have data in the current browser
- 🌐 The app is meant for HTML-based comparison workflows
- 🧾 Results are stored as raw HTML so you can inspect and compare actual renders
- 🗃️ The built-in catalog helps you find common models faster, but you can always keep custom models in your own list
- 🔁 If you linked a result to the wrong model, you can correct it later from the result editor
- 🔒 The project works without a backend; your workspace stays local unless you export it
- 🖥️ Best experience is on desktop or laptop because preview and side-by-side comparison need space
- Create one experiment per task
- Keep prompt versions when you iterate on wording
- Save multiple model attempts when stability matters
- Rate results consistently, otherwise stats become less useful
- Export backups before clearing browser data, changing machines, or doing browser cleanup
Komparator is a private local comparison tool for HTML outputs from LLMs.
It is best when you want a repeatable workflow:
- one task
- multiple model outputs
- visual review
- structured comparison
- local ownership of data
