EvalDesk

Test AI agents without writing code.

Get Started · Features · Use Cases · How It Works · Tech Stack

Open-source evaluation tool that lets domain experts — doctors, lawyers, teachers, compliance officers — test and rate AI agent answers. No JSON. No Python scripts. No engineering required.

Why EvalDesk?

Current AI evaluation tools require engineers to write code. No-code alternatives charge $500+/month and lock you in. EvalDesk is the only tool that is open source + self-hostable + no-code.

Quick Start

git clone https://github.com/ramandagar/EvalDesk.git
cd EvalDesk
docker compose up -d

Open http://localhost:3000 — that's it. No cloud dependency. Your data stays on your server.

Features

Plain English test cases — Write questions and expected answers in normal text
One-click agent testing — Paste your agent URL, hit Run
Human rating interface — Pass / Fail / Partial with keyboard shortcuts (1 / 2 / 3)
Quality dashboard — Track pass rate over time, spot regressions
LLM-as-Judge — Optional auto-scoring with GPT-4 or any LLM
Team collaboration — Invite domain experts by email, no GitHub account needed
Self-hostable — One Docker command, your infrastructure, your data
CI/CD integration — GitHub Action included, fail PRs below your quality threshold

Use Cases

Who	What they test
Doctors	Medical triage bots, diagnostic assistants
Lawyers	Contract review agents, legal research tools
Teachers	Educational AI tutors, grading assistants
Compliance	Banking chatbots, insurance claim processors
Product Managers	Customer support bots, FAQ agents
QA Teams	Regression testing for AI agent updates

How It Works

1. Create a project     → Name it, paste your agent's endpoint URL
2. Write test questions → Type what you'd ask the AI in plain English
3. Run & rate           → Each answer gets Pass/Fail/Partial with keyboard shortcuts
4. Track quality        → See pass rate trends, catch regressions before production

Tech Stack

Layer	Tech
Frontend	Next.js 15, React 19, Tailwind CSS
Backend	Next.js API routes, Drizzle ORM
Database	SQLite (self-hosted), Postgres (cloud)
Auth	NextAuth.js
Deploy	Docker, docker-compose
CI/CD	GitHub Actions

Development

npm install
cp .env.example .env.local
npx drizzle-kit generate && npx drizzle-kit migrate
npm run dev

Comparison

Feature	EvalDesk	DeepEval	Langfuse	Confident AI
Open source	Yes	Yes	Yes	No
Self-hostable	Yes	Partial	Yes	No
No-code UI	Yes	No	Partial	Yes
Price	Free	Free	Free	$500+/mo

Contributing

PRs welcome. Fork, branch, open a pull request.

License

MIT — use it, fork it, modify it, self-host it. No strings attached.

Built for the people who actually know if an AI answer is correct.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
design-md		design-md
drizzle		drizzle
marketing		marketing
sdk		sdk
src		src
.env.example		.env.example
.gitignore		.gitignore
DESIGN.md		DESIGN.md
Dockerfile		Dockerfile
README.md		README.md
action.yml		action.yml
agent_eval_build_plan.html		agent_eval_build_plan.html
docker-compose.yml		docker-compose.yml
drizzle.config.ts		drizzle.config.ts
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EvalDesk

Quick Start

Features

Use Cases

How It Works

Tech Stack

Development

Comparison

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EvalDesk

Quick Start

Features

Use Cases

How It Works

Tech Stack

Development

Comparison

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages