🧠 Prompt Evaluation Lab

A lightweight platform for evaluating and comparing LLM prompts using objective metrics. Treat prompts as testable, version-controlled artifacts.

🚀 Quick Start

Option 1: Standalone Demo (No Setup)

Run without API keys or dependencies:

python demo_standalone.py

This evaluates 3 prompt versions using built-in heuristics and displays a leaderboard.

Option 2: Full Version (With OpenAI)

# 1. Setup environment
python -m venv venv
.\venv\Scripts\activate
pip install -r requirements.txt

# 2. Add API key
copy .env.example .env
# Edit .env: OPENAI_API_KEY=your_key_here

# 3. Run CLI or web dashboard
python src/runner.py           # CLI mode
python app.py                  # Web UI at http://localhost:5000

📊 What Gets Evaluated

Prompts are scored on:

Semantic Similarity - Closeness to reference answers
Accuracy - Factual correctness
Faithfulness - Avoids hallucinations
Completeness - Covers all key points

🏗️ Project Structure

Prompt_Eval_Lab/
├── demo_standalone.py       ⭐ Zero-dependency demo
├── app.py                   Web dashboard
├── datasets/qa_test.json    Sample Q&A dataset (15 questions)
├── prompts/                 prompt_v1, v2, v3 to compare
├── src/                     Evaluation engine
│   ├── evaluator.py         Core evaluation logic
│   ├── metrics.py           Scoring functions
│   └── runner.py            CLI runner
├── static/templates/        Web UI
└── tests/                   30+ pytest tests

🔧 Docker Deployment

# Quick start
docker-compose up -d

# Production
docker build -t prompt-eval:latest .
docker run -d -p 5000:5000 -e OPENAI_API_KEY=key prompt-eval:latest

✨ Features

Standalone Demo:

✅ Zero setup - works immediately
✅ No API costs
✅ Heuristic-based scoring

Full Version:

✅ Real LLM API integration
✅ Embeddings-based similarity
✅ GPT-4 as judge
✅ Web dashboard
✅ Rate limiting & CORS security

📝 Adding Your Prompts

Create prompts/prompt_v4.txt
Use placeholders: {question} and {context}
Run evaluation to compare: python demo_standalone.py

🧪 Testing & Development

# Install dev dependencies
pip install -r requirements-dev.txt

# Run tests
pytest tests/ -v --cov=src

# Linting
flake8 src/ app.py
black src/ app.py

🔒 Environment Variables

Variable	Required	Default	Description
`OPENAI_API_KEY`	No	-	Falls back to demo mode without it
`FLASK_DEBUG`	No	False	Enable debug mode
`FLASK_PORT`	No	5000	Server port
`CORS_ORIGINS`	No	*	Allowed origins

💡 Why This Matters

Most teams judge prompt quality subjectively. This platform provides objective, repeatable measurements to:

Track improvements over time
A/B test different approaches
Catch quality regressions
Make data-driven decisions

Think of it as unit tests for prompts.

📄 License

MIT - Use freely in your projects!

Try it now: python demo_standalone.py 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
datasets		datasets
prompts		prompts
results		results
src		src
static		static
templates		templates
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
demo_standalone.py		demo_standalone.py
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
validate_env.py		validate_env.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Prompt Evaluation Lab

🚀 Quick Start

Option 1: Standalone Demo (No Setup)

Option 2: Full Version (With OpenAI)

📊 What Gets Evaluated

🏗️ Project Structure

🔧 Docker Deployment

✨ Features

📝 Adding Your Prompts

🧪 Testing & Development

🔒 Environment Variables

💡 Why This Matters

📄 License

About

Uh oh!

Releases

Packages

Languages

invo-coder19/Prompt_Eval_Lab

Folders and files

Latest commit

History

Repository files navigation

🧠 Prompt Evaluation Lab

🚀 Quick Start

Option 1: Standalone Demo (No Setup)

Option 2: Full Version (With OpenAI)

📊 What Gets Evaluated

🏗️ Project Structure

🔧 Docker Deployment

✨ Features

📝 Adding Your Prompts

🧪 Testing & Development

🔒 Environment Variables

💡 Why This Matters

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages