Skip to content

MohammedAly22/prompttrace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation


PromptTrace

PromptTrace

Stop losing your best prompts.

Lightweight prompt versioning & evaluation tracker for LLM engineers.
One decorator. Automatic versioning. Local SQLite. Beautiful dashboard.


PyPI Β  Python Β  License Β  Deps

Quick StartΒ Β β€’Β Β  FeaturesΒ Β β€’Β Β  DashboardΒ Β β€’Β Β  API ReferenceΒ Β β€’Β Β  Configuration



screencapture-127-0-0-1-8777-2026-03-26-06_37_49

The Problem

You iterate on prompts 50 times a day. You had a great system prompt last Tuesday that got 92% accuracy β€” but you lost it. You changed one word and everything broke, but you can't remember which word.

Your eval scores live in scattered notebooks and print() statements.

PromptTrace fixes this. β†’ pip install prompttrace β†’ done.


πŸ“¦ Installation

pip install prompttrace

Requirements: Python 3.9+ Β· Single dependency: rich


πŸš€ Quick Start

1 β†’ Decorate your LLM calls

from prompttrace import trace

@trace(experiment="my-chatbot", model="gpt-4o")
def generate(prompt, temperature=0.7):
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        temperature=temperature,
    )
    return response.choices[0].message.content

# Every call is now automatically tracked
generate("Explain quantum computing in one sentence.", temperature=0.3)
generate("Explain quantum computing in one sentence.", temperature=0.9)

2 β†’ Launch the dashboard

from prompttrace import dashboard

dashboard()  # β†’ http://127.0.0.1:8777

Or from the terminal:

prompttrace

That's it. Every prompt, output, latency, model, and generation parameter is logged and visualized.


✨ Features

Feature Description
🎯 @trace decorator Wrap any LLM call β€” auto-logs prompt, output, latency, params
πŸ“ log_call() function Manual logging for when you can't use a decorator
πŸ“Š Auto eval Pass an eval_fn to score outputs automatically
πŸ”€ Prompt versioning Every unique prompt gets a hash β€” see how changes affect results
βš–οΈ Side-by-side compare Diff two prompts word-by-word, see outputs and metrics
πŸ–₯️ Web dashboard Modern UI with animated charts, tables, filters β€” zero JS deps
πŸ”’ Local-only Everything in SQLite. No cloud. No API keys. No telemetry
🎨 Rich terminal logs Colorful, emoji-powered console output via rich
πŸ”„ Real-time updates Dashboard auto-refreshes every 2s β€” no manual reload
πŸ—‘οΈ Experiment management Delete experiments, filter dashboard by experiment
πŸ“€ CSV export One-click export of all traces for external analysis

πŸ–₯️ Dashboard

Launch with prompttrace or from prompttrace import dashboard; dashboard().

Three views:

View What it does
Dashboard Stats cards, latency chart, status donut, model usage β€” filterable by experiment
Traces Full table of all logged calls with search, filter, delete, and CSV export
Compare Select two prompts β†’ word-level diff highlighting with outputs side-by-side
screencapture-127-0-0-1-8777-2026-03-26-06_37_49
screencapture-127-0-0-1-8777-2026-03-26-06_39_01
screencapture-127-0-0-1-8777-2026-03-26-06_39_26

πŸ“– Usage Guide

The @trace Decorator

from prompttrace import trace

@trace(
    experiment="summarizer",       # Group related traces
    model="claude-3-sonnet",       # Model identifier
    tags=["prod", "v2"],           # Optional tags
    description="Q3 summary bot",  # Optional experiment description
)
def summarize(prompt, temperature=0.5, max_tokens=500):
    # Your LLM call here
    return llm_response

What gets logged automatically:

Prompt text Β· Output Β· Latency Β· Generation parameters (temperature, top_p, max_tokens, etc.) Β· Input variables Β· Status (success / error) Β· Error messages Β· Approximate token counts


Returning Metadata

Return a dict to include token counts:

@trace(experiment="qa", model="gpt-4o")
def answer(prompt):
    resp = openai.chat.completions.create(...)
    return {
        "output": resp.choices[0].message.content,
        "token_count_input": resp.usage.prompt_tokens,
        "token_count_output": resp.usage.completion_tokens,
    }

Auto Evaluation

Pass an eval_fn to score every output automatically:

def my_eval(prompt, output):
    """Return a dict of metric_name: score."""
    return {
        "relevance": compute_relevance(prompt, output),
        "length_ok": 1.0 if 50 < len(output) < 500 else 0.0,
        "has_citation": 1.0 if "[source]" in output else 0.0,
    }

@trace(experiment="research-bot", model="gpt-4o", eval_fn=my_eval)
def research(prompt):
    return call_llm(prompt)

Metrics appear in the terminal and the dashboard.


Manual Logging with log_call()

For cases where a decorator doesn't fit:

from prompttrace import log_call
import time

start = time.perf_counter()
output = my_llm_pipeline(prompt)
elapsed = (time.perf_counter() - start) * 1000

log_call(
    prompt="Translate to French: Hello world",
    output="Bonjour le monde",
    experiment="translation",
    model="gpt-4o-mini",
    generation_params={"temperature": 0.2},
    latency_ms=elapsed,
    token_count_input=8,
    token_count_output=5,
    tags=["translation", "french"],
    eval_metrics={"bleu": 0.95, "fluency": 0.88},
)

CLI

# Default (localhost:8777)
prompttrace

# Custom port
prompttrace --port 9000

# Accessible from network
prompttrace --host 0.0.0.0 --port 8777

πŸ“‹ API Reference

@trace(...)

Parameter Type Default Description
experiment str "default" Experiment name for grouping
model str "unknown" Model identifier
tags list[str] None Optional tags
eval_fn callable None fn(prompt, output) β†’ dict[str, float]
description str "" Experiment description

log_call(...)

Parameter Type Default Description
prompt str required The prompt template
output str required The LLM output
experiment str "default" Experiment name
model str "unknown" Model identifier
generation_params dict None e.g. {"temperature": 0.7}
input_variables dict None Template variables
latency_ms float 0 Response time in ms
token_count_input int 0 Input token count
token_count_output int 0 Output token count
status str "success" "success" or "error"
error_message str "" Error details
tags list[str] None Optional tags
eval_metrics dict None {"metric": score}

dashboard(host, port)

Launches the web UI. Blocks until Ctrl+C.


βš™οΈ Configuration

Database Location

By default, traces are stored in .prompttrace/traces.db in the current directory.

# Override via environment variable
export PROMPTTRACE_DB=/path/to/my/traces.db
# Override programmatically
from prompttrace import set_db_path
set_db_path("/path/to/my/traces.db")

πŸ“ Project Structure

your-project/
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ README.md
β”œβ”€β”€ example.py
└── prompttrace/
    β”œβ”€β”€ __init__.py          # Public API exports
    β”œβ”€β”€ core.py              # @trace decorator, log_call, dashboard launcher
    β”œβ”€β”€ db.py                # SQLite database layer
    β”œβ”€β”€ server.py            # Built-in HTTP server + JSON API
    β”œβ”€β”€ cli.py               # CLI entry point
    β”œβ”€β”€ dashboard.html       # Single-file web dashboard (zero JS deps)
    └── logo.png             # App logo

πŸ“„ License

MIT β€” use it however you want.





PromptTrace
Stop losing your best prompts.

About

Lightweight prompt versioning & evaluation tracker for LLM engineers.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors