A lightweight research platform for running trust-calibration A/B experiments. Participants see an AI assistant recommendation under one of several condition variants (name, tone, confidence framing) and decide whether to accept it. All behavioral events and latencies are logged for offline analysis.
- Serves a browser-based task from
web/ - Deterministically assigns each participant to a condition via SHA-256 hash
- Accepts
POST /api/eventspayloads, validates them, and persists to JSONL+CSV (or SQLite) - Exposes
/api/conditions,/api/assign,/api/health, and/api/metrics - Per-IP sliding-window rate limiting and basic server metrics out of the box
src/trustlab/
├── api/ HTTP handler + middleware (CORS, rate-limit, metrics)
├── config/ Environment-driven settings
├── core/ Domain types: EventRecord, ParticipantSession, validation
├── services/ ConditionAssignmentService, SessionRegistry
├── storage/ EventStore ABC + FileEventStore / SQLiteEventStore
└── utils/ condition_loader
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env # edit if neededpython app.py --port 8003Open http://127.0.0.1:8003. For a fixed participant (reproducible condition assignment):
http://127.0.0.1:8003/?participant_id=P-DEMO0001
docker compose uppytest- No framework: stdlib
ThreadingHTTPServerkeeps the dependency list empty at runtime and is sufficient for pilot-scale load. - Deterministic assignment: SHA-256 of
participant_idgives stable, reproducible condition mapping without a database. - Dual-format logging: JSONL is the primary store (easy to stream); CSV is written in parallel for quick spreadsheet access.
- SQLite alternative: set
STORAGE_BACKEND=sqlitefor stronger durability guarantees. - Timestamp validation: server rejects events whose client timestamp drifts more than
TIMESTAMP_TOLERANCE_SECONDSfrom server time, catching stale replays.
python analysis/analyze_events.py