"Point, Click, Test" โ Make AI agent testing as intuitive as Postman made API testing
Quick Start โข Features โข Documentation โข Roadmap โข Contributing
Sentinel is a visual-first testing and evaluation platform for AI agents, designed for frontier AI labs, research teams, and agent builders. Build tests with an intuitive drag-and-drop canvas or write declarative YAML specsโyour choice.
graph LR
A[๐ Click Components] --> B[๐จ Visual Canvas]
B --> C[๐ Auto-Generate YAML]
C --> D[โ
Run Tests]
D --> E[๐ Compare Results]
style B fill:#6366f1,stroke:#4f46e5,stroke-width:2px,color:#fff
|
Build tests by clicking, not coding. No YAML knowledge required. |
Visual changes = clean YAML diffs. Perfect for version control and CI/CD. |
Built for frontier AI labs with deterministic, repeatable testing. |
Target Positioning: "Postman for AI Agents" with research-grade rigor and visual-first design
|
React 19 โข React Flow 12.3 โข Tauri 2.0
|
FastAPI โข Pydantic โข Python 3.13
|
| Category | Metric | Status |
|---|---|---|
| Frontend Unit Tests | 473 tests across 27 test files | โ 100% passing |
| Frontend Coverage | Component, hooks, services, stores | โ 50%+ coverage |
| Backend Tests | pytest suite with comprehensive coverage | โ 85%+ coverage |
| TypeScript Strict Mode | 0 errors, only 4 any usages |
โ Excellent type safety |
| Code Quality | Black, Ruff, MyPy, ESLint | โ All checks pass |
| Total Codebase | 57,581 LOC (project code only) | โ Well-documented (47% docs) |
๐ฆ Tech Stack Details
- Framework: React 19 + Vite 6.0
- Desktop: Tauri 2.0 (Rust-powered desktop app)
- Canvas: React Flow 12.3 (@xyflow/react)
- State: Zustand 5.0
- Styling: TailwindCSS + shadcn/ui
- Testing: Vitest + React Testing Library
- Type Safety: TypeScript 5.7 (strict mode, 0 errors)
- Icons: lucide-react
- API: FastAPI 0.115+
- Schema: Pydantic v2 (type-safe validation)
- Database: SQLite (local) / PostgreSQL (server)
- Testing: pytest + pytest-cov
- Code Quality: Black (line-length: 100), Ruff, MyPy
- Python: 3.13+
- Anthropic API: Claude 3.5 Sonnet, Claude 3 Opus (โฅ0.43.1)
- OpenAI API: GPT-5.1 (default), GPT-5 Pro, GPT-5 Mini (โฅ1.59.6)
- Future: Amazon Bedrock, HuggingFace, Ollama
# Clone repository
git clone https://github.com/navam-io/sentinel.git
cd sentinel/frontend
# Install dependencies
npm install
# Launch desktop app (hot reload enabled)
npm run tauri:dev๐ That's it! The visual canvas opens with:
- Component palette on the left
- Interactive canvas in the center
- Library tab with 16+ templates
- Test suite organizer
Try it now:
- Click Library tab โ Browse 16 built-in templates
- Click Load on any template โ Canvas populates automatically
- Click Canvas tab โ See visual node representation
- Click Test tab โ View auto-generated YAML
- Click Run Test โ Execute and see live results!
Option 2: Development Mode (Browser Only)
cd frontend
npm install
npm run dev # Opens http://localhost:1420Runs Vite dev server without Tauri. Faster for UI-only development.
Option 3: Backend API (Python)
# Setup Python environment
cd backend
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -e ".[dev]"
# Run tests to verify
pytest -v # โ
All tests pass
# Start API server (optional)
uvicorn main:app --reload
# Visit http://localhost:8000/docs for API documentation|
1. Browse Library
|
2. Load Template
|
3. Customize
|
4. Run & Validate
|
# Auto-generated from visual canvas
name: "Geography Quiz"
version: "1.0"
description: "Test factual knowledge about world capitals"
category: "qa"
model:
provider: "openai"
model: "gpt-5.1"
temperature: 0.7
max_tokens: 1000
inputs:
- type: "input"
query: "What is the capital of France?"
assertions:
- type: "must_contain"
value: "Paris"
- type: "output_type"
value: "text"
- type: "max_latency_ms"
value: 2000
tags:
- "canvas-generated"
- "geography"
- "qa-test"|
Blue โข Knowledge validation Test factual knowledge and basic reasoning category: "qa"
assertions:
- must_contain: "Paris"
- max_latency_ms: 2000Use Cases:
|
Purple โข Syntax validation Validate code structure and quality category: "code-generation"
assertions:
- regex_match: "def\\s+\\w+"
- output_type: "code"Use Cases:
|
Green โข Web automation Test browser interactions and scraping category: "browser-agents"
assertions:
- must_call_tool: ["browser"]
- output_type: "json"Use Cases:
|
|
Orange โข Conversations Multi-step dialogue testing category: "multi-turn"
assertions:
- must_contain: "context"
- min_tokens: 50Use Cases:
|
Cyan โข Agentic workflows Test LangGraph state machines category: "langgraph"
assertions:
- must_call_tool: ["state"]
- output_type: "json"Use Cases:
|
Red โข Security testing Security and safety validation category: "safety"
assertions:
- must_not_contain: "sensitive"
- output_type: "text"Use Cases:
|
๐ See All 12 Categories
| Category | Color | Purpose | Example Use Cases |
|---|---|---|---|
| Q&A | Blue | Knowledge & reasoning tests | Fact-checking, trivia, simple Q&A |
| Code Generation | Purple | Code quality validation | Syntax checks, function detection, code structure |
| Browser Agents | Green | Web automation testing | Scraping, UI testing, browser tools |
| Multi-turn | Orange | Conversation flows | Dialogue testing, context retention |
| LangGraph | Cyan | Agentic workflows | State machines, workflow orchestration |
| Safety | Red | Security & safety | Prompt injection, content filtering |
| Data Analysis | Indigo | Data processing tasks | CSV parsing, data transformation |
| Reasoning | Pink | Logic & problem-solving | Chain-of-thought, math, puzzles |
| Tool Use | Yellow | Function calling tests | API calls, tool invocation |
| API Testing | Teal | REST endpoint validation | HTTP requests, API responses |
| UI Testing | Lime | Visual & interaction tests | Component rendering, user flows |
| Regression | Amber | Consistency testing | Version comparison, behavior stability |
๐ See All 8 Assertion Types
| Type | Purpose | Example | Use Case |
|---|---|---|---|
must_contain |
Text presence check | "Paris" |
Verify specific content appears |
must_not_contain |
Text absence check | "London" |
Ensure unwanted content absent |
regex_match |
Pattern matching | "def\\s+\\w+" |
Validate code/format structure |
must_call_tool |
Tool verification | ["browser", "calculator"] |
Verify agent tool usage |
output_type |
Format validation | "json", "code", "text" |
Enforce output format |
max_latency_ms |
Performance check | 2000 |
Ensure response time < 2s |
min_tokens |
Min output length | 50 |
Require minimum detail |
max_tokens |
Max output length | 500 |
Enforce conciseness |
|
|
v0.22.0 - Unified Library Tab & 12-Category System (Nov 23, 2025) โญ Latest
Major UX Improvements & Template Expansion
- โ Unified Library Tab combining templates and user tests
- โ 12-Category Classification System (Q&A, Code Gen, Browser, Multi-turn, Safety, etc.)
- โ 10 New Templates (API Testing, Data Analysis, LangGraph, Safety, Reasoning, etc.)
- โ Enhanced search and filtering
- โ Category-based organization with color coding
- โ Tab restructure: Canvas, Test, Suite, Library
- โ Integrated run section in Test tab (collapsible)
- โ State persistence (run details, suite expansion)
- โ Refined LibraryCard component with icons
- โ Better visual hierarchy and information design
- โ Database schema updates (category, is_template fields)
- โ Updated API endpoints for category management
- โ Test renaming functionality
v0.21.0 - Test Suite Organizer (Nov 22, 2025)
- โ Test suite organizer with folders
- โ Suite search and filtering
- โ Drag-and-drop test organization
- โ Suite export/import
v0.20.0 - Enhanced Visual Canvas (Nov 2025)
- โ Test renaming functionality
- โ Improved drag-drop palette
- โ Auto-save improvements
- โ Better YAML synchronization
gantt
title Sentinel 2026 Roadmap
dateFormat YYYY-MM
section Phase 4
Model Execution :2026-01, 2m
Result Storage :2026-02, 1m
section Phase 5
Advanced Providers :2026-03, 2m
LangGraph Support :2026-05, 1m
section Phase 6
Analytics & CI/CD :2026-06, 3m
|
v0.23.0 - Execution Q1 2026
|
v0.24.0 - Providers Q1 2026
|
v0.25.0 - LangGraph Q2 2026
|
๐ฎ Future Features (v0.26.0+)
Advanced Features (Q2-Q3 2026)
- v0.26.0: AI-assisted test generation
- v0.27.0: Visual assertion builder enhancements
- v0.28.0: Regression engine & comparison view
Enterprise Features (Q3-Q4 2026)
- v0.29.0: Collaborative workspaces
- v0.30.0: Advanced safety scenarios
- v0.31.0: Dashboard & analytics platform
- v0.32.0: CI/CD integration & automation
|
Visual-First Interface
|
DSL Mode + Programmatic
|
| Metric | Value | Metric | Value |
|---|---|---|---|
| Version | 0.22.0 | Release Date | Nov 23, 2025 |
| Total Tests | 473 passing | Test Pass Rate | 100% โ |
| Frontend Tests | 27 files, 473 tests | Backend Tests | 6 files, comprehensive coverage |
| Frontend Coverage | 50%+ | Backend Coverage | 85%+ |
| Node Types | 6+ production | Templates | 16 ready-to-use |
| Categories | 12 classifications | Assertion Types | 8 validators |
| Frontend LOC | 13,536 (100 files) | Backend LOC | 3,234 (34 files) |
| TypeScript Errors | 0 (strict mode) | TypeScript any |
Only 4 instances |
| Components | 68 React components | Documentation | 27,157 LOC (47% of codebase) |
Tech Metrics:
- Build Time: ~3s (Vite HMR)
- Desktop App: Tauri 2.0 (lightweight, fast startup)
- Test Execution: 2.24s for 473 unit tests
- Type Safety: TypeScript strict mode, Pydantic v2
Code Quality:
- โ Frontend: ESLint, TypeScript strict (0 errors)
- โ Backend: Black (line-length: 100), Ruff, MyPy
- โ Codebase: 57,581 LOC (project code only, excluding dependencies)
- โ Documentation: Exceptional (47.2% of codebase is documentation)
We welcome contributions! Sentinel is in active development and we'd love your help.
|
Found a bug? Let us know! |
Have ideas? Start a discussion! |
Help make docs better! |
# Backend Development
cd backend
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -e ".[dev]"
# Run backend tests
pytest -v --cov=backend
black . && ruff check . && mypy .
# Frontend Development
cd frontend
npm install
# Run frontend tests
npm test # Unit tests (Vitest) - 473 tests
npm run lint # ESLint
npm run type-check # TypeScript (0 errors)
# Run dev server
npm run dev # Browser only
npm run tauri:dev # Desktop app (recommended)
# Code quality checks
npm run lint && npm run type-check๐ Contributing Guidelines
How to Contribute:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Code Style:
- Frontend: ESLint + TypeScript strict mode (0 errors required)
- Backend: Black (line-length: 100) + Ruff + MyPy
- Tests: Required for all features (unit tests where applicable)
- Commits: Conventional Commits format (
feat:,fix:,docs:, etc.)
Testing Requirements:
- Unit tests for all new components/functions
- 100% test pass rate before merge
- No TypeScript errors allowed
- Maintain or improve test coverage
Review Process:
- All PRs require 1 approval
- CI/CD checks must pass (tests, linting, type checking)
- Documentation updates for new features
|
GUI is primary interface DSL for interoperability |
Desktop-first architecture Self-hosted, air-gapped support |
Deterministic & reproducible Built for frontier AI labs |
No coding required Everyone can test agents |
Core Philosophy: "Point, Click, Test"
Make AI agent testing as intuitive as Postman made API testing, as visual as Langflow made LLM workflows, and as powerful as LangSmith made observability.
Sentinel's design is inspired by industry-leading tools:
|
Langflow
|
n8n
|
Postman
|
Playwright
|
LangSmith
|
Special Thanks:
- React Flow team for production-ready canvas library
- Tauri team for lightweight desktop framework
- shadcn/ui for beautiful component library
- Open source community for inspiration and support
MIT License - see LICENSE file for details.
Copyright (c) 2025 Navam
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
- Email: hello@navam.io
- Twitter: @navam_io
- GitHub: navam-io/sentinel
- Website: navam.io
Version: 0.22.0 (Released November 23, 2025)
Status: Unified Library Tab with 12-Category System โ
Next Milestone: v0.23.0 - Model Execution & Result Storage (Q1 2026)
Production Ready Features:
- โ Visual canvas with 6+ node types
- โ Real-time Visual โ YAML synchronization
- โ Library with 16 categorized templates
- โ 12-category classification system
- โ Test suite organizer
- โ Desktop app (Tauri 2.0)
- โ 473 tests with 100% pass rate
- โ Comprehensive documentation (47% of codebase)
Built with โค๏ธ by the Navam Team for frontier AI labs, researchers, and agent builders
โญ Star this repo if you find it helpful!