Sentinel

Visual-First AI Agent Testing Platform

"Point, Click, Test" — Make AI agent testing as intuitive as Postman made API testing

Quick Start • Features • Documentation • Roadmap • Contributing

🎯 What is Sentinel?

Sentinel is a visual-first testing and evaluation platform for AI agents, designed for frontier AI labs, research teams, and agent builders. Build tests with an intuitive drag-and-drop canvas or write declarative YAML specs—your choice.

graph LR
    A[👆 Click Components] --> B[🎨 Visual Canvas]
    B --> C[📝 Auto-Generate YAML]
    C --> D[✅ Run Tests]
    D --> E[📊 Compare Results]
    style B fill:#6366f1,stroke:#4f46e5,stroke-width:2px,color:#fff

Why Sentinel?

🎨 Visual First

Build tests by clicking, not coding. No YAML knowledge required.

🔄 Git Friendly

Visual changes = clean YAML diffs. Perfect for version control and CI/CD.

🧪 Research Grade

Built for frontier AI labs with deterministic, repeatable testing.

Target Positioning: "Postman for AI Agents" with research-grade rigor and visual-first design

✨ Key Features

Current Release (v0.22.0 - November 23, 2025)

🎨 Visual Canvas & Library

^{React 19 • React Flow 12.3 • Tauri 2.0}

✅ Unified Library Tab: Templates + user tests in one interface
✅ 12-Category System: Q&A, Code Gen, Browser Agents, Multi-turn, Safety, and more
✅ 16 Built-in Templates: Production-ready test templates
✅ 6+ Node Types: Input, Model, Assertion, Tool, System, Output
✅ Real-time YAML generation from canvas
✅ Smart Positioning with auto-layout
✅ Desktop App (Tauri 2.0) for local-first workflow
✅ 473 Unit Tests (100% pass rate)

🔧 Type-Safe Backend & Execution

^{FastAPI • Pydantic • Python 3.13}

✅ 8 Assertion Types (text, regex, tools, format, perf)
✅ Round-Trip Conversion (Visual ↔ YAML, zero data loss)
✅ Schema Validation with clear error messages
✅ Backend Tests with comprehensive coverage
✅ Model Providers: Anthropic Claude, OpenAI GPT-5
✅ FastAPI Backend with SQLite/PostgreSQL support
✅ Type Safety: Black, Ruff, MyPy, TypeScript strict mode
✅ Test Suites with folder organization

🧪 Test Coverage & Quality

Category	Metric	Status
Frontend Unit Tests	473 tests across 27 test files	✅ 100% passing
Frontend Coverage	Component, hooks, services, stores	✅ 50%+ coverage
Backend Tests	pytest suite with comprehensive coverage	✅ 85%+ coverage
TypeScript Strict Mode	0 errors, only 4 `any` usages	✅ Excellent type safety
Code Quality	Black, Ruff, MyPy, ESLint	✅ All checks pass
Total Codebase	57,581 LOC (project code only)	✅ Well-documented (47% docs)

📦 Tech Stack Details

Frontend

Framework: React 19 + Vite 6.0
Desktop: Tauri 2.0 (Rust-powered desktop app)
Canvas: React Flow 12.3 (@xyflow/react)
State: Zustand 5.0
Styling: TailwindCSS + shadcn/ui
Testing: Vitest + React Testing Library
Type Safety: TypeScript 5.7 (strict mode, 0 errors)
Icons: lucide-react

Backend

API: FastAPI 0.115+
Schema: Pydantic v2 (type-safe validation)
Database: SQLite (local) / PostgreSQL (server)
Testing: pytest + pytest-cov
Code Quality: Black (line-length: 100), Ruff, MyPy
Python: 3.13+

Model Providers (Pluggable)

Anthropic API: Claude 3.5 Sonnet, Claude 3 Opus (≥0.43.1)
OpenAI API: GPT-5.1 (default), GPT-5 Pro, GPT-5 Mini (≥1.59.6)
Future: Amazon Bedrock, HuggingFace, Ollama

🚀 Quick Start

Option 1: Visual Canvas (Desktop App) — Recommended ⭐

# Clone repository
git clone https://github.com/navam-io/sentinel.git
cd sentinel/frontend

# Install dependencies
npm install

# Launch desktop app (hot reload enabled)
npm run tauri:dev

🎉 That's it! The visual canvas opens with:

Component palette on the left
Interactive canvas in the center
Library tab with 16+ templates
Test suite organizer

Try it now:

Click Library tab → Browse 16 built-in templates
Click Load on any template → Canvas populates automatically
Click Canvas tab → See visual node representation
Click Test tab → View auto-generated YAML
Click Run Test → Execute and see live results!

Option 2: Development Mode (Browser Only)

cd frontend
npm install
npm run dev  # Opens http://localhost:1420

Runs Vite dev server without Tauri. Faster for UI-only development.

Option 3: Backend API (Python)

# Setup Python environment
cd backend
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -e ".[dev]"

# Run tests to verify
pytest -v  # ✅ All tests pass

# Start API server (optional)
uvicorn main:app --reload
# Visit http://localhost:8000/docs for API documentation

🎬 Visual Canvas Demo

Building Your First Test in 60 Seconds

1. Browse Library

📚 16+ templates
12 categories
Search & filter

2. Load Template

👁️ Click "Load"
Canvas auto-populates
Nodes connected

3. Customize

✏️ Edit node values
Add/remove nodes
Real-time YAML sync

4. Run & Validate

▶️ Execute test
Live results
Pass/fail indicators

Generated YAML Example

# Auto-generated from visual canvas
name: "Geography Quiz"
version: "1.0"
description: "Test factual knowledge about world capitals"
category: "qa"

model:
  provider: "openai"
  model: "gpt-5.1"
  temperature: 0.7
  max_tokens: 1000

inputs:
  - type: "input"
    query: "What is the capital of France?"

assertions:
  - type: "must_contain"
    value: "Paris"
  - type: "output_type"
    value: "text"
  - type: "max_latency_ms"
    value: 2000

tags:
  - "canvas-generated"
  - "geography"
  - "qa-test"

💡 12-Category Test Classification

📝 Q&A Testing

^{Blue • Knowledge validation}

Test factual knowledge and basic reasoning

category: "qa"
assertions:
  - must_contain: "Paris"
  - max_latency_ms: 2000

Use Cases:

Knowledge validation
Fact-checking
Simple reasoning

💻 Code Generation

^{Purple • Syntax validation}

Validate code structure and quality

category: "code-generation"
assertions:
  - regex_match: "def\\s+\\w+"
  - output_type: "code"

Use Cases:

Code quality checks
Syntax validation
Function detection

🌐 Browser Agents

^{Green • Web automation}

Test browser interactions and scraping

category: "browser-agents"
assertions:
  - must_call_tool: ["browser"]
  - output_type: "json"

Use Cases:

Web scraping tests
UI automation
Browser tool usage

🔄 Multi-turn

^{Orange • Conversations}

Multi-step dialogue testing

category: "multi-turn"
assertions:
  - must_contain: "context"
  - min_tokens: 50

Use Cases:

Conversation flows
Context retention
Multi-step reasoning

🔗 LangGraph

^{Cyan • Agentic workflows}

Test LangGraph state machines

category: "langgraph"
assertions:
  - must_call_tool: ["state"]
  - output_type: "json"

Use Cases:

Workflow testing
State management
Agent coordination

🛡️ Safety

^{Red • Security testing}

Security and safety validation

category: "safety"
assertions:
  - must_not_contain: "sensitive"
  - output_type: "text"

Use Cases:

Prompt injection tests
Content safety
Security validation

📚 See All 12 Categories

Category	Color	Purpose	Example Use Cases
Q&A	Blue	Knowledge & reasoning tests	Fact-checking, trivia, simple Q&A
Code Generation	Purple	Code quality validation	Syntax checks, function detection, code structure
Browser Agents	Green	Web automation testing	Scraping, UI testing, browser tools
Multi-turn	Orange	Conversation flows	Dialogue testing, context retention
LangGraph	Cyan	Agentic workflows	State machines, workflow orchestration
Safety	Red	Security & safety	Prompt injection, content filtering
Data Analysis	Indigo	Data processing tasks	CSV parsing, data transformation
Reasoning	Pink	Logic & problem-solving	Chain-of-thought, math, puzzles
Tool Use	Yellow	Function calling tests	API calls, tool invocation
API Testing	Teal	REST endpoint validation	HTTP requests, API responses
UI Testing	Lime	Visual & interaction tests	Component rendering, user flows
Regression	Amber	Consistency testing	Version comparison, behavior stability

📚 See All 8 Assertion Types

Type	Purpose	Example	Use Case
`must_contain`	Text presence check	`"Paris"`	Verify specific content appears
`must_not_contain`	Text absence check	`"London"`	Ensure unwanted content absent
`regex_match`	Pattern matching	`"def\\s+\\w+"`	Validate code/format structure
`must_call_tool`	Tool verification	`["browser", "calculator"]`	Verify agent tool usage
`output_type`	Format validation	`"json"`, `"code"`, `"text"`	Enforce output format
`max_latency_ms`	Performance check	`2000`	Ensure response time < 2s
`min_tokens`	Min output length	`50`	Require minimum detail
`max_tokens`	Max output length	`500`	Enforce conciseness

📖 Documentation

🚀 Getting Started

Installation Guide - Setup in 5 minutes
Visual Canvas Guide - Build tests visually
Quick Start Tutorial - Your first test
Code Metrics Report - Comprehensive codebase analysis

📚 Examples & Templates

16 Built-in Templates - Production-ready examples
Example Walkthroughs - Detailed guides
12 Test Categories - Classification guide

📘 API Reference

DSL Reference - Complete YAML spec
Python API Docs - Backend API
Schema Reference - Pydantic models

🔧 Advanced

Best Practices - Writing effective tests
CI/CD Integration - Automation guide
Architecture Guide - Technical deep-dive

🗺️ Roadmap

✅ Recently Released

v0.22.0 - Unified Library Tab & 12-Category System (Nov 23, 2025) ⭐ Latest

Major UX Improvements & Template Expansion

Library Tab & Category System

✅ Unified Library Tab combining templates and user tests
✅ 12-Category Classification System (Q&A, Code Gen, Browser, Multi-turn, Safety, etc.)
✅ 10 New Templates (API Testing, Data Analysis, LangGraph, Safety, Reasoning, etc.)
✅ Enhanced search and filtering
✅ Category-based organization with color coding

UI/UX Improvements

✅ Tab restructure: Canvas, Test, Suite, Library
✅ Integrated run section in Test tab (collapsible)
✅ State persistence (run details, suite expansion)
✅ Refined LibraryCard component with icons
✅ Better visual hierarchy and information design

Backend Enhancements

✅ Database schema updates (category, is_template fields)
✅ Updated API endpoints for category management
✅ Test renaming functionality

v0.21.0 - Test Suite Organizer (Nov 22, 2025)

✅ Test suite organizer with folders
✅ Suite search and filtering
✅ Drag-and-drop test organization
✅ Suite export/import

v0.20.0 - Enhanced Visual Canvas (Nov 2025)

✅ Test renaming functionality
✅ Improved drag-drop palette
✅ Auto-save improvements
✅ Better YAML synchronization

🚧 In Progress

gantt
    title Sentinel 2026 Roadmap
    dateFormat  YYYY-MM
    section Phase 4
    Model Execution         :2026-01, 2m
    Result Storage          :2026-02, 1m
    section Phase 5
    Advanced Providers      :2026-03, 2m
    LangGraph Support       :2026-05, 1m
    section Phase 6
    Analytics & CI/CD       :2026-06, 3m

v0.23.0 - Execution ^{Q1 2026}

Live execution dashboard
Result storage (SQLite)
Test run history
Metrics & analytics
Performance tracking

v0.24.0 - Providers ^{Q1 2026}

Bedrock integration
HuggingFace support
Ollama local models
Provider comparison
Cost tracking

v0.25.0 - LangGraph ^{Q2 2026}

LangGraph framework support
State machine testing
Multi-agent workflows
Workflow visualization
Debug tools

🔮 Future Features (v0.26.0+)

Advanced Features (Q2-Q3 2026)

v0.26.0: AI-assisted test generation
v0.27.0: Visual assertion builder enhancements
v0.28.0: Regression engine & comparison view

Enterprise Features (Q3-Q4 2026)

v0.29.0: Collaborative workspaces
v0.30.0: Advanced safety scenarios
v0.31.0: Dashboard & analytics platform
v0.32.0: CI/CD integration & automation

→ Full roadmap with detailed specs

👥 Who Uses Sentinel?

🎯 Primary Users

Visual-First Interface

📊 Product Managers - Validate agents without code
🔬 Research Scientists - Build evaluation suites visually
🧪 QA Engineers - Create and debug tests with clicks
🛡️ Safety Teams - Collaborative safety testing
🏢 Frontier Labs - Test model releases
🧬 Neo-labs - Agent-focused research

⚡ Advanced Users

DSL Mode + Programmatic

💻 Model Engineers - Direct YAML editing, programmatic tests
⚙️ DevOps Engineers - CI/CD integration, automation
🏗️ Infrastructure Teams - Enterprise testing at scale
🤖 Agent Builders - Production validation
🔧 Framework Developers - Integration testing
📈 MLOps Teams - Regression detection

📊 Project Stats

Metric	Value	Metric	Value
Version	0.22.0	Release Date	Nov 23, 2025
Total Tests	473 passing	Test Pass Rate	100% ✅
Frontend Tests	27 files, 473 tests	Backend Tests	6 files, comprehensive coverage
Frontend Coverage	50%+	Backend Coverage	85%+
Node Types	6+ production	Templates	16 ready-to-use
Categories	12 classifications	Assertion Types	8 validators
Frontend LOC	13,536 (100 files)	Backend LOC	3,234 (34 files)
TypeScript Errors	0 (strict mode)	TypeScript `any`	Only 4 instances
Components	68 React components	Documentation	27,157 LOC (47% of codebase)

Tech Metrics:

Build Time: ~3s (Vite HMR)
Desktop App: Tauri 2.0 (lightweight, fast startup)
Test Execution: 2.24s for 473 unit tests
Type Safety: TypeScript strict mode, Pydantic v2

Code Quality:

✅ Frontend: ESLint, TypeScript strict (0 errors)
✅ Backend: Black (line-length: 100), Ruff, MyPy
✅ Codebase: 57,581 LOC (project code only, excluding dependencies)
✅ Documentation: Exceptional (47.2% of codebase is documentation)

🤝 Contributing

We welcome contributions! Sentinel is in active development and we'd love your help.

🐛 Report Bugs

GitHub Issues

Found a bug? Let us know!

💡 Suggest Features

Discussions

Have ideas? Start a discussion!

📖 Improve Docs

Submit PRs

Help make docs better!

Development Setup

# Backend Development
cd backend
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -e ".[dev]"

# Run backend tests
pytest -v --cov=backend
black . && ruff check . && mypy .

# Frontend Development
cd frontend
npm install

# Run frontend tests
npm test                    # Unit tests (Vitest) - 473 tests
npm run lint                # ESLint
npm run type-check          # TypeScript (0 errors)

# Run dev server
npm run dev                 # Browser only
npm run tauri:dev           # Desktop app (recommended)

# Code quality checks
npm run lint && npm run type-check

📋 Contributing Guidelines

How to Contribute:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Code Style:

Frontend: ESLint + TypeScript strict mode (0 errors required)
Backend: Black (line-length: 100) + Ruff + MyPy
Tests: Required for all features (unit tests where applicable)
Commits: Conventional Commits format (feat:, fix:, docs:, etc.)

Testing Requirements:

Unit tests for all new components/functions
100% test pass rate before merge
No TypeScript errors allowed
Maintain or improve test coverage

Review Process:

All PRs require 1 approval
CI/CD checks must pass (tests, linting, type checking)
Documentation updates for new features

🎨 Design Principles

🎯 Visual First

GUI is primary interface
DSL for interoperability

🔒 Security First

Desktop-first architecture
Self-hosted, air-gapped support

🔬 Research Grade

Deterministic & reproducible
Built for frontier AI labs

♿ Accessible

No coding required
Everyone can test agents

Core Philosophy: "Point, Click, Test"

Make AI agent testing as intuitive as Postman made API testing, as visual as Langflow made LLM workflows, and as powerful as LangSmith made observability.

🙏 Acknowledgments

Sentinel's design is inspired by industry-leading tools:

Langflow

Node-based LLM workflows
Visual-first design

n8n

Visual automation
Drag-and-drop UX

Postman

API testing UX
Developer experience

Playwright

Record/replay pattern
E2E testing

LangSmith

LLM observability
Evaluation platform

Special Thanks:

React Flow team for production-ready canvas library
Tauri team for lightweight desktop framework
shadcn/ui for beautiful component library
Open source community for inspiration and support

📄 License

MIT License - see LICENSE file for details.

Copyright (c) 2025 Navam

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.

💬 Community & Support

📧 Contact

Email: hello@navam.io
Twitter: @navam_io
GitHub: navam-io/sentinel
Website: navam.io

🚀 Current Status

Version: 0.22.0 (Released November 23, 2025)
Status: Unified Library Tab with 12-Category System ✅
Next Milestone: v0.23.0 - Model Execution & Result Storage (Q1 2026)

Production Ready Features:

✅ Visual canvas with 6+ node types
✅ Real-time Visual ↔ YAML synchronization
✅ Library with 16 categorized templates
✅ 12-category classification system
✅ Test suite organizer
✅ Desktop app (Tauri 2.0)
✅ 473 tests with 100% pass rate
✅ Comprehensive documentation (47% of codebase)

⬆ Back to Top

Built with ❤️ by the Navam Team for frontier AI labs, researchers, and agent builders

⭐ Star this repo if you find it helpful!

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
.claude		.claude
.github		.github
.obsidian		.obsidian
artifacts		artifacts
assets		assets
backend		backend
backlog		backlog
blogs		blogs
docs		docs
frontend		frontend
metrics		metrics
refer		refer
releases		releases
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
pytest.ini		pytest.ini
upgrade-test-checklist.md		upgrade-test-checklist.md

License

navam-io/sentinel

Folders and files

Latest commit

History

Repository files navigation

Sentinel

Visual-First AI Agent Testing Platform

🎯 What is Sentinel?

Why Sentinel?

🎨 Visual First

🔄 Git Friendly

🧪 Research Grade

✨ Key Features

Current Release (v0.22.0 - November 23, 2025)

🎨 Visual Canvas & Library

🔧 Type-Safe Backend & Execution

🧪 Test Coverage & Quality

Frontend

Backend

Model Providers (Pluggable)

🚀 Quick Start

Option 1: Visual Canvas (Desktop App) — Recommended ⭐

🎬 Visual Canvas Demo

Building Your First Test in 60 Seconds

Generated YAML Example

💡 12-Category Test Classification

📝 Q&A Testing

💻 Code Generation

🌐 Browser Agents

🔄 Multi-turn

🔗 LangGraph

🛡️ Safety

📖 Documentation

🚀 Getting Started

📚 Examples & Templates

📘 API Reference

🔧 Advanced

🗺️ Roadmap

✅ Recently Released

Library Tab & Category System

UI/UX Improvements

Backend Enhancements

🚧 In Progress

👥 Who Uses Sentinel?

🎯 Primary Users

⚡ Advanced Users

📊 Project Stats

🤝 Contributing

🐛 Report Bugs

💡 Suggest Features

📖 Improve Docs

Development Setup

🎨 Design Principles

🎯 Visual First

🔒 Security First

🔬 Research Grade

♿ Accessible

🙏 Acknowledgments

📄 License

💬 Community & Support

📧 Contact

🚀 Current Status

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages