Skip to content

navam-io/sentinel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Sentinel Logo

Sentinel

Visual-First AI Agent Testing Platform

"Point, Click, Test" โ€” Make AI agent testing as intuitive as Postman made API testing

Version License Tests Coverage TypeScript React Python

Quick Start โ€ข Features โ€ข Documentation โ€ข Roadmap โ€ข Contributing


๐ŸŽฏ What is Sentinel?

Sentinel is a visual-first testing and evaluation platform for AI agents, designed for frontier AI labs, research teams, and agent builders. Build tests with an intuitive drag-and-drop canvas or write declarative YAML specsโ€”your choice.

graph LR
    A[๐Ÿ‘† Click Components] --> B[๐ŸŽจ Visual Canvas]
    B --> C[๐Ÿ“ Auto-Generate YAML]
    C --> D[โœ… Run Tests]
    D --> E[๐Ÿ“Š Compare Results]
    style B fill:#6366f1,stroke:#4f46e5,stroke-width:2px,color:#fff
Loading

Why Sentinel?

๐ŸŽจ Visual First

Build tests by clicking, not coding. No YAML knowledge required.

๐Ÿ”„ Git Friendly

Visual changes = clean YAML diffs. Perfect for version control and CI/CD.

๐Ÿงช Research Grade

Built for frontier AI labs with deterministic, repeatable testing.

Target Positioning: "Postman for AI Agents" with research-grade rigor and visual-first design


โœจ Key Features

Current Release (v0.22.0 - November 23, 2025)

๐ŸŽจ Visual Canvas & Library

React 19 โ€ข React Flow 12.3 โ€ข Tauri 2.0

  • โœ… Unified Library Tab: Templates + user tests in one interface
  • โœ… 12-Category System: Q&A, Code Gen, Browser Agents, Multi-turn, Safety, and more
  • โœ… 16 Built-in Templates: Production-ready test templates
  • โœ… 6+ Node Types: Input, Model, Assertion, Tool, System, Output
  • โœ… Real-time YAML generation from canvas
  • โœ… Smart Positioning with auto-layout
  • โœ… Desktop App (Tauri 2.0) for local-first workflow
  • โœ… 473 Unit Tests (100% pass rate)

๐Ÿ”ง Type-Safe Backend & Execution

FastAPI โ€ข Pydantic โ€ข Python 3.13

  • โœ… 8 Assertion Types (text, regex, tools, format, perf)
  • โœ… Round-Trip Conversion (Visual โ†” YAML, zero data loss)
  • โœ… Schema Validation with clear error messages
  • โœ… Backend Tests with comprehensive coverage
  • โœ… Model Providers: Anthropic Claude, OpenAI GPT-5
  • โœ… FastAPI Backend with SQLite/PostgreSQL support
  • โœ… Type Safety: Black, Ruff, MyPy, TypeScript strict mode
  • โœ… Test Suites with folder organization

๐Ÿงช Test Coverage & Quality

Category Metric Status
Frontend Unit Tests 473 tests across 27 test files โœ… 100% passing
Frontend Coverage Component, hooks, services, stores โœ… 50%+ coverage
Backend Tests pytest suite with comprehensive coverage โœ… 85%+ coverage
TypeScript Strict Mode 0 errors, only 4 any usages โœ… Excellent type safety
Code Quality Black, Ruff, MyPy, ESLint โœ… All checks pass
Total Codebase 57,581 LOC (project code only) โœ… Well-documented (47% docs)
๐Ÿ“ฆ Tech Stack Details

Frontend

  • Framework: React 19 + Vite 6.0
  • Desktop: Tauri 2.0 (Rust-powered desktop app)
  • Canvas: React Flow 12.3 (@xyflow/react)
  • State: Zustand 5.0
  • Styling: TailwindCSS + shadcn/ui
  • Testing: Vitest + React Testing Library
  • Type Safety: TypeScript 5.7 (strict mode, 0 errors)
  • Icons: lucide-react

Backend

  • API: FastAPI 0.115+
  • Schema: Pydantic v2 (type-safe validation)
  • Database: SQLite (local) / PostgreSQL (server)
  • Testing: pytest + pytest-cov
  • Code Quality: Black (line-length: 100), Ruff, MyPy
  • Python: 3.13+

Model Providers (Pluggable)

  • Anthropic API: Claude 3.5 Sonnet, Claude 3 Opus (โ‰ฅ0.43.1)
  • OpenAI API: GPT-5.1 (default), GPT-5 Pro, GPT-5 Mini (โ‰ฅ1.59.6)
  • Future: Amazon Bedrock, HuggingFace, Ollama

๐Ÿš€ Quick Start

Option 1: Visual Canvas (Desktop App) โ€” Recommended โญ

# Clone repository
git clone https://github.com/navam-io/sentinel.git
cd sentinel/frontend

# Install dependencies
npm install

# Launch desktop app (hot reload enabled)
npm run tauri:dev

๐ŸŽ‰ That's it! The visual canvas opens with:

  • Component palette on the left
  • Interactive canvas in the center
  • Library tab with 16+ templates
  • Test suite organizer

Try it now:

  1. Click Library tab โ†’ Browse 16 built-in templates
  2. Click Load on any template โ†’ Canvas populates automatically
  3. Click Canvas tab โ†’ See visual node representation
  4. Click Test tab โ†’ View auto-generated YAML
  5. Click Run Test โ†’ Execute and see live results!
Option 2: Development Mode (Browser Only)
cd frontend
npm install
npm run dev  # Opens http://localhost:1420

Runs Vite dev server without Tauri. Faster for UI-only development.

Option 3: Backend API (Python)
# Setup Python environment
cd backend
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -e ".[dev]"

# Run tests to verify
pytest -v  # โœ… All tests pass

# Start API server (optional)
uvicorn main:app --reload
# Visit http://localhost:8000/docs for API documentation

๐ŸŽฌ Visual Canvas Demo

Building Your First Test in 60 Seconds

1. Browse Library

๐Ÿ“š 16+ templates
12 categories
Search & filter

2. Load Template

๐Ÿ‘๏ธ Click "Load"
Canvas auto-populates
Nodes connected

3. Customize

โœ๏ธ Edit node values
Add/remove nodes
Real-time YAML sync

4. Run & Validate

โ–ถ๏ธ Execute test
Live results
Pass/fail indicators

Generated YAML Example

# Auto-generated from visual canvas
name: "Geography Quiz"
version: "1.0"
description: "Test factual knowledge about world capitals"
category: "qa"

model:
  provider: "openai"
  model: "gpt-5.1"
  temperature: 0.7
  max_tokens: 1000

inputs:
  - type: "input"
    query: "What is the capital of France?"

assertions:
  - type: "must_contain"
    value: "Paris"
  - type: "output_type"
    value: "text"
  - type: "max_latency_ms"
    value: 2000

tags:
  - "canvas-generated"
  - "geography"
  - "qa-test"

๐Ÿ’ก 12-Category Test Classification

๐Ÿ“ Q&A Testing

Blue โ€ข Knowledge validation

Test factual knowledge and basic reasoning

category: "qa"
assertions:
  - must_contain: "Paris"
  - max_latency_ms: 2000

Use Cases:

  • Knowledge validation
  • Fact-checking
  • Simple reasoning

๐Ÿ’ป Code Generation

Purple โ€ข Syntax validation

Validate code structure and quality

category: "code-generation"
assertions:
  - regex_match: "def\\s+\\w+"
  - output_type: "code"

Use Cases:

  • Code quality checks
  • Syntax validation
  • Function detection

๐ŸŒ Browser Agents

Green โ€ข Web automation

Test browser interactions and scraping

category: "browser-agents"
assertions:
  - must_call_tool: ["browser"]
  - output_type: "json"

Use Cases:

  • Web scraping tests
  • UI automation
  • Browser tool usage

๐Ÿ”„ Multi-turn

Orange โ€ข Conversations

Multi-step dialogue testing

category: "multi-turn"
assertions:
  - must_contain: "context"
  - min_tokens: 50

Use Cases:

  • Conversation flows
  • Context retention
  • Multi-step reasoning

๐Ÿ”— LangGraph

Cyan โ€ข Agentic workflows

Test LangGraph state machines

category: "langgraph"
assertions:
  - must_call_tool: ["state"]
  - output_type: "json"

Use Cases:

  • Workflow testing
  • State management
  • Agent coordination

๐Ÿ›ก๏ธ Safety

Red โ€ข Security testing

Security and safety validation

category: "safety"
assertions:
  - must_not_contain: "sensitive"
  - output_type: "text"

Use Cases:

  • Prompt injection tests
  • Content safety
  • Security validation
๐Ÿ“š See All 12 Categories
Category Color Purpose Example Use Cases
Q&A Blue Knowledge & reasoning tests Fact-checking, trivia, simple Q&A
Code Generation Purple Code quality validation Syntax checks, function detection, code structure
Browser Agents Green Web automation testing Scraping, UI testing, browser tools
Multi-turn Orange Conversation flows Dialogue testing, context retention
LangGraph Cyan Agentic workflows State machines, workflow orchestration
Safety Red Security & safety Prompt injection, content filtering
Data Analysis Indigo Data processing tasks CSV parsing, data transformation
Reasoning Pink Logic & problem-solving Chain-of-thought, math, puzzles
Tool Use Yellow Function calling tests API calls, tool invocation
API Testing Teal REST endpoint validation HTTP requests, API responses
UI Testing Lime Visual & interaction tests Component rendering, user flows
Regression Amber Consistency testing Version comparison, behavior stability
๐Ÿ“š See All 8 Assertion Types
Type Purpose Example Use Case
must_contain Text presence check "Paris" Verify specific content appears
must_not_contain Text absence check "London" Ensure unwanted content absent
regex_match Pattern matching "def\\s+\\w+" Validate code/format structure
must_call_tool Tool verification ["browser", "calculator"] Verify agent tool usage
output_type Format validation "json", "code", "text" Enforce output format
max_latency_ms Performance check 2000 Ensure response time < 2s
min_tokens Min output length 50 Require minimum detail
max_tokens Max output length 500 Enforce conciseness

๐Ÿ“– Documentation

๐Ÿš€ Getting Started

๐Ÿ“š Examples & Templates

๐Ÿ“˜ API Reference

๐Ÿ”ง Advanced


๐Ÿ—บ๏ธ Roadmap

โœ… Recently Released

v0.22.0 - Unified Library Tab & 12-Category System (Nov 23, 2025) โญ Latest

Major UX Improvements & Template Expansion

Library Tab & Category System

  • โœ… Unified Library Tab combining templates and user tests
  • โœ… 12-Category Classification System (Q&A, Code Gen, Browser, Multi-turn, Safety, etc.)
  • โœ… 10 New Templates (API Testing, Data Analysis, LangGraph, Safety, Reasoning, etc.)
  • โœ… Enhanced search and filtering
  • โœ… Category-based organization with color coding

UI/UX Improvements

  • โœ… Tab restructure: Canvas, Test, Suite, Library
  • โœ… Integrated run section in Test tab (collapsible)
  • โœ… State persistence (run details, suite expansion)
  • โœ… Refined LibraryCard component with icons
  • โœ… Better visual hierarchy and information design

Backend Enhancements

  • โœ… Database schema updates (category, is_template fields)
  • โœ… Updated API endpoints for category management
  • โœ… Test renaming functionality
v0.21.0 - Test Suite Organizer (Nov 22, 2025)
  • โœ… Test suite organizer with folders
  • โœ… Suite search and filtering
  • โœ… Drag-and-drop test organization
  • โœ… Suite export/import
v0.20.0 - Enhanced Visual Canvas (Nov 2025)
  • โœ… Test renaming functionality
  • โœ… Improved drag-drop palette
  • โœ… Auto-save improvements
  • โœ… Better YAML synchronization

๐Ÿšง In Progress

gantt
    title Sentinel 2026 Roadmap
    dateFormat  YYYY-MM
    section Phase 4
    Model Execution         :2026-01, 2m
    Result Storage          :2026-02, 1m
    section Phase 5
    Advanced Providers      :2026-03, 2m
    LangGraph Support       :2026-05, 1m
    section Phase 6
    Analytics & CI/CD       :2026-06, 3m
Loading

v0.23.0 - Execution Q1 2026

  • Live execution dashboard
  • Result storage (SQLite)
  • Test run history
  • Metrics & analytics
  • Performance tracking

v0.24.0 - Providers Q1 2026

  • Bedrock integration
  • HuggingFace support
  • Ollama local models
  • Provider comparison
  • Cost tracking

v0.25.0 - LangGraph Q2 2026

  • LangGraph framework support
  • State machine testing
  • Multi-agent workflows
  • Workflow visualization
  • Debug tools
๐Ÿ”ฎ Future Features (v0.26.0+)

Advanced Features (Q2-Q3 2026)

  • v0.26.0: AI-assisted test generation
  • v0.27.0: Visual assertion builder enhancements
  • v0.28.0: Regression engine & comparison view

Enterprise Features (Q3-Q4 2026)

  • v0.29.0: Collaborative workspaces
  • v0.30.0: Advanced safety scenarios
  • v0.31.0: Dashboard & analytics platform
  • v0.32.0: CI/CD integration & automation

โ†’ Full roadmap with detailed specs


๐Ÿ‘ฅ Who Uses Sentinel?

๐ŸŽฏ Primary Users

Visual-First Interface

  • ๐Ÿ“Š Product Managers - Validate agents without code
  • ๐Ÿ”ฌ Research Scientists - Build evaluation suites visually
  • ๐Ÿงช QA Engineers - Create and debug tests with clicks
  • ๐Ÿ›ก๏ธ Safety Teams - Collaborative safety testing
  • ๐Ÿข Frontier Labs - Test model releases
  • ๐Ÿงฌ Neo-labs - Agent-focused research

โšก Advanced Users

DSL Mode + Programmatic

  • ๐Ÿ’ป Model Engineers - Direct YAML editing, programmatic tests
  • โš™๏ธ DevOps Engineers - CI/CD integration, automation
  • ๐Ÿ—๏ธ Infrastructure Teams - Enterprise testing at scale
  • ๐Ÿค– Agent Builders - Production validation
  • ๐Ÿ”ง Framework Developers - Integration testing
  • ๐Ÿ“ˆ MLOps Teams - Regression detection

๐Ÿ“Š Project Stats

Metric Value Metric Value
Version 0.22.0 Release Date Nov 23, 2025
Total Tests 473 passing Test Pass Rate 100% โœ…
Frontend Tests 27 files, 473 tests Backend Tests 6 files, comprehensive coverage
Frontend Coverage 50%+ Backend Coverage 85%+
Node Types 6+ production Templates 16 ready-to-use
Categories 12 classifications Assertion Types 8 validators
Frontend LOC 13,536 (100 files) Backend LOC 3,234 (34 files)
TypeScript Errors 0 (strict mode) TypeScript any Only 4 instances
Components 68 React components Documentation 27,157 LOC (47% of codebase)

Tech Metrics:

  • Build Time: ~3s (Vite HMR)
  • Desktop App: Tauri 2.0 (lightweight, fast startup)
  • Test Execution: 2.24s for 473 unit tests
  • Type Safety: TypeScript strict mode, Pydantic v2

Code Quality:

  • โœ… Frontend: ESLint, TypeScript strict (0 errors)
  • โœ… Backend: Black (line-length: 100), Ruff, MyPy
  • โœ… Codebase: 57,581 LOC (project code only, excluding dependencies)
  • โœ… Documentation: Exceptional (47.2% of codebase is documentation)

๐Ÿค Contributing

We welcome contributions! Sentinel is in active development and we'd love your help.

๐Ÿ› Report Bugs

GitHub Issues

Found a bug? Let us know!

๐Ÿ’ก Suggest Features

Discussions

Have ideas? Start a discussion!

๐Ÿ“– Improve Docs

Submit PRs

Help make docs better!

Development Setup

# Backend Development
cd backend
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -e ".[dev]"

# Run backend tests
pytest -v --cov=backend
black . && ruff check . && mypy .

# Frontend Development
cd frontend
npm install

# Run frontend tests
npm test                    # Unit tests (Vitest) - 473 tests
npm run lint                # ESLint
npm run type-check          # TypeScript (0 errors)

# Run dev server
npm run dev                 # Browser only
npm run tauri:dev           # Desktop app (recommended)

# Code quality checks
npm run lint && npm run type-check
๐Ÿ“‹ Contributing Guidelines

How to Contribute:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Code Style:

  • Frontend: ESLint + TypeScript strict mode (0 errors required)
  • Backend: Black (line-length: 100) + Ruff + MyPy
  • Tests: Required for all features (unit tests where applicable)
  • Commits: Conventional Commits format (feat:, fix:, docs:, etc.)

Testing Requirements:

  • Unit tests for all new components/functions
  • 100% test pass rate before merge
  • No TypeScript errors allowed
  • Maintain or improve test coverage

Review Process:

  • All PRs require 1 approval
  • CI/CD checks must pass (tests, linting, type checking)
  • Documentation updates for new features

๐ŸŽจ Design Principles

๐ŸŽฏ Visual First


GUI is primary interface
DSL for interoperability

๐Ÿ”’ Security First


Desktop-first architecture
Self-hosted, air-gapped support

๐Ÿ”ฌ Research Grade


Deterministic & reproducible
Built for frontier AI labs

โ™ฟ Accessible


No coding required
Everyone can test agents

Core Philosophy: "Point, Click, Test"

Make AI agent testing as intuitive as Postman made API testing, as visual as Langflow made LLM workflows, and as powerful as LangSmith made observability.


๐Ÿ™ Acknowledgments

Sentinel's design is inspired by industry-leading tools:

Langflow

Node-based LLM workflows
Visual-first design

n8n

Visual automation
Drag-and-drop UX

Postman

API testing UX
Developer experience

Playwright

Record/replay pattern
E2E testing

LangSmith

LLM observability
Evaluation platform

Special Thanks:

  • React Flow team for production-ready canvas library
  • Tauri team for lightweight desktop framework
  • shadcn/ui for beautiful component library
  • Open source community for inspiration and support

๐Ÿ“„ License

MIT License - see LICENSE file for details.

Copyright (c) 2025 Navam

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.

๐Ÿ’ฌ Community & Support

Documentation Metrics Report GitHub Issues Discussions Twitter

๐Ÿ“ง Contact


๐Ÿš€ Current Status

Version: 0.22.0 (Released November 23, 2025)
Status: Unified Library Tab with 12-Category System โœ…
Next Milestone: v0.23.0 - Model Execution & Result Storage (Q1 2026)

Production Ready Features:

  • โœ… Visual canvas with 6+ node types
  • โœ… Real-time Visual โ†” YAML synchronization
  • โœ… Library with 16 categorized templates
  • โœ… 12-category classification system
  • โœ… Test suite organizer
  • โœ… Desktop app (Tauri 2.0)
  • โœ… 473 tests with 100% pass rate
  • โœ… Comprehensive documentation (47% of codebase)

โฌ† Back to Top

Built with โค๏ธ by the Navam Team for frontier AI labs, researchers, and agent builders

โญ Star this repo if you find it helpful!

About

Unified Agent Regression + Evaluation Platform

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •