Open-source AI assurance framework for models, MCPs, and agents
TrustVector is an evidence-based evaluation framework for AI systems, providing transparent, multi-dimensional trust scores across security, privacy, performance, trust, and operational excellence.
Powered and supported by Guard0.ai
🌐 Website: trustvector.dev 📖 Documentation: /methodology 🤝 Contributing: /contribute 🗺️ Roadmap: ROADMAP.md 🔒 Security: SECURITY.md 📜 Code of Conduct: CODE_OF_CONDUCT.md
Unlike simple benchmarks, TrustVector evaluates AI systems holistically across 5 critical dimensions:
- Performance & Reliability - Accuracy, latency, uptime
- Security - Prompt injection resistance, jailbreak defense, data leakage
- Privacy & Compliance - Data residency, retention, HIPAA/GDPR
- Trust & Transparency - Explainability, hallucination rate, bias
- Operational Excellence - API quality, SDKs, ecosystem maturity
- Evidence-Based: Every score backed by verifiable sources
- CVSS-like Weighting: Customize dimension importance for your use case
- Transparent: Full methodology and confidence levels disclosed
- Open-Source: Community-driven, GitHub-based contribution workflow
- Actionable: Use case specific recommendations
Browse existing evaluations at trustvector.dev or locally:
# Clone the repository
git clone https://github.com/Guard0-Security/TrustVector.git
cd TrustVector
# Install dependencies
npm install
# Run the website locally
npm run dev
# Open http://localhost:3000All evaluations are structured JSON files in /data:
import claudeSonnet from './data/models/claude-sonnet-4-5.json';
// Get overall score
const overallScore = calculateOverallScore(claudeSonnet);
// Access dimension scores
const securityScore = claudeSonnet.trust_vector.security.overall_score;
// Custom weighted score (CVSS-style)
const customScore = calculateCustomScore(claudeSonnet, {
performance_reliability: 0.15,
security: 0.35,
privacy_compliance: 0.30,
trust_transparency: 0.10,
operational_excellence: 0.10,
});106 Total Evaluations across 3 categories:
Frontier Models:
- ✅ Claude Sonnet 4.5, Claude Opus 4.1, Claude 3.7 Sonnet, Claude 3.5 Haiku (Anthropic)
- ✅ GPT-5, GPT-4.5, GPT-4.1, GPT-4o, GPT-4o Mini (OpenAI)
- ✅ o1, o1 Mini, o3, o3 Mini (OpenAI Reasoning)
- ✅ Gemini 2.5 Pro, Gemini 2.0 Flash (Google)
- ✅ Llama 4 Behemoth, Llama 4 Maverick, Llama 4 Scout, Llama 3.3 70B, Llama 3.1 405B (Meta)
- ✅ Grok 3 Beta (xAI)
- ✅ DeepSeek R1, DeepSeek V3 (DeepSeek)
Specialized & Open Source:
- ✅ Gemma 3 27B (Google)
- ✅ Qwen2.5-VL 32B (Alibaba)
- ✅ Nemotron Ultra 253B (NVIDIA)
- ✅ Nova Pro (Amazon)
Enterprise Platforms:
- ✅ Amazon Bedrock Agents, Azure Bot Service, Google Agent Builder
- ✅ IBM Watson Assistant, Google Dialogflow, Amazon Lex
Developer Frameworks:
- ✅ LangGraph Agent, LlamaIndex Agent, CrewAI, AutoGen
- ✅ Haystack, LangFlow, Flowise, E2B Agents
Autonomous Agents:
- ✅ AutoGPT, BabyAGI, AgentGPT, Adala
- ✅ And 15+ more...
Cloud & Infrastructure:
- ✅ AWS, Azure, Cloudflare, Docker, Kubernetes
Development Tools:
- ✅ GitHub, Git, Filesystem, Memory
Productivity & Business:
- ✅ Gmail, Google Drive, Calendar, Linear, Atlassian
- ✅ Datadog, Elasticsearch, MongoDB
Utilities:
- ✅ Brave Search, Fetch, Everything
- ✅ And 15+ more...
We welcome contributions! Here's how to add an evaluation:
Pick an unevaluated AI system (model, MCP, or agent).
For each criterion, collect:
- Source: Benchmark name, paper, or documentation
- URL: Link to evidence
- Date: When published
- Value: What it shows (e.g., "92.3% on MATH-500")
Use our template:
npm run create-template -- --type model --id your-model-nameFill in the JSON following the schema in /framework/schema/types.ts.
npm run validateEnsures your evaluation meets quality standards.
git checkout -b evaluation/your-model-name
git add data/models/your-model-name.json
git commit -m "Add evaluation for Your Model Name"
git push origin evaluation/your-model-nameOpen a PR! We'll review within 48 hours.
See CONTRIBUTING.md for detailed guidelines.
- 90-100: Exceptional - Industry leading
- 75-89: Strong - Meets enterprise requirements
- 60-74: Adequate - Usable with caveats
- 40-59: Concerning - Significant gaps
- 0-39: Poor - Not recommended
- High: Multiple independent sources, recent testing
- Medium: Single authoritative source or slightly dated
- Low: Inferred, vendor claims only, needs verification
Every score requires:
- Primary source (benchmark, paper, official doc)
- Date of evidence
- Methodology used to derive score
- Last verified timestamp
See METHODOLOGY.md for full details.
Like CVSS, you can weight dimensions based on your priorities:
import { calculateCustomScore, WEIGHTING_PROFILES } from '@/framework/calculator/custom-score';
// Use a predefined profile
const score = calculateCustomScore(entity, WEIGHTING_PROFILES.security_first);
// Or create your own weights
const customWeights = {
performance_reliability: 0.20,
security: 0.30,
privacy_compliance: 0.25,
trust_transparency: 0.15,
operational_excellence: 0.10,
};
const myScore = calculateCustomScore(entity, customWeights);balanced- Equal weight (20% each)security_first- Security & privacy prioritizedperformance_focused- Speed & reliability firstenterprise- Balanced security, compliance, opshealthcare- HIPAA compliance focusedfinancial- Security & compliance heavystartup- Performance & cost optimized
trust-vector/
├── data/ # Evaluation data
│ ├── models/ # Model evaluations (JSON)
│ ├── mcps/ # MCP evaluations
│ ├── agents/ # Agent evaluations
│ └── use-cases/ # Use case taxonomy
├── framework/ # Core framework
│ ├── schema/ # TypeScript types & validation
│ └── calculator/ # Custom score calculator
├── website/ # Next.js website
│ ├── app/ # Pages and routes
│ ├── components/ # React components
│ └── lib/ # Utilities
├── docs/ # Documentation
│ ├── CONTRIBUTING.md
│ ├── METHODOLOGY.md
│ └── examples/
└── scripts/ # Validation & tools
└── validate-data.ts # CI validation
TrustVector itself:
- ✅ No user data collection
- ✅ No cookies or tracking
- ✅ Static site generation (no runtime deps)
- ✅ All evaluations version-controlled
- ✅ Schema validation on every PR
| Feature | TrustVector | Typical Benchmarks |
|---|---|---|
| Multi-dimensional | ✅ 5 dimensions | ❌ 1-2 metrics |
| Evidence-based | ✅ Every score sourced | |
| Confidence levels | ✅ High/Med/Low | ❌ No |
| Security evaluation | ✅ Comprehensive | |
| Privacy assessment | ✅ Detailed | ❌ Rare |
| Use case specific | ✅ 10+ use cases | |
| Custom weighting | ✅ CVSS-style | ❌ No |
| Open-source | ✅ MIT |
MIT License - see LICENSE for details.
Inspired by:
- CVSS - Vulnerability scoring methodology
- RiskRubric.ai - AI model evaluation
- LMSYS Chatbot Arena - Crowdsourced benchmarking
- OWASP LLM Top 10 - LLM security
Built with:
- Next.js 14 & React 18
- TypeScript
- Tailwind CSS
- shadcn/ui
- Recharts
- Zod (validation)
If you find TrustVector useful:
- ⭐ Star the repository to show support
- 🐛 Report bugs via Issues
- 💡 Request features via Discussions
- 🤝 Contribute evaluations (see CONTRIBUTING.md)
- 📢 Share with your network
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Feature Requests: Roadmap
- Security: SECURITY.md
⭐ Star on GitHub · 🤝 Contribute · 📖 Read Docs
Made with ❤️ by Guard0.ai and the TrustVector community