TrustVector

Open-source AI assurance framework for models, MCPs, and agents

TrustVector is an evidence-based evaluation framework for AI systems, providing transparent, multi-dimensional trust scores across security, privacy, performance, trust, and operational excellence.

Powered and supported by Guard0.ai

🌐 Website: trustvector.dev 📖 Documentation: /methodology 🤝 Contributing: /contribute 🗺️ Roadmap: ROADMAP.md 🔒 Security: SECURITY.md 📜 Code of Conduct: CODE_OF_CONDUCT.md

🎯 What is TrustVector?

Unlike simple benchmarks, TrustVector evaluates AI systems holistically across 5 critical dimensions:

Performance & Reliability - Accuracy, latency, uptime
Security - Prompt injection resistance, jailbreak defense, data leakage
Privacy & Compliance - Data residency, retention, HIPAA/GDPR
Trust & Transparency - Explainability, hallucination rate, bias
Operational Excellence - API quality, SDKs, ecosystem maturity

Key Features

Evidence-Based: Every score backed by verifiable sources
CVSS-like Weighting: Customize dimension importance for your use case
Transparent: Full methodology and confidence levels disclosed
Open-Source: Community-driven, GitHub-based contribution workflow
Actionable: Use case specific recommendations

🚀 Quick Start

View Evaluations

Browse existing evaluations at trustvector.dev or locally:

# Clone the repository
git clone https://github.com/Guard0-Security/TrustVector.git
cd TrustVector

# Install dependencies
npm install

# Run the website locally
npm run dev

# Open http://localhost:3000

Using the Data

All evaluations are structured JSON files in /data:

import claudeSonnet from './data/models/claude-sonnet-4-5.json';

// Get overall score
const overallScore = calculateOverallScore(claudeSonnet);

// Access dimension scores
const securityScore = claudeSonnet.trust_vector.security.overall_score;

// Custom weighted score (CVSS-style)
const customScore = calculateCustomScore(claudeSonnet, {
  performance_reliability: 0.15,
  security: 0.35,
  privacy_compliance: 0.30,
  trust_transparency: 0.10,
  operational_excellence: 0.10,
});

📊 Current Coverage

106 Total Evaluations across 3 categories:

AI Models (38)

Frontier Models:

✅ Claude Sonnet 4.5, Claude Opus 4.1, Claude 3.7 Sonnet, Claude 3.5 Haiku (Anthropic)
✅ GPT-5, GPT-4.5, GPT-4.1, GPT-4o, GPT-4o Mini (OpenAI)
✅ o1, o1 Mini, o3, o3 Mini (OpenAI Reasoning)
✅ Gemini 2.5 Pro, Gemini 2.0 Flash (Google)
✅ Llama 4 Behemoth, Llama 4 Maverick, Llama 4 Scout, Llama 3.3 70B, Llama 3.1 405B (Meta)
✅ Grok 3 Beta (xAI)
✅ DeepSeek R1, DeepSeek V3 (DeepSeek)

Specialized & Open Source:

✅ Gemma 3 27B (Google)
✅ Qwen2.5-VL 32B (Alibaba)
✅ Nemotron Ultra 253B (NVIDIA)
✅ Nova Pro (Amazon)

See all models →

AI Agents (34)

Enterprise Platforms:

✅ Amazon Bedrock Agents, Azure Bot Service, Google Agent Builder
✅ IBM Watson Assistant, Google Dialogflow, Amazon Lex

Developer Frameworks:

✅ LangGraph Agent, LlamaIndex Agent, CrewAI, AutoGen
✅ Haystack, LangFlow, Flowise, E2B Agents

Autonomous Agents:

✅ AutoGPT, BabyAGI, AgentGPT, Adala
✅ And 15+ more...

See all agents →

MCP Servers (34)

Cloud & Infrastructure:

✅ AWS, Azure, Cloudflare, Docker, Kubernetes

Development Tools:

✅ GitHub, Git, Filesystem, Memory

Productivity & Business:

✅ Gmail, Google Drive, Calendar, Linear, Atlassian
✅ Datadog, Elasticsearch, MongoDB

Utilities:

✅ Brave Search, Fetch, Everything
✅ And 15+ more...

See all MCPs →

🤝 Contributing

We welcome contributions! Here's how to add an evaluation:

1. Choose an Entity

Pick an unevaluated AI system (model, MCP, or agent).

2. Gather Evidence

For each criterion, collect:

Source: Benchmark name, paper, or documentation
URL: Link to evidence
Date: When published
Value: What it shows (e.g., "92.3% on MATH-500")

3. Create Evaluation File

Use our template:

npm run create-template -- --type model --id your-model-name

Fill in the JSON following the schema in /framework/schema/types.ts.

4. Validate

npm run validate

Ensures your evaluation meets quality standards.

5. Submit PR

git checkout -b evaluation/your-model-name
git add data/models/your-model-name.json
git commit -m "Add evaluation for Your Model Name"
git push origin evaluation/your-model-name

Open a PR! We'll review within 48 hours.

See CONTRIBUTING.md for detailed guidelines.

📖 Methodology

Scoring (0-100)

90-100: Exceptional - Industry leading
75-89: Strong - Meets enterprise requirements
60-74: Adequate - Usable with caveats
40-59: Concerning - Significant gaps
0-39: Poor - Not recommended

Confidence Levels

High: Multiple independent sources, recent testing
Medium: Single authoritative source or slightly dated
Low: Inferred, vendor claims only, needs verification

Evidence Requirements

Every score requires:

Primary source (benchmark, paper, official doc)
Date of evidence
Methodology used to derive score
Last verified timestamp

See METHODOLOGY.md for full details.

🛠️ Custom Score Calculator

Like CVSS, you can weight dimensions based on your priorities:

import { calculateCustomScore, WEIGHTING_PROFILES } from '@/framework/calculator/custom-score';

// Use a predefined profile
const score = calculateCustomScore(entity, WEIGHTING_PROFILES.security_first);

// Or create your own weights
const customWeights = {
  performance_reliability: 0.20,
  security: 0.30,
  privacy_compliance: 0.25,
  trust_transparency: 0.15,
  operational_excellence: 0.10,
};

const myScore = calculateCustomScore(entity, customWeights);

Predefined Profiles

balanced - Equal weight (20% each)
security_first - Security & privacy prioritized
performance_focused - Speed & reliability first
enterprise - Balanced security, compliance, ops
healthcare - HIPAA compliance focused
financial - Security & compliance heavy
startup - Performance & cost optimized

🏗️ Project Structure

trust-vector/
├── data/                       # Evaluation data
│   ├── models/                 # Model evaluations (JSON)
│   ├── mcps/                   # MCP evaluations
│   ├── agents/                 # Agent evaluations
│   └── use-cases/              # Use case taxonomy
├── framework/                  # Core framework
│   ├── schema/                 # TypeScript types & validation
│   └── calculator/             # Custom score calculator
├── website/                    # Next.js website
│   ├── app/                    # Pages and routes
│   ├── components/             # React components
│   └── lib/                    # Utilities
├── docs/                       # Documentation
│   ├── CONTRIBUTING.md
│   ├── METHODOLOGY.md
│   └── examples/
└── scripts/                    # Validation & tools
    └── validate-data.ts        # CI validation

🔒 Security & Privacy

TrustVector itself:

✅ No user data collection
✅ No cookies or tracking
✅ Static site generation (no runtime deps)
✅ All evaluations version-controlled
✅ Schema validation on every PR

🌟 Why TrustVector?

Feature	TrustVector	Typical Benchmarks
Multi-dimensional	✅ 5 dimensions	❌ 1-2 metrics
Evidence-based	✅ Every score sourced	⚠️ Limited
Confidence levels	✅ High/Med/Low	❌ No
Security evaluation	✅ Comprehensive	⚠️ Basic
Privacy assessment	✅ Detailed	❌ Rare
Use case specific	✅ 10+ use cases	⚠️ Generic
Custom weighting	✅ CVSS-style	❌ No
Open-source	✅ MIT	⚠️ Varies

📜 License

MIT License - see LICENSE for details.

🙏 Acknowledgments

Inspired by:

CVSS - Vulnerability scoring methodology
RiskRubric.ai - AI model evaluation
LMSYS Chatbot Arena - Crowdsourced benchmarking
OWASP LLM Top 10 - LLM security

Built with:

Next.js 14 & React 18
TypeScript
Tailwind CSS
shadcn/ui
Recharts
Zod (validation)

🌟 Support

If you find TrustVector useful:

⭐ Star the repository to show support
🐛 Report bugs via Issues
💡 Request features via Discussions
🤝 Contribute evaluations (see CONTRIBUTING.md)
📢 Share with your network

📬 Contact

Issues: GitHub Issues
Discussions: GitHub Discussions
Feature Requests: Roadmap
Security: SECURITY.md

⭐ Star on GitHub · 🤝 Contribute · 📖 Read Docs

Made with ❤️ by Guard0.ai and the TrustVector community

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TrustVector

🎯 What is TrustVector?

Key Features

🚀 Quick Start

View Evaluations

Using the Data

📊 Current Coverage

AI Models (38)

AI Agents (34)

MCP Servers (34)

🤝 Contributing

1. Choose an Entity

2. Gather Evidence

3. Create Evaluation File

4. Validate

5. Submit PR

📖 Methodology

Scoring (0-100)

Confidence Levels

Evidence Requirements

🛠️ Custom Score Calculator

Predefined Profiles

🏗️ Project Structure

🔒 Security & Privacy

🌟 Why TrustVector?

📜 License

🙏 Acknowledgments

🌟 Support

📬 Contact

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github		.github
app		app
components		components
data		data
docs		docs
framework		framework
lib		lib
public		public
screenshots		screenshots
scripts		scripts
types		types
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
next.config.js		next.config.js
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

License

guard0-ai/TrustVector

Folders and files

Latest commit

History

Repository files navigation

TrustVector

🎯 What is TrustVector?

Key Features

🚀 Quick Start

View Evaluations

Using the Data

📊 Current Coverage

AI Models (38)

AI Agents (34)

MCP Servers (34)

🤝 Contributing

1. Choose an Entity

2. Gather Evidence

3. Create Evaluation File

4. Validate

5. Submit PR

📖 Methodology

Scoring (0-100)

Confidence Levels

Evidence Requirements

🛠️ Custom Score Calculator

Predefined Profiles

🏗️ Project Structure

🔒 Security & Privacy

🌟 Why TrustVector?

📜 License

🙏 Acknowledgments

🌟 Support

📬 Contact

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages