# Trust Scoring Exploration

This notebook provides an interactive walkthrough of the **agent-mesh trust scoring system**.
You'll learn how agents earn trust, how scores decay over time, and how policy violations
affect an agent's ability to operate within the mesh.

### The 5-Dimension Model

| Dimension | Weight | What it measures |
|-----------|--------|------------------|
| Policy Compliance | 0.25 | Did the agent follow declared policies? |
| Security Posture | 0.25 | Did the agent stay within trust boundaries? |
| Output Quality | 0.20 | Did downstream consumers accept the output? |
| Resource Efficiency | 0.15 | Was token/compute usage proportionate? |
| Collaboration Health | 0.15 | Did inter-agent handoffs complete? |

Scores range from **0 to 1000** and map to trust tiers:
- **Verified Partner** (900+), **Trusted** (700-899), **Standard** (500-699), **Probationary** (300-499), **Untrusted** (<300)

In [None]:
import sys, os
sys.path.insert(0, os.path.abspath(os.path.join(os.getcwd(), '..', 'src')))

from agentmesh.reward.engine import RewardEngine, RewardConfig
from agentmesh.reward.scoring import DimensionType, TrustScore, ScoreThresholds
from agentmesh.reward.trust_decay import NetworkTrustEngine, TrustEvent
from agentmesh.constants import (
    TRUST_SCORE_DEFAULT, TRUST_REVOCATION_THRESHOLD,
    TIER_VERIFIED_PARTNER_THRESHOLD, TIER_TRUSTED_THRESHOLD,
    TIER_STANDARD_THRESHOLD, TIER_PROBATIONARY_THRESHOLD,
)

# Create three agents with different profiles
AGENTS = {
    "did:mesh:alice": "Reliable data processor",
    "did:mesh:bob": "New agent, still proving itself",
    "did:mesh:carol": "Agent that will misbehave",
}

engine = RewardEngine()

print("Registered agents:")
for did, role in AGENTS.items():
    score = engine.get_agent_score(did)
    print(f"  {did} — {role}")
    print(f"    Score: {score.total_score}  Tier: {score.tier}")

## How Trust Scores Work

Every agent starts at a **default score of 500** (Standard tier). The `RewardEngine`
records signals across 5 dimensions and recalculates a weighted total score.

Each signal has a value between **0.0** (bad) and **1.0** (good). Dimension scores
use an exponential moving average, so recent signals matter more than old ones.

In [None]:
# Show the default score breakdown for a fresh agent
explanation = engine.get_score_explanation("did:mesh:alice")

print(f"Agent: {explanation['agent_did']}")
print(f"Total Score: {explanation['total_score']}")
print(f"Trend: {explanation['trend']}")
print(f"Revoked: {explanation['revoked']}")
print()

print("Dimension Weights:")
for dim in DimensionType:
    print(f"  {dim.value:.<30} weight={getattr(engine.config, f'{dim.value}_weight', 0):.2f}")
print()

print("Score Thresholds:")
thresholds = ScoreThresholds()
for tier, val in [
    ("Verified Partner", thresholds.verified_partner),
    ("Trusted", thresholds.trusted),
    ("Standard", thresholds.standard),
    ("Probationary", thresholds.probationary),
    ("Revocation", thresholds.revocation_threshold),
]:
    print(f"  {tier:.<25} {val}")

## Simulating Agent Interactions

Let's simulate **Alice** performing well — completing tasks, staying in policy,
and collaborating successfully. Watch her score climb.

In [None]:
alice = "did:mesh:alice"
score_history = []

# Record 20 rounds of positive signals across all dimensions
for i in range(20):
    engine.record_policy_compliance(alice, compliant=True, policy_name="data-access")
    engine.record_security_event(alice, within_boundary=True, event_type="api_call")
    engine.record_output_quality(alice, accepted=True, consumer="did:mesh:bob")
    engine.record_resource_usage(alice, tokens_used=800, tokens_budget=1000,
                                  compute_ms=150, compute_budget_ms=200)
    engine.record_collaboration(alice, handoff_successful=True, peer_did="did:mesh:bob")

    # Recalculate and record
    score = engine._recalculate_score(alice)
    score_history.append(score.total_score)

print("Alice's score over 20 interaction rounds:")
print()
for i, s in enumerate(score_history):
    bar = '█' * (s // 20)
    print(f"  Round {i+1:2d}: {s:4d} {bar}")

final = engine.get_agent_score(alice)
print(f"\nFinal: score={final.total_score}, tier={final.tier}")

## Trust Decay

The `NetworkTrustEngine` applies **linear temporal decay** — if an agent has no
positive signals, its score decreases at a configurable rate (default: 2 points/hour).

This prevents stale agents from retaining high trust indefinitely.

In [None]:
import time

decay_engine = NetworkTrustEngine(decay_rate=2.0)

# Start an agent at score 800
agent = "did:mesh:decay-demo"
decay_engine.set_score(agent, 800)

# Simulate 48 hours of inactivity (no positive signals)
base_time = time.time()
decay_engine._last_positive[agent] = base_time

decay_log = [(0, 800.0)]
for hour in range(1, 49):
    sim_time = base_time + hour * 3600
    decay_engine.apply_temporal_decay(now=sim_time)
    current = decay_engine.get_score(agent)
    decay_log.append((hour, current))

print("Trust decay over 48 hours of inactivity (rate=2.0/hr):")
print()
for hour, score in decay_log[::4]:  # Every 4 hours
    bar = '█' * int(score // 20)
    print(f"  Hour {hour:3d}: {score:6.1f} {bar}")

print(f"\nScore dropped from 800.0 → {decay_log[-1][1]:.1f} over 48 hours")
print(f"Note: decay floors at 100 to allow recovery")

## Policy Violations & Trust Impact

When an agent violates policies or causes security events, trust drops sharply.
Critical signals (value < 0.3) trigger **immediate recalculation**.

If the score falls below **300**, automatic credential revocation is triggered.

In [None]:
carol = "did:mesh:carol"
revocations = []

# Track revocations
engine.on_revocation(lambda did, reason: revocations.append((did, reason)))

# Carol starts with some good behavior
for _ in range(5):
    engine.record_policy_compliance(carol, compliant=True)
    engine.record_security_event(carol, within_boundary=True, event_type="normal")
engine._recalculate_score(carol)
print(f"Carol after good behavior: {engine.get_agent_score(carol).total_score}")

# Now Carol starts violating policies
violation_scores = []
violations = [
    ("Policy violation: unauthorized data access", DimensionType.POLICY_COMPLIANCE, 0.0),
    ("Security breach: left trust boundary", DimensionType.SECURITY_POSTURE, 0.0),
    ("Output rejected by consumer", DimensionType.OUTPUT_QUALITY, 0.1),
    ("Resource abuse: 10x over budget", DimensionType.RESOURCE_EFFICIENCY, 0.0),
    ("Failed handoff to peer", DimensionType.COLLABORATION_HEALTH, 0.0),
    ("Critical: attempted privilege escalation", DimensionType.SECURITY_POSTURE, 0.0),
]

for desc, dim, val in violations:
    engine.record_signal(carol, dim, val, source="monitor", details=desc)
    score = engine._recalculate_score(carol)
    violation_scores.append((desc, score.total_score, score.tier))
    print(f"  ✗ {desc}")
    print(f"    → Score: {score.total_score}, Tier: {score.tier}")

print()
if revocations:
    print(f"⚠ REVOCATION triggered: {revocations[-1][1]}")
else:
    print("No revocation triggered (score stayed above threshold)")

## Trust Threshold Gates

The `ScoreThresholds` class provides gate decisions — **allow**, **warn**, or **revoke**
based on the current score. This is how the mesh decides whether an agent can perform actions.

In [None]:
thresholds = ScoreThresholds()

test_scores = [950, 750, 500, 400, 300, 200, 100]

print(f"{'Score':>6}  {'Tier':<20} {'Allow?':<8} {'Warn?':<8} {'Revoke?':<8}")
print(f"{'─'*6}  {'─'*20} {'─'*8} {'─'*8} {'─'*8}")

for score in test_scores:
    tier = thresholds.get_tier(score)
    allow = thresholds.should_allow(score)
    warn = thresholds.should_warn(score)
    revoke = thresholds.should_revoke(score)
    print(f"{score:>6}  {tier:<20} {'✓' if allow else '✗':<8} {'⚠' if warn else '—':<8} {'☠' if revoke else '—':<8}")

print()
print("Gate logic:")
print(f"  Allow  → score >= {thresholds.allow_threshold}")
print(f"  Warn   → score <  {thresholds.warn_threshold}")
print(f"  Revoke → score <  {thresholds.revocation_threshold}")

## Visualization

Let's visualize how trust scores evolve for agents with different behavior patterns.

In [None]:
try:
    import matplotlib.pyplot as plt
    HAS_MATPLOTLIB = True
except ImportError:
    HAS_MATPLOTLIB = False

# Build score timelines for all three agents
viz_engine = RewardEngine()

timelines = {"Alice (reliable)": [], "Bob (mixed)": [], "Carol (violator)": []}
agents_map = {
    "Alice (reliable)": "did:mesh:alice-viz",
    "Bob (mixed)": "did:mesh:bob-viz",
    "Carol (violator)": "did:mesh:carol-viz",
}

for step in range(30):
    # Alice: consistently good
    viz_engine.record_policy_compliance(agents_map["Alice (reliable)"], compliant=True)
    viz_engine.record_security_event(agents_map["Alice (reliable)"], True, "normal")
    viz_engine.record_output_quality(agents_map["Alice (reliable)"], True, "peer")

    # Bob: mostly good, occasional issues
    good = step % 4 != 0  # Fails every 4th round
    viz_engine.record_policy_compliance(agents_map["Bob (mixed)"], compliant=good)
    viz_engine.record_security_event(agents_map["Bob (mixed)"], good, "api_call")
    viz_engine.record_output_quality(agents_map["Bob (mixed)"], good, "peer")

    # Carol: starts ok, then increasingly bad
    carol_good = step < 10
    viz_engine.record_policy_compliance(agents_map["Carol (violator)"], compliant=carol_good)
    viz_engine.record_security_event(agents_map["Carol (violator)"], carol_good, "check")
    if step >= 15:
        viz_engine.record_signal(agents_map["Carol (violator)"],
                                  DimensionType.SECURITY_POSTURE, 0.0, "monitor")

    for label, did in agents_map.items():
        s = viz_engine._recalculate_score(did)
        timelines[label].append(s.total_score)

if HAS_MATPLOTLIB:
    fig, ax = plt.subplots(figsize=(10, 5))
    for label, scores in timelines.items():
        ax.plot(range(1, 31), scores, marker='o', markersize=3, label=label)

    # Threshold lines
    ax.axhline(y=TIER_TRUSTED_THRESHOLD, color='green', linestyle='--', alpha=0.5, label='Trusted (700)')
    ax.axhline(y=TIER_STANDARD_THRESHOLD, color='orange', linestyle='--', alpha=0.5, label='Standard (500)')
    ax.axhline(y=TRUST_REVOCATION_THRESHOLD, color='red', linestyle='--', alpha=0.5, label='Revocation (300)')

    ax.set_xlabel('Interaction Round')
    ax.set_ylabel('Trust Score (0-1000)')
    ax.set_title('Trust Score Evolution by Agent Behavior')
    ax.legend(loc='lower left', fontsize=8)
    ax.set_ylim(0, 1050)
    ax.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()
else:
    # Text-based fallback
    print("Trust Score Evolution (matplotlib not installed — text fallback)")
    print()
    for label, scores in timelines.items():
        print(f"{label}:")
        for i in range(0, 30, 5):
            bar = '█' * (scores[i] // 25)
            print(f"  Round {i+1:2d}: {scores[i]:4d} {bar}")
        print()