feat: add JSON schema validation for governance policies #305 by jhawpetoss6-collab · Pull Request #367 · microsoft/agent-governance-toolkit

jhawpetoss6-collab · 2026-03-24T02:06:40Z

This PR introduces Pydantic-based schema validation for governance policies (#305).

Changes:

Added PolicySchema in packages/agent-compliance/schemas/.
Support for field-level validation and descriptive error messages.
Improves misconfiguration detection at policy load time.

/claim #305

github-actions · 2026-03-24T02:06:58Z

Welcome to the Agent Governance Toolkit! Thanks for your first pull request.
Please ensure tests pass, code follows style (ruff check), and you have signed the CLA.
See our Contributing Guide.

github-actions

🤖 AI Agent: code-reviewer

Review of PR #305: JSON Schema Validation for Governance Policies

This PR introduces a PolicySchema using Pydantic to validate governance policies. The addition of schema validation is a significant improvement for detecting misconfigurations at policy load time. Below is a detailed review of the changes:

🔴 CRITICAL

Validation of rules Field:

The rules field is defined as List[Dict[str, Any]], which is too permissive. This could allow arbitrary data structures to pass validation, potentially leading to security vulnerabilities or policy misinterpretation.

Actionable Fix: Define a stricter schema for the rules field. For example:

class RuleSchema(BaseModel):
    type: str = Field(..., description="Type of the rule")
    condition: Dict[str, Any] = Field(..., description="Condition for the rule")
    action: str = Field(..., description="Action to be taken")

rules: List[RuleSchema] = Field(default_factory=list)

Lack of Validation for id Field:
- The id field is defined as a str but lacks constraints such as format validation (e.g., UUID or specific pattern). This could lead to non-unique or invalid identifiers being accepted.
- Actionable Fix: Add a regex constraint or use a UUID type:
```
id: str = Field(..., regex=r"^[a-zA-Z0-9-_]+$", description="Unique policy identifier")
```
Potential Injection in metadata Field:
- The metadata field allows arbitrary key-value pairs (Dict[str, Any]), which could be exploited for injection attacks if the metadata is used in sensitive contexts (e.g., logging, command execution).
- Actionable Fix: Validate the structure of metadata or sanitize its usage downstream.

🟡 WARNING

Backward Compatibility:
- Introducing schema validation could break existing policies that do not conform to the new schema. This is a breaking change if existing users have policies that do not meet the new requirements.
- Actionable Fix: Provide a migration guide or backward compatibility layer to handle older policy formats.

💡 SUGGESTIONS

Descriptive Error Messages:

While Pydantic provides error messages, they may not always be user-friendly. Consider customizing error messages for common validation failures to improve usability.

Example:

@validator("id")
def validate_id(cls, value):
    if not re.match(r"^[a-zA-Z0-9-_]+$", value):
        raise ValueError("Policy ID must be alphanumeric and can include '-' or '_'.")
    return value

Schema Versioning:
- The version field defaults to "1.0.0", but there is no mechanism to enforce or validate schema versions. This could lead to issues when policies evolve.
- Actionable Fix: Add a version validation mechanism or use an enum for supported versions:
```
from enum import Enum

class PolicyVersion(str, Enum):
    v1_0_0 = "1.0.0"
    v1_1_0 = "1.1.0"

version: PolicyVersion = Field(PolicyVersion.v1_0_0)
```
Test Coverage:
- Ensure that comprehensive tests are added for the new schema validation, including edge cases (e.g., missing fields, invalid types, malformed rules).
- Actionable Fix: Add unit tests in the corresponding test suite, e.g., tests/test_policy_schema.py.
Documentation:
- Update the documentation to include examples of valid and invalid policies, along with the expected error messages for invalid cases.

Summary of Feedback

🔴 CRITICAL: Tighten validation for rules, id, and metadata fields to prevent security vulnerabilities.
🟡 WARNING: Address potential backward compatibility issues by providing a migration guide or compatibility layer.
💡 SUGGESTION: Improve error messages, add schema versioning, and ensure comprehensive test coverage.

This PR is a solid step forward in improving policy validation but requires additional safeguards to ensure correctness and security.

github-actions · 2026-03-24T02:07:11Z

🤖 AI Agent: security-scanner — Security Review of PR #305: JSON Schema Validation for Governance Policies

Security Review of PR #305: JSON Schema Validation for Governance Policies

Summary

This PR introduces a Pydantic-based schema validation for governance policies. While this is a positive step toward ensuring that policies conform to a defined structure, there are potential security concerns that need to be addressed.

Findings

1. Deserialization Attacks

Severity: 🔴 CRITICAL
Issue: The validate_policy function directly deserializes untrusted input (data) into a PolicySchema object using PolicySchema(**data). If the data contains malicious payloads, this could lead to security vulnerabilities, especially if the rules or metadata fields are used in downstream logic without further sanitization.
Attack Vector: An attacker could craft a malicious JSON payload that exploits downstream logic, such as injecting unexpected data types or structures into rules or metadata. For example, if rules are expected to contain specific key-value pairs, an attacker could inject unexpected keys or values that bypass policy enforcement.
Fix:

Add stricter validation for the rules and metadata fields. For example, define a specific schema for rules instead of using Dict[str, Any].
Use Pydantic's Strict* types (e.g., StrictStr, StrictInt) to enforce type safety.
Add custom validators for fields that require additional checks (e.g., ensuring rules contain only allowed keys).

2. Prompt Injection Defense Bypass

Severity: 🟠 HIGH
Issue: The rules field is defined as a list of dictionaries with arbitrary keys and values (Dict[str, Any]). This could allow an attacker to inject malicious prompts or commands into the policy rules, potentially bypassing prompt injection defenses if these rules are used to generate prompts dynamically.
Attack Vector: If the rules are used to construct prompts for an AI agent, an attacker could inject malicious instructions into the rules field, leading to unintended behavior by the agent.
Fix:

Define a strict schema for rules that limits the structure and content of the dictionaries.
Implement sanitization and escaping for any user-provided strings that may be used in prompt generation.

3. Policy Engine Circumvention

Severity: 🟡 MEDIUM
Issue: The id and version fields are not validated beyond basic type checks. An attacker could potentially exploit this to inject invalid or conflicting policy versions, leading to policy engine misbehavior or circumvention.
Attack Vector: An attacker could submit a policy with a duplicate id but a higher version number, potentially overriding an existing policy with malicious rules.
Fix:

Enforce uniqueness for the id field at the database or policy storage layer.
Validate the version field to ensure it follows semantic versioning and does not conflict with existing versions.

4. Credential Exposure

Severity: 🔵 LOW
Issue: The metadata field allows arbitrary key-value pairs, which could inadvertently include sensitive information such as API keys or credentials. If this data is logged or exposed, it could lead to credential leakage.
Attack Vector: If a user includes sensitive information in the metadata field and this field is logged or exposed in error messages, an attacker could gain access to these credentials.
Fix:

Explicitly disallow sensitive keys (e.g., api_key, password) in the metadata field using a custom validator.
Avoid logging the contents of the metadata field or sanitize it before logging.

Recommendations

Harden Schema Definitions: Replace Dict[str, Any] with stricter schemas for rules and metadata. Use Pydantic's Strict* types and custom validators to enforce constraints.
Sanitize Inputs: Ensure that all user-provided inputs are sanitized and escaped, especially if they are used in prompt generation or other downstream logic.
Enforce Uniqueness and Versioning: Implement checks to ensure id uniqueness and validate version against semantic versioning standards.
Avoid Logging Sensitive Data: Ensure that sensitive information in metadata is not logged or exposed in error messages.

Suggested Fix (Code Example)

Here’s an updated version of the PolicySchema with stricter validation:

from pydantic import BaseModel, Field, StrictStr, StrictInt, validator
from typing import List, Optional, Dict, Any

class RuleSchema(BaseModel):
    action: StrictStr = Field(..., description="Action to be taken")
    resource: StrictStr = Field(..., description="Resource to which the rule applies")
    condition: Optional[Dict[StrictStr, Any]] = Field(default_factory=dict, description="Conditions for the rule")

    @validator("condition", pre=True, always=True)
    def validate_condition(cls, value):
        # Add custom validation logic for conditions if needed
        return value

class PolicySchema(BaseModel):
    """
    JSON Schema for governance policies.
    Addresses request for better validation (#305).
    """
    id: StrictStr = Field(..., description="Unique policy identifier")
    name: StrictStr = Field(..., description="Human-readable policy name")
    version: StrictStr = Field("1.0.0", description="Policy version in semantic versioning format")
    rules: List[RuleSchema] = Field(default_factory=list, description="List of policy rules")
    metadata: Optional[Dict[StrictStr, Any]] = Field(default_factory=dict, description="Additional metadata")

    @validator("version")
    def validate_version(cls, value):
        # Validate semantic versioning (e.g., using a regex or a library)
        if not re.match(r"^\d+\.\d+\.\d+$", value):
            raise ValueError("Version must follow semantic versioning (e.g., 1.0.0)")
        return value

    @validator("metadata", pre=True, always=True)
    def sanitize_metadata(cls, value):
        # Disallow sensitive keys in metadata
        sensitive_keys = {"api_key", "password", "secret"}
        if any(key in sensitive_keys for key in value.keys()):
            raise ValueError("Metadata contains sensitive keys")
        return value

def validate_policy(data: Dict[str, Any]):
    return PolicySchema(**data)

Final Assessment

Merge Readiness: 🚫 Not Ready
Required Changes: Address the critical and high-severity issues before merging.

imran-siddique

Good foundation — Pydantic is the right approach and correct directory. Needs expansion per #305: add separate schemas for access/content/cost/trust policies, add PolicyValidationError custom exception, integrate with policy loading path, add tests with valid/invalid policy files. Check packages/agent-compliance/src/ for actual policy structures.

PR microsoft#367 (PolicySchema): Add missing __init__.py for schemas package, add PolicyValidationError exception class, add return type hint and docstring to validate_policy(), wrap in try/except per issue microsoft#305 acceptance criteria. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PR #367 (PolicySchema): Add missing __init__.py for schemas package, add PolicyValidationError exception class, add return type hint and docstring to validate_policy(), wrap in try/except per issue #305 acceptance criteria. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…oft#305 (microsoft#367) Co-authored-by: Justin Skywork <justin@skywork.ai>

PR microsoft#367 (PolicySchema): Add missing __init__.py for schemas package, add PolicyValidationError exception class, add return type hint and docstring to validate_policy(), wrap in try/except per issue microsoft#305 acceptance criteria. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

feat: implement JSON schema validation for governance policies micros…

e4e14f0

…oft#305

jhawpetoss6-collab requested a review from imran-siddique as a code owner March 24, 2026 02:06

github-actions bot added the size/S Small PR (< 50 lines) label Mar 24, 2026

github-actions bot reviewed Mar 24, 2026

View reviewed changes

imran-siddique requested changes Mar 24, 2026

View reviewed changes

imran-siddique merged commit de6a485 into microsoft:main Mar 24, 2026
9 of 10 checks passed

imran-siddique mentioned this pull request Mar 24, 2026

fix: followup cleanup for merged community PRs #393

Merged

imran-siddique mentioned this pull request Mar 24, 2026

fix: post-merge review — dep confusion + pydantic dependency #411

Merged

imran-siddique pushed a commit to imran-siddique/agent-governance-toolkit that referenced this pull request Mar 24, 2026

feat: implement JSON schema validation for governance policies micros…

979c133

…oft#305 (microsoft#367) Co-authored-by: Justin Skywork <justin@skywork.ai>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add JSON schema validation for governance policies #305#367

feat: add JSON schema validation for governance policies #305#367
imran-siddique merged 1 commit intomicrosoft:mainfrom
jhawpetoss6-collab:strike/policy-schema-validation

jhawpetoss6-collab commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot commented Mar 24, 2026

Security Review of PR #305: JSON Schema Validation for Governance Policies

Summary

Findings

1. Deserialization Attacks

2. Prompt Injection Defense Bypass

3. Policy Engine Circumvention

4. Credential Exposure

Recommendations

Suggested Fix (Code Example)

Final Assessment

Uh oh!

imran-siddique left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jhawpetoss6-collab commented Mar 24, 2026

Changes:

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

🤖 AI Agent: code-reviewer

Review of PR #305: JSON Schema Validation for Governance Policies

🔴 CRITICAL

🟡 WARNING

💡 SUGGESTIONS

Summary of Feedback

Uh oh!

github-actions bot commented Mar 24, 2026

Security Review of PR #305: JSON Schema Validation for Governance Policies

Summary

Findings

1. Deserialization Attacks

2. Prompt Injection Defense Bypass

3. Policy Engine Circumvention

4. Credential Exposure

Recommendations

Suggested Fix (Code Example)

Final Assessment

Uh oh!

imran-siddique left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants