Skip to content

feat: add JSON schema validation for governance policies #305#367

Merged
imran-siddique merged 1 commit intomicrosoft:mainfrom
jhawpetoss6-collab:strike/policy-schema-validation
Mar 24, 2026
Merged

feat: add JSON schema validation for governance policies #305#367
imran-siddique merged 1 commit intomicrosoft:mainfrom
jhawpetoss6-collab:strike/policy-schema-validation

Conversation

@jhawpetoss6-collab
Copy link
Contributor

This PR introduces Pydantic-based schema validation for governance policies (#305).

Changes:

  • Added PolicySchema in packages/agent-compliance/schemas/.
  • Support for field-level validation and descriptive error messages.
  • Improves misconfiguration detection at policy load time.

/claim #305

@github-actions
Copy link

Welcome to the Agent Governance Toolkit! Thanks for your first pull request.
Please ensure tests pass, code follows style (ruff check), and you have signed the CLA.
See our Contributing Guide.

@github-actions github-actions bot added the size/S Small PR (< 50 lines) label Mar 24, 2026
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review of PR #305: JSON Schema Validation for Governance Policies

This PR introduces a PolicySchema using Pydantic to validate governance policies. The addition of schema validation is a significant improvement for detecting misconfigurations at policy load time. Below is a detailed review of the changes:


🔴 CRITICAL

  1. Validation of rules Field:

    • The rules field is defined as List[Dict[str, Any]], which is too permissive. This could allow arbitrary data structures to pass validation, potentially leading to security vulnerabilities or policy misinterpretation.
    • Actionable Fix: Define a stricter schema for the rules field. For example:
      class RuleSchema(BaseModel):
          type: str = Field(..., description="Type of the rule")
          condition: Dict[str, Any] = Field(..., description="Condition for the rule")
          action: str = Field(..., description="Action to be taken")
      
      rules: List[RuleSchema] = Field(default_factory=list)
  2. Lack of Validation for id Field:

    • The id field is defined as a str but lacks constraints such as format validation (e.g., UUID or specific pattern). This could lead to non-unique or invalid identifiers being accepted.
    • Actionable Fix: Add a regex constraint or use a UUID type:
      id: str = Field(..., regex=r"^[a-zA-Z0-9-_]+$", description="Unique policy identifier")
  3. Potential Injection in metadata Field:

    • The metadata field allows arbitrary key-value pairs (Dict[str, Any]), which could be exploited for injection attacks if the metadata is used in sensitive contexts (e.g., logging, command execution).
    • Actionable Fix: Validate the structure of metadata or sanitize its usage downstream.

🟡 WARNING

  1. Backward Compatibility:
    • Introducing schema validation could break existing policies that do not conform to the new schema. This is a breaking change if existing users have policies that do not meet the new requirements.
    • Actionable Fix: Provide a migration guide or backward compatibility layer to handle older policy formats.

💡 SUGGESTIONS

  1. Descriptive Error Messages:

    • While Pydantic provides error messages, they may not always be user-friendly. Consider customizing error messages for common validation failures to improve usability.
    • Example:
      @validator("id")
      def validate_id(cls, value):
          if not re.match(r"^[a-zA-Z0-9-_]+$", value):
              raise ValueError("Policy ID must be alphanumeric and can include '-' or '_'.")
          return value
  2. Schema Versioning:

    • The version field defaults to "1.0.0", but there is no mechanism to enforce or validate schema versions. This could lead to issues when policies evolve.
    • Actionable Fix: Add a version validation mechanism or use an enum for supported versions:
      from enum import Enum
      
      class PolicyVersion(str, Enum):
          v1_0_0 = "1.0.0"
          v1_1_0 = "1.1.0"
      
      version: PolicyVersion = Field(PolicyVersion.v1_0_0)
  3. Test Coverage:

    • Ensure that comprehensive tests are added for the new schema validation, including edge cases (e.g., missing fields, invalid types, malformed rules).
    • Actionable Fix: Add unit tests in the corresponding test suite, e.g., tests/test_policy_schema.py.
  4. Documentation:

    • Update the documentation to include examples of valid and invalid policies, along with the expected error messages for invalid cases.

Summary of Feedback

  • 🔴 CRITICAL: Tighten validation for rules, id, and metadata fields to prevent security vulnerabilities.
  • 🟡 WARNING: Address potential backward compatibility issues by providing a migration guide or compatibility layer.
  • 💡 SUGGESTION: Improve error messages, add schema versioning, and ensure comprehensive test coverage.

This PR is a solid step forward in improving policy validation but requires additional safeguards to ensure correctness and security.

@github-actions
Copy link

🤖 AI Agent: security-scanner — Security Review of PR #305: JSON Schema Validation for Governance Policies

Security Review of PR #305: JSON Schema Validation for Governance Policies

Summary

This PR introduces a Pydantic-based schema validation for governance policies. While this is a positive step toward ensuring that policies conform to a defined structure, there are potential security concerns that need to be addressed.


Findings

1. Deserialization Attacks

Severity: 🔴 CRITICAL
Issue: The validate_policy function directly deserializes untrusted input (data) into a PolicySchema object using PolicySchema(**data). If the data contains malicious payloads, this could lead to security vulnerabilities, especially if the rules or metadata fields are used in downstream logic without further sanitization.
Attack Vector: An attacker could craft a malicious JSON payload that exploits downstream logic, such as injecting unexpected data types or structures into rules or metadata. For example, if rules are expected to contain specific key-value pairs, an attacker could inject unexpected keys or values that bypass policy enforcement.
Fix:

  • Add stricter validation for the rules and metadata fields. For example, define a specific schema for rules instead of using Dict[str, Any].
  • Use Pydantic's Strict* types (e.g., StrictStr, StrictInt) to enforce type safety.
  • Add custom validators for fields that require additional checks (e.g., ensuring rules contain only allowed keys).

2. Prompt Injection Defense Bypass

Severity: 🟠 HIGH
Issue: The rules field is defined as a list of dictionaries with arbitrary keys and values (Dict[str, Any]). This could allow an attacker to inject malicious prompts or commands into the policy rules, potentially bypassing prompt injection defenses if these rules are used to generate prompts dynamically.
Attack Vector: If the rules are used to construct prompts for an AI agent, an attacker could inject malicious instructions into the rules field, leading to unintended behavior by the agent.
Fix:

  • Define a strict schema for rules that limits the structure and content of the dictionaries.
  • Implement sanitization and escaping for any user-provided strings that may be used in prompt generation.

3. Policy Engine Circumvention

Severity: 🟡 MEDIUM
Issue: The id and version fields are not validated beyond basic type checks. An attacker could potentially exploit this to inject invalid or conflicting policy versions, leading to policy engine misbehavior or circumvention.
Attack Vector: An attacker could submit a policy with a duplicate id but a higher version number, potentially overriding an existing policy with malicious rules.
Fix:

  • Enforce uniqueness for the id field at the database or policy storage layer.
  • Validate the version field to ensure it follows semantic versioning and does not conflict with existing versions.

4. Credential Exposure

Severity: 🔵 LOW
Issue: The metadata field allows arbitrary key-value pairs, which could inadvertently include sensitive information such as API keys or credentials. If this data is logged or exposed, it could lead to credential leakage.
Attack Vector: If a user includes sensitive information in the metadata field and this field is logged or exposed in error messages, an attacker could gain access to these credentials.
Fix:

  • Explicitly disallow sensitive keys (e.g., api_key, password) in the metadata field using a custom validator.
  • Avoid logging the contents of the metadata field or sanitize it before logging.

Recommendations

  1. Harden Schema Definitions: Replace Dict[str, Any] with stricter schemas for rules and metadata. Use Pydantic's Strict* types and custom validators to enforce constraints.
  2. Sanitize Inputs: Ensure that all user-provided inputs are sanitized and escaped, especially if they are used in prompt generation or other downstream logic.
  3. Enforce Uniqueness and Versioning: Implement checks to ensure id uniqueness and validate version against semantic versioning standards.
  4. Avoid Logging Sensitive Data: Ensure that sensitive information in metadata is not logged or exposed in error messages.

Suggested Fix (Code Example)

Here’s an updated version of the PolicySchema with stricter validation:

from pydantic import BaseModel, Field, StrictStr, StrictInt, validator
from typing import List, Optional, Dict, Any

class RuleSchema(BaseModel):
    action: StrictStr = Field(..., description="Action to be taken")
    resource: StrictStr = Field(..., description="Resource to which the rule applies")
    condition: Optional[Dict[StrictStr, Any]] = Field(default_factory=dict, description="Conditions for the rule")

    @validator("condition", pre=True, always=True)
    def validate_condition(cls, value):
        # Add custom validation logic for conditions if needed
        return value

class PolicySchema(BaseModel):
    """
    JSON Schema for governance policies.
    Addresses request for better validation (#305).
    """
    id: StrictStr = Field(..., description="Unique policy identifier")
    name: StrictStr = Field(..., description="Human-readable policy name")
    version: StrictStr = Field("1.0.0", description="Policy version in semantic versioning format")
    rules: List[RuleSchema] = Field(default_factory=list, description="List of policy rules")
    metadata: Optional[Dict[StrictStr, Any]] = Field(default_factory=dict, description="Additional metadata")

    @validator("version")
    def validate_version(cls, value):
        # Validate semantic versioning (e.g., using a regex or a library)
        if not re.match(r"^\d+\.\d+\.\d+$", value):
            raise ValueError("Version must follow semantic versioning (e.g., 1.0.0)")
        return value

    @validator("metadata", pre=True, always=True)
    def sanitize_metadata(cls, value):
        # Disallow sensitive keys in metadata
        sensitive_keys = {"api_key", "password", "secret"}
        if any(key in sensitive_keys for key in value.keys()):
            raise ValueError("Metadata contains sensitive keys")
        return value

def validate_policy(data: Dict[str, Any]):
    return PolicySchema(**data)

Final Assessment

  • Merge Readiness: 🚫 Not Ready
  • Required Changes: Address the critical and high-severity issues before merging.

Copy link
Member

@imran-siddique imran-siddique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good foundation — Pydantic is the right approach and correct directory. Needs expansion per #305: add separate schemas for access/content/cost/trust policies, add PolicyValidationError custom exception, integrate with policy loading path, add tests with valid/invalid policy files. Check packages/agent-compliance/src/ for actual policy structures.

@imran-siddique imran-siddique merged commit de6a485 into microsoft:main Mar 24, 2026
9 of 10 checks passed
imran-siddique added a commit to imran-siddique/agent-governance-toolkit that referenced this pull request Mar 24, 2026
PR microsoft#367 (PolicySchema): Add missing __init__.py for schemas package,
add PolicyValidationError exception class, add return type hint and
docstring to validate_policy(), wrap in try/except per issue microsoft#305
acceptance criteria.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
imran-siddique added a commit that referenced this pull request Mar 24, 2026
PR #367 (PolicySchema): Add missing __init__.py for schemas package,
add PolicyValidationError exception class, add return type hint and
docstring to validate_policy(), wrap in try/except per issue #305
acceptance criteria.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
imran-siddique pushed a commit to imran-siddique/agent-governance-toolkit that referenced this pull request Mar 24, 2026
imran-siddique added a commit to imran-siddique/agent-governance-toolkit that referenced this pull request Mar 24, 2026
PR microsoft#367 (PolicySchema): Add missing __init__.py for schemas package,
add PolicyValidationError exception class, add return type hint and
docstring to validate_policy(), wrap in try/except per issue microsoft#305
acceptance criteria.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/S Small PR (< 50 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants