feat: add JSON schema validation for governance policies #305#367
Conversation
|
Welcome to the Agent Governance Toolkit! Thanks for your first pull request. |
There was a problem hiding this comment.
🤖 AI Agent: code-reviewer
Review of PR #305: JSON Schema Validation for Governance Policies
This PR introduces a PolicySchema using Pydantic to validate governance policies. The addition of schema validation is a significant improvement for detecting misconfigurations at policy load time. Below is a detailed review of the changes:
🔴 CRITICAL
-
Validation of
rulesField:- The
rulesfield is defined asList[Dict[str, Any]], which is too permissive. This could allow arbitrary data structures to pass validation, potentially leading to security vulnerabilities or policy misinterpretation. - Actionable Fix: Define a stricter schema for the
rulesfield. For example:class RuleSchema(BaseModel): type: str = Field(..., description="Type of the rule") condition: Dict[str, Any] = Field(..., description="Condition for the rule") action: str = Field(..., description="Action to be taken") rules: List[RuleSchema] = Field(default_factory=list)
- The
-
Lack of Validation for
idField:- The
idfield is defined as astrbut lacks constraints such as format validation (e.g., UUID or specific pattern). This could lead to non-unique or invalid identifiers being accepted. - Actionable Fix: Add a regex constraint or use a UUID type:
id: str = Field(..., regex=r"^[a-zA-Z0-9-_]+$", description="Unique policy identifier")
- The
-
Potential Injection in
metadataField:- The
metadatafield allows arbitrary key-value pairs (Dict[str, Any]), which could be exploited for injection attacks if the metadata is used in sensitive contexts (e.g., logging, command execution). - Actionable Fix: Validate the structure of
metadataor sanitize its usage downstream.
- The
🟡 WARNING
- Backward Compatibility:
- Introducing schema validation could break existing policies that do not conform to the new schema. This is a breaking change if existing users have policies that do not meet the new requirements.
- Actionable Fix: Provide a migration guide or backward compatibility layer to handle older policy formats.
💡 SUGGESTIONS
-
Descriptive Error Messages:
- While Pydantic provides error messages, they may not always be user-friendly. Consider customizing error messages for common validation failures to improve usability.
- Example:
@validator("id") def validate_id(cls, value): if not re.match(r"^[a-zA-Z0-9-_]+$", value): raise ValueError("Policy ID must be alphanumeric and can include '-' or '_'.") return value
-
Schema Versioning:
- The
versionfield defaults to"1.0.0", but there is no mechanism to enforce or validate schema versions. This could lead to issues when policies evolve. - Actionable Fix: Add a version validation mechanism or use an enum for supported versions:
from enum import Enum class PolicyVersion(str, Enum): v1_0_0 = "1.0.0" v1_1_0 = "1.1.0" version: PolicyVersion = Field(PolicyVersion.v1_0_0)
- The
-
Test Coverage:
- Ensure that comprehensive tests are added for the new schema validation, including edge cases (e.g., missing fields, invalid types, malformed
rules). - Actionable Fix: Add unit tests in the corresponding test suite, e.g.,
tests/test_policy_schema.py.
- Ensure that comprehensive tests are added for the new schema validation, including edge cases (e.g., missing fields, invalid types, malformed
-
Documentation:
- Update the documentation to include examples of valid and invalid policies, along with the expected error messages for invalid cases.
Summary of Feedback
- 🔴 CRITICAL: Tighten validation for
rules,id, andmetadatafields to prevent security vulnerabilities. - 🟡 WARNING: Address potential backward compatibility issues by providing a migration guide or compatibility layer.
- 💡 SUGGESTION: Improve error messages, add schema versioning, and ensure comprehensive test coverage.
This PR is a solid step forward in improving policy validation but requires additional safeguards to ensure correctness and security.
🤖 AI Agent: security-scanner — Security Review of PR #305: JSON Schema Validation for Governance PoliciesSecurity Review of PR #305: JSON Schema Validation for Governance PoliciesSummaryThis PR introduces a Pydantic-based schema validation for governance policies. While this is a positive step toward ensuring that policies conform to a defined structure, there are potential security concerns that need to be addressed. Findings1. Deserialization AttacksSeverity: 🔴 CRITICAL
2. Prompt Injection Defense BypassSeverity: 🟠 HIGH
3. Policy Engine CircumventionSeverity: 🟡 MEDIUM
4. Credential ExposureSeverity: 🔵 LOW
Recommendations
Suggested Fix (Code Example)Here’s an updated version of the from pydantic import BaseModel, Field, StrictStr, StrictInt, validator
from typing import List, Optional, Dict, Any
class RuleSchema(BaseModel):
action: StrictStr = Field(..., description="Action to be taken")
resource: StrictStr = Field(..., description="Resource to which the rule applies")
condition: Optional[Dict[StrictStr, Any]] = Field(default_factory=dict, description="Conditions for the rule")
@validator("condition", pre=True, always=True)
def validate_condition(cls, value):
# Add custom validation logic for conditions if needed
return value
class PolicySchema(BaseModel):
"""
JSON Schema for governance policies.
Addresses request for better validation (#305).
"""
id: StrictStr = Field(..., description="Unique policy identifier")
name: StrictStr = Field(..., description="Human-readable policy name")
version: StrictStr = Field("1.0.0", description="Policy version in semantic versioning format")
rules: List[RuleSchema] = Field(default_factory=list, description="List of policy rules")
metadata: Optional[Dict[StrictStr, Any]] = Field(default_factory=dict, description="Additional metadata")
@validator("version")
def validate_version(cls, value):
# Validate semantic versioning (e.g., using a regex or a library)
if not re.match(r"^\d+\.\d+\.\d+$", value):
raise ValueError("Version must follow semantic versioning (e.g., 1.0.0)")
return value
@validator("metadata", pre=True, always=True)
def sanitize_metadata(cls, value):
# Disallow sensitive keys in metadata
sensitive_keys = {"api_key", "password", "secret"}
if any(key in sensitive_keys for key in value.keys()):
raise ValueError("Metadata contains sensitive keys")
return value
def validate_policy(data: Dict[str, Any]):
return PolicySchema(**data)Final Assessment
|
imran-siddique
left a comment
There was a problem hiding this comment.
Good foundation — Pydantic is the right approach and correct directory. Needs expansion per #305: add separate schemas for access/content/cost/trust policies, add PolicyValidationError custom exception, integrate with policy loading path, add tests with valid/invalid policy files. Check packages/agent-compliance/src/ for actual policy structures.
PR microsoft#367 (PolicySchema): Add missing __init__.py for schemas package, add PolicyValidationError exception class, add return type hint and docstring to validate_policy(), wrap in try/except per issue microsoft#305 acceptance criteria. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…oft#305 (microsoft#367) Co-authored-by: Justin Skywork <justin@skywork.ai>
PR microsoft#367 (PolicySchema): Add missing __init__.py for schemas package, add PolicyValidationError exception class, add return type hint and docstring to validate_policy(), wrap in try/except per issue microsoft#305 acceptance criteria. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This PR introduces Pydantic-based schema validation for governance policies (#305).
Changes:
PolicySchemainpackages/agent-compliance/schemas/./claim #305