feat: Removed python schema files & support JSON Schema uploads and updates to related deployment scripts#566
Merged
Prajwal-Microsoft merged 14 commits intodevfrom May 6, 2026
Merged
Conversation
added 3 commits
April 28, 2026 10:05
Adds a parallel JSON Schema upload path so schemas can be authored as data instead of executable Python. The worker materialises Pydantic models from JSON in memory (no exec) via the new remote_schema_loader. Legacy .py uploads continue to work unchanged. M1 of the migration plan.
…_schema.py supports .json - Adds damagedcarimage.json, policereport.json, repairestimate.json (generated via scripts/py_schema_to_json.py). - register_schema.py now picks the correct content-type per extension (.py -> text/x-python, .json -> application/json). - Manifest unchanged for now; flip to .json files when ready to deprecate the legacy Python path.
…json schemas - schema_info.json manifest now lists *.json files (was *.py). - post_deployment.sh and post_deployment.ps1 derive multipart Content-Type per file extension (.json -> application/json, .py -> text/x-python). - test_http/schema_API.http examples updated to upload .json samples. - docs/CustomizeSchemaData.md sample table, mermaid diagram, and manifest example refer to .json files. - register_schema.py docstring example updated. Legacy .py uploads still work end-to-end; the change just flips the default authored format.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds first-class support for registering and using JSON Schema (.json) schema artifacts alongside legacy executable Python (.py) schemas, updating the API, worker, deployment tooling, samples, and docs to enable a safer “data-only” schema workflow.
Changes:
- Add JSON Schema upload validation + class-name derivation in the Schema Vault API, and persist a new
Formatfield (python|json) in schema metadata. - Update the worker map handler to materialize JSON Schema into in-memory Pydantic models (no code execution) while preserving legacy Python loading.
- Refresh deployment/scripts/docs/samples/tests to use
.jsonschemas by default and to upload the correct MIME types.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/ContentProcessorAPI/test_http/schema_API.http | Updates REST client examples to upload .json schema files. |
| src/ContentProcessorAPI/samples/schemas/schema_info.json | Points sample manifest to .json schema artifacts. |
| src/ContentProcessorAPI/samples/schemas/autoclaim.json | Adds JSON Schema version of the autoclaim sample. |
| src/ContentProcessorAPI/samples/schemas/damagedcarimage.json | Adds JSON Schema version of the damaged-car-image sample. |
| src/ContentProcessorAPI/samples/schemas/policereport.json | Adds JSON Schema version of the policereport sample. |
| src/ContentProcessorAPI/samples/schemas/repairestimate.json | Adds JSON Schema version of the repairestimate sample. |
| src/ContentProcessorAPI/samples/schemas/register_schema.py | Updates schema registration helper to handle .json + correct MIME types. |
| src/ContentProcessorAPI/requirements.txt | Adds jsonschema dependency for server-side JSON Schema validation. |
| src/ContentProcessorAPI/app/tests/routers/test_schemavault.py | Extends router tests to cover JSON upload/update paths and legacy .py acceptance. |
| src/ContentProcessorAPI/app/tests/logics/test_schema_validator.py | Adds unit tests for JSON schema validation and class-name derivation. |
| src/ContentProcessorAPI/app/routers/schemavault.py | Adds extension/size validation, JSON schema validation path, and Format/ContentType handling. |
| src/ContentProcessorAPI/app/routers/models/schmavault/model.py | Extends API Schema model with Format: Literal['python','json']. |
| src/ContentProcessorAPI/app/routers/logics/schemavault.py | Extends update logic to persist Format (and continues updating metadata). |
| src/ContentProcessorAPI/app/routers/logics/schema_validator.py | Introduces JSON Schema validator + extension keyword allowlist. |
| src/ContentProcessor/tests/unit/utils/test_remote_schema_loader.py | Adds unit tests for JSON-schema-to-Pydantic model translation and a golden sample check. |
| src/ContentProcessor/src/libs/utils/remote_schema_loader.py | Adds safe JSON-schema-based loader that builds Pydantic models without executing code. |
| src/ContentProcessor/src/libs/pipeline/handlers/map_handler.py | Switches schema loading based on Schema.Format (json vs python). |
| src/ContentProcessor/src/libs/pipeline/entities/schema.py | Adds Format field to worker-side Schema entity. |
| src/ContentProcessor/requirements.txt | Adds jsonschema dependency to the worker requirements. |
| scripts/py_schema_to_json.py | Adds a local conversion helper from legacy .py Pydantic models to .json schema. |
| infra/scripts/post_deployment.sh | Uploads .json schemas with application/json and .py with text/x-python. |
| infra/scripts/post_deployment.ps1 | Same as above for PowerShell deployments. |
| docs/CustomizeSchemaData.md | Updates docs to recommend JSON Schema, documents workflow, and updates sample references. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Container image was failing at import time with ModuleNotFoundError: 'jsonschema'. The Dockerfile installs from uv.lock via 'uv sync --frozen', so requirements.txt alone was not enough; the dep had to land in pyproject.toml + uv.lock. ContentProcessorAPI: adds jsonschema (+ specifications, referencing, rpds-py). ContentProcessor: pins jsonschema to 4.25.1 (was a 4.26.0 transitive).
BREAKING CHANGE: schema vault no longer accepts Python (.py) schema files. - API rejects .py uploads with HTTP 415; only .json (JSON Schema Draft 2020-12) is accepted. - Worker (map_handler) refuses to process schemas with Format='python'; existing Cosmos records must be re-registered as JSON. - Deleted libs/utils/remote_module_loader.py (the exec/importlib loader that was the original RCE primitive). - Deleted sample .py schemas; .json equivalents have been the default since the previous commit. - register_schema.py, post_deployment.sh/ps1, .http examples, and CustomizeSchemaData.md all updated to JSON-only. - Schema model defaults Format to 'json'; API model Literal restricted to 'json' only. - Test suite updated: previous .py-accepting tests now assert .py is rejected.
Roopan-Microsoft
approved these changes
May 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
This pull request migrates the schema management system from using executable Python (
.py) Pydantic classes to declarative JSON Schema (.json) documents, eliminating the need to upload or execute Python code for schema registration and processing. This change improves security by removing remote code execution risks and simplifies the schema authoring and registration process. The documentation, deployment scripts, and helper tools have all been updated to reflect and enforce this new approach.Key changes by theme:
1. Schema Management and Security
.json) files instead of Python (.py) files. The system no longer accepts.pyuploads, and all references in docs and scripts have been updated accordingly. [1] [2] [3] [4] [5] [6] [7] [8] [9]2. Documentation Updates
docs/CustomizeSchemaData.mdfile has been thoroughly revised to describe the new workflow: authoring schemas as JSON Schema documents, registration steps, and the removal of Python class-based schemas. It includes updated diagrams, code snippets, and instructions for both individual and bulk schema registration. [1] [2] [3] [4] [5] [6] [7]3. Deployment Script Enforcement
post_deployment.ps1) and Bash (post_deployment.sh) deployment scripts now enforce JSON-only schema uploads, rejecting any non-.jsonfiles and setting the correctContent-Type. [1] [2] [3]4. Migration Helper Tool
scripts/py_schema_to_json.py, a helper script that converts legacy Pydantic.pyschemas into JSON Schema format for upload, supporting a smooth migration for existing users.5. Minor Codebase Update
src/ContentProcessor/src/libs/pipeline/entities/schema.pyto useLiteralfor improved type safety.Does this introduce a breaking change?
Golden Path Validation
Deployment Validation
What to Check
Verify that the following are valid
Other Information