Skip to content

feat: Removed python schema files & support JSON Schema uploads and updates to related deployment scripts#566

Merged
Prajwal-Microsoft merged 14 commits intodevfrom
feature/json-schema-support
May 6, 2026
Merged

feat: Removed python schema files & support JSON Schema uploads and updates to related deployment scripts#566
Prajwal-Microsoft merged 14 commits intodevfrom
feature/json-schema-support

Conversation

@Prajwal-Microsoft
Copy link
Copy Markdown
Collaborator

@Prajwal-Microsoft Prajwal-Microsoft commented Apr 28, 2026

Purpose

This pull request migrates the schema management system from using executable Python (.py) Pydantic classes to declarative JSON Schema (.json) documents, eliminating the need to upload or execute Python code for schema registration and processing. This change improves security by removing remote code execution risks and simplifies the schema authoring and registration process. The documentation, deployment scripts, and helper tools have all been updated to reflect and enforce this new approach.

Key changes by theme:

1. Schema Management and Security

  • All schema registration, storage, and processing now require JSON Schema (.json) files instead of Python (.py) files. The system no longer accepts .py uploads, and all references in docs and scripts have been updated accordingly. [1] [2] [3] [4] [5] [6] [7] [8] [9]

2. Documentation Updates

  • The docs/CustomizeSchemaData.md file has been thoroughly revised to describe the new workflow: authoring schemas as JSON Schema documents, registration steps, and the removal of Python class-based schemas. It includes updated diagrams, code snippets, and instructions for both individual and bulk schema registration. [1] [2] [3] [4] [5] [6] [7]

3. Deployment Script Enforcement

  • Both PowerShell (post_deployment.ps1) and Bash (post_deployment.sh) deployment scripts now enforce JSON-only schema uploads, rejecting any non-.json files and setting the correct Content-Type. [1] [2] [3]

4. Migration Helper Tool

  • Added scripts/py_schema_to_json.py, a helper script that converts legacy Pydantic .py schemas into JSON Schema format for upload, supporting a smooth migration for existing users.

5. Minor Codebase Update

  • Updated type hints in src/ContentProcessor/src/libs/pipeline/entities/schema.py to use Literal for improved type safety.

Does this introduce a breaking change?

  • Yes
  • No

Golden Path Validation

  • I have tested the primary workflows (the "golden path") to ensure they function correctly without errors.

Deployment Validation

  • I have validated the deployment process successfully and all services are running as expected with this change.

What to Check

Verify that the following are valid

  • ...

Other Information

JSON Schema Migration added 3 commits April 28, 2026 10:05
Adds a parallel JSON Schema upload path so schemas can be authored as data instead of executable Python. The worker materialises Pydantic models from JSON in memory (no exec) via the new remote_schema_loader. Legacy .py uploads continue to work unchanged. M1 of the migration plan.
…_schema.py supports .json

- Adds damagedcarimage.json, policereport.json, repairestimate.json (generated via scripts/py_schema_to_json.py).
- register_schema.py now picks the correct content-type per extension (.py -> text/x-python, .json -> application/json).
- Manifest unchanged for now; flip to .json files when ready to deprecate the legacy Python path.
…json schemas

- schema_info.json manifest now lists *.json files (was *.py).
- post_deployment.sh and post_deployment.ps1 derive multipart Content-Type per file extension (.json -> application/json, .py -> text/x-python).
- test_http/schema_API.http examples updated to upload .json samples.
- docs/CustomizeSchemaData.md sample table, mermaid diagram, and manifest example refer to .json files.
- register_schema.py docstring example updated.

Legacy .py uploads still work end-to-end; the change just flips the default authored format.
Comment thread src/ContentProcessor/src/libs/utils/remote_schema_loader.py Fixed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds first-class support for registering and using JSON Schema (.json) schema artifacts alongside legacy executable Python (.py) schemas, updating the API, worker, deployment tooling, samples, and docs to enable a safer “data-only” schema workflow.

Changes:

  • Add JSON Schema upload validation + class-name derivation in the Schema Vault API, and persist a new Format field (python | json) in schema metadata.
  • Update the worker map handler to materialize JSON Schema into in-memory Pydantic models (no code execution) while preserving legacy Python loading.
  • Refresh deployment/scripts/docs/samples/tests to use .json schemas by default and to upload the correct MIME types.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/ContentProcessorAPI/test_http/schema_API.http Updates REST client examples to upload .json schema files.
src/ContentProcessorAPI/samples/schemas/schema_info.json Points sample manifest to .json schema artifacts.
src/ContentProcessorAPI/samples/schemas/autoclaim.json Adds JSON Schema version of the autoclaim sample.
src/ContentProcessorAPI/samples/schemas/damagedcarimage.json Adds JSON Schema version of the damaged-car-image sample.
src/ContentProcessorAPI/samples/schemas/policereport.json Adds JSON Schema version of the policereport sample.
src/ContentProcessorAPI/samples/schemas/repairestimate.json Adds JSON Schema version of the repairestimate sample.
src/ContentProcessorAPI/samples/schemas/register_schema.py Updates schema registration helper to handle .json + correct MIME types.
src/ContentProcessorAPI/requirements.txt Adds jsonschema dependency for server-side JSON Schema validation.
src/ContentProcessorAPI/app/tests/routers/test_schemavault.py Extends router tests to cover JSON upload/update paths and legacy .py acceptance.
src/ContentProcessorAPI/app/tests/logics/test_schema_validator.py Adds unit tests for JSON schema validation and class-name derivation.
src/ContentProcessorAPI/app/routers/schemavault.py Adds extension/size validation, JSON schema validation path, and Format/ContentType handling.
src/ContentProcessorAPI/app/routers/models/schmavault/model.py Extends API Schema model with Format: Literal['python','json'].
src/ContentProcessorAPI/app/routers/logics/schemavault.py Extends update logic to persist Format (and continues updating metadata).
src/ContentProcessorAPI/app/routers/logics/schema_validator.py Introduces JSON Schema validator + extension keyword allowlist.
src/ContentProcessor/tests/unit/utils/test_remote_schema_loader.py Adds unit tests for JSON-schema-to-Pydantic model translation and a golden sample check.
src/ContentProcessor/src/libs/utils/remote_schema_loader.py Adds safe JSON-schema-based loader that builds Pydantic models without executing code.
src/ContentProcessor/src/libs/pipeline/handlers/map_handler.py Switches schema loading based on Schema.Format (json vs python).
src/ContentProcessor/src/libs/pipeline/entities/schema.py Adds Format field to worker-side Schema entity.
src/ContentProcessor/requirements.txt Adds jsonschema dependency to the worker requirements.
scripts/py_schema_to_json.py Adds a local conversion helper from legacy .py Pydantic models to .json schema.
infra/scripts/post_deployment.sh Uploads .json schemas with application/json and .py with text/x-python.
infra/scripts/post_deployment.ps1 Same as above for PowerShell deployments.
docs/CustomizeSchemaData.md Updates docs to recommend JSON Schema, documents workflow, and updates sample references.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/ContentProcessorAPI/app/routers/logics/schemavault.py
Comment thread src/ContentProcessorAPI/app/routers/schemavault.py Outdated
Comment thread src/ContentProcessorAPI/app/routers/schemavault.py Outdated
Comment thread src/ContentProcessor/requirements.txt Outdated
Comment thread docs/CustomizeSchemaData.md Outdated
Container image was failing at import time with ModuleNotFoundError: 'jsonschema'.
The Dockerfile installs from uv.lock via 'uv sync --frozen', so requirements.txt alone was not enough; the dep had to land in pyproject.toml + uv.lock.

ContentProcessorAPI: adds jsonschema (+ specifications, referencing, rpds-py).
ContentProcessor: pins jsonschema to 4.25.1 (was a 4.26.0 transitive).
BREAKING CHANGE: schema vault no longer accepts Python (.py) schema files.

- API rejects .py uploads with HTTP 415; only .json (JSON Schema Draft 2020-12) is accepted.
- Worker (map_handler) refuses to process schemas with Format='python'; existing Cosmos records must be re-registered as JSON.
- Deleted libs/utils/remote_module_loader.py (the exec/importlib loader that was the original RCE primitive).
- Deleted sample .py schemas; .json equivalents have been the default since the previous commit.
- register_schema.py, post_deployment.sh/ps1, .http examples, and CustomizeSchemaData.md all updated to JSON-only.
- Schema model defaults Format to 'json'; API model Literal restricted to 'json' only.
- Test suite updated: previous .py-accepting tests now assert .py is rejected.
Comment thread src/ContentProcessor/src/libs/utils/remote_schema_loader.py Dismissed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 31 changed files in this pull request and generated 8 comments.

Comment thread src/ContentProcessorAPI/app/routers/schemavault.py Outdated
Comment thread src/ContentProcessorAPI/app/routers/schemavault.py Outdated
Comment thread src/ContentProcessorAPI/app/routers/logics/schema_validator.py Outdated
Comment thread src/ContentProcessor/src/libs/pipeline/entities/schema.py Outdated
Comment thread src/ContentProcessor/src/libs/pipeline/handlers/map_handler.py
Comment thread docs/CustomizeSchemaData.md Outdated
Comment thread docs/CustomizeSchemaData.md
Comment thread infra/scripts/post_deployment.sh
Copilot AI review requested due to automatic review settings May 6, 2026 13:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 34 out of 36 changed files in this pull request and generated 2 comments.

Comment thread src/ContentProcessorAPI/app/routers/logics/schemavault.py
Comment thread src/ContentProcessorAPI/app/routers/schemavault.py
@Prajwal-Microsoft Prajwal-Microsoft changed the title feat: Support JSON Schema uploads and update related deployment scripts feat: Removed python schema files & support JSON Schema uploads and updates to related deployment scripts May 6, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 33 out of 35 changed files in this pull request and generated 4 comments.

Comment thread src/ContentProcessorAPI/test_http/schema_API.http
Comment thread docs/CustomizeSchemaData.md
Comment thread src/ContentProcessor/src/libs/utils/remote_schema_loader.py
Comment thread src/ContentProcessor/src/libs/pipeline/handlers/map_handler.py
Copilot AI review requested due to automatic review settings May 6, 2026 14:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 36 out of 38 changed files in this pull request and generated 4 comments.

Comment thread src/ContentProcessor/src/libs/pipeline/handlers/map_handler.py
Comment thread src/ContentProcessorAPI/app/routers/schemavault.py
Comment thread src/ContentProcessor/src/libs/utils/remote_schema_loader.py
Comment thread src/ContentProcessorAPI/app/routers/logics/schema_validator.py
@Prajwal-Microsoft Prajwal-Microsoft merged commit 4930a1d into dev May 6, 2026
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants