Skip to content

[ML] Revert: Harden pytorch_inference with TorchScript model graph validation#3006

Merged
edsavage merged 1 commit intoelastic:mainfrom
edsavage:revert/pr-2999-graph-validation
Mar 20, 2026
Merged

[ML] Revert: Harden pytorch_inference with TorchScript model graph validation#3006
edsavage merged 1 commit intoelastic:mainfrom
edsavage:revert/pr-2999-graph-validation

Conversation

@edsavage
Copy link
Contributor

@prodsecmachine
Copy link

prodsecmachine commented Mar 20, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@wwang500 wwang500 self-requested a review March 20, 2026 04:52
Copy link

@wwang500 wwang500 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@edsavage edsavage merged commit ceabc9b into elastic:main Mar 20, 2026
24 checks passed
edsavage added a commit to edsavage/ml-cpp that referenced this pull request Mar 22, 2026
edsavage added a commit that referenced this pull request Mar 24, 2026
…alidation (#3008)

Reapply "[ML] Harden pytorch_inference with TorchScript model graph validation (#2999)" (#3006)

This reverts commit ceabc9b.

- Adds a static TorchScript graph validation layer (CModelGraphValidator, CSupportedOperations) that rejects models containing operations not observed in supported transformer architectures, reducing the attack surface by ensuring only known-safe operation sets are permitted.
- Includes aten::mul_ and quantized::linear_dynamic in the allowed operations for dynamically quantized models (e.g. ELSER v2 imported via Eland).
- Adds Python extraction tooling (dev-tools/extract_model_ops/) to trace reference HuggingFace models and collect their op sets, with support for quantized variants.
- Adds reference_model_ops.json golden file and C++ drift test to detect allowlist staleness on PyTorch upgrades.
- Adds adversarial "evil model" integration tests to verify rejection of forbidden operations.
- Adds CHANGELOG entry.

- Add aten::norm to graph validator allowlist

The prepacked .multilingual-e5-small model uses aten::norm for
normalization, which was not in the allowlist. This caused the
model to be rejected with "Unrecognised operations: aten::norm".

- Add multilingual-e5-small model ops to reference files

Extracted ops from intfloat/multilingual-e5-small (base and Eland
text_embedding variant) and added both to the reference golden file.

The base model uses standard XLM-RoBERTa ops. The Eland variant adds
pooling/normalization ops (linalg_vector_norm, clamp, etc.). The
prepacked .multilingual-e5-small model bundled with Elasticsearch uses
aten::norm (added to the allowlist in the previous commit).

- Add graph validator test for prepacked e5 model with aten::norm

The prepacked .multilingual-e5-small model uses aten::norm, which was
missing from the allowlist and caused production failures. This test
loads a tiny (24KB) model that mirrors the real prepacked model's graph
structure (including aten::norm) and verifies graph validation passes.

The test model was created by tracing a minimal XLM-RoBERTa-like
architecture with normalization, then patching the TorchScript IR to
use aten::norm (which modern PyTorch decomposes into
aten::linalg_vector_norm, so it can't be generated via tracing).

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants