Skip to content

Fix heap OOB write in EmbedLayerNormalizationShapeInference#28176

Merged
xadupre merged 4 commits into
mainfrom
copilot/fix-heap-oob-write-onnxmodel
Apr 24, 2026
Merged

Fix heap OOB write in EmbedLayerNormalizationShapeInference#28176
xadupre merged 4 commits into
mainfrom
copilot/fix-heap-oob-write-onnxmodel

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 22, 2026

Description

EmbedLayerNormalizationShapeInference unconditionally wrote to output index 2 when getNumOutputs() == 2 && mask_index_type == 0, causing a heap out-of-bounds write during model loading — no session.run() required.

Fix: Replace the flawed condition with a simple bounds check:

// Before (vulnerable):
if (ctx.getNumOutputs() == 3 || (ctx.getNumOutputs() == 2 && mask_index_type == 0)) {
    updateOutputShape(ctx, 2, output_shape);  // OOB when numOutputs == 2
    propagateElemTypeFromInputToOutput(ctx, 0, 2);
}

// After (fixed):
if (ctx.getNumOutputs() > 2) {
    updateOutputShape(ctx, 2, output_shape);
    propagateElemTypeFromInputToOutput(ctx, 0, 2);
}

A regression test EmbedLayerNormBatch1_NoMaskIndex_NoSumOutput has been added to embed_layer_norm_op_test.cc to cover the previously vulnerable path: mask_index_type=0 with exactly 2 outputs (no embedding_sum).

Motivation and Context

A crafted ONNX model with an EmbedLayerNormalization node declaring 2 outputs and mask_index_type=0 triggers the vulnerable path. getOutputType(2) returns a pointer one past the end of the internal node_output_types_ vector; subsequent writes through that pointer corrupt adjacent heap memory. In release builds this is silent — no assertion, no crash, exploitable via heap shaping.

The embedding_sum output is always at index 2 by definition (confirmed in the CPU kernel). The old special-casing for mask_index_type == 0 was both incorrect and unnecessary.

The condition `ctx.getNumOutputs() == 2 && mask_index_type == 0` tried
to write to output index 2 when only outputs 0 and 1 existed, causing a
heap out-of-bounds write during model loading. Fix by checking
ctx.getNumOutputs() > 2 before accessing output index 2.

Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/030beaf6-1876-4192-baf4-48058a0248c6

Co-authored-by: xadupre <22452781+xadupre@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix heap OOB write in EmbedLayerNormalization shape inference Fix heap OOB write in EmbedLayerNormalizationShapeInference Apr 22, 2026
Copilot AI requested a review from xadupre April 22, 2026 08:50
@xadupre
Copy link
Copy Markdown
Member

xadupre commented Apr 22, 2026

@copilot can you add a unit test?

…eInference

Test verifies that loading a model with EmbedLayerNormalization having
mask_index_type=0 and exactly 2 outputs (no embedding_sum) no longer
triggers a heap out-of-bounds write during shape inference.

Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/1c68348d-6d40-4267-a7b7-5bbd0e2e76d4

Co-authored-by: xadupre <22452781+xadupre@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 22, 2026

@copilot can you add a unit test?

Added a C++ regression test EmbedLayerNormBatch1_NoMaskIndex_NoSumOutput in embed_layer_norm_op_test.cc (commit cc9ab1e). It reuses the existing EmbedLayerNormBatch1_EmbeddingSum_NoMaskIndex data (which has mask_index_type=0) but calls RunTest with sum_output=false, producing exactly 2 outputs. This is the configuration that triggered the heap OOB write in EmbedLayerNormalizationShapeInference before the fix.

@vraspar
Copy link
Copy Markdown
Contributor

vraspar commented Apr 22, 2026

Should we also consider guarding output-1 accesses with ctx.getNumOutputs() > 1 to prevent the same class of bug

@xadupre xadupre marked this pull request as ready for review April 24, 2026 13:54
@xadupre xadupre merged commit d762130 into main Apr 24, 2026
91 checks passed
@xadupre xadupre deleted the copilot/fix-heap-oob-write-onnxmodel branch April 24, 2026 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

onnxruntime::contrib::EmbedLayerNormalizationShapeInference - Heap OOB write via TensorShapeProto pointer

3 participants