Skip to content

chore: change default static input shapes for coreml#206

Merged
jamjamjon merged 1 commit intojamjamjon:mainfrom
wep21:fix-coreml
Jan 21, 2026
Merged

chore: change default static input shapes for coreml#206
jamjamjon merged 1 commit intojamjamjon:mainfrom
wep21:fix-coreml

Conversation

@wep21
Copy link
Copy Markdown
Contributor

@wep21 wep21 commented Jan 19, 2026

  • apply static shape config if device is coreml
  • update ort api for coreml

@jamjamjon
Copy link
Copy Markdown
Owner

@wep21 Hi, thanks a lot for the PR and for working on the CoreML integration — really appreciate it 🙏

I agree that enabling static input shapes can improve CoreML performance in cases where input shapes are fixed.

That said, I wanted to double-check one scenario before merging this: in this project we also support super-resolution models, where the input H/W may vary per image.

My understanding is that forcing static_input_shapes = true in such cases may cause CoreML EP to skip parts of the graph (or even the whole graph) and fall back to CPU, potentially hurting performance — but I’d like to confirm this with you.

Have you tested this configuration with dynamic-resolution inputs (e.g. SR models with varying image sizes)? If so, I’d be very interested in the results.

In the meantime, I’ll also take a closer look at the relevant CoreML / ORT APIs to better understand the exact behavior here.

Thanks again for the contribution!

@wep21
Copy link
Copy Markdown
Contributor Author

wep21 commented Jan 20, 2026

I've encountered some error when trying some error, so I've applied static shape to all examples, some may not cause error though.

dwpose

    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.34s
     Running `target/debug/examples/pose-estimation dwpose --dtype f16 --device coreml`
      DryRun yolo/v8-n-det |███████             | 1/3 (124.479/s)                                                                2026-01-20 10:48:02.387 pose-estimation[13363:11260759] 2026-01-20 10:48:02.387131 [E:onnxruntime:, sequential_executor.cc:572 ExecuteKernel] Non-zero status code returned while running 6606494152137530418_CoreML_6606494152137530418_4 node. Name:'CoreMLExecutionProvider_6606494152137530418_CoreML_6606494152137530418_4_4' Status Message: output_features has no value for _model_22_dfl_conv_Conv_output_0
Error: Non-zero status code returned while running 6606494152137530418_CoreML_6606494152137530418_4 node. Name:'CoreMLExecutionProvider_6606494152137530418_CoreML_6606494152137530418_4_4' Status Message: output_features has no value for _model_22_dfl_conv_Conv_output_0

rtdetr

    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.46s
     Running `target/debug/examples/object-detection rtdetr --dtype fp16 --device coreml`
2026-01-20 10:58:32.534 object-detection[14003:11268733] 2026-01-20 10:58:32.534044 [E:onnxruntime:, inference_session.cc:2544 operator()] Exception during initialization: /Users/runner/work/ort-artifacts/ort-artifacts/onnxruntime/onnxruntime/core/graph/graph_utils.cc:30 int onnxruntime::graph_utils::GetIndexFromName(const Node &, const std::string &, bool) itr != node_args.end() was false. Attempting to get index by a name which does not exist:InsertedPrecisionFreeCast_/model/encoder/encoder.0/layers.0/norm1/Constant_output_0for node: /model/encoder/encoder.0/layers.0/norm2/Mul_75736c73/SimplifiedLayerNormFusion/
Error: Exception during initialization: /Users/runner/work/ort-artifacts/ort-artifacts/onnxruntime/onnxruntime/core/graph/graph_utils.cc:30 int onnxruntime::graph_utils::GetIndexFromName(const Node &, const std::string &, bool) itr != node_args.end() was false. Attempting to get index by a name which does not exist:InsertedPrecisionFreeCast_/model/encoder/encoder.0/layers.0/norm1/Constant_output_0for node: /model/encoder/encoder.0/layers.0/norm2/Mul_75736c73/SimplifiedLayerNormFusion/

Have you tested this configuration with dynamic-resolution inputs (e.g. SR models with varying image sizes)? If so, I’d be very interested in the results.

I will try this configuration.

@wep21 wep21 marked this pull request as draft January 20, 2026 04:02
@jamjamjon
Copy link
Copy Markdown
Owner

@wep21 Hi! I made some tests and all tests were performed on Apple M4 Mac mini using ONNX Runtime CoreML Execution Provider.

Overview

Model static_input_shapes FP32 FP16 Notes
DWpose (RTMPose) false Stable
YOLO v26 false Stable (FP16 / FP32 / Q8)
RT-DETR v4 false Not usable
RT-DETR v4 true FP32 only

Detailed Findings

1. DWpose (Pose Estimation)

  • Tested with static_input_shapes = false
  • Both FP32 and FP16 run successfully
  • No CoreML compilation or ONNX Runtime initialization issues

Conclusion:
DWpose is fully compatible with CoreML EP under dynamic shapes and mixed precision.

2. YOLO v26 (Object Detection)

  • Tested with static_input_shapes = false
  • All tested precisions work:
  • FP32
  • FP16
  • Q8

Conclusion:
YOLO models are stable and well supported by CoreML EP across different precisions.

3. RT-DETR v4 (Transformer-based Detector)

static_input_shapes = false

  • FP32:
    Fails during CoreML compilation with: mps.matmul contracting dimensions differ (1 vs 256)
    Indicates unsupported matmul broadcast semantics in CoreML / MPSGraph.

  • FP16:
    Fails during ONNX Runtime session initialization due to:InsertedPrecisionFreeCast + SimplifiedLayerNormFusion
    This is a known CoreML EP FP16 graph rewrite issue.

static_input_shapes = true

  • FP32: Runs successfully
  • FP16: Same ORT initialization failure as above

Conclusion:
RT-DETR can only run on CoreML EP in FP32 with static input shapes enabled.
FP16 is currently not usable due to CoreML EP graph transformation limitations.

Overall Conclusions

  • CoreML EP works reliably for CNN-based models (YOLO, DWpose).
  • Transformer-based models (RT-DETR) expose multiple CoreML limitations:
      1. Unsupported matmul broadcasting during CoreML compilation
      1. FP16 precision-cast and LayerNorm fusion conflict in ONNX Runtime
  • These issues originate from CoreML EP / ONNX Runtime internals, not from usls.

Recommendation

  • Use CoreML EP for CNN-style models.
  • Use CPU / CUDA / TensorRT for RT-DETR or other Transformer-based detectors.
  • Avoid FP16 + CoreML EP for Transformer models until upstream issues are resolved.

Additional Notes

Thank you very much for the PR — I really appreciate you taking the time to investigate this and propose a solution.

Enabling static_input_shapes = true is definitely the right direction, and your PR helped clarify where the CoreML limitation actually comes from. That was very helpful.

That said, I think the current implementation is a bit more complex than necessary. To make this easier to maintain and more consistent with the existing config design, we can configure the CoreML ONNX engine more directly using the following APIs:

  • Config::dwpose_133_t().with_model_coreml_static_input_shapes(true)
  • Config::dwpose_133_t().with_module_coreml_static_input_shapes(Module::Model, true)
  • Config::dwpose_133_t().with_coreml_static_input_shapes_all(true)

These options provide a simpler and more explicit way to enable static input shapes for CoreML.

In addition to the three configuration APIs listed above, another option would be to change the default behavior and enable static_input_shapes = true by default in the CoreML EP config. The relevant code is here:

/// Apple CoreML execution provider configuration.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CoreMlConfig {
/// Use static input shapes for optimization.
pub static_input_shapes: bool,
/// Enable subgraph running mode.
pub subgraph_running: bool,
/// Model format: MLProgram or NeuralNetwork.
pub model_format: u8,
/// Compute units: All, CPUAndGPU, CPUAndNeuralEngine, or CPUOnly.
pub compute_units: u8,
/// Specialization strategy: Default, FastPrediction, or FastCompilation.
pub specialization_strategy: u8,
}

If you’re interested, you’re very welcome to update the PR to take this approach, or we can iterate on it together. I’d be happy to review and help refine the change.

Thanks again for the contribution!

Signed-off-by: wep21 <daisuke.nishimatsu1021@gmail.com>
@wep21 wep21 changed the title fix: update ort api for coreml chore: change default static input shapes for coreml Jan 20, 2026
@wep21 wep21 marked this pull request as ready for review January 20, 2026 16:58
@wep21
Copy link
Copy Markdown
Contributor Author

wep21 commented Jan 20, 2026

@jamjamjon Thank you for the further investigation and detailed implementation suggestion.
In this PR, I decide to just change the default of static input shapes of coreml and I've confirmed it works with rtdetr v4 in fp32 precision.

@jamjamjon jamjamjon merged commit 2d052ce into jamjamjon:main Jan 21, 2026
17 checks passed
@wep21 wep21 deleted the fix-coreml branch January 21, 2026 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants