chore: change default static input shapes for coreml by wep21 · Pull Request #206 · jamjamjon/usls

wep21 · 2026-01-19T09:18:47Z

apply static shape config if device is coreml
update ort api for coreml

jamjamjon · 2026-01-19T15:37:42Z

@wep21 Hi, thanks a lot for the PR and for working on the CoreML integration — really appreciate it 🙏

I agree that enabling static input shapes can improve CoreML performance in cases where input shapes are fixed.

That said, I wanted to double-check one scenario before merging this: in this project we also support super-resolution models, where the input H/W may vary per image.

My understanding is that forcing static_input_shapes = true in such cases may cause CoreML EP to skip parts of the graph (or even the whole graph) and fall back to CPU, potentially hurting performance — but I’d like to confirm this with you.

Have you tested this configuration with dynamic-resolution inputs (e.g. SR models with varying image sizes)? If so, I’d be very interested in the results.

In the meantime, I’ll also take a closer look at the relevant CoreML / ORT APIs to better understand the exact behavior here.

Thanks again for the contribution!

wep21 · 2026-01-20T01:59:08Z

I've encountered some error when trying some error, so I've applied static shape to all examples, some may not cause error though.

dwpose

    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.34s
     Running `target/debug/examples/pose-estimation dwpose --dtype f16 --device coreml`
      DryRun yolo/v8-n-det |███████             | 1/3 (124.479/s)                                                                2026-01-20 10:48:02.387 pose-estimation[13363:11260759] 2026-01-20 10:48:02.387131 [E:onnxruntime:, sequential_executor.cc:572 ExecuteKernel] Non-zero status code returned while running 6606494152137530418_CoreML_6606494152137530418_4 node. Name:'CoreMLExecutionProvider_6606494152137530418_CoreML_6606494152137530418_4_4' Status Message: output_features has no value for _model_22_dfl_conv_Conv_output_0
Error: Non-zero status code returned while running 6606494152137530418_CoreML_6606494152137530418_4 node. Name:'CoreMLExecutionProvider_6606494152137530418_CoreML_6606494152137530418_4_4' Status Message: output_features has no value for _model_22_dfl_conv_Conv_output_0

rtdetr

    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.46s
     Running `target/debug/examples/object-detection rtdetr --dtype fp16 --device coreml`
2026-01-20 10:58:32.534 object-detection[14003:11268733] 2026-01-20 10:58:32.534044 [E:onnxruntime:, inference_session.cc:2544 operator()] Exception during initialization: /Users/runner/work/ort-artifacts/ort-artifacts/onnxruntime/onnxruntime/core/graph/graph_utils.cc:30 int onnxruntime::graph_utils::GetIndexFromName(const Node &, const std::string &, bool) itr != node_args.end() was false. Attempting to get index by a name which does not exist:InsertedPrecisionFreeCast_/model/encoder/encoder.0/layers.0/norm1/Constant_output_0for node: /model/encoder/encoder.0/layers.0/norm2/Mul_75736c73/SimplifiedLayerNormFusion/
Error: Exception during initialization: /Users/runner/work/ort-artifacts/ort-artifacts/onnxruntime/onnxruntime/core/graph/graph_utils.cc:30 int onnxruntime::graph_utils::GetIndexFromName(const Node &, const std::string &, bool) itr != node_args.end() was false. Attempting to get index by a name which does not exist:InsertedPrecisionFreeCast_/model/encoder/encoder.0/layers.0/norm1/Constant_output_0for node: /model/encoder/encoder.0/layers.0/norm2/Mul_75736c73/SimplifiedLayerNormFusion/

Have you tested this configuration with dynamic-resolution inputs (e.g. SR models with varying image sizes)? If so, I’d be very interested in the results.

I will try this configuration.

jamjamjon · 2026-01-20T14:53:03Z

@wep21 Hi! I made some tests and all tests were performed on Apple M4 Mac mini using ONNX Runtime CoreML Execution Provider.

Overview

Model	static_input_shapes	FP32	FP16	Notes
DWpose (RTMPose)	false	✅	✅	Stable
YOLO v26	false	✅	✅	Stable (FP16 / FP32 / Q8)
RT-DETR v4	false	❌	❌	Not usable
RT-DETR v4	true	✅	❌	FP32 only

Detailed Findings

1. DWpose (Pose Estimation)

Tested with static_input_shapes = false
Both FP32 and FP16 run successfully
No CoreML compilation or ONNX Runtime initialization issues

Conclusion:
DWpose is fully compatible with CoreML EP under dynamic shapes and mixed precision.

2. YOLO v26 (Object Detection)

Tested with static_input_shapes = false
All tested precisions work:
FP32
FP16
Q8

Conclusion:
YOLO models are stable and well supported by CoreML EP across different precisions.

3. RT-DETR v4 (Transformer-based Detector)

static_input_shapes = false

FP32:
Fails during CoreML compilation with: mps.matmul contracting dimensions differ (1 vs 256)
Indicates unsupported matmul broadcast semantics in CoreML / MPSGraph.
FP16:
Fails during ONNX Runtime session initialization due to:InsertedPrecisionFreeCast + SimplifiedLayerNormFusion
This is a known CoreML EP FP16 graph rewrite issue.

static_input_shapes = true

FP32: Runs successfully
FP16: Same ORT initialization failure as above

Conclusion:
RT-DETR can only run on CoreML EP in FP32 with static input shapes enabled.
FP16 is currently not usable due to CoreML EP graph transformation limitations.

Overall Conclusions

CoreML EP works reliably for CNN-based models (YOLO, DWpose).
Transformer-based models (RT-DETR) expose multiple CoreML limitations:
- 1. Unsupported matmul broadcasting during CoreML compilation
- 1. FP16 precision-cast and LayerNorm fusion conflict in ONNX Runtime
These issues originate from CoreML EP / ONNX Runtime internals, not from usls.

Recommendation

Use CoreML EP for CNN-style models.
Use CPU / CUDA / TensorRT for RT-DETR or other Transformer-based detectors.
Avoid FP16 + CoreML EP for Transformer models until upstream issues are resolved.

Additional Notes

Thank you very much for the PR — I really appreciate you taking the time to investigate this and propose a solution.

Enabling static_input_shapes = true is definitely the right direction, and your PR helped clarify where the CoreML limitation actually comes from. That was very helpful.

That said, I think the current implementation is a bit more complex than necessary. To make this easier to maintain and more consistent with the existing config design, we can configure the CoreML ONNX engine more directly using the following APIs:

Config::dwpose_133_t().with_model_coreml_static_input_shapes(true)
Config::dwpose_133_t().with_module_coreml_static_input_shapes(Module::Model, true)
Config::dwpose_133_t().with_coreml_static_input_shapes_all(true)

These options provide a simpler and more explicit way to enable static input shapes for CoreML.

In addition to the three configuration APIs listed above, another option would be to change the default behavior and enable static_input_shapes = true by default in the CoreML EP config. The relevant code is here:

usls/src/ort/ep_config.rs

Lines 73 to 86 in 55ed111

    
           /// Apple CoreML execution provider configuration. 
        
           #[derive(Debug, Clone, Serialize, Deserialize)] 
        
           pub struct CoreMlConfig { 
        
               /// Use static input shapes for optimization. 
        
               pub static_input_shapes: bool, 
        
               /// Enable subgraph running mode. 
        
               pub subgraph_running: bool, 
        
               /// Model format: MLProgram or NeuralNetwork. 
        
               pub model_format: u8, 
        
               /// Compute units: All, CPUAndGPU, CPUAndNeuralEngine, or CPUOnly. 
        
               pub compute_units: u8, 
        
               /// Specialization strategy: Default, FastPrediction, or FastCompilation. 
        
               pub specialization_strategy: u8, 
        
           }

If you’re interested, you’re very welcome to update the PR to take this approach, or we can iterate on it together. I’d be happy to review and help refine the change.

Thanks again for the contribution!

Signed-off-by: wep21 <daisuke.nishimatsu1021@gmail.com>

wep21 · 2026-01-20T17:02:12Z

@jamjamjon Thank you for the further investigation and detailed implementation suggestion.
In this PR, I decide to just change the default of static input shapes of coreml and I've confirmed it works with rtdetr v4 in fp32 precision.

wep21 marked this pull request as draft January 20, 2026 04:02

chore: change default static input shapes for coreml

395473d

Signed-off-by: wep21 <daisuke.nishimatsu1021@gmail.com>

wep21 force-pushed the fix-coreml branch from 596eeb1 to 395473d Compare January 20, 2026 16:48

wep21 changed the title ~~fix: update ort api for coreml~~ chore: change default static input shapes for coreml Jan 20, 2026

wep21 marked this pull request as ready for review January 20, 2026 16:58

jamjamjon merged commit 2d052ce into jamjamjon:main Jan 21, 2026
17 checks passed

wep21 deleted the fix-coreml branch January 21, 2026 00:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: change default static input shapes for coreml#206

chore: change default static input shapes for coreml#206
jamjamjon merged 1 commit intojamjamjon:mainfrom
wep21:fix-coreml

wep21 commented Jan 19, 2026 •

edited

Loading

Uh oh!

jamjamjon commented Jan 19, 2026

Uh oh!

wep21 commented Jan 20, 2026 •

edited

Loading

Uh oh!

jamjamjon commented Jan 20, 2026

Uh oh!

wep21 commented Jan 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wep21 commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jamjamjon commented Jan 19, 2026

Uh oh!

wep21 commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jamjamjon commented Jan 20, 2026

Overview

Detailed Findings

1. DWpose (Pose Estimation)

2. YOLO v26 (Object Detection)

3. RT-DETR v4 (Transformer-based Detector)

Overall Conclusions

Recommendation

Additional Notes

Uh oh!

wep21 commented Jan 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wep21 commented Jan 19, 2026 •

edited

Loading

wep21 commented Jan 20, 2026 •

edited

Loading