Skip to content

[ET-VK] Add Vulkan ops for skin segmentation and EdgeTAM models#17709

Merged
SS-JIA merged 1 commit intogh/SS-JIA/451/basefrom
gh/SS-JIA/451/head
Feb 25, 2026
Merged

[ET-VK] Add Vulkan ops for skin segmentation and EdgeTAM models#17709
SS-JIA merged 1 commit intogh/SS-JIA/451/basefrom
gh/SS-JIA/451/head

Conversation

@SS-JIA
Copy link
Copy Markdown
Contributor

@SS-JIA SS-JIA commented Feb 25, 2026

Stack from ghstack (oldest at bottom):

Implement several missing Vulkan operators needed to reduce graph
fragmentation in the skin segmentation and EdgeTAM models.

Skin segmentation ops:

  • aten.where.self: already had C++ and GLSL implementations but was
    missing the Python partitioner registration.
  • aten.bitwise_and.Tensor: added as a new binary_op shader variant
    operating on uint8 (bool) tensors.

EdgeTAM partitioning fixes:

  • Comparison ops (eq, lt, le, gt, ge): were registered under the
    generic BinaryOp features which inherited FP_INT_T as the output
    dtype set. The partitioner correctly rejected these because their
    outputs are bool tensors. Split them into a dedicated
    register_comparison_ops registration with outputs_dtypes=BOOL_T. The
    binary_op.glsl shader already handles bool output via the
    IS_COMPARISON_OP path (uint8 storage), so no shader changes are
    needed.
  • aten.copy.default: not in the op registry, causing a subgraph break
    in the first-frame model. This op appears when valid_num_points.to()
    is called with matching dtype (a no-op cast). Add it to
    RemoveRedundantOpsTransform so it is eliminated before the partitioner
    runs. Also register it as an ephemeral op as a fallback. The removal
    logic requires a _src_arg1_ops set to handle the copy.default(self,
    src) argument order, where the replacement target is args[1] (src)
    rather than args[0] (self) as in all other redundant ops.

Differential Revision: D94364641

cc @manuelcandales @digantdesai @cbilgin

Implement several missing Vulkan operators needed to reduce graph
fragmentation in the skin segmentation and EdgeTAM models.

**Skin segmentation ops:**

- aten.where.self: already had C++ and GLSL implementations but was
  missing the Python partitioner registration.
- aten.bitwise_and.Tensor: added as a new binary_op shader variant
  operating on uint8 (bool) tensors.

**EdgeTAM partitioning fixes:**

- Comparison ops (eq, lt, le, gt, ge): were registered under the
  generic BinaryOp features which inherited FP_INT_T as the output
  dtype set. The partitioner correctly rejected these because their
  outputs are bool tensors. Split them into a dedicated
  register_comparison_ops registration with outputs_dtypes=BOOL_T. The
  binary_op.glsl shader already handles bool output via the
  IS_COMPARISON_OP path (uint8 storage), so no shader changes are
  needed.
- aten.copy.default: not in the op registry, causing a subgraph break
  in the first-frame model. This op appears when valid_num_points.to()
  is called with matching dtype (a no-op cast). Add it to
  RemoveRedundantOpsTransform so it is eliminated before the partitioner
  runs. Also register it as an ephemeral op as a fallback. The removal
  logic requires a _src_arg1_ops set to handle the copy.default(self,
  src) argument order, where the replacement target is args[1] (src)
  rather than args[0] (self) as in all other redundant ops.

Differential Revision: [D94364641](https://our.internmc.facebook.com/intern/diff/D94364641/)

[ghstack-poisoned]
@pytorch-bot pytorch-bot Bot added the module: vulkan Issues related to the Vulkan delegate and code under backends/vulkan/ label Feb 25, 2026
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Feb 25, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17709

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures

As of commit 88544f6 with merge base 63f9724 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@SS-JIA SS-JIA merged commit 112a344 into gh/SS-JIA/451/base Feb 25, 2026
202 of 208 checks passed
@SS-JIA SS-JIA deleted the gh/SS-JIA/451/head branch February 25, 2026 19:12
@SS-JIA SS-JIA temporarily deployed to cherry-pick-bot February 25, 2026 19:12 — with GitHub Actions Inactive
SS-JIA pushed a commit that referenced this pull request Feb 25, 2026
Implement several missing Vulkan operators needed to reduce graph
fragmentation in the skin segmentation and EdgeTAM models.

**Skin segmentation ops:**

- aten.where.self: already had C++ and GLSL implementations but was
  missing the Python partitioner registration.
- aten.bitwise_and.Tensor: added as a new binary_op shader variant
  operating on uint8 (bool) tensors.

**EdgeTAM partitioning fixes:**

- Comparison ops (eq, lt, le, gt, ge): were registered under the
  generic BinaryOp features which inherited FP_INT_T as the output
  dtype set. The partitioner correctly rejected these because their
  outputs are bool tensors. Split them into a dedicated
  register_comparison_ops registration with outputs_dtypes=BOOL_T. The
  binary_op.glsl shader already handles bool output via the
  IS_COMPARISON_OP path (uint8 storage), so no shader changes are
  needed.
- aten.copy.default: not in the op registry, causing a subgraph break
  in the first-frame model. This op appears when valid_num_points.to()
  is called with matching dtype (a no-op cast). Add it to
  RemoveRedundantOpsTransform so it is eliminated before the partitioner
  runs. Also register it as an ephemeral op as a fallback. The removal
  logic requires a _src_arg1_ops set to handle the copy.default(self,
  src) argument order, where the replacement target is args[1] (src)
  rather than args[0] (self) as in all other redundant ops.

Differential Revision: [D94364641](https://our.internmc.facebook.com/intern/diff/D94364641/)

ghstack-source-id: 344667759
Pull Request resolved: #17709
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: vulkan Issues related to the Vulkan delegate and code under backends/vulkan/

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants