Skip to content

Conversation

Nicoshev
Copy link
Contributor

@Nicoshev Nicoshev commented Aug 4, 2025

Introduce SVE128 SIMD batch box-cox computation.

We've seen about 65% throughput improvement.

Privacy Context Container: L1196524

This is a no-op from OSS point of view, therefore it could be landed without tests (see precedence set by #143627), but we should delete those at some point

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 4, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159778

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6b223ae with merge base 01c3c89 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Aug 4, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: Nicoshev / name: Nicolas De Carli (6b223ae)

@pytorch-bot pytorch-bot bot added ciflow/inductor module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor release notes: sparse release notes category labels Aug 4, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78994871

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78994871

Nicoshev added a commit to Nicoshev/pytorch that referenced this pull request Aug 5, 2025
Summary:
Pull Request resolved: pytorch#159778

We are introducing SVE128 perfkernels

As first translation, we are implementing float32 batch box cox for SVE128

Test Plan:
Sigrid Predictor canary

Rollback Plan:

Differential Revision:
D78994871

Privacy Context Container: L1196524
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78994871

Nicoshev added a commit to Nicoshev/pytorch that referenced this pull request Aug 5, 2025
Summary:
Pull Request resolved: pytorch#159778

We are introducing SVE128 perfkernels

As first translation, we are implementing float32 batch box cox for SVE128

Test Plan:
Sigrid Predictor canary

Rollback Plan:

Differential Revision:
D78994871

Privacy Context Container: L1196524
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78994871

Nicoshev added a commit to Nicoshev/pytorch that referenced this pull request Aug 5, 2025
Summary:
Pull Request resolved: pytorch#159778

We are introducing SVE128 perfkernels

As first translation, we are implementing float32 batch box cox for SVE128

Test Plan:
Sigrid Predictor canary

Rollback Plan:

Differential Revision:
D78994871

Privacy Context Container: L1196524
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78994871

Nicoshev added a commit to Nicoshev/pytorch that referenced this pull request Aug 5, 2025
Summary:
Pull Request resolved: pytorch#159778

We are introducing SVE128 perfkernels

As first translation, we are implementing float32 batch box cox for SVE128

Test Plan:
Sigrid Predictor canary

Rollback Plan:

Differential Revision:
D78994871

Privacy Context Container: L1196524
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78994871

Nicoshev added a commit to Nicoshev/pytorch that referenced this pull request Aug 5, 2025
Summary:
Pull Request resolved: pytorch#159778

We are introducing SVE128 perfkernels

As first translation, we are implementing float32 batch box cox for SVE128

Test Plan:
Sigrid Predictor canary

Rollback Plan:

Differential Revision:
D78994871

Privacy Context Container: L1196524
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78994871

Nicoshev added a commit to Nicoshev/pytorch that referenced this pull request Aug 5, 2025
Summary:
Pull Request resolved: pytorch#159778

We are introducing SVE128 perfkernels

As first translation, we are implementing float32 batch box cox for SVE128

Test Plan:
Sigrid Predictor canary

Rollback Plan:

Differential Revision:
D78994871

Privacy Context Container: L1196524
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78994871

@facebook-github-bot
Copy link
Contributor

@Nicoshev has exported this pull request. If you are a Meta employee, you can view the originating diff in D78994871.

@Nicoshev Nicoshev requested a review from malfet September 15, 2025 17:53
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please used auto to avoid some hard to spot casting errors
  • Use C10_LIKELY/C10_UNLIKELY instead of __builtin_expect
  • Add new unittest or mention in PR which test validates this one

Nicoshev added a commit to Nicoshev/pytorch that referenced this pull request Sep 15, 2025
Summary:

We are introducing SVE128 perfkernels

As first translation, we are implementing float32 batch box cox for SVE128

Test Plan:
Sigrid Predictor canary

Rollback Plan:

Differential Revision:
D78994871

Privacy Context Container: L1196524
@facebook-github-bot
Copy link
Contributor

@Nicoshev has exported this pull request. If you are a Meta employee, you can view the originating diff in D78994871.

Nicoshev added a commit to Nicoshev/pytorch that referenced this pull request Sep 15, 2025
Summary:

We are introducing SVE128 perfkernels

As first translation, we are implementing float32 batch box cox for SVE128

Test Plan:
Sigrid Predictor canary

Rollback Plan:

Differential Revision:
D78994871

Privacy Context Container: L1196524
@facebook-github-bot
Copy link
Contributor

@Nicoshev has exported this pull request. If you are a Meta employee, you can view the originating diff in D78994871.

Nicoshev added a commit to Nicoshev/pytorch that referenced this pull request Sep 15, 2025
Summary:

We are introducing SVE128 perfkernels

As first translation, we are implementing float32 batch box cox for SVE128

Test Plan:
Sigrid Predictor canary

Rollback Plan:

Differential Revision:
D78994871

Privacy Context Container: L1196524
@facebook-github-bot
Copy link
Contributor

@Nicoshev has exported this pull request. If you are a Meta employee, you can view the originating diff in D78994871.

@Nicoshev Nicoshev requested a review from malfet September 15, 2025 19:13
Summary:

We are introducing SVE128 perfkernels

As first translation, we are implementing float32 batch box cox for SVE128

Test Plan:
Sigrid Predictor canary

Rollback Plan:

Differential Revision:
D78994871

Privacy Context Container: L1196524
@facebook-github-bot
Copy link
Contributor

@Nicoshev has exported this pull request. If you are a Meta employee, you can view the originating diff in D78994871.

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 16, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
Introduce SVE128 SIMD batch box-cox computation.

We've seen about 65% throughput improvement.

Privacy Context Container: L1196524

This is a no-op from OSS point of view, therefore it could be landed without tests (see precedence set by pytorch#143627), but we should delete those at some point

Pull Request resolved: pytorch#159778
Approved by: https://github.com/malfet
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
Introduce SVE128 SIMD batch box-cox computation.

We've seen about 65% throughput improvement.

Privacy Context Container: L1196524

This is a no-op from OSS point of view, therefore it could be landed without tests (see precedence set by pytorch#143627), but we should delete those at some point

Pull Request resolved: pytorch#159778
Approved by: https://github.com/malfet
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
Introduce SVE128 SIMD batch box-cox computation.

We've seen about 65% throughput improvement.

Privacy Context Container: L1196524

This is a no-op from OSS point of view, therefore it could be landed without tests (see precedence set by pytorch#143627), but we should delete those at some point

Pull Request resolved: pytorch#159778
Approved by: https://github.com/malfet
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
Introduce SVE128 SIMD batch box-cox computation.

We've seen about 65% throughput improvement.

Privacy Context Container: L1196524

This is a no-op from OSS point of view, therefore it could be landed without tests (see precedence set by pytorch#143627), but we should delete those at some point

Pull Request resolved: pytorch#159778
Approved by: https://github.com/malfet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged meta-exported module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor release notes: sparse release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants