[CPU] optimize Lp norm for 1-dimensional vector #122143

min-jean-cho · 2024-03-18T23:32:47Z

Fixes #120229

Optimize vector norm by simplifying vector norm formula for 1-dimensional vector.
Vector norm formula for 1-dimensional vector simplifies to abs(x). See below for proof.
Next step, we can similarly optimize matrix norm (torch.linalg.matrix_norm) for 1 x 1 matrix.
Additionally, avoids overflow in power, abs(x) ** p for large p or x, for 1-dimensional vector.

Performance

Avg Latency (ms) of torch.norm and torch.linalg.vector_norm for
torch.norm(torch.randn(2**18, 1), ord, -1)
torch.linalg.vector_norm(torch.randn(2**18, 1), ord, -1)
Tested on 28 physical cores/socket, 1 socket on Skylake.

				Avg Latency (ms)
op	input shape	dim	ord	baseline (master)	optimized (`7102f1e`)	speedup ratio (baseline/optimized)
torch.norm	(2**18, 1)	-1	fro	34.3755531	0.0125408	2741.094
			inf	34.0952635	0.0122237	2789.271
			-inf	34.3674493	0.0120759	2845.953
			0	34.1004515	0.0175261	1945.69
			1	34.1688442	0.0121593	2810.089
			-1	33.949492	0.0120282	2822.487
			2	34.3669581	0.0120401	2854.366
			-2	33.9252067	0.0121069	2802.139

torch.linalg.vector_norm	(2**18, 1)	-1	inf	34.090879	0.0095105	3584.545
			-inf	34.3708754	0.0099111	3467.931
			0	34.0880775	0.0141716	2405.38
			1	34.1392851	0.0093174	3664.036
			-1	33.925395	0.0092483	3668.302
			2	34.3854165	0.0092459	3719.002
			-2	33.932972	0.0093007	3648.429

Proof

For those interested :)

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jianyuh @nikitaved @pearu @mruberry @walterddr @xwang233 @lezcano @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @amjames @desertfire @chauhang

pytorch-bot · 2024-03-18T23:32:50Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/122143

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4ff9918 with merge base 61ff41f ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…float16

…-jean-cho/pytorch into minjean/speedup_norm_one_dim

test/test_foreach.py

lezcano

Fair, but the implementation can be improved and simplified.

Can you also add a test?

aten/src/ATen/native/LinearAlgebra.cpp

lezcano

Also, now that you are at it, can you update the decomposition as well?

min-jean-cho · 2024-03-20T20:19:44Z

@pytorchbot merge

pytorchmergebot · 2024-03-20T20:21:30Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-03-20T20:21:46Z

Merge failed

Reason: 8 jobs have failed, first few of them are: trunk / win-vs2019-cpu-py3 / test (default, 1, 3, windows.4xlarge.nonephemeral), trunk / win-vs2019-cpu-py3 / test (default, 2, 3, windows.4xlarge.nonephemeral), trunk / win-vs2019-cpu-py3 / test (default, 3, 3, windows.4xlarge.nonephemeral), trunk / linux-focal-cuda12.1-py3.10-gcc9 / test (nogpu_AVX512, 1, 1, linux.2xlarge), trunk / linux-focal-cuda12.1-py3.10-gcc9 / test (nogpu_NO_AVX2, 1, 1, linux.2xlarge)

Details for Dev Infra team

Raised by workflow job

min-jean-cho · 2024-03-20T20:27:34Z

@pytorchbot merge

pytorchmergebot · 2024-03-20T20:29:25Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Fixes #120229 - Optimize vector norm by simplifying vector norm formula for 1-dimensional vector. - Vector norm formula for 1-dimensional vector simplifies to `abs(x)`. See below for proof. - Next step, we can similarly optimize matrix norm (`torch.linalg.matrix_norm`) for 1 x 1 matrix. - Additionally, avoids overflow in power, `abs(x) ** p` for large `p` or `x`, for 1-dimensional vector. ### Performance Avg Latency (ms) of `torch.norm` and `torch.linalg.vector_norm` for `torch.norm(torch.randn(2**18, 1), ord, -1)` `torch.linalg.vector_norm(torch.randn(2**18, 1), ord, -1)` Tested on 28 physical cores/socket, 1 socket on Skylake. | | | | | **Avg Latency (ms)** | | | |-------------------------- |----------------- |--------- |--------- |----------------------- |----------------------- |---------------------------------------- | | **op** | **input shape** | **dim** | **ord** | **baseline (master)** | **optimized (7102f1e)** | **speedup ratio (baseline/optimized)** | | torch.norm | (2**18, 1) | -1 | fro | 34.3755531 | 0.0125408 | 2741.094 | | | | | inf | 34.0952635 | 0.0122237 | 2789.271 | | | | | -inf | 34.3674493 | 0.0120759 | 2845.953 | | | | | 0 | 34.1004515 | 0.0175261 | 1945.69 | | | | | 1 | 34.1688442 | 0.0121593 | 2810.089 | | | | | -1 | 33.949492 | 0.0120282 | 2822.487 | | | | | 2 | 34.3669581 | 0.0120401 | 2854.366 | | | | | -2 | 33.9252067 | 0.0121069 | 2802.139 | | | | | | | | | | torch.linalg.vector_norm | (2**18, 1) | -1 | inf | 34.090879 | 0.0095105 | 3584.545 | | | | | -inf | 34.3708754 | 0.0099111 | 3467.931 | | | | | 0 | 34.0880775 | 0.0141716 | 2405.38 | | | | | 1 | 34.1392851 | 0.0093174 | 3664.036 | | | | | -1 | 33.925395 | 0.0092483 | 3668.302 | | | | | 2 | 34.3854165 | 0.0092459 | 3719.002 | | | | | -2 | 33.932972 | 0.0093007 | 3648.429 | ### Proof <details> <summary>For those interested :)</summary> <img width="382" alt="1_dim_vector_norm_proof1" src="https://github.com/pytorch/pytorch/assets/93151422/59b1e00b-8fcd-47cb-877d-d31403b5195b"> <img width="432" alt="1_dim_vector_norm_proof2" src="https://github.com/pytorch/pytorch/assets/93151422/236bea15-2dd5-480b-9871-58b2e3b24322"> </details> Pull Request resolved: #122143 Approved by: https://github.com/lezcano

simplify Lp norm for 1-dimensional vector

fafa0eb

pytorch-bot bot added the release notes: linalg_frontend release notes category label Mar 18, 2024

min-jean-cho changed the title ~~simplify Lp norm for 1-dimensional vector~~ [CPU] optimize Lp norm for 1-dimensional vector Mar 18, 2024

min-jean-cho mentioned this pull request Mar 18, 2024

TensorIterator coalesce_dimensions to not coalesce reduction dim #121721

Closed

lint

18ca874

pytorchbot added the open source label Mar 18, 2024

min-jean-cho added 9 commits March 18, 2024 19:27

update

6f48310

update

f9f505e

update

b443e4d

update

15b6a2d

try fixing for test_foreach_l2_large_value_input__foreach_norm_cuda_b…

1497515

…float16

Merge branch 'pytorch:main' into minjean/speedup_norm_one_dim

cc97f40

update

04b122e

Merge branch 'minjean/speedup_norm_one_dim' of https://github.com/min…

329bfb9

…-jean-cho/pytorch into minjean/speedup_norm_one_dim

Merge branch 'pytorch:main' into minjean/speedup_norm_one_dim

1249b92

min-jean-cho commented Mar 19, 2024

View reviewed changes

test/test_foreach.py Show resolved Hide resolved

min-jean-cho marked this pull request as ready for review March 19, 2024 23:33

min-jean-cho requested review from IvanYashchuk, lezcano and nikitaved as code owners March 19, 2024 23:33

min-jean-cho added module: cpu CPU specific problem (e.g., perf, algorithm) module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul labels Mar 19, 2024

min-jean-cho requested review from jgong5 and mingfeima and removed request for IvanYashchuk, lezcano and nikitaved March 19, 2024 23:34

lezcano reviewed Mar 19, 2024

View reviewed changes

aten/src/ATen/native/LinearAlgebra.cpp Outdated Show resolved Hide resolved

aten/src/ATen/native/LinearAlgebra.cpp Outdated Show resolved Hide resolved

aten/src/ATen/native/LinearAlgebra.cpp Outdated Show resolved Hide resolved

lezcano reviewed Mar 19, 2024

View reviewed changes

min-jean-cho requested review from a team and jithunnair-amd as code owners March 20, 2024 19:56

pytorch-bot bot added ciflow/inductor module: dynamo module: inductor labels Mar 20, 2024

min-jean-cho removed module: inductor module: dynamo ciflow/inductor labels Mar 20, 2024

min-jean-cho force-pushed the minjean/speedup_norm_one_dim branch from 80a98bf to 10fc75d Compare March 20, 2024 20:06

min-jean-cho removed request for avikchaudhuri, gmagogsfm, jeffdaily, jithunnair-amd, tugsbayasgalan and zhxchen17 March 20, 2024 20:16

pytorchmergebot added the merging label Mar 20, 2024

pytorchmergebot removed the merging label Mar 20, 2024

Merge branch 'pytorch:main' into minjean/speedup_norm_one_dim

4ff9918

pytorchmergebot added the merging label Mar 20, 2024

pytorchmergebot added the Merged label Mar 20, 2024

pytorchmergebot closed this in 057892f Mar 20, 2024

pytorchmergebot removed the merging label Mar 20, 2024

eqy mentioned this pull request Apr 29, 2024

[CUDA][AMP] Size-1 (scalar) norms are broken on CUDA + AMP following #122143 #125174

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CPU] optimize Lp norm for 1-dimensional vector #122143

[CPU] optimize Lp norm for 1-dimensional vector #122143

Uh oh!

min-jean-cho commented Mar 18, 2024 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Mar 18, 2024 •

edited

Loading

Uh oh!

Uh oh!

lezcano left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lezcano left a comment

Uh oh!

min-jean-cho commented Mar 20, 2024

Uh oh!

pytorchmergebot commented Mar 20, 2024

Uh oh!

pytorchmergebot commented Mar 20, 2024

Uh oh!

min-jean-cho commented Mar 20, 2024

Uh oh!

pytorchmergebot commented Mar 20, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[CPU] optimize Lp norm for 1-dimensional vector #122143

[CPU] optimize Lp norm for 1-dimensional vector #122143

Uh oh!

Conversation

min-jean-cho commented Mar 18, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance

Proof

Uh oh!

pytorch-bot bot commented Mar 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/122143

✅ No Failures

Uh oh!

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

min-jean-cho commented Mar 20, 2024

Uh oh!

pytorchmergebot commented Mar 20, 2024

Merge started

Uh oh!

pytorchmergebot commented Mar 20, 2024

Merge failed

Uh oh!

min-jean-cho commented Mar 20, 2024

Uh oh!

pytorchmergebot commented Mar 20, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

min-jean-cho commented Mar 18, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Mar 18, 2024 •

edited

Loading