fix issue of baddbmm when out has nan value for beta=0 #96086

XiaobingSuper · 2023-03-06T09:38:41Z

Stack from ghstack (oldest at bottom):

Fix #96037.

cc @jgong5 @mingfeima @sanchitintel @ashokei @jingxu10

[ghstack-poisoned]

pytorch-bot · 2023-03-06T09:38:44Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96086

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit df0aa8e:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

nikitaved · 2023-03-06T10:21:08Z

aten/src/ATen/native/LinearAlgebra.cpp

-              r *= beta;
+              r = beta == scalar_t(0) ? scalar_t(0) : beta * r;
              for (const auto k : c10::irange(ks)) {


Oh, wow, I did not know that baddbmm has an in-house implementation...

I was sure matmuls are delegated to MKL or something of the sort.
This bug seems to have been there for the past 5 years, I'm amazed there haven't been reported random failures of deployed CPU inference systems.

haha, maybe that's why :) everyone who cares about cpu inference, made sure to use MKL and bypass this buggy impl :)

Yeah, unless this path bypasses MKL when it is not available, I would assume dispatching to aten::mm could be a better strategy unless circular. If that is the case indeed, this could be done as a follow-up?

@nikitaved Never tried, torch.mm was never powerful enough for my purposes (no broadcast, single matrix). Not for real world heavy lifting.

According to the docs, that's not torch.mm. Probably torch.matmul.
I never figured why there are so many entrypoints.

Yeah, it is matmul indeed. mm appears to be a method (Tensor.mm) with the aforementioned limitations. Totally agree, seems like this redundancy is more confusing than it is helpful...

baddbmm is usually dispatched to mkl, but in some cases (small matrices, small batches) it goes to special implementation).

lezcano · 2023-03-06T09:42:03Z

aten/src/ATen/native/LinearAlgebra.cpp

@@ -1557,7 +1557,7 @@ inline void baddbmm_cpu_kernel(const Tensor& result, const Tensor& self, const T
                r += s2[k] * m1[k][j];
              }
            } else {
-              r *= beta;
+              r = beta == scalar_t(0) ? scalar_t(0) : beta * r;


Please leave a comment here.

And not that it matters that much given that this implementation will not be particularly efficient, but you may want to do this up front calling result.zero_() in this case and putting an if here, so that this part is vectorized. Similar to how it's done for this path:

pytorch/aten/src/ATen/native/LinearAlgebra.cpp

Lines 1639 to 1641 in 5b2ab0d

if (is_bmm_out || (beta.to<c10::complex<double>>() == 0.0)) {

self_or_result.zero_();

return;

I don't think it will get a better performance calling result.zero_(), for this path, the m*n*k is a smaller number, and if we use result.zero_(), there has two steps(no fusion, and also have dispatch overhead).

I do some performance test for m=1, n=1, k=300:

threads = 20

this PR

batch_size = 1, avg time is 0.006 (ms) batch_size = 20, avg time is 0.011 (ms) batch_size = 40, avg time is 0.011 (ms) batch_size = 80, avg time is 0.012 (ms) batch_size = 160, avg time is 0.015 (ms)

your proposal

batch_size = 1, avg time is 0.013 (ms) batch_size = 20, avg time is 0.019 (ms) batch_size = 40, avg time is 0.018 (ms) batch_size = 80, avg time is 0.023 (ms) batch_size = 160, avg time is 0.029 (ms)

threads =1

this PR

batch_size = 1, avg time is 0.013 (ms) batch_size = 20, avg time is 0.024 (ms) batch_size = 40, avg time is 0.036 (ms) batch_size = 80, avg time is 0.069 (ms) batch_size = 160, avg time is 0.137 (ms)

your proposal

batch_size = 1, avg time is 0.010 (ms) batch_size = 20, avg time is 0.024 (ms) batch_size = 40, avg time is 0.038 (ms) batch_size = 80, avg time is 0.072 (ms) batch_size = 160, avg time is 0.142 (ms)

test code:

import torch num_iter = 600 def fn(a, b, c): return torch.baddbmm(a, b, c, beta=0.00) for batch_size in [1, 20, 40, 80, 160]: m = 1 n = 1 p = 300 a = torch.randn((batch_size, n, p)) b = torch.randn((batch_size, n, m)) c = torch.randn((batch_size, m, p)) fwd = 0 with torch.no_grad(): for i in range(300): y = fn(a, b, c) with torch.no_grad(): t1 = time.time() for i in range(num_iter): y = fn(a, b, c) t2 = time.time() fwd = fwd + (t2 - t1) avg_time = fwd / num_iter * 1000 print("batch_size = %d, avg time is %0.3f (ms) fps:%f"%(batch_size, avg_time, batch_size * num_iter / fwd))

Fix #96037. cc jgong5 mingfeima sanchitintel ashokei jingxu10 [ghstack-poisoned]

lezcano

Fair enough!

XiaobingSuper · 2023-03-07T14:52:08Z

@pytorchbot merge

pytorchmergebot · 2023-03-07T14:53:55Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Fix pytorch#96037. Pull Request resolved: pytorch#96086 Approved by: https://github.com/ngimel, https://github.com/lezcano

Fix pytorch/pytorch#96037. Pull Request resolved: pytorch/pytorch#96086 Approved by: https://github.com/ngimel, https://github.com/lezcano

Fix pytorch#96037. Pull Request resolved: pytorch#96086 Approved by: https://github.com/ngimel, https://github.com/lezcano

fix issue of baddbmm when out has nan value for beta=0

f54e1c1

[ghstack-poisoned]

XiaobingSuper requested review from lezcano, nikitaved and IvanYashchuk as code owners March 6, 2023 09:38

pytorch-bot bot added the release notes: linalg_frontend release notes category label Mar 6, 2023

XiaobingSuper mentioned this pull request Mar 6, 2023

inductor: align baddbmm behavior with eager mode for beta=0 and input has nan value #96087

Closed

XiaobingSuper added the module: cpu CPU specific problem (e.g., perf, algorithm) label Mar 6, 2023

XiaobingSuper linked an issue Mar 6, 2023 that may be closed by this pull request

baddbmm result affected by prior content of out tensor (NaNs preserved) #96037

Closed

pytorchbot added the open source label Mar 6, 2023

XiaobingSuper requested a review from ngimel March 6, 2023 09:52

nikitaved reviewed Mar 6, 2023

View reviewed changes

ngimel approved these changes Mar 6, 2023

View reviewed changes

lezcano reviewed Mar 6, 2023

View reviewed changes

Update on "fix issue of baddbmm when out has nan value for beta=0"

df0aa8e

Fix #96037. cc jgong5 mingfeima sanchitintel ashokei jingxu10 [ghstack-poisoned]

XiaobingSuper added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 7, 2023

XiaobingSuper requested a review from lezcano March 7, 2023 04:40

IvanYashchuk removed their request for review March 7, 2023 11:11

jgong5 mentioned this pull request Mar 7, 2023

use accumulate type in BF16 gemm(include dot, mv) ref path #96074

Closed

lezcano approved these changes Mar 7, 2023

View reviewed changes

pytorchmergebot added the Merged label Mar 7, 2023

pytorchmergebot closed this in ac77883 Mar 7, 2023

ydwu4 pushed a commit to ydwu4/pytorch that referenced this pull request Mar 10, 2023

fix issue of baddbmm when out has nan value for beta=0 (pytorch#96086)

baa242a

Fix pytorch#96037. Pull Request resolved: pytorch#96086 Approved by: https://github.com/ngimel, https://github.com/lezcano

ydwu4 added a commit to ydwu4/pytorch that referenced this pull request Mar 13, 2023

fix issue of baddbmm when out has nan value for beta=0 (pytorch#96086)

4904f8f

Fix pytorch#96037. Pull Request resolved: pytorch#96086 Approved by: https://github.com/ngimel, https://github.com/lezcano

jlamypoirier mentioned this pull request Apr 7, 2023

Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) huggingface/transformers#22575

Merged

facebook-github-bot deleted the gh/XiaobingSuper/73/head branch June 8, 2023 15:04

mayank31398 mentioned this pull request Jul 12, 2023

🐛 torch baddbmm error fixed for BigCode models huggingface/transformers#24768

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix issue of baddbmm when out has nan value for beta=0 #96086

fix issue of baddbmm when out has nan value for beta=0 #96086

XiaobingSuper commented Mar 6, 2023 •

edited

pytorch-bot bot commented Mar 6, 2023 •

edited

nikitaved Mar 6, 2023 •

edited

mcskatkat Mar 6, 2023

vadimkantorov Mar 6, 2023 •

edited

nikitaved Mar 6, 2023 •

edited

mcskatkat Mar 6, 2023

mcskatkat Mar 6, 2023

nikitaved Mar 6, 2023 •

edited

ngimel Mar 6, 2023

lezcano Mar 6, 2023

lezcano Mar 6, 2023

XiaobingSuper Mar 7, 2023

XiaobingSuper Mar 7, 2023

lezcano left a comment

XiaobingSuper commented Mar 7, 2023

pytorchmergebot commented Mar 7, 2023

	if (is_bmm_out \|\| (beta.to<c10::complex<double>>() == 0.0)) {
	self_or_result.zero_();
	return;

fix issue of baddbmm when out has nan value for beta=0 #96086

fix issue of baddbmm when out has nan value for beta=0 #96086

Conversation

XiaobingSuper commented Mar 6, 2023 • edited

pytorch-bot bot commented Mar 6, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96086

✅ No Failures

nikitaved Mar 6, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vadimkantorov Mar 6, 2023 • edited

Choose a reason for hiding this comment

nikitaved Mar 6, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikitaved Mar 6, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lezcano left a comment

Choose a reason for hiding this comment

XiaobingSuper commented Mar 7, 2023

pytorchmergebot commented Mar 7, 2023

Merge started

XiaobingSuper commented Mar 6, 2023 •

edited

pytorch-bot bot commented Mar 6, 2023 •

edited

nikitaved Mar 6, 2023 •

edited

vadimkantorov Mar 6, 2023 •

edited

nikitaved Mar 6, 2023 •

edited

nikitaved Mar 6, 2023 •

edited