Skip to content

Conversation

@pytorch-bot
Copy link

pytorch-bot bot commented Feb 4, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94140

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit c9e2a65:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@peterbell10 peterbell10 added the topic: not user facing topic category label Feb 4, 2023
peterbell10 added a commit to peterbell10/pytorch that referenced this pull request Feb 4, 2023
@peterbell10 peterbell10 marked this pull request as ready for review February 10, 2023 00:03
@peterbell10 peterbell10 requested a review from ngimel February 10, 2023 00:04
peterbell10 added a commit to peterbell10/pytorch that referenced this pull request Feb 10, 2023
@ngimel
Copy link
Collaborator

ngimel commented Feb 10, 2023

Did you check that inductor perf didn't change?

(torch.float16, torch.ops.aten._native_batch_norm_legit.no_stats): 1e-5,
(torch.bfloat16, torch.ops.aten.linalg_vector_norm.default): 1e-4,
(torch.float16, torch.ops.aten.linalg_vector_norm.default): 1e-4,
(torch.bfloat16, torch.ops.aten.var_mean.correction): 5e-7,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does tolerance change here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aten.var_mean uses a different algorithm from aten.mean which ends up being slightly more precise. 5e-7 is still incredibly good for half-precision though. The default rtol for torch.testing is 1e-5.

@peterbell10
Copy link
Collaborator Author

Did you check that inductor perf didn't change?

This improves perf by removing the duplicate mean calculation. However it's very slight since the second mean was being fused with the sum of square deviations in the variance. In the following example, I see a 0.6% speedup from 366 us to 364 us

import torch
import torch._dynamo
from torch._inductor import config
config.debug = True

a = torch.nn.BatchNorm3d(10).train().cuda()
b = torch.rand(10, 10, 16, 64, 64, device="cuda")

@torch._dynamo.optimize()
def fn(x):
    return a(x)

_ = fn(b)
%timeit fn(b)

@facebook-github-bot facebook-github-bot deleted the gh/peterbell10/517/head branch June 8, 2023 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants