Reference implementations for rsqrt and native_layer_norm #79413

IvanYashchuk · 2022-06-13T13:11:58Z

This PR adds references for:

torch.rsqrt
torch.native_layer_norm
torch.nn.functional.layer_norm

native_layer_norm had a different number of dimensions if the input was 0-sized. I fixed that.

facebook-github-bot · 2022-06-13T13:12:05Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/79413
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

✅ No Failures (0 Pending)

As of commit f7307e1 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

vadimkantorov · 2022-06-13T14:23:27Z

It would be nice to have a way to display reference implementation code in documentation, this can compensate for unclear description (meanwhile it's being improved): #51455

ngimel · 2022-06-13T15:51:17Z

cc @Chillee for native_layer_norm

ngimel · 2022-06-13T15:54:22Z

torch/_refs/__init__.py

+def _normalize(a, norm_dims, eps):
+    computation_dtype = utils.get_computation_dtype(a.dtype)
+    a_acc = prims.convert_element_type(a, computation_dtype)
+    biased_var, mean = var_mean(a_acc, dim=norm_dims, unbiased=False, keepdim=True)


currently nvfuser doesn't understand var_mean, so this decomp is not readily usable

Same here.

Also, the reason the previous decomp used separate calls to var and mean was that it more closely matched the numerics of eager mode. @ngimel's opinion was that the discrepancy shouldn't matter, but just noting in case you need to adjust tolerances on any tests.

I will make mean working with the nvFuser executor in a separate PR. var will be implemented later as it's currently a "prim" in PyTorch, but a composite in nvFuser.

It's separate calls to var and mean. Take a look at the var_mean function:

def var_mean( a: TensorLikeType, dim: Union[Optional[int], Optional[List[int]]] = None, unbiased: Optional[bool] = None, keepdim: bool = False, *, correction: Optional[int] = None, ): v = var(a, dim, unbiased, keepdim, correction=correction) m = mean(a, dim, keepdim) return v, m

PR for mean: #79444
PR for var: #79517
And together they make nvfuser understand var_mean.

I believe @peterbell10 once told me that var_mean had some speed and stability problems in PyTorch (?). I do not remember very well so I'll let him discuss this, but perhaps this is relevant here.

On CPU, the mean from var_mean is less accurate than calling mean separately. mean is implemented roughly as sum(x, dim) / x.size(dim), where sum uses a low-error summation algorithm. var_mean on the other hand computes both the mean and var in a single pass, but with a naive summation that is less accurate.

var_mean here calls into reference decomposition which in turn computes mean and var separately (mean via sum and division)

For similar reasons, can we change this call to a torch.var_mean as well?

Well, then, unless you decompose torch.var_mean further, you'd get inaccurate cpu result

Then I think var_mean should probably call into torch.var and torch.mean.

Chillee · 2022-06-13T18:13:52Z

test/test_decomp.py

        func = op.get_op()
        for sample_input in samples:
            if requires_grad:
+                if None in sample_input.args:


Why is this needed?

Because this test cannot handle None as a positional argument to native_layer_norm. It has the signature: native_layer_norm(Tensor input, int[] normalized_shape, Tensor? weight, Tensor? bias, float eps) -> (Tensor, Tensor, Tensor). Both weight and bias can be None and I added sample inputs for these cases.

Chillee · 2022-06-13T18:14:34Z

torch/_refs/__init__.py

+    computation_dtype = utils.get_computation_dtype(a.dtype)
+    a_acc = prims.convert_element_type(a, computation_dtype)
+    biased_var, mean = var_mean(a_acc, dim=norm_dims, unbiased=False, keepdim=True)
+    rstd = rsqrt(biased_var + eps)


Can we just call torch.rsqrt here?

I prefer to use the torch._refs function here.

My understanding is that the current thinking is to prefer calling torch.foo vs. _refs.foo if convenient, as it's fairly easy to map from torch.foo => _refs.foo but not the other way around.

See @mruberry's comment here: #78689 (comment)

It's very easy to map _refs.rsqrt to torch.rsqrt because _refs.rsqrt is a wrapper of torch.rsqrt for regular and meta tensors and test_decompy.py passing indicates that it's fine as it is.

I changed this to torch.rsqrt.

From the perspective of the "correctness" tests it's fine. But, if I understand correctly, _refs.rsqrt => prims.rsqrt, and then you're saying that prims.rsqrt simply calls torch.rsqrt under the hood.

But when we trace things out, using _refs.rsqrt doesn't have an interception point to allow us to decompose native_layer_norm but not decompose _refs.rsqrt.

TBH, for rsqrt this is a somewhat strange discussion haha since _refs.rsqrt == prims.rsqrt == aten::rsqrt. But it matters more for var_mean.

For example, say that I'm a compiler that wants native_layer_norm decomposed, but I have a special handling for var_mean. If we call _refs.var_mean, then our trace has no option to trace out var_mean - we'll get var followed by mean.

If we use torch.var_mean though, then we'll have the choice to either further decompose into var and mean or preserve var_mean in our trace.

Because of the way the current context mapping is implemented it's a one-way torch. -> refs. transform, so we should get in the habit of preferring torch. calls when possible. In some cases the torch operation doesn't have as much functionality as the ref, and in cases where that extra functionality is used the ref should be called explicitly, instead.

Chillee · 2022-06-13T18:17:55Z

torch/_refs/__init__.py

+def _normalize(a, norm_dims, eps):
+    computation_dtype = utils.get_computation_dtype(a.dtype)
+    a_acc = prims.convert_element_type(a, computation_dtype)
+    biased_var, mean = var_mean(a_acc, dim=norm_dims, unbiased=False, keepdim=True)


Same here.

Also, the reason the previous decomp used separate calls to var and mean was that it more closely matched the numerics of eager mode. @ngimel's opinion was that the discrepancy shouldn't matter, but just noting in case you need to adjust tolerances on any tests.

torch/testing/_internal/common_methods_invocations.py

…er-norm

This reverts commit 87cdca6.

mruberry · 2022-06-14T18:30:07Z

torch/_refs/__init__.py


+def _normalize(a, norm_dims, eps):
+    computation_dtype = utils.get_computation_dtype(a.dtype)
+    a_acc = prims.convert_element_type(a, computation_dtype)


Let's make this conversation conditional

mruberry · 2022-06-14T18:30:33Z

torch/_refs/__init__.py

    return prims.slice_in_dim(a, start, start + length, axis=dim)


+def _normalize(a, norm_dims, eps):


Add a comment for this function -- what it's meant to be used for, how it's intended to be used -- we should probably add type annotations to it

mruberry · 2022-06-14T18:32:40Z

torch/_refs/__init__.py

+    weight: Optional[Tensor],
+    bias: Optional[Tensor],
+    eps: float,
+) -> Tuple[TensorLikeType, TensorLikeType, TensorLikeType]:


With the latest updates I think it's just all Tensor now

…er-norm

IvanYashchuk · 2022-06-16T20:20:12Z

All CI is passing now. Since this PR adds a new OpInfo it revealed a few problems and tests had to be skipped. I submitted an issue #79705.

Except for torch._refs.var_mean vs torch.var_mean (https://github.com/pytorch/pytorch/pull/79413/files#r899336672) I addressed all the feedback.

@Chillee, @mruberry could you please take another look?

mruberry

Great work, @IvanYashchuk! Nice attention to the bug fix and the issue.

Let's ship this as soon as the tests are happy.

Chillee

LGTM, besides the minor nit about torch.var and torch.mean.

I don't want to block this PR on that though - this PR has been through a bunch of rounds already :)

If it's an issue for using this decomposition in the future we can just modify it to torch.var/torch.mean then.

IvanYashchuk · 2022-06-17T07:21:54Z

@pytorchbot merge -g

pytorchmergebot · 2022-06-17T07:23:56Z

@pytorchbot successfully started a merge job. Check the current status here

github-actions · 2022-06-17T07:24:41Z

Hey @IvanYashchuk.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

…79413) Summary: This PR adds references for: - `torch.rsqrt` - `torch.native_layer_norm` - `torch.nn.functional.layer_norm` `native_layer_norm` had a different number of dimensions if the input was 0-sized. I fixed that. Pull Request resolved: #79413 Approved by: https://github.com/mruberry, https://github.com/Chillee Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/bc1fef96aff4aeffe7a7b39ef8b4ff467860df28 Reviewed By: malfet Differential Revision: D37254233 fbshipit-source-id: d712aabb1c26fccb0b19f703f7c75df46f503396

IvanYashchuk added 9 commits June 13, 2022 14:32

Remove special casing for 0-sized inputs

7808339

Add OpInfo for native_layer_norm

1703f58

Remove special casing for 0-sized in decompositions.py

e357a7a

Skip CPU bfloat16 for native_layer_norm

09f6a13

Add refs for rsqrt, native_layer_norm

fc89dd3

Add samples with None arguments for weight and bias

90b7bdf

Three cases for optional weight and bias

3d68aba

Add error_inputs_native_layer_norm

8b95bda

Add input checks

9b5db3f

IvanYashchuk added the module: primTorch label Jun 13, 2022

IvanYashchuk requested review from mruberry and ngimel as code owners June 13, 2022 13:11

facebook-github-bot added the cla signed label Jun 13, 2022

Remove native_layer_norm decomp

f323b3e

pytorchbot added the open source label Jun 13, 2022

IvanYashchuk added 2 commits June 13, 2022 16:23

formatting

f246459

Skip mypy

dfe8a1d

ngimel requested a review from Chillee June 13, 2022 15:51

ngimel reviewed Jun 13, 2022

View reviewed changes

mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 13, 2022

Chillee reviewed Jun 13, 2022

View reviewed changes

IvanYashchuk added 5 commits June 13, 2022 21:53

Remove test_comprehensive skip

87cdca6

Merge remote-tracking branch 'upstream/viable/strict' into native-lay…

472ddf6

…er-norm

Revert "Remove test_comprehensive skip"

0ecffe6

This reverts commit 87cdca6.

Remove test_comprehensive skip

0e47507

Add rsqrt to __all__

1013fe0

mruberry reviewed Jun 14, 2022

View reviewed changes

IvanYashchuk added 12 commits June 15, 2022 16:08

Add a comment to layer_norm_kernel.cu

5730c09

Use _maybe_convert_to_dtype

8d95161

Add docstring to _normalize

a11bda3

Change TensorLikeType->Tensor

6cd5a74

Fix typo

5901865

Add refs.nn.functional.layer_norm

2aaf514

Merge remote-tracking branch 'upstream/viable/strict' into native-lay…

9198804

…er-norm

mypy fixes

0027287

Remove supports_expanded_weight

e875261

xfail gradgrad test and link the issue

eeccfe4

Skip jit test because gradgrad is failing

58b18bc

Skip test_correctness_with_reusing_ir

f7307e1

mruberry added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 16, 2022

mruberry self-requested a review June 16, 2022 23:23

mruberry approved these changes Jun 16, 2022

View reviewed changes

Chillee approved these changes Jun 16, 2022

View reviewed changes

pytorchmergebot added the Merged label Jun 17, 2022

pytorchmergebot closed this in bc1fef9 Jun 17, 2022

mruberry added the topic: not user facing topic category label Jun 17, 2022

Chillee mentioned this pull request Jun 21, 2022

[inductor] Add some prims pytorch/torchdynamo#431

Merged

Chillee mentioned this pull request Aug 20, 2022

Nvfuser to copy decomp to prim #83782

Closed

		return prims.slice_in_dim(a, start, start + length, axis=dim)


		def _normalize(a, norm_dims, eps):

Reference implementations for rsqrt and native_layer_norm #79413

Reference implementations for rsqrt and native_layer_norm #79413

Uh oh!

Conversation

IvanYashchuk commented Jun 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jun 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

✅ No Failures (0 Pending)

Uh oh!

vadimkantorov commented Jun 13, 2022

Uh oh!

ngimel commented Jun 13, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lezcano Jun 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IvanYashchuk commented Jun 16, 2022

Uh oh!

mruberry left a comment

Choose a reason for hiding this comment

Uh oh!

Chillee left a comment

Choose a reason for hiding this comment

Uh oh!

IvanYashchuk commented Jun 17, 2022

Uh oh!

pytorchmergebot commented Jun 17, 2022

Uh oh!

github-actions bot commented Jun 17, 2022

Uh oh!

Reviewers

Assignees

Labels

IvanYashchuk commented Jun 13, 2022 •

edited

Loading

facebook-github-bot commented Jun 13, 2022 •

edited

Loading

lezcano Jun 16, 2022 •

edited

Loading