Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inductor] added aten.exponential_ decomp #91673

Conversation

@pytorch-bot
Copy link

pytorch-bot bot commented Jan 4, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91673

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1ba5577:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@@ -1963,6 +1963,11 @@ def uniform(
)


@register_decomposition(aten.exponential_)
def exponential_(self, rate=1, generator=None):
return self.copy_(-1/rate * torch.log(1 - torch.rand_like(self)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you have to use generator in the torch.rand_like call if it is not None?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @soumith, looks like prim does not support generator yet, I see there is a TODO comment https://github.com/pytorch/pytorch/blob/master/torch/_prims/__init__.py#L2669-L2670.

For now, I think we can just add assert generator is None.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense.

@min-jean-cho min-jean-cho marked this pull request as ready for review January 11, 2023 20:23
Copy link
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function also has an out-of place variant and an out= variant. A better way to implement it would be to implement the out-of-place variant and then generate the in-place via register_inplace and the out= via the out wrapper.

@register_decomposition(aten.exponential_)
def exponential_(self, rate=1, generator=None):
assert generator is None
return self.copy_(-1 / rate * torch.log1p(-torch.rand_like(self)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return self.copy_(-1 / rate * torch.log1p(-torch.rand_like(self)))
return self.copy_(-1 / rate * torch.log(torch.rand_like(self)))

If x ~ U(0,1), 1-x ~ U(0,1).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can go to refs?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for the other distributions, yep

Copy link
Collaborator

@ngimel ngimel Jan 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lezcano this won't work with triton (and generally with any fast log approximation) if compute is properly done in fp32, this is explained in the comments in eager code. Exponential distribution should not generate 0's because pdf at 0 is 0. Yet with half dtype fast log approximation would be truncated to 0:

In [28]: max_rand = torch.rand(10000000000, device="cuda").amax()

In [29]: def fn(x):
    ...:     return x.log().half()
    ...: 

In [30]: opt_fn = torch.compile(fn)
/scratch/ngimel/work/pytorch/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
  warnings.warn(

In [31]: fn(max_rand)
Out[31]: tensor(-5.9605e-08, device='cuda:0', dtype=torch.float16) #fine, eager log doesn't use fast approximation)
In [33]: opt_fn(max_rand)
Out[33]: tensor(-0., device='cuda:0', dtype=torch.float16) #fast log approx

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I guess we'd have similar issues even if we cast it to float?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or this is device specific definition of exponential ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current formula you have implemented should do. This comment #91673 (review) and #91673 (comment) are still relevant though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently cpu incorrectly implements exponential in eager, but cuda exponential_ indeed doesn't contain zero, at least until large lambda would cause underflow:

In [5]: torch.empty(100000000, device="cuda").exponential_().min()
Out[5]: tensor(5.9605e-08, device='cuda:0')

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because CPU implementation uses MKL (

vdRngExponential(VSL_RNG_METHOD_EXPONENTIAL_ICDF, stream, len,
). Let me check with MKL folks whether the VSL vRngExponential excludes zero or not. If it doesn't, we should fix the MKL-based implementation from PyTorch side. cc @CaoE

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link above says f(x) is defined for x >= a, where a is a displacement parameter.

vdRngExponential(VSL_RNG_METHOD_EXPONENTIAL_ICDF, stream, len,
(double *)(sample_ptr + begin), 0, 1./lambda);

Here a is 0, so probably that's why cpu generated 0.

@@ -1963,6 +1963,12 @@ def uniform(
)


@register_decomposition(aten.exponential_)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about casting halfs to higher precision for intermediate computations?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decorating it with pw_cast_for_opmath should be good enough right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh thanks for the catch, pw_cast_for_opmath should do. I'm checking if other ELEMENTWISE_TYPE_PROMOTION_KIND than the default would be applicable.

Copy link
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

@min-jean-cho
Copy link
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 18, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@lezcano
Copy link
Collaborator

lezcano commented Jan 18, 2023

Also, could you add the relevant OpInfo test?

pytorchmergebot pushed a commit that referenced this pull request Nov 15, 2023
…113195)

Range of sampled random variable needs to be clarified for `torch.tensor.exponential_` whose supported interval is (0, inf) is different from [0, inf] of exponential distribution.

Background: #37984 (comment), #48841 (comment), #91673 (comment)

Pull Request resolved: #113195
Approved by: https://github.com/albanD
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] [Inductor] decompose aten.exponential_
8 participants