Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get people to stop improperly using AT_ASSERT #20287

Closed
ezyang opened this issue May 8, 2019 · 17 comments
Closed

Get people to stop improperly using AT_ASSERT #20287

ezyang opened this issue May 8, 2019 · 17 comments
Labels
triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@ezyang
Copy link
Contributor

ezyang commented May 8, 2019

Somehow, we have ended up in a situation where a lot of people in the PyTorch codebase are using AT_ASSERT when they should be using AT_CHECK (just do a grep for AT_ASSERT in aten/src/ATen/native, I audited the first five files and every occurrence of AT_ASSERT was wrong). Just to review the intended semantics:

  • AT_ASSERT means "internal error, someone on the PyTorch dev team fucked up." If an AT_ASSERT triggers, that's a bug, and you should report a bug to PyTorch. In fact, the error message on AT_ASSERT says exactly this.
  • AT_CHECK means "user error, report a message to the user." Users might trigger AT_CHECK and that's fine, it just means they misused the API. Give them a nice, human-friendly message, and let them try something else in their Jupyter session, whatever,

Use of AT_ASSERT when you mean AT_CHECK is bad for a few reasons:

  1. At some point in the future, we may legitimately want to add a compilation mode that disables asserts (since, in principle, they should never be triggered). We cannot do this if half of the asserts are legitimate user input validation checks; disabling them will lead to hard to understand crashes on this "optimized" build.
  2. When an improperly tagged AT_ASSERT fails, we get spurious bug reports where users report a bug to PyTorch, because the error message told them to. Examples: C++ API, IValue toTensor() didn't work #13503, Tensor type mismatch, caller expects elements to be int, while tensor contains float #13052, isTensorList() ASSERT FAILED #17911
  3. I hate it!

Let's do some renaming to make the distinction clearer.

We propose the final state be just two macros (I'm lying; there's also an IndexError macro; I intend to preserve it but most people won't use it):

AT_INTERNAL_ASSERT(cond, ...); // panic if cond is not true; with optional message after
AT_CHECK(cond, ...); // error if cond is not true; with optional message after

Translation guide:

  • AT_ASSERT(cond) ==> AT_INTERNAL_ASSERT(cond)
  • AT_ASSERTM(cond, msg) ==> AT_INTERNAL_ASSERT(cond, msg)
  • AT_CHECK(cond, msg) ==> AT_CHECK(cond, msg)
  • AT_ERROR(msg) ==> AT_CHECK(false, msg)

I'm OK with bikeshedding names.

For BC-reasons, we will need to retain the old macros with deprecation warnings. We'll use a similar strategy to AT_DISPATCH_ here. The transition won't be done all in one go; at the same time we'll audit AT_ASSERT sites to see if they really are internal errors or not.

@ezyang
Copy link
Contributor Author

ezyang commented May 8, 2019

cc @gchanan

@ezyang
Copy link
Contributor Author

ezyang commented May 8, 2019

Proposal was modified to use AT_ instead of C10_ on the grounds that it is easier to type.

@gchanan
Copy link
Contributor

gchanan commented May 8, 2019

lgtm.

@smessmer
Copy link
Contributor

smessmer commented May 8, 2019

I fully agree we should fix this and choose names that better describe what these do.

Note that it's not always obvious if AT_ASSERT or AT_CHECK is the right choice.
For example, in my current work on the operator library, say a kernel uses the registration API wrongly. The registration API asserts against this and should (probably?) use AT_CHECK because the bad kernel could be a custom kernel supplied by the user. However, for internal kernels that are part of PyTorch core, you could also argue it should be AT_ASSERT.

@ezyang
Copy link
Contributor Author

ezyang commented May 8, 2019

For example, in my current work on the operator library, say a kernel uses the registration API wrongly. The registration API asserts against this and should (probably?) use AT_CHECK because the bad kernel could be a custom kernel supplied by the user. However, for internal kernels that are part of PyTorch core, you could also argue it should be AT_ASSERT.

Matters will always be murky for the C++ API, because we publish the very same API we use internally and so you can't distinguish between user and internal code. However, one way to solve this problem is to distinguish between the internal API, and the public API (as @mike7ant has asked us to do). Then you can put proper checks in the public API.

@Krovatkin
Copy link
Contributor

Krovatkin commented May 8, 2019

it would be nice if these macros all returned an ostream object for composing more comprehensive error messages.

ASSERT_AT(vec.size()) << "Vector size should never be equal to zero  context ( node =  " << *node <<  " )  some more useful information " ; 

EDIT: grammar

@ezyang
Copy link
Contributor Author

ezyang commented May 8, 2019

@Krovatkin The ASSERT macro, for internal implementation reasons, currently requires you use to a separate ASSERTM for messages.

AT_ASSERTM(vec.size(), "Vector size should never be equal to zero  context ( node =  ", *node,  " )  some more useful information ");

I plan to eliminate this inconsistency. (I'll have to construct two strings on error case because of variadic macro comma pasting shenanigans, but that's a small price to pay).

@suo
Copy link
Member

suo commented May 8, 2019

For informational purposes: the way people are trained to do in fbcode is glog-style: CHECK() will crash the process (not catchable), DCHECK() is the same but gets compiled out in opt. If you want to throw a user-facing error instead of crashing everything, you have to throw explicitly.

I don't think we should have a crashing CHECK(), since there's basically no circumstance where it's okay for library code to crash a production server, and it would be too easy for an errant CHECK() to slip through.

But having a macro vs. throw rule might be easier than remembering which of the macros you're supposed to use (or having an explicit-but-long name like AT_INTERNAL_ASSERT).

@ezyang
Copy link
Contributor Author

ezyang commented May 8, 2019

@suo Are you OK with the AT_CHECK name in that case? It sounds like it might be toxic for fbcode people, and we should change it to something that leads to a less visceral reaction (like AT_ENFORCE). (I personally prefer AT_CHECK but I'm willing to compromise on this.)

@zdevito
Copy link
Contributor

zdevito commented May 8, 2019

This seems like a great improvement. It would be nice if the error messages knew what do with source range information, so, for instance, AT_ASSERT(cond, node->range()) would give useful error reporting information.

@eellison
Copy link
Contributor

eellison commented May 8, 2019

There's JIT_SCRIPT_ASSERT in error_report.h, but no one ever uses that

@zdevito
Copy link
Contributor

zdevito commented May 8, 2019

I am just going to clean up SourceRange so it has an operator<<, that way it will work trivially with AT_CHECK and AT_ASSERT

@zdevito
Copy link
Contributor

zdevito commented May 9, 2019

This patch makes SourceRange printable, so AT_ASSERT(cond, "error message\n", node->sourceRange()) will print context information.
#20300

@ailzhang ailzhang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 9, 2019
facebook-github-bot pushed a commit that referenced this issue May 13, 2019
Summary:
Pull Request resolved: #20321

First part of #20287

- Rename `AT_ASSERT` to `TORCH_INTERNAL_ASSERT`
- Make `TORCH_INTERNAL_ASSERT` work with variadic inputs
- Deprecated `AT_ASSERT` and `AT_ASSERTM`
- Rename `AT_CHECK` to `TORCH_CHECK`
- Make `TORCH_CHECK` give a better error message when no arguments are
  provided
- Deprecate `AT_ERROR` in favor of `TORCH_CHECK(false, ...)`
- Deprecate `AT_INDEX_ERROR` in favor of `TORCH_CHECK_INDEX(false, ...)`
- Rename `AT_WARN` to `TORCH_WARN`

No use sites are changed; I'll work on that in follow up patches
(or disable the deprecation, if necessary.)

Differential Revision: D15278439

fbshipit-source-id: 7e0ed489d4e89e5f56b8ad7eafa72cb9a06065ee
@ezyang
Copy link
Contributor Author

ezyang commented Sep 30, 2019

We fixed this!

@dashesy
Copy link
Contributor

dashesy commented Oct 22, 2019

Just a note. All the docs for cpp extensions still use AT_ASSERTM that is why many users (like me) hooked in using them.

Also, is AT_CHECK also deprecated?

warning: ‘void c10::detail::deprecated_AT_CHECK()’ is deprecated [-Wdeprecated-declarations] ::c10::detail::deprecated_AT_CHECK();

So all custom cpp extensions will have are deprecated now.

@ezyang
Copy link
Contributor Author

ezyang commented Oct 23, 2019

@dashesy Thank you for the note. See pytorch/tutorials#711

@onbigion13
Copy link

nice advice, and thanks for you explanation of the utilities of these two states. I'm exactly struggling understanding the difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

10 participants