Make sure each warnings.warn only executes once inside TorchScript. #45382

gmagogsfm · 2020-09-26T22:16:23Z

Add a pass at end of runCleanupPasses to annotate aten::warn so that each has its unique id
Enhanced interpreter so that it tracks which aten::warn has been executed before and skip them
Improved insertInstruction so that it correctly checks for overflow

Fixes #45108

dr-ci · 2020-09-26T22:26:46Z

💊 CI failures summary and remediations

As of commit c214d52 (more details on the Dr. CI page):

Commit c214d52 was recently pushed. Waiting for builds...

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 65 times.

gmagogsfm · 2020-09-28T22:15:08Z

There is a still a doctest failing, but I wanted to get some early feedbacks on the approach, thus sending out for review now.

dzhulgakov · 2020-09-28T23:11:05Z

torch/csrc/jit/passes/annotate_warns.cpp

+namespace jit {
+
+void AnnotateWarns(Block* b) {
+  static int64_t idx = 0;


you probably need to make it atomic as compilation might happen concurrently from different threads

Ended up removing the idx per your suggestion.

dzhulgakov · 2020-09-28T23:13:09Z

torch/csrc/jit/runtime/interpreter.cpp

@@ -857,7 +870,11 @@ struct CodeImpl {

  void emitWarn(Node* node) {
    emitLoadInputs(node->inputs());
-    insertInstruction(WARN);
+    int64_t idx = -1;
+    if (node->hasAttribute(attr::warn_id)) {


just curious - why do we need to have separate indices? would using just Node* as a key here be sufficient? or are there some considerations with inlining that might change that? (I'm not TS expert, so @suo probably has a better idea)

Good point. Changed to using Node* as key.

New turn of events.

Using Node* as key doesn't work for ProflingExecutor, which unrolls loops and thus creating many copies of same node, resulting in many calls to warnings. Switched back to using an unique ID attached to aten::warn.

dzhulgakov · 2020-09-28T23:14:16Z

torch/csrc/jit/runtime/interpreter.cpp

+            // TODO TODO this set should be graph specific, rather than global.
+            bool need_warn = true;
+            if (inst.X != -1) {
+              auto inserted = warned_indices_.insert(inst.X);


you probably need a lock for it because afaiu the same Code can run concurrently from several threads

dzhulgakov · 2020-09-28T23:18:46Z

torch/csrc/jit/runtime/interpreter.cpp

+              const auto msg = pop(stack).toStringRef();
+              if (need_warn) {
+                TORCH_WARN(msg);
+              }


it was the case with this code even earlier, but I find it suspicious that this branch didn't have drop(stack,1). Do we actually have unittests that verify it? It might be a bug because I think that additional argument (i.e. stacklevel) shouldn't depend on presence or absence of range information. Safer fix might be to record # of node.inputs in inst.N and use it to drop from the stack

I think adding an extra stack causes interpreter error. It kind of makes sense since lack of file info means stack_level is not meaningful. Anyway, I will do some more investigation and address this issue in a later PR.

Yeah, I'm just asking whether we actually ever have a codepath exercising this branch (as usually we have the file info).

dzhulgakov · 2020-09-28T23:21:10Z

torch/csrc/jit/runtime/interpreter.cpp

-    instructions_.emplace_back(op, X, N);
+    instructions_.emplace_back(
+        op,
+        safe_narrow_cast<int32_t, int64_t>(X),


wait, won't it throw if there are > int32 distinct warnings? it actually can happen because you keep increasing the same counter across all models, so it might overflow eventually. you probably want explicit static_cast in emitWarn i n this case as the truncation is ok

Removed this logic since index is no longer needed

Added static_cast at time of emitting warn op.

dzhulgakov

Looks good in a current form, minor comments are optional

dzhulgakov · 2020-09-29T05:26:45Z

torch/csrc/jit/runtime/interpreter.cpp

+  // ensure each WARN instruction only executes once to mimic Python behavior.
+  struct WarnedNodes {
+   public:
+    bool contains(Node* n) {


just to be a bit paranoid (not that it matters in this case): there can be a race between two threads between calls to contains and insert that makes two logs to be logged. You can go by with a single method shoud_log_once that both tries to insert and returns a bool whether it was successfully inserted.

Given it's a cheap operation, the simple (not read-write) mutex is fine too

dzhulgakov · 2020-09-29T05:28:47Z

torch/csrc/jit/runtime/interpreter.cpp

+              const auto msg = pop(stack).toStringRef();
+              if (need_warn) {
+                TORCH_WARN(msg);
+              }


Yeah, I'm just asking whether we actually ever have a codepath exercising this branch (as usually we have the file info).

facebook-github-bot

@gmagogsfm has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-10-02T22:12:31Z

@gmagogsfm merged this pull request in d150d3e.

#46369) Summary: This diff restores previous behavior of silently allow overflowing when inserting instructions. The behavior was changed recently in #45382. But it started to break some existing use cases that haver overflow problems. Restoring original behavior but throw a warning to to unblock existing use cases where overflowing happens. Pull Request resolved: #46369 Reviewed By: kwanmacher, wanchaol, fbhuba Differential Revision: D24324345 Pulled By: gmagogsfm fbshipit-source-id: 1c0fac421d4de38f070e21059bbdc1b788575bdf

dzhulgakov · 2020-10-22T20:09:29Z

torch/csrc/jit/runtime/interpreter.cpp

+    std::unordered_set<int32_t> warned_nodes_;
+  };
+
+  WarnedNodes warned_nodes_;


This is actually not we want - multiple invocations of the same function should still warn only once (like python does). Sorry for missing it in the review. You probably need to move it to Code level or something like this

Another suggestion - instead of unordered_set, just create a vector<atomic> inside Code and use that to modify the index directly. I think this way you don't need locking. Or you can still have unordered_map, but pre-populate it with all possible indices beforehand so that you don't need the lock

Thanks for the suggestion, I wonder if what python does is actually desired. Think about following use case:

@torch.jit.script def issue_warning_wrong_dtype(dtype: str): warnings.warn("Incorrect data type " + dtype)

It may be called in various callsites, and they are all useful. IMHO, it is slightly emit warning for each unique callsite. Number of warnings would still not be too spammy because there likely won't be that many callsites to the extend that they feel spammy to users. What do you think?

I am still investigating the issue reported internally, which I feel is a multi-threading issues similar to #46684. Will update here once finding out more.

Well, ideally we should respect stacklevel argument, like Python does. I.e. in you issue_warning_wrong_dtype example, it'd be stacklevel=2 or something. However, that's much harder to implement and frankly I don't think it's worth the effort as warnings in TorchScript are kind of a niche use case in general.

Fixing the spammy warning is important though, because the typical case is inference, where model is called many times in a loop and each of them creates independent InterpreterState. So the current implementation sadly will log on each invocation spamming the logs. (that's pretty much the internal issue you're referring to).

I thought about this problem a lot but couldn't think of a clean way to implement the "warn once" behavior, which requires maintaining a global state that is above ScriptModule. This sort of goes against TorchScript's philosophy of keep states of ScriptModule local to itself.

Here is an alternative solution: Given what we talked about, it sounds like it is OK (or even preferred) to have zero warnings emitted during inference. If that's the case, we can expose a control knob that predictor can toggle to disable Warnings entirely inside TorchScript module. What do you think?

Yeah, we can have a global knob :)

In general, I don't think that having this state across ScriptModule invocation is necessarily that bad. It's a very narrow case of state and it matches python's warnings module behavior.

requires maintaining a global state that is above ScriptModule

I wonder whether we can just add a bit per original Warning IR Node somewhere. That'd be the closest match to what python does and also keep state "local" to ScriptModule that owns the original graph.

I wonder whether we can just add a bit per original Warning IR Node somewhere. That'd be the closest match to what python does and also keep state "local" to ScriptModule that owns the original graph.

Based on my rough understanding, that bit per Warning node would have to be in predictor, which runs inference calls in a for loop. In that case, implementation detail of TorchScript kind of leaked. Let me know what you think.

dzhulgakov · 2020-10-22T20:10:32Z

test/jit/test_warn.py

+        FileCheck() \
+            .check_count(
+                str="UserWarning: I am warning you",
+                count=2,


Yeah, so in this case it should be 1, not 2 - that's what Python is doing

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Sep 26, 2020

gmagogsfm marked this pull request as ready for review September 28, 2020 22:14

gmagogsfm requested a review from apaszke as a code owner September 28, 2020 22:14

gmagogsfm requested a review from eellison September 28, 2020 22:14

dzhulgakov reviewed Sep 28, 2020

View reviewed changes

dzhulgakov requested a review from suo September 28, 2020 23:21

gmagogsfm requested a review from dzhulgakov September 29, 2020 04:09

dzhulgakov approved these changes Sep 29, 2020

View reviewed changes

Make sure each warnings.warn only executes once inside TorchScript.

c214d52

gmagogsfm requested a review from dzhulgakov October 1, 2020 01:33

facebook-github-bot reviewed Oct 1, 2020

View reviewed changes

gmagogsfm requested a review from Lilyjjo October 2, 2020 18:36

facebook-github-bot closed this in d150d3e Oct 2, 2020

facebook-github-bot added the merged label Oct 2, 2020

gmagogsfm mentioned this pull request Oct 15, 2020

[JIT] Make InsertInstruction overflow check a warning instead of fatal #46369

Closed

dzhulgakov reviewed Oct 22, 2020

View reviewed changes

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure each warnings.warn only executes once inside TorchScript. #45382

Make sure each warnings.warn only executes once inside TorchScript. #45382

gmagogsfm commented Sep 26, 2020 •

edited

dr-ci bot commented Sep 26, 2020 •

edited

gmagogsfm commented Sep 28, 2020

dzhulgakov Sep 28, 2020

gmagogsfm Sep 29, 2020

dzhulgakov Sep 28, 2020

gmagogsfm Sep 29, 2020

gmagogsfm Oct 1, 2020

dzhulgakov Sep 28, 2020

gmagogsfm Sep 29, 2020

dzhulgakov Sep 28, 2020

gmagogsfm Sep 29, 2020

dzhulgakov Sep 29, 2020

dzhulgakov Sep 28, 2020

gmagogsfm Sep 29, 2020

gmagogsfm Oct 1, 2020

dzhulgakov left a comment

dzhulgakov Sep 29, 2020

dzhulgakov Sep 29, 2020

facebook-github-bot left a comment

facebook-github-bot commented Oct 2, 2020

dzhulgakov Oct 22, 2020

dzhulgakov Oct 23, 2020

gmagogsfm Oct 26, 2020

dzhulgakov Oct 27, 2020

gmagogsfm Nov 2, 2020

dzhulgakov Nov 9, 2020

gmagogsfm Nov 9, 2020

dzhulgakov Oct 22, 2020

Make sure each warnings.warn only executes once inside TorchScript. #45382

Make sure each warnings.warn only executes once inside TorchScript. #45382

Conversation

gmagogsfm commented Sep 26, 2020 • edited

dr-ci bot commented Sep 26, 2020 • edited

💊 CI failures summary and remediations

gmagogsfm commented Sep 28, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dzhulgakov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Oct 2, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gmagogsfm commented Sep 26, 2020 •

edited

dr-ci bot commented Sep 26, 2020 •

edited