[nnc][scripts] Add a script for bisecting the TE fuser pass #58357

bertmaher · 2021-05-15T22:52:18Z

Stack from ghstack:

[nnc][scripts] Add a script for bisecting the TE fuser pass #58357 [nnc][scripts] Add a script for bisecting the TE fuser pass
[nnc] Enable CPU fusion inside Facebook, take 2 #58347 [nnc] Enable CPU fusion inside Facebook, take 2
[nnc] Do not fuse unsqueeze with variable dim #58346 [nnc] Do not fuse unsqueeze with variable dim

Finding a miscompilation in a large program can be tedious; this
script automates the process of bisecting based on the number of fused
instructions. Since fusing aten::cat without the corresponding
prim::ListConstruct will cause an assertion failure, we treat that case as a
"skip" and ignore it for the purpose of bisection.

Differential Revision: D28463808

Finding a miscompilation in a large program can be tedious; this script automates the process of bisecting based on the number of fused instructions. Since fusing aten::cat without the corresponding prim::ListConstruct will cause an assertion failure, we treat that case as a "skip" and ignore it for the purpose of bisection. Differential Revision: [D28463808](https://our.internmc.facebook.com/intern/diff/D28463808/) [ghstack-poisoned]

facebook-github-bot · 2021-05-15T22:52:21Z

💊 CI failures summary and remediations

As of commit d06d6a1 (more details on the Dr. CI page):

1/1 failures possibly* introduced in this PR
- 1/1 non-scanned failure(s)

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Finding a miscompilation in a large program can be tedious; this script automates the process of bisecting based on the number of fused instructions. Since fusing aten::cat without the corresponding prim::ListConstruct will cause an assertion failure, we treat that case as a "skip" and ignore it for the purpose of bisection. Differential Revision: [D28463808](https://our.internmc.facebook.com/intern/diff/D28463808/) ghstack-source-id: 129079484 Pull Request resolved: #58357

codecov · 2021-05-16T11:25:18Z

Codecov Report

Merging #58357 (d06d6a1) into gh/bertmaher/130/base (c48f6bc) will decrease coverage by 0.01%.
The diff coverage is n/a.

@@                    Coverage Diff                    @@
##           gh/bertmaher/130/base   #58357      +/-   ##
=========================================================
- Coverage                  76.49%   76.48%   -0.02%     
=========================================================
  Files                       1992     1992              
  Lines                     199840   199902      +62     
=========================================================
+ Hits                      152877   152890      +13     
- Misses                     46963    47012      +49

huiguoo

Great debugging script! Wonders what are the internal assert failures..

huiguoo · 2021-05-17T17:32:15Z

torch/csrc/jit/tensorexpr/scripts/bisect.py

+
+        # Scan forward from mid towards bad.
+        while test_limit <= first_bad and val == -1:
+            val = test(cmd, test_limit)


There could be duplicated runs for a test_limit that has already been added in skips. The following check can avoid the duplication:

if test_limit in skips: test_limit = test_limit + 1

Or if I understand correctly, whenever we hit a test_limit in skips, then all test_limits in range(test_limit, first_bad) are in skips too. So break the while loop in this case?

if test_limit in skips: break

I'm not entirely sure about this case :). skipping the re-test is probably a legit optimization although I'm not sure whether that case ends up being hit ever b/c of the way the search space narrows each iteration of binary search...

In theory the case can happen. But maybe in reality it's rare to happen and can be omitted.

An example for one test_limit being hit twice in theory: assuming last_good, first_bad = 0, 100
1st iteration of while keep_going():

mid = 50, all test_limits in (mid, 80) returns -1, and test_limit=80 returns 0 so last_good, first_bad = 0, 80

2nd iteration of while keep_going():

mid = 40, all test_limits in (mid, 80) returns -1 so it starts scanning back towards good ...

In the 2nd iteration, test_limits in (50, 80) are tested the second time.

huiguoo · 2021-05-17T18:05:21Z

torch/csrc/jit/tensorexpr/scripts/bisect.py

+    skips = set()
+
+    # Test if there are any unskipped commits in (last_good, first_bad)
+    def keep_going():


Checking the limit next to last_good is sufficient?

def keep_going(): return (last_good+1) not in skips and (last_good+1)!=first_bad

I'm not sure if this is actually good enough -- I'm not saying it's not, but I found it really hard to be certain that what I was writing was correct, so I went with something that looked pretty obviously correct. I think the explicit range check is pretty easy to reason about, and even though it's O(n), it's a pretty cheap O(n) since it's just checking set membership.

facebook-github-bot · 2021-05-18T23:11:38Z

This pull request has been merged in 9eee782.

facebook-github-bot added oncall: jit Add this issue/PR to JIT oncall triage queue cla signed labels May 15, 2021

This was referenced May 15, 2021

[nnc] Do not fuse unsqueeze with variable dim #58346

Closed

[nnc] Enable CPU fusion inside Facebook, take 2 #58347

Closed

huiguoo self-requested a review May 17, 2021 17:35

huiguoo approved these changes May 17, 2021

View reviewed changes

huiguoo reviewed May 17, 2021

View reviewed changes

facebook-github-bot closed this in 9eee782 May 18, 2021

facebook-github-bot added the Merged label May 18, 2021

facebook-github-bot deleted the gh/bertmaher/130/head branch May 22, 2021 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[nnc][scripts] Add a script for bisecting the TE fuser pass #58357

[nnc][scripts] Add a script for bisecting the TE fuser pass #58357

Uh oh!

bertmaher commented May 15, 2021 •

edited

Loading

Uh oh!

facebook-github-bot commented May 15, 2021 •

edited

Loading

Uh oh!

codecov bot commented May 16, 2021

Uh oh!

huiguoo left a comment

Uh oh!

huiguoo May 17, 2021

Uh oh!

huiguoo May 17, 2021

Uh oh!

bertmaher May 18, 2021

Uh oh!

huiguoo May 19, 2021

Uh oh!

huiguoo May 17, 2021

Uh oh!

bertmaher May 18, 2021

Uh oh!

facebook-github-bot commented May 18, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[nnc][scripts] Add a script for bisecting the TE fuser pass #58357

[nnc][scripts] Add a script for bisecting the TE fuser pass #58357

Uh oh!

Conversation

bertmaher commented May 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented May 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

Uh oh!

codecov bot commented May 16, 2021

Codecov Report

Uh oh!

huiguoo left a comment

Choose a reason for hiding this comment

Uh oh!

huiguoo May 17, 2021

Choose a reason for hiding this comment

Uh oh!

huiguoo May 17, 2021

Choose a reason for hiding this comment

Uh oh!

bertmaher May 18, 2021

Choose a reason for hiding this comment

Uh oh!

huiguoo May 19, 2021

Choose a reason for hiding this comment

Uh oh!

huiguoo May 17, 2021

Choose a reason for hiding this comment

Uh oh!

bertmaher May 18, 2021

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented May 18, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bertmaher commented May 15, 2021 •

edited

Loading

facebook-github-bot commented May 15, 2021 •

edited

Loading