-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[nnc][scripts] Add a script for bisecting the TE fuser pass #58357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Finding a miscompilation in a large program can be tedious; this script automates the process of bisecting based on the number of fused instructions. Since fusing aten::cat without the corresponding prim::ListConstruct will cause an assertion failure, we treat that case as a "skip" and ignore it for the purpose of bisection. Differential Revision: [D28463808](https://our.internmc.facebook.com/intern/diff/D28463808/) [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit d06d6a1 (more details on the Dr. CI page):
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
Finding a miscompilation in a large program can be tedious; this script automates the process of bisecting based on the number of fused instructions. Since fusing aten::cat without the corresponding prim::ListConstruct will cause an assertion failure, we treat that case as a "skip" and ignore it for the purpose of bisection. Differential Revision: [D28463808](https://our.internmc.facebook.com/intern/diff/D28463808/) ghstack-source-id: 129079484 Pull Request resolved: #58357
Codecov Report
@@ Coverage Diff @@
## gh/bertmaher/130/base #58357 +/- ##
=========================================================
- Coverage 76.49% 76.48% -0.02%
=========================================================
Files 1992 1992
Lines 199840 199902 +62
=========================================================
+ Hits 152877 152890 +13
- Misses 46963 47012 +49 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great debugging script! Wonders what are the internal assert failures..
|
|
||
| # Scan forward from mid towards bad. | ||
| while test_limit <= first_bad and val == -1: | ||
| val = test(cmd, test_limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could be duplicated runs for a test_limit that has already been added in skips. The following check can avoid the duplication:
if test_limit in skips:
test_limit = test_limit + 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or if I understand correctly, whenever we hit a test_limit in skips, then all test_limits in range(test_limit, first_bad) are in skips too. So break the while loop in this case?
if test_limit in skips:
break
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure about this case :). skipping the re-test is probably a legit optimization although I'm not sure whether that case ends up being hit ever b/c of the way the search space narrows each iteration of binary search...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory the case can happen. But maybe in reality it's rare to happen and can be omitted.
An example for one test_limit being hit twice in theory: assuming last_good, first_bad = 0, 100
1st iteration of while keep_going():
mid = 50, all test_limits in (mid, 80) returns -1, and test_limit=80 returns 0 so last_good, first_bad = 0, 80
2nd iteration of while keep_going():
mid = 40, all test_limits in (mid, 80) returns -1 so it starts scanning back towards good ...
In the 2nd iteration, test_limits in (50, 80) are tested the second time.
| skips = set() | ||
|
|
||
| # Test if there are any unskipped commits in (last_good, first_bad) | ||
| def keep_going(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking the limit next to last_good is sufficient?
def keep_going():
return (last_good+1) not in skips and (last_good+1)!=first_bad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is actually good enough -- I'm not saying it's not, but I found it really hard to be certain that what I was writing was correct, so I went with something that looked pretty obviously correct. I think the explicit range check is pretty easy to reason about, and even though it's O(n), it's a pretty cheap O(n) since it's just checking set membership.
|
This pull request has been merged in 9eee782. |
Stack from ghstack:
Finding a miscompilation in a large program can be tedious; this
script automates the process of bisecting based on the number of fused
instructions. Since fusing aten::cat without the corresponding
prim::ListConstruct will cause an assertion failure, we treat that case as a
"skip" and ignore it for the purpose of bisection.
Differential Revision: D28463808