Skip to content

Conversation

@bertmaher
Copy link
Contributor

@bertmaher bertmaher commented May 15, 2021

Stack from ghstack:

Finding a miscompilation in a large program can be tedious; this
script automates the process of bisecting based on the number of fused
instructions. Since fusing aten::cat without the corresponding
prim::ListConstruct will cause an assertion failure, we treat that case as a
"skip" and ignore it for the purpose of bisection.

Differential Revision: D28463808

Finding a miscompilation in a large program can be tedious; this
script automates the process of bisecting based on the number of fused
instructions.  Since fusing aten::cat without the corresponding
prim::ListConstruct will cause an assertion failure, we treat that case as a
"skip" and ignore it for the purpose of bisection.

Differential Revision: [D28463808](https://our.internmc.facebook.com/intern/diff/D28463808/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented May 15, 2021

💊 CI failures summary and remediations

As of commit d06d6a1 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-scanned failure(s)

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot facebook-github-bot added oncall: jit Add this issue/PR to JIT oncall triage queue cla signed labels May 15, 2021
bertmaher added a commit that referenced this pull request May 15, 2021
Finding a miscompilation in a large program can be tedious; this
script automates the process of bisecting based on the number of fused
instructions.  Since fusing aten::cat without the corresponding
prim::ListConstruct will cause an assertion failure, we treat that case as a
"skip" and ignore it for the purpose of bisection.

Differential Revision: [D28463808](https://our.internmc.facebook.com/intern/diff/D28463808/)

ghstack-source-id: 129079484
Pull Request resolved: #58357
@codecov
Copy link

codecov bot commented May 16, 2021

Codecov Report

Merging #58357 (d06d6a1) into gh/bertmaher/130/base (c48f6bc) will decrease coverage by 0.01%.
The diff coverage is n/a.

@@                    Coverage Diff                    @@
##           gh/bertmaher/130/base   #58357      +/-   ##
=========================================================
- Coverage                  76.49%   76.48%   -0.02%     
=========================================================
  Files                       1992     1992              
  Lines                     199840   199902      +62     
=========================================================
+ Hits                      152877   152890      +13     
- Misses                     46963    47012      +49     

@huiguoo huiguoo self-requested a review May 17, 2021 17:35
Copy link
Contributor

@huiguoo huiguoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great debugging script! Wonders what are the internal assert failures..


# Scan forward from mid towards bad.
while test_limit <= first_bad and val == -1:
val = test(cmd, test_limit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be duplicated runs for a test_limit that has already been added in skips. The following check can avoid the duplication:

if test_limit in skips:
  test_limit = test_limit + 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or if I understand correctly, whenever we hit a test_limit in skips, then all test_limits in range(test_limit, first_bad) are in skips too. So break the while loop in this case?

if test_limit in skips:
  break

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure about this case :). skipping the re-test is probably a legit optimization although I'm not sure whether that case ends up being hit ever b/c of the way the search space narrows each iteration of binary search...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory the case can happen. But maybe in reality it's rare to happen and can be omitted.

An example for one test_limit being hit twice in theory: assuming last_good, first_bad = 0, 100
1st iteration of while keep_going():

mid = 50, all test_limits in (mid, 80) returns -1, and test_limit=80 returns 0 so last_good, first_bad = 0, 80

2nd iteration of while keep_going():

mid = 40, all test_limits in (mid, 80) returns -1 so it starts scanning back towards good ...

In the 2nd iteration, test_limits in (50, 80) are tested the second time.

skips = set()

# Test if there are any unskipped commits in (last_good, first_bad)
def keep_going():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the limit next to last_good is sufficient?

def keep_going():
  return (last_good+1) not in skips and (last_good+1)!=first_bad

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is actually good enough -- I'm not saying it's not, but I found it really hard to be certain that what I was writing was correct, so I went with something that looked pretty obviously correct. I think the explicit range check is pretty easy to reason about, and even though it's O(n), it's a pretty cheap O(n) since it's just checking set membership.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 9eee782.

@facebook-github-bot facebook-github-bot deleted the gh/bertmaher/130/head branch May 22, 2021 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants