Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Recursive bisect strategy #1997
First, thank you very much for the new
While looking at the implementation recently, I came across a way I think it might be possible to improve the bisection strategy in
In the case where there is a single, singly-dependent failure (i.e. when there is a single expected failure, which depends upon a single preceding example), I believe the current strategy is optimal.
However, when there are multiple preceding examples that combine to cause the expected failure, then there are cases that the current strategy fails to reduce, and cases that it successfully reduces, but could reduce in fewer steps.
Since the problem cases involve expected failures that depend on multiple preceding examples, I first expanded the
I then added failing test cases:
The rounds-based strategy will not reduce the case
Also, the rounds-based strategy takes 70 runs to infer that the following is irreducible:
(My test case originally had 14 runs, because I hadn't accounted for the first step.)
Instead of searching amongst combinations of the candidate set, the recursive algorithm always splits its candidate set into two slices. Informally:
(Where "fixed" means while reducing one set, the other set is always included as part of each run.)
In case my lousy prose description makes no sense, here is a visual representation of a simple case where there are two culprit examples, one in each half of the candidate set:
I've implemented the above strategy in this PR, and the additional test cases now pass.
There remain some failures elsewhere in the suite, however, because the formatter output no longer matches the previous algorithm's, and because the recursive strategy doesn't proceed in rounds, it's difficult to restore these tests without changing the bisection output.
So, before I tackle that: is this something you'd be interested in merging? I'm happy to modify anything that's unclear. :)
Thanks for digging into this! I'm quite keen to see the bisector improved, and you've clearly thought a lot about this....so this looks great!
I do have a few questions/concerns:
In this instance the algorithms should be identical. The recursive strategy uses the same two initial slices as the rounds-based one, so if there is only a single failure, one of those slices can always be eliminated until the terminating condition is reached. It's only when both of the initial slices are implicated that the algorithms start to differ.
There's a minor difference in that the recursive strategy tests the expected failure(s) on their own at the start to eliminate totally independent failures, then proceeds with the knowledge that at least one culprit exists.
For singly-dependent failures this makes no difference to the number of runs since the rounds-based strategy also does a run with just the expected failure; it just does it at the end, rather than the beginning.
For totally independent failures, the recursive strategy will return in a single run, whereas the rounds-based strategy will take ~log2(n) runs.
For multiply-dependent failures the number of runs differs anyway.
This is my mistake - the diagram is incorrect at that point, and the last run would be skipped because of the
At the moment because the code is clearer when the recursive steps are independent. It's certainly an optimisation that could be applied, but my (brief) attempt to do so resulted in a bit of a mess, so I thought I'd submit the PR with the more understandable code. I'll have another look at this.
I'd love to try something like the diagram. :) The challenging bit would be handling cases where there are more examples than will fit in the terminal width (or 80 chars, or whatever). I'll certainly give it a go - is there a standard width you try to keep your formatter output to? I guess the output also needs to be comprehensible in a non-color terminal, so we couldn't rely solely on colors.
Yep, some real-world testing would definitely be nice. I have a good test project (private source unfortunately) that had a bunch of order-dependent failures I can resurrect and try bisecting, but broader testing would be nice, too. I don't suppose the rspec project has any example test suites lying around to hand that are used for this sort of thing? If not I could look at using the github API to gather some, or just make a twitter appeal. :)
We don't have any width we limit output to. In fact, our formatters commonly go over normal terminal widths and the terminal just wraps it. The dots for the progress formatter are just one line line that the terminal wraps, for example. For the doc formatter, the length is whatever the doc strings provided by the user are, and we do nothing to limit it.
Here it could be a problem for it to wrap if it causes it to run together and makes it hard to distinguish different runs.
Note that we also support multiple bisect formatters -- currently we have a normal one and a verbose one that is enabled via
referenced this pull request
Jun 17, 2015
Thanks for the code feedback, Myron. I have:
I'll definitely continue to work on getting a diagrammatic formatter going, but from my POV it'd be nice to do that as a separate PR after doing the real-world testing on the new algorithm and getting this merged (if it's accepted, of course :)). Would that be a good idea? If you'd like it all in one PR though, that's fine with me.
The full test suite now passes for me locally; I've fixed the travis failures I understand, but some of the suites seem to be failing despite having only pending failures.
Okay, feedback addressed so far:
For the latter issue, I figured that rather than try to cover formatter edge cases in either the
Okay, thanks for all the latest feedback - I think I've covered it all:
Edit: there's a single JRuby failure which appears to be related to my changes (at least, it appears in a bisect-related spec), but is pretty baffling. Don't suppose anyone's seen something similar in the suite before?
We periodically get file descriptor errors like that in our builds for JRuby. I have no idea why, but it's always transient so we just kick the build and run it again. That's what I just did.
It would be nice to not have to do that but since it has to do with a JVM resource limit and not a logic bug in RSpec or our specs, I'm not sure what to do about it :(.
Thanks, @urbanautomaton, this is fantastic!
I left two more comments that are cosmetic changes (formatting / comments). I'm completely happy (ecstatic, even) with this otherwise.
Do you want to rebase into a smaller number of commits before we merge? It's not strictly needed but given the back-and-forth and the fact that we're up to 26 commits it could help cleanup the history a bit.
Okay, remaining bits of feedback addressed - I've rebased into a smaller number of commits, too. The rebase isn't perfect - because of all the back-and-forth I wasn't able to do it totally cleanly, so there are a couple of additions in 6a22644 that only make complete sense in the light of cf1482a (relating to formatter notifications). If you're happy with that then we could just leave it - alternatively I could just squash those commits together and have one big commit that lands both the algorithm and formatter modifications at once.
Thanks once again for all the feedback - the PR is so much better for it. I've always been appreciative of the hard work that goes into maintaining RSpec, but I don't think I've really understood just how much work that actually is. So, cheers. :)
Thanks, @urbanautomaton! The way you've squashed this looks great to me. It tells a very clear story, which is quite helpful.
I did notice one of the commit messages that I think is slightly inaccurate (in a way that can cause confusion), and if I'm correct, I'd like the message amended before we merge. I'm happy to take care of amending the message (or you can take care of it; I don't particularly care) but can you first confirm for me whether or not I'm correct about the message? I'm talking specifically about the message for 6a22644:
Specifically, these lines:
It shows that it first discovers on that the failure happens with using the RH quarter of the RHS...and then for some reason it proceeds to also try the LH quarter of the RHS, when it shouldn't need to. Given that you use
...where the LH quarter of the RHS is tested before the RH quarter of the RHS.
Not trying to be anal here but the usefulness of the graphical diagram (which is really, really great, BTW!) depends on it being accurate or else it causes confusion so I think it's worth getting right. Can you confirm if my understanding is correct?
Hmm - I think you're right about the former, but not the latter. The
ids_to_ignore, duration = track_duration do [lhs, rhs].find do |ids| get_expected_failures_for?(remaining_ids - ids) end end
Edit: sorry, I keep trying to reply in too little time. So I think the following is correct:
and have amended the commit accordingly.