New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive bisect strategy #1997

Merged
merged 6 commits into from Jun 29, 2015

Conversation

Projects
None yet
3 participants
@urbanautomaton
Copy link
Contributor

urbanautomaton commented Jun 15, 2015

Hi,

First, thank you very much for the new --bisect feature - it's already saved me and a colleague hours of painstaking manual debugging, and it doesn't get much better than that. :)

While looking at the implementation recently, I came across a way I think it might be possible to improve the bisection strategy in ExampleMinimizer. Presently a round-based approach is used, which attempts to find ~50% of the current residual candidates to ignore in each round. It first searches for contiguous blocks of candidates to ignore, then for more complex combinations of the candidate set. If it can't find ~50% of the candidate set to ignore at any stage, it concludes that the run is irreducible.

In the case where there is a single, singly-dependent failure (i.e. when there is a single expected failure, which depends upon a single preceding example), I believe the current strategy is optimal.

However, when there are multiple preceding examples that combine to cause the expected failure, then there are cases that the current strategy fails to reduce, and cases that it successfully reduces, but could reduce in fewer steps.

Examples

Since the problem cases involve expected failures that depend on multiple preceding examples, I first expanded the FakeBisectRunner class to support simulating multiply-dependent failures.

I then added failing test cases:

The rounds-based strategy will not reduce the case C C C C C C C I F (where C = culprit, I = innocent, and F = expected failure), as the first round fails to remove 50% of the candidates.

Also, the rounds-based strategy takes 70 runs to infer that the following is irreducible: C C C C C C C C F. With a recursive strategy, I believe this should be possible in 1 + 2 + 4 + 8 = 15 runs:

  • 1 run to determine that at least one of the candidates is responsible
  • 2 runs to determine that both halves of the candidates are responsible
  • 4 runs to determine that all quarters of the candidates are responsible
  • 8 runs to determine that all of the candidates are responsible

(My test case originally had 14 runs, because I hadn't accounted for the first step.)

Recursive strategy

Instead of searching amongst combinations of the candidate set, the recursive algorithm always splits its candidate set into two slices. Informally:

  1. Is the candidate set a single example?
    • yes: return it as the culprit
    • no:
      1. Divide the candidate set in two
      2. Run each slice with the expected failure
      3. Can either set be eliminated?
        • yes: recurse on other set
        • no: recurse on both sets, retaining other set as "fixed" for each run

(Where "fixed" means while reducing one set, the other set is always included as part of each run.)

In case my lousy prose description makes no sense, here is a visual representation of a simple case where there are two culprit examples, one in each half of the candidate set:

I've implemented the above strategy in this PR, and the additional test cases now pass.

Formatter

There remain some failures elsewhere in the suite, however, because the formatter output no longer matches the previous algorithm's, and because the recursive strategy doesn't proceed in rounds, it's difficult to restore these tests without changing the bisection output.

So, before I tackle that: is this something you'd be interested in merging? I'm happy to modify anything that's unclear. :)

Cheers,
Simon

@myronmarston

This comment has been minimized.

Copy link
Member

myronmarston commented Jun 16, 2015

Thanks for digging into this! I'm quite keen to see the bisector improved, and you've clearly thought a lot about this....so this looks great!

I do have a few questions/concerns:

  • How does this affect the common case where there is on culprit causing one or more other failures when it runs before them? You mention that the current algorithm is optimal for this common case, so presumably your recursive algorithm performs less optimally...any ideas what the affect is? (Particularly in terms of runtime and number of runs)
  • The diagram you provided is reaaallly helpful, but there are a couple things that surprise me about it:
    • For the "Recurse on RHS with LHS Fixed" section, why is the last run needed? It looks like you've identified the culprit on the RHS after the 3rd run so the 4th seems unnecessary (and wasteful).
    • For the "Recurse on LHS with RHS Fixed" section, why is the RHS fixed to all 4 examples when you've already identified which of the 4 examples is the culprit? The individual runs will take less time if we don't run the extra examples we've already identified as safe to ignore.
  • Given that this is a departure from a rounds-based strategy, and the formatter output is built around that...do you have thoughts on what the formatter output might look like for this strategy? The diagram is really cool and it makes me wonder if we could do something like that...
  • I'd be interested on seeing some A/B test results of the two strategies on real-world example repos. It'd help me understand how this affects users.

Thanks again!

@urbanautomaton

This comment has been minimized.

Copy link
Contributor

urbanautomaton commented Jun 16, 2015

Hi Myron,

How does this affect the common case where there is one culprit

In this instance the algorithms should be identical. The recursive strategy uses the same two initial slices as the rounds-based one, so if there is only a single failure, one of those slices can always be eliminated until the terminating condition is reached. It's only when both of the initial slices are implicated that the algorithms start to differ.

There's a minor difference in that the recursive strategy tests the expected failure(s) on their own at the start to eliminate totally independent failures, then proceeds with the knowledge that at least one culprit exists.

For singly-dependent failures this makes no difference to the number of runs since the rounds-based strategy also does a run with just the expected failure; it just does it at the end, rather than the beginning.

For totally independent failures, the recursive strategy will return in a single run, whereas the rounds-based strategy will take ~log2(n) runs.

For multiply-dependent failures the number of runs differs anyway.

For the "Recurse on RHS with LHS Fixed" section, why is the last run needed?

This is my mistake - the diagram is incorrect at that point, and the last run would be skipped because of the #find usage here, combined with the terminating condition that assumes any candidate set of length 1 must be the culprit.

why is the RHS fixed to all 4 examples when you've already identified which of the 4 examples is the culprit?

At the moment because the code is clearer when the recursive steps are independent. It's certainly an optimisation that could be applied, but my (brief) attempt to do so resulted in a bit of a mess, so I thought I'd submit the PR with the more understandable code. I'll have another look at this.

do you have thoughts on what the formatter output might look like for this strategy?

I'd love to try something like the diagram. :) The challenging bit would be handling cases where there are more examples than will fit in the terminal width (or 80 chars, or whatever). I'll certainly give it a go - is there a standard width you try to keep your formatter output to? I guess the output also needs to be comprehensible in a non-color terminal, so we couldn't rely solely on colors.

I'd be interested on seeing some A/B test results of the two strategies on real-world example repos.

Yep, some real-world testing would definitely be nice. I have a good test project (private source unfortunately) that had a bunch of order-dependent failures I can resurrect and try bisecting, but broader testing would be nice, too. I don't suppose the rspec project has any example test suites lying around to hand that are used for this sort of thing? If not I could look at using the github API to gather some, or just make a twitter appeal. :)

@myronmarston

This comment has been minimized.

Copy link
Member

myronmarston commented Jun 16, 2015

I'd love to try something like the diagram. :) The challenging bit would be handling cases where there are more examples than will fit in the terminal width (or 80 chars, or whatever). I'll certainly give it a go - is there a standard width you try to keep your formatter output to? I guess the output also needs to be comprehensible in a non-color terminal, so we couldn't rely solely on colors.

We don't have any width we limit output to. In fact, our formatters commonly go over normal terminal widths and the terminal just wraps it. The dots for the progress formatter are just one line line that the terminal wraps, for example. For the doc formatter, the length is whatever the doc strings provided by the user are, and we do nothing to limit it.

Here it could be a problem for it to wrap if it causes it to run together and makes it hard to distinguish different runs.

Note that we also support multiple bisect formatters -- currently we have a normal one and a verbose one that is enabled via --bisect=verbose. So maybe the diagram could use --bisect=diagram or replace the verbose output or something.

@myronmarston

This comment has been minimized.

Copy link
Member

myronmarston commented Jun 16, 2015

I don't suppose the rspec project has any example test suites lying around to hand that are used for this sort of thing?

Sadly, no.

end

def bisect_over_recursive(candidate_ids, fixed)
return candidate_ids if candidate_ids.length == 1

This comment has been minimized.

@myronmarston

myronmarston Jun 16, 2015

Member

one? works nicely for this case:

return candidate_ids if candidate_ids.one?
else
slices.permutation(2).map do |slice, other_slice|
bisect_over_recursive(slice, fixed + other_slice)
end.flatten(1)

This comment has been minimized.

@myronmarston

myronmarston Jun 16, 2015

Member

I had to think for a while about this to confirm that there's always only two permutations here. IMO, this would be easier to understand if you explicitly spelled out the two permutations rather than using permutation(2).map { }.flatten(1). This also works out well to let us name the two sides lhs and rhs which helped me understand the algorithm and your diagram:

lhs, rhs = candidate_ids.each_slice(slice_size).to_a

# ...

ids_to_ignore = [lhs, rhs].find do |ids|
  get_expected_failures_for?(candidate_ids - ids + fixed)
end

if ids_to_ignore
  # ...
else
  bisect_over_recursive(lhs, fixed + rhs) +
  bisect_over_recursive(rhs, fixed + lhs)
end

This simpler form may also help with this:

At the moment because the code is clearer when the recursive steps are independent. It's certainly an optimisation that could be applied, but my (brief) attempt to do so resulted in a bit of a mess, so I thought I'd submit the PR with the more understandable code. I'll have another look at this.

...since they are no longer generically run through a map block the 2nd recursive call can more easily be updated to account for what was learned from the 1st recursive call.

end

currently_needed_ids
if ids_to_ignore
self.remaining_ids = candidate_ids - ids_to_ignore

This comment has been minimized.

@myronmarston

myronmarston Jun 16, 2015

Member

Mutating this on self always made me a bit uneasy but it was the best way I could find to make it work before. If you find it works best for the new algorithm, feel free to keep it...but if you can find a clean way to not mutate the state like this, that would be great :).

This comment has been minimized.

@urbanautomaton

urbanautomaton Jun 16, 2015

Contributor

Yeah - I tried to eliminate it but ended up having problems with #repro_command_for_currently_needed_ids. I think to support the graceful abort on SIGINT feature the minimizer will have to store its current progress (since the execution state will be lost on interrupt), so I don't think it's going to be easy to get away from this bit of state.

Maybe it'd be good to write the algorithm purely with local variables, then add explicit #memoize_progress calls to write the state, just to make it a bit clearer why the object-level state is needed. I'll give that a go and see if it's any nicer.

@urbanautomaton urbanautomaton force-pushed the urbanautomaton:recursive-bisect-strategy branch 2 times, most recently from b909be7 to b600552 Jun 18, 2015

@urbanautomaton

This comment has been minimized.

Copy link
Contributor

urbanautomaton commented Jun 18, 2015

Thanks for the code feedback, Myron. I have:

  • Altered the algorithm to track eliminated ids rather than remaining ids, allowing the algorithm to use the knowledge gained from LHS recursion when recursing on the RHS (updated algorithm diagram).
  • Simplified the recursive logic to eliminate the #permutation call (don't know what I was thinking there, really :) )
  • Had another go at eliminating the object-level progress state, but it got messy again. In the end I think if it has to exist for the graceful-exit-on-SIGINT feature, it's less confusing to just accept and use it, rather than having multiple sources of truth for algorithm progress.
  • Updated the progress and verbose formatters as minimally as possible to make sense with the new algorithm - here's a gist with sample output.
  • Patched up the specs and features involving formatter output to match the new display (for the debug formatter, this had the nice side-effect of demonstrating that the algorithms are essentially identical for the singly-dependent case).

I'll definitely continue to work on getting a diagrammatic formatter going, but from my POV it'd be nice to do that as a separate PR after doing the real-world testing on the new algorithm and getting this merged (if it's accepted, of course :)). Would that be a good idea? If you'd like it all in one PR though, that's fine with me.

The full test suite now passes for me locally; I've fixed the travis failures I understand, but some of the suites seem to be failing despite having only pending failures.

@myronmarston

This comment has been minimized.

Copy link
Member

myronmarston commented Jun 19, 2015

The build is failing due to missing code coverage. To see the missing coverage, run script/rspec_with_simplecov and the open coverage/index.html. It looks like the bisect formatter has some uncovered lines. Let me know if you want help with that.

Bisecting over non-failing examples 1-9 .. ignoring examples 6-9
Bisecting over non-failing examples 1-5 .. ignoring examples 4-5
Bisecting over non-failing examples 1-3 .. ignoring example 3
Bisecting over non-failing examples 1-2 . ignoring example 1

This comment has been minimized.

@myronmarston

myronmarston Jun 19, 2015

Member

The duration of each round was really nice before. Can we include the duration on these lines as well?

Also, how does this format it when the set being bisected over isn't contiguous? e.g. when you've fixed the LHS and are running part of the RHS.

This comment has been minimized.

@urbanautomaton

urbanautomaton Jun 19, 2015

Contributor

Yep, I'll look at putting that back. It dropped out because there wasn't a natural equivalent of :bisect_round_finished in the recursive scheme, but I'll work it back in.

There's an illustration of the more complex recursive case here - in the case where both sides need recursing on it just doesn't display any ignorable ids at the moment. I'm thinking of adding text like "Both sides implicated - splitting..." to indicate this situation. Will follow up with a proper test to illustrate.

bisect_over_recursive(candidate_ids)
end

def bisect_over_recursive(candidate_ids)

This comment has been minimized.

@myronmarston

myronmarston Jun 19, 2015

Member

Instead of bisect_over(candidate_ids) and bisect_over_recursive(candidate_ids), WDYT about calling the methods bisect() and bisect_over(candidate_ids)? The former method is called only once and already assumes the candidate_ids list is non_failing_example_ids so it's not really needed as an argument. Plus it performs the entire bisect procedure over the entire id set, not just over a subset, so bisect seems like a better method name. I also prefer not to put recursive in a method name -- that's obvious from seeing it call itself.

This comment has been minimized.

@urbanautomaton

urbanautomaton Jun 19, 2015

Contributor

Agreed, will rename as suggested.

@@ -8,7 +6,7 @@ module Bisect
# repeatedly running different subsets of the suite.
class ExampleMinimizer
attr_reader :runner, :reporter, :all_example_ids, :failed_example_ids
attr_accessor :remaining_ids
attr_accessor :eliminated_ids

This comment has been minimized.

@myronmarston

myronmarston Jun 19, 2015

Member

Why the switch from remaining_ids to eliminated_ids? They are just inverses (bearing the relationship remaining_ids = non_failing_example_ids - eliminated_ids) so it seems like whats possible or easy with one is just as possible or easy with the other, right? I think tracking remaining_ids has a few advantages over tracking eliminated_ids:

  • It's what was already in place, so sticking with it means less churn here...which would help keep the diff on the PR focused on behavioral changes rather than including an equivalent structural change.
  • remaining_ids seems more fundamental to me. After all, that's what the final repo command is based on. In addition, we keep passing remaining_ids as part of the notification events so it's clearly important externally. Contrast that to eliminated_ids -- as far as I can see it's only used within this class.
  • Having remaining_ids be a logical attribute that is computed on every call (e.g. for every notification that includes it) is going to be slower than if we tracked it directly. (Admittedly, this perf difference is not going to be noticable.)

Thoughts? If you still wanted to write code in terms of eliminating ids, you could define an eliminate_ids helper method that would remove the ids from remaining_ids.

This comment has been minimized.

@urbanautomaton

urbanautomaton Jun 19, 2015

Contributor

Hmm. I'd thought it was because it made calculating the working set for each run easier, but on reflection it doesn't at all. Will revert. :)

@myronmarston

This comment has been minimized.

Copy link
Member

myronmarston commented Jun 19, 2015

This is looking really good, @urbanautomaton. I haven't had a chance to finish my review and have to get ready for work but I'll try to finish my review later today or tomorrow.

@urbanautomaton

This comment has been minimized.

Copy link
Contributor

urbanautomaton commented Jun 19, 2015

Thanks @myronmarston - I've only been able to grab 30 minutes here and there to work on this since submitting it, so a steady drip of feedback is perfect. :)

@urbanautomaton urbanautomaton force-pushed the urbanautomaton:recursive-bisect-strategy branch from bfb9348 to 080da9d Jun 21, 2015

@urbanautomaton

This comment has been minimized.

Copy link
Contributor

urbanautomaton commented Jun 21, 2015

Okay, feedback addressed so far:

  • Renamed bisect_over/bisect_over_recursive as suggested
  • Reverted to tracking remaining ids instead of eliminated ones
  • Restored durations to formatter output for both simple and complex cases
  • Fixed the coverage issues for the bisect formatter

For the latter issue, I figured that rather than try to cover formatter edge cases in either the Bisect::Coordinator spec or the feature tests, it made most sense to have a dedicated spec for BisectProgressFormatter. I added an example for the case I needed, but it now looks weirdly incomplete. Since I'm planning to try a diagrammatic version soon, though, I don't really want to exhaustively test the current behaviour only to immediately rewrite it. Are you happy for me to fill this spec out in a follow-up PR with the diagram formatter?

@@ -106,8 +107,10 @@ Feature: Bisect
- ./spec/calculator_7_spec.rb[1:1]
- ./spec/calculator_8_spec.rb[1:1]
- ./spec/calculator_9_spec.rb[1:1]
Checking that failures are order-dependent..
- Running: rspec ./spec/calculator_1_spec.rb[1:1] --seed 1234 (n.nnnn seconds) failure is order-dependent

This comment has been minimized.

@myronmarston

myronmarston Jun 22, 2015

Member

The "failure is order dependent" bit isn't separated at all from the command so it kinda runs together. Thoughts on adding an elipses or something similar to separate?

This comment has been minimized.

@urbanautomaton

urbanautomaton Jun 23, 2015

Contributor

Sure, I'll sort something out there (might put it on a separate line):

Checking that failures are order-dependent...
 - Running: <snip>
Failure is order-dependent

(with the above-mentioned changes to the "is order-dependent" copy, of course)

Bisecting over non-failing examples 1-9 .. ignoring examples 6-9 (0.30166 seconds)
Bisecting over non-failing examples 1-5 .. ignoring examples 4-5 (0.30306 seconds)
Bisecting over non-failing examples 1-3 .. ignoring example 3 (0.33292 seconds)
Bisecting over non-failing examples 1-2 . ignoring example 1 (0.16476 seconds)

This comment has been minimized.

@myronmarston

myronmarston Jun 22, 2015

Member

Numbering the rounds was kinda nice before. What do you think about adding back the Round x: label?

This comment has been minimized.

@urbanautomaton

urbanautomaton Jun 23, 2015

Contributor

I left it out because in this algorithm there aren't really sequential rounds (they're executed sequentially, but really the doubly-recursive case is a branch), so I couldn't think of a neat way to map the doubly-recursive case onto the old rounds-counting system.

I tried adding an object-level counter and incrementing it every time bisect_over is entered, but I didn't like the extra state (and it didn't really represent what the algorithm was doing very well). I'd be happy to put it back though if you think it's okay.

This comment has been minimized.

@myronmarston

myronmarston Jun 23, 2015

Member

Maybe the round counter can be kept in the formatter? It's really a formatting concern and it seems simpler to handle it there.

@@ -54,12 +54,12 @@ Feature: Bisect
Bisect started using options: "--seed 1234"
Running suite to find failures... (0.16755 seconds)
Starting bisect with 1 failing example and 9 non-failing examples.
Checking that failures are order-dependent... failure is order-dependent

This comment has been minimized.

@myronmarston

myronmarston Jun 22, 2015

Member

Technically, we don't actually know that the failure is order dependent -- it merely appears to be. But it could flap for other reasons (e.g. based on randomness or some other indeterminacy) and it could just happen to fail with all examples and pass when run on its own. I don't feel strongly about this, but phrasing this as failure appears to be order-dependent is more accurate. Thoughts?

This comment has been minimized.

@urbanautomaton

urbanautomaton Jun 23, 2015

Contributor

Yep, will make that change.

@@ -5,8 +5,8 @@ module RSpec::Core::Formatters
BisectProgressFormatter = Class.new(remove_const :BisectProgressFormatter) do
RSpec::Core::Formatters.register self

def bisect_round_finished(notification)
return super unless notification.round == 3
def bisect_round_started(notification)

This comment has been minimized.

@myronmarston

myronmarston Jun 22, 2015

Member

It looks like you've removed bisect_round_finished but kept bisect_round_started. Seems a little odd that they aren't paired. Not sure if there's anything to do about that though...

This comment has been minimized.

@urbanautomaton

urbanautomaton Jun 23, 2015

Contributor

Yeah, I don't really see a way around this one - there are two exit conditions for a round now, and because it's recursive we can only really say a round "ends" when all of its sub-rounds end, which doesn't really help much. So I ended up with bisect_round_started and two exit conditions, bisect_ignoring_ids and bisect_multiple_culprits_detected.

I'll try to think of better names for these to make it clear they're a related set of notifications.

This comment has been minimized.

@urbanautomaton

urbanautomaton Jun 26, 2015

Contributor

I've renamed these to give them all the same prefix:

  • bisect_round_started
  • bisect_round_ignoring_ids
  • bisect_round_detected_multiple_culprits

There's still the split for the different exit conditions, but at least now they're all more obviously a set.

ids_to_ignore = subsets.find do |ids|
notify(
:bisect_round_started,
:candidates_range => candidates_range(candidate_ids),

This comment has been minimized.

@myronmarston

myronmarston Jun 22, 2015

Member

IMO, this would read more naturally as candidate_range rather than candidates_range -- kinda like you hear people say "age range" but not "ages range".

This comment has been minimized.

@urbanautomaton

urbanautomaton Jun 23, 2015

Contributor

Agreed, will change.

end

def bisect_dependency_check_finished(notification)
is_part = notification.dependent_failures ? "is" : "is not"

This comment has been minimized.

@myronmarston

myronmarston Jun 22, 2015

Member

Seeing this conditional makes me think that maybe it would be better to have separate notifications for the two cases, e.g. bisect_dependency_check_succeeded and bisect_dependency_check_failed or similar. Thoughts?

This comment has been minimized.

@urbanautomaton

urbanautomaton Jun 23, 2015

Contributor

Heh, I actually had them separate initially exactly as you suggest, but joined them into one notification. Will put them back.

else
"examples #{notification.ignore_range.begin}-" \
"#{notification.ignore_range.end}"
end

This comment has been minimized.

@myronmarston

myronmarston Jun 22, 2015

Member

I'm seeing duplication between the range_desc = [ ... ].join('-') bit in the method above and this bit here. WDYT about passing a custom type instead of a Range object that has a description method that takes care of this formatting? After all, we don't actually need any of the range semantics....just an object with a begin and an end.

end
end
end
end

This comment has been minimized.

@myronmarston

myronmarston Jun 22, 2015

Member

This certainly works to cover the previously uncovered lines but I think I'd prefer to see them covered with a more integrated-style test (probably with the FakeRunner, though, for simplicity and faster speed). I see a couple specific benefits of a more integrated style test over this approach:

  • One of the big advantages I see from code coverage tools is that they can act as an automated dead code finder. Consider if you test this by having an example in coordinator_spec.rb that causes these events to be exercised, and asserts on the total output, rather than testing them directly here. In the future if we make changes that removes the need for these events, the code coverage will notify us these events are not covered and it'll serve it's purpose as a dead code finder. OTOH, when you test them directly, they will always be covered, even if we don't actually need or use them anymore.
  • Here you've tested that when the formatter receives the event with dependent_failures == false, it outputs "is not order-dependent"...but as far as I can see, you have no test that demonstrates that the event is ever sent with that value. By having a more integrated test you can easily cover both sides of the logic.

Obviously, I don't advocate always using a more integrated style (trade offs and all that) but it has some compelling advantages here, particular with the fact that we have the FakeRunner. Thoughts?

This comment has been minimized.

@urbanautomaton

urbanautomaton Jun 23, 2015

Contributor

Yep, very happy to put it in the coordinator spec instead.

@@ -1,3 +1,5 @@
require 'set'

This comment has been minimized.

@myronmarston

myronmarston Jun 22, 2015

Member

Do you mind using RSpec::Core::Set rather than stdlib Set?

We try hard to minimize how much of the stdlib we depend on (to limit situations where gem authors could have passing RSpec suites while forgetting to load a bit of stdlib they use) and defined RSpec::Core::Set specifically for situations where we want a set-like collection.

This comment has been minimized.

@JonRowe

This comment has been minimized.

@urbanautomaton

urbanautomaton Jun 23, 2015

Contributor

Sure, makes total sense. Will reimplement with arrays, I think (unless you're happy for me to start adding methods like #superset? to the substitute Set implementation...)

@urbanautomaton urbanautomaton force-pushed the urbanautomaton:recursive-bisect-strategy branch from 080da9d to c54fc0b Jun 23, 2015

)
end

it 'reduces a multiply-dependent failure' do

This comment has been minimized.

@myronmarston

myronmarston Jun 24, 2015

Member

Do you mean "multiple-dependent failure" instead of "multiply-dependent failure"?

This comment has been minimized.

@urbanautomaton

urbanautomaton Jun 24, 2015

Contributor

It's the adverb form (c.f. "singly") - happy to change it though if it's unusual in US English.

This comment has been minimized.

@myronmarston

myronmarston Jun 24, 2015

Member

Please do. I've never heard "multiply" used as anything other than a verb in the arithmetic sense. I assumed it was a typo and I think other US English speakers would think the same.

This comment has been minimized.

@JonRowe

JonRowe Jun 24, 2015

Member

I've never heard of singly either :P

This comment has been minimized.

@urbanautomaton

urbanautomaton Jun 24, 2015

Contributor

Gadzooks! All right then, away with it. :)

"Multiple-dependent" sounds odd to me in turn, though. How's it "reduces a failure with multiple dependencies"?

This comment has been minimized.

@JonRowe

@urbanautomaton urbanautomaton force-pushed the urbanautomaton:recursive-bisect-strategy branch 4 times, most recently from d235493 to ba8a16c Jun 26, 2015

@urbanautomaton

This comment has been minimized.

Copy link
Contributor

urbanautomaton commented Jun 26, 2015

Okay, thanks for all the latest feedback - I think I've covered it all:

Edit: there's a single JRuby failure which appears to be related to my changes (at least, it appears in a bisect-related spec), but is pretty baffling. Don't suppose anyone's seen something similar in the suite before?

@myronmarston

This comment has been minimized.

Copy link
Member

myronmarston commented Jun 26, 2015

Edit: there's a single JRuby failure which appears to be related to my changes (at least, it appears in a bisect-related spec), but is pretty baffling. Don't suppose anyone's seen something similar in the suite before?

We periodically get file descriptor errors like that in our builds for JRuby. I have no idea why, but it's always transient so we just kick the build and run it again. That's what I just did.

It would be nice to not have to do that but since it has to do with a JVM resource limit and not a logic bug in RSpec or our specs, I'm not sure what to do about it :(.

it 'detects an unminimisable failure in the minimum number of runs' do
counting_minimizer.find_minimal_repro

expect(counting_reporter.round_count).to eq(15)

This comment has been minimized.

@myronmarston

myronmarston Jun 26, 2015

Member

It's unclear from this example why 15 is the minimum number of runs. It kinda feels like a magic number right now. Can you add a comment (or update the docstring, if the explanation is short enough) explaining why 15 is the minimum?

|
|The minimal reproduction command is:
| rspec 2.rb[1:1]
EOS

This comment has been minimized.

@myronmarston

myronmarston Jun 26, 2015

Member

Not a merge blocker, but I much prefer if the lines with | are indented 2 extra spaces so they nested under the expect line. Structurally I think of them as being "child" lines rather than "sibling" lines to the expectation line so indenting them 2 extra spaces makes that clear. I like the EOS where it is.

This comment has been minimized.

@urbanautomaton

urbanautomaton Jun 26, 2015

Contributor

Oh! Yeah, totally agreed, not sure how I managed that. Vim paste-with-autoindent maybe.

@myronmarston

This comment has been minimized.

Copy link
Member

myronmarston commented Jun 26, 2015

Thanks, @urbanautomaton, this is fantastic!

I left two more comments that are cosmetic changes (formatting / comments). I'm completely happy (ecstatic, even) with this otherwise.

Do you want to rebase into a smaller number of commits before we merge? It's not strictly needed but given the back-and-forth and the fact that we're up to 26 commits it could help cleanup the history a bit.

@urbanautomaton urbanautomaton force-pushed the urbanautomaton:recursive-bisect-strategy branch from ba8a16c to 0a654a5 Jun 26, 2015

if start == finish
"example #{start}"
else
"examples #{[start, finish].join("-")}"

This comment has been minimized.

@myronmarston

myronmarston Jun 26, 2015

Member

This seems simpler as:

"examples #{start}-#{finish}"

...which should also be a bit faster -- it's fewer object allocations (no unnecessary array allocation or string allocation for "-") and fewer method calls (no join call).

Support multiple dependent failures in FakeRunner
To illustrate cases for which a recursive bisection strategy is faster than
subset enumeration, FakeRunner needs to support failures that are dependent on
multiple candidate examples.

@urbanautomaton urbanautomaton force-pushed the urbanautomaton:recursive-bisect-strategy branch from 8f58382 to fcaa214 Jun 28, 2015

urbanautomaton added some commits Jun 14, 2015

Test the number of rounds to 'reduce' pathological case
If an expected failure depends on a large number of the passing examples, the
subset enumeration strategy can take a large number of rounds to minimise the
run.

Add a test to count rounds in the worst case, where a failure depends on all
of the preceding examples. With 8 candidates preceding a failure, a recursive
strategy could return in 15 rounds (1 + 2 + 4 + 8).

@urbanautomaton urbanautomaton force-pushed the urbanautomaton:recursive-bisect-strategy branch 2 times, most recently from 7339f87 to cf1482a Jun 28, 2015

@urbanautomaton

This comment has been minimized.

Copy link
Contributor

urbanautomaton commented Jun 28, 2015

Okay, remaining bits of feedback addressed - I've rebased into a smaller number of commits, too. The rebase isn't perfect - because of all the back-and-forth I wasn't able to do it totally cleanly, so there are a couple of additions in 6a22644 that only make complete sense in the light of cf1482a (relating to formatter notifications). If you're happy with that then we could just leave it - alternatively I could just squash those commits together and have one big commit that lands both the algorithm and formatter modifications at once.

Thanks once again for all the feedback - the PR is so much better for it. I've always been appreciative of the hard work that goes into maintaining RSpec, but I don't think I've really understood just how much work that actually is. So, cheers. :)

@myronmarston

This comment has been minimized.

Copy link
Member

myronmarston commented Jun 29, 2015

Thanks, @urbanautomaton! The way you've squashed this looks great to me. It tells a very clear story, which is quite helpful.

I did notice one of the commit messages that I think is slightly inaccurate (in a way that can cause confusion), and if I'm correct, I'd like the message amended before we merge. I'm happy to take care of amending the message (or you can take care of it; I don't particularly care) but can you first confirm for me whether or not I'm correct about the message? I'm talking specifically about the message for 6a22644:

[✘] - expected failure (failing)
[✔︎] - expected failure (passing)
.   - uninvolved example
%   - culprit

Failing run:
          . % . . . . % . [✘]

Divide candidates in two and test:
                  . . % . [✔︎]
          . % . .         [✔︎]
Both sides implicated

Recurse on LHS with RHS fixed
              . . . . % . [✔︎]
          . %     . . % . [✘]
            %     . . % . [✘]
 Identified ^

Recurse on RHS with LHS result
            %         % . [✘]
            %     . .     [✔︎]
            %           . [✔︎]
            %         %   [✘]
           Identified ^

Both sides returned, final results:
            %         %   [✘]

Specifically, these lines:

Recurse on RHS with LHS result
            %         % . [✘]
            %     . .     [✔︎]

It shows that it first discovers on that the failure happens with using the RH quarter of the RHS...and then for some reason it proceeds to also try the LH quarter of the RHS, when it shouldn't need to. Given that you use Enumerable#find to find test the two quarters, it should abort as soon as it finds a quarter that causes the expected failures so it wouldn't actually try the 2nd quarter, right?

Actually, the find is run against [lhs, rhs] so really I think that should be:

Recurse on RHS with LHS result
            %     . .     [✔︎]
            %         % . [✘]

...where the LH quarter of the RHS is tested before the RH quarter of the RHS.

Not trying to be anal here but the usefulness of the graphical diagram (which is really, really great, BTW!) depends on it being accurate or else it causes confusion so I think it's worth getting right. Can you confirm if my understanding is correct?

@urbanautomaton urbanautomaton force-pushed the urbanautomaton:recursive-bisect-strategy branch from cf1482a to 17087bb Jun 29, 2015

urbanautomaton added some commits Jun 14, 2015

Use recursive strategy for bisection
To speed up bisection of complex dependent failures, a recursive bisection
strategy is introduced in place of permutation searching. The strategy works
as follows:

1. Split the candidate set into two slices
2. Test each slice with the expected failure(s)
3. If either slice can be ignored:
  - recurse on the other slice
4. If neither slice can be ignored, for each slice:
  - recurse on slice, retaining other slice for test runs

This is easier to illustrate graphically - shown below is a case where a
single expected failure depends on two preceding non-failing examples:

    [✘] - expected failure (failing)
    [✔︎] - expected failure (passing)
    .   - uninvolved example
    %   - culprit

    Failing run:
              . % . . . . % . [✘]

    Divide candidates in two and test:
                      . . % . [✔︎]
              . % . .         [✔︎]
    Both sides implicated

    Recurse on LHS with RHS fixed
                  . . . . % . [✔︎]
              . %     . . % . [✘]
                %     . . % . [✘]
     Identified ^

    Recurse on RHS with LHS result
                %         % . [✘]
                %           . [✔︎]
                %         %   [✘]
               Identified ^

    Both sides returned, final results:
                %         %   [✘]
Basic recursive bisect formatter
This modifies BisectProgressFormatter to more closely model the recursive
bisection algorithm, fixing the associated BisectCoordinator specs.

@urbanautomaton urbanautomaton force-pushed the urbanautomaton:recursive-bisect-strategy branch from 17087bb to 5a2f476 Jun 29, 2015

@urbanautomaton

This comment has been minimized.

Copy link
Contributor

urbanautomaton commented Jun 29, 2015

Hmm - I think you're right about the former, but not the latter. The find runs on the lhs first, but the block tests whether the failure exists without the tested side:

ids_to_ignore, duration = track_duration do
  [lhs, rhs].find do |ids|
    get_expected_failures_for?(remaining_ids - ids)
  end
end

So I think your first version is correct:

Edit: sorry, I keep trying to reply in too little time. So I think the following is correct:

Recurse on RHS with LHS result
                %         % . [✘]
                %           . [✔︎]
                %         %   [✘]

and have amended the commit accordingly.

myronmarston added a commit that referenced this pull request Jun 29, 2015

@myronmarston myronmarston merged commit 86c4fd4 into rspec:master Jun 29, 2015

2 checks passed

continuous-integration/appveyor AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@myronmarston

This comment has been minimized.

Copy link
Member

myronmarston commented Jun 29, 2015

👍 Merged. Thanks again!

myronmarston added a commit that referenced this pull request Jun 29, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment