Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poll for actions in structural equality loop. #12039

Closed
wants to merge 4 commits into from

Conversation

eutro
Copy link
Contributor

@eutro eutro commented Feb 24, 2023

Fixes #3921.

This PR updates the implementation of structural equality in compare.c to periodically poll for actions (outstanding signals, garbage collection interrupts), since this is an unbounded loop.

This ensures, for example, that the following comparison:

let rec x = 1 :: x
in x = x

can appropriately be interrupted with Ctrl-C at the top level.

It is possible that handling actions does not throw an exception, this happens in the case of a gc interrupt, or in the case that the signal handler (such as one registered by a user) does not raise an exception. It is also possible that a major garbage collection occurs:

Sys.set_signal
  Sys.sigusr1
  (Sys.Signal_handle
    (fun _sb -> 
      print_endline "Full major";
      Gc.full_major ();
      ()));;

Thus, the structural equality loop pessimistically saves the two compared values across the action handling, and restarts from the beginning if no exception was thrown. This is necessary for correctness when the comparison is not a loop, but just a very large tree.

# let make n x = List.init n (fun _ -> x);;
val make : int -> 'a -> 'a list = <fun>
# let ll = make 100 (make 100 (make 100 (make 100 (make 10 1))));;
val ll : int list list list list list = ...
# ll = ll;;
(* must be true, even with interrupts *)
# (ll, 0) = (ll, 0);;
(* must be true *)
# (ll, 0) = (ll, 1);;
(* must be false *)

Automated tests for this would likely be finnicky and unreliable, but please let me know if and how I should write them.

@gasche
Copy link
Member

gasche commented Feb 24, 2023

If I understand correctly, after an interrupt the PR aborts the comparison and restarts from the initial value. If we wanted to instead continue comparing the values on the current "compare stack" (a worklist of pairs of values to compare), we would be in trouble if a GC had moved some values around or freed them.

Naive question: why don't we make the compare stack a GC root, so that its elements remain alive during GCs? Is the worry that there would be a performance cost there? (Is it not clear to me that there would.) Or maybe just this was felt more complex to implement?

@eutro
Copy link
Contributor Author

eutro commented Feb 24, 2023

That's right, we restart from the initial values because of the GC.

Discussing with @stedolan, the biggest issue with marking the whole stack as a GC root is with the extra complexity it introduces. Restarting the comparison like this is much easier to verify, and should occur rarely enough that the performance penalty of restarting should not be observable by any reasonably written program.

@gasche
Copy link
Member

gasche commented Feb 24, 2023

One worry with restarting from the start is that it introduces a risk of non-termination (or at least arbitrary delays). You could be unlucky and compare two large values that take more time to compare than the time interval between two pending action notifications (maybe there is another domain that allocates a lot in the background), and you keep restarting indefinitely and never make progress.

@gasche
Copy link
Member

gasche commented Feb 24, 2023

This being said: the current PR has an attractive feature/weight ratio: it is simpler than registering the compare stack as a root, and it already provides what is arguably a nice improvement over the current state. I think it could make sense to review an implementation along the current lines first, consider merging, and then think about keeping the stack alive as a second step.

@gasche
Copy link
Member

gasche commented Feb 24, 2023

If I understand the comparison code correctly, comparing two constructor blocks K(v1, v2, v3) and K(w1, w2, w3) will push ([v2; v3], [w2; w3]) on the compare stack and jump directly to comparing v1 and w1. Your code inserts a poll whenever popping items from the compare stack. Unless I am missing something, there is no poll before comparing the first arguments, so we may in fact diverge without polling.

type nat = Z | S of nat
let rec infty = S infty
let () = ignore (infty = infty)

Other sources of large delays are the cases String_tag and Double_array_tag, which could run an inner loop for an arbitrary long time without polling.

A natural fix with the current implementation (that never returns from the poll) would be to use if (++signal_poll_timer >= limit) goto poll in all polling places (including: before the continue on first constructor arguments, and in each iteration of the Double_array_tag loop), with a shared poll: label that does the polling. (String_tag is more annoying as it does a memcmp, one should either leave it alone or split it in chunks.)

Note: instead of doing signal_poll_timer += 1 for each argument of a block, it may be possible to use signal_poll_timer += size for each block. This probably does not make much of a performance difference.

@eutro
Copy link
Contributor Author

eutro commented Feb 25, 2023

I've implemented polling for strings and double arrays in the commit above, but I don't think it's a particularly good idea, the extra returns increased the code size by a bit (sorry, I got some numbers here horribly wrong). I think it would be best for it to just churn through these and poll for signals before continuing. String and array size are bounded by memory anyway. Not to mention that the code is not pretty.

@eutro
Copy link
Contributor Author

eutro commented Feb 25, 2023

Actually I was mistaken, as long as the compiler behaves itself these changes should be fine performance-wise. Hopefully they are correct.

@eutro
Copy link
Contributor Author

eutro commented Feb 25, 2023

Regarding the signal_poll_timer += size suggestion: I don't think it matters much, any argument to a large arity constructor will either pop immediately (if it's a leaf), or itself also push (if it's another branch), either way the timer gets incremented.

@gasche
Copy link
Member

gasche commented Feb 25, 2023

I don't think it matters much, any argument to a large arity constructor will either pop immediately (if it's a leaf), or itself also push (if it's another branch), either way the timer gets incremented.

My argument is that if you increment-and-check once for each block header, instead of once for each block argument, you do it less often. But I agree that it does not matter much. (Most blocks have few arguments anyway, and the increment-and-check instructions are not costly.)

if (res < 0) return LESS;
if (res > 0) return GREATER;
if (last_block) break;
COMPARE_DO_POLL();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect COMPARE_INC_POLL_TIMER here, not COMPARE_DO_POLL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was going to be a COMPARE_INC_POLL_TIMER(COMPARE_SIGNAL_POLL_PERIOD) since that's how many bytes were checked, but of course then the comparison would always hold. I've changed it to that, since it makes intention clearer, and the compiler should be able to optimise it away.

signal_poll_timer = 0; \
if (compare_poll_actions(&sp, stk, &orig1, &orig2, &v1, &v2)) \
goto loop_start; \
} while (0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be just the source of the poll: label, with COMPARE_INC_POLL_TIMER doing a goto loop, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also: I'm not too excited by macros being defined at the toplevel that contains bound occurrences of names that are not currently in scope. I made suggestions that I think would let us get rid of COMPARE_DO_POLL, and I would rather write the code corresponding to COMPARE_IN_POLL_TIMER directly at each callsite, or possibly defining the macro within the body of the do_compare_val function.)

Copy link
Contributor Author

@eutro eutro Feb 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be just the source of the poll: label, with COMPARE_INC_POLL_TIMER doing a goto loop, right?

That's not quite true, we need it to carry on from where it left off if the poll finds nothing while comparing strings or double arrays. I could make COMPARE_INC_POLL_TIMER expand to an expression instead, and add goto loop_starts and continues explicitly at the call site to make intent clearer:

static intnat do_compare_val(/*...*/) {
#define COMPARE_INC_POLL_TIMER(inc)                                     \
  (((signal_poll_timer += (inc)) >= COMPARE_SIGNAL_POLL_PERIOD) &&      \
   (signal_poll_timer = 0,                                              \
    compare_poll_actions(&sp, stk, &orig1, &orig2, &v1, &v2)))
// ...
  if (COMPARE_INC_POLL_TIMER(1)) goto loop_start;
// ...
}

Or of course inline it at call sites, at the cost of some code duplication.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah of course, I was missing the fact that we return normally in absence of pending actions. Sorry for the noise.

@gasche
Copy link
Member

gasche commented Feb 25, 2023

The new code is substantially more complex than the previous version. I think it would be nice if we could get back to a simpler version, possibly by giving up on polls in string comparisons.

runtime/compare.c Outdated Show resolved Hide resolved
runtime/compare.c Outdated Show resolved Hide resolved
{
value o1 = *orig1, o2 = *orig2;
if (caml_check_pending_actions()) {
compare_free_stack(stk);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

caml_check_pending_actions() can return false positives. It is considered that spurious polls are cheap, which simplifies various aspects of the implementation. One worry of backtracking pessimistically is that spurious polls are no longer necessarily cheap. It should be possible to avoid this with an internal process_pending_actions variant that returns whether something has actually been done; I will have to look further into this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may complicate things too much, since we'd have to free the stack after a possible garbage collection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clearer, I have in mind doing the following: 7c104c5

However, the alternate path of scanning the compare stack would let one avoid doing this altogether.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Your commit looks like it should update the implementation of caml_do_pending_actions_exn but does not.)

The change to caml_do_pending_actions looks like it won't be necessary for the present PR, but could be useful for other purposes in the future, so maybe it could be an independent PR of its own.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, it's not finished indeed. At least you get the idea. I do not have time to finish it right now but I can do it if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to caml_do_pending_actions looks like it won't be necessary for the present PR

Review-wise, are we waiting to see where #12053 is headed, or is the plan to review and merge this one (with restarting) independently?

@gadmm
Copy link
Contributor

gadmm commented Feb 25, 2023

The new code is substantially more complex than the previous version. I think it would be nice if we could get back to a simpler version, possibly by giving up on polls in string comparisons.

I quite like the approach taken and do not find it too complex. Let me some time and I can review it.

@gasche
Copy link
Member

gasche commented Feb 25, 2023

Sure. I would be interested in your opinion of whether we should try to ensure that the compare stack gets scanned by the GC, instead of restarting.

@xavierleroy
Copy link
Contributor

I haven't read the code, but just to point out the obvious: if comparison restarts from zero at every interrupt, it means that with big enough data structures and frequent enough interrupts, comparison will never terminate, right?

For this reason I'm not in favor of checking interrupts while comparing strings : string comparison will terminate eventually, and rather quickly even for very big strings.

The most problematic case is comparison of cyclic data structures, which never terminates. In this case, being interruptible is important. No so in cases where termination is guaranteed.

As to giving the comparison stack to the GC : that's not possible with the current format of these stacks, which contain pointers inside heap objects, so at the minimum a different format should be used. Then it might be possible to use one of the GC's hooks to add a scanning function for comparison stacks.

@gasche
Copy link
Member

gasche commented Feb 25, 2023

As to giving the comparison stack to the GC : that's not possible with the current format of these stacks, which contain pointers inside heap objects, so at the minimum a different format should be used.

Good point. Could we just store the two containing blocks and an offset?

@xavierleroy
Copy link
Contributor

Good point. Could we just store the two containing blocks and an offset?

Yes, but you might also need the length (unless it's cheap enough to recompute it from the blocks?), so you now have 4 words per stack entry instead of 3.

@gasche
Copy link
Member

gasche commented Mar 21, 2023

I propose a draft version that does register the compare stack as a root block before servicing any interrupt request in #12128. My hope is that we can agree on the design and let @eutro integrate the code in the present PR.

@gasche
Copy link
Member

gasche commented Apr 8, 2023

In the end we did a lot of polish to #12128 and merged it. I am not sure whether @eutro wants to push a bit more refining the design, or stop there (in which case the current PR can be closed). The avenue I see for potential refinement -- it was done in this PR but not in #12128 -- is to make sure that very large blocks increase the polling pace.
It would even be nice to poll in the middle of very large strings or arrays, but doing this well is not obvious, one would need to duplicate the polling code or to have dedicated state to remember that we are in the middle of an a large object -- just the index at which to start could be used, and reset at 0 when leaving the block.

@eutro, I think you should decide whether you want to go further with this.

@eutro
Copy link
Contributor Author

eutro commented Apr 8, 2023

I think I'll close this PR since the main intention (of polling within a finite time) has been superseded. I may open a PR in the future to address the outstanding possible refinements.

@eutro eutro closed this Apr 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

structural equality for cyclic data structure cannot be interrupted
4 participants