Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocamldebug became very slow #9606

Closed
garrigue opened this issue May 26, 2020 · 28 comments
Closed

ocamldebug became very slow #9606

garrigue opened this issue May 26, 2020 · 28 comments

Comments

@garrigue
Copy link
Contributor

I use frequently ocamldebug to debug ocamlc itself, and recently it became very slow.
I have to wait more than 10 seconds for things that took less than 1 second (in my qualitative memory...)

I am using trunk (4.12) on MacOSX Mojave.

@xavierleroy
Copy link
Contributor

Could you post a script that reproduces the observed slowness? I.e. an operation that takes > 10 seconds. Reports without a repro case are unusable.

@xavierleroy
Copy link
Contributor

(Clarification: by "script" I mean at least a transcript of an ocamldebug session. Not necessarily a fully automated shell script, even though that would be nice too.)

@garrigue
Copy link
Contributor Author

garrigue commented May 26, 2020

Just one line is enough:

echo run | env TERM=dumb time ./debugger/ocamldebug ./ocamlc -I stdlib -I utils -I parsing -I typing -c testsuite/tests/typing-gadts/pr5332.ml

Up to 4.09 this takes less than 1s to run and exit, but since 4.10 this takes about 30s of wall time (but only 0.7s of user time, and 0.1s of system time, which seem both unchanged).

Quite mysterious to me. This looks like the process is waiting a lot, doesn't it ?

@gasche
Copy link
Member

gasche commented May 26, 2020

This looks like something that one can use git bisect for, to determine the change that introduced the slowdown between 4.09 and 4.10.

@stedolan
Copy link
Contributor

It appears that the debugger is creating ~71k breakpoints using set_event, causing the debugged program to spend a long time in find_breakpoint (runtime/debugger.c). I'm not sure why it's creating the breakpoints, though.

@gasche
Copy link
Member

gasche commented May 26, 2020

Maybe the feature we merged to allow debugging of dynlinked code?

@gasche
Copy link
Member

gasche commented May 26, 2020

@stedolan
Copy link
Contributor

#8654 introduced the O(n) find_breakpoint function, so perhaps ocamldebug was always creating 71k breakpoints, but that used to be fast?

@xavierleroy
Copy link
Contributor

#8654 introduced the O(n) find_breakpoint function,

But n = 1 here, no? For a bytecode executable that does not use dynamic linking, there is only one code fragment.

so perhaps ocamldebug was always creating 71k breakpoints, but that used to be fast?

That sounds expensive, and I can't remember the original ocamldebug doing this.

@gasche
Copy link
Member

gasche commented May 27, 2020

If I read the code correctly, code fragments are indexed in an extensible table caml_code_fragments_table (and debugger events give the fragment index frag, so we have O(1) access in all cases), but "breakpoint saved instructions" used to be saved in a fixed table caml_saved_code[pos] and are now a in linear-searched bag breakpoints_table. When there are lots of breakpoints it would be much more efficient to go back to constant-time indexing for breakpoints, by having an extensible table that exactly mirrors the structure of the code-fragment table (same frag indices), and contains a pc-indexed table of breakpoints for each code fragment.

@gasche
Copy link
Member

gasche commented May 27, 2020

(If I understand the code correctly, set_event does not set a breakpoint, but it is caused by loading debug events. Creating 71K breakpoints would of course be wrong, but having 71K debug events is reasonable. I don't understand yet why we need the saved-code mechanism in this case (it's odd that debug event would be represented by overwriting instructions?), but it looks like the dynlink patch uses the same "breakpoint structure" logic in both cases, which is sensible if we do need to save the overwritten instruction here.)

@xavierleroy
Copy link
Contributor

Creating 71K breakpoints would of course be wrong, but having 71K debug events is reasonable. I don't understand yet why we need the saved-code mechanism in this case

Indeed, debug events are replaced by a special EVENT instruction, whose purpose is (among others) to decrease caml_event_count and do stuff like checkpointing or stopping execution when it reaches zero.

"breakpoint saved instructions" used to be saved in a fixed table caml_saved_code[pos] and are now a in linear-searched bag breakpoints_table.

This is atrocious, indeed. Someone failed algorithms 101.

@gasche
Copy link
Member

gasche commented May 27, 2020

It's probably more of a case of someone not realizing what the complexity constraints/requirement were for this particular piece of code. (In particular, it is natural to assume that there are few breakpoints set during a debugging session.)

@stedolan
Copy link
Contributor

(In particular, it is natural to assume that there are few breakpoints set during a debugging session.)

Yes, this surprised me also, to the point that I assumed the problem was in ocamldebug. (If I understand correctly now, what the C code calls "breakpoint" includes all debug events, regardless of whether the user has set breakpoints or not).

@bluddy
Copy link
Contributor

bluddy commented May 27, 2020

Would be good to document this with the fix so it doesn't happen again.

@gasche
Copy link
Member

gasche commented May 27, 2020

@bluddy would you be interested in submitting a fix for this issue? I would expect that moving from a linear-search bag to a (frag+pc)-indexed table for "saved instructions" is not very hard (I was considering giving it a go myself).

One would have to look at how the code-fragment table evolves (only in the bytecode runtime). Either we have to evolve the new table at the same time (this may require factorizing the changes to the code-fragment table into functions?), or maybe it is enough to create new entries in the saved-instruction table on-demand, on the first attempt to save an instruction in this fragment. (If the code-fragments table can shrink by removing elements, then the on-demand strategy would not be good, as we risk keeping those saved-instruction fragments indefinitely and having a leak.)

@xavierleroy
Copy link
Contributor

I think we can rescue the current implementation by using binary search and sorting the array on demand, taking advantage of the fact that we have sequences of insertions followed by long sequences of searches, so the cost of sorting is amortized.

@bluddy
Copy link
Contributor

bluddy commented May 28, 2020

I'll give it a shot @gasche. I'll try @xavierleroy's simple suggestion first.

@jhjourdan
Copy link
Contributor

I'm sorry to come after the battle, but I don't think @xavierleroy's idea is the way to go, for two reasons:

1- It will require to add an instruction to the interface between the debugger UI and the interpreter. This is not nice, because it will also require enforcing an additional invariant that the sorting instruction is emitted before any use of the table.

2- Unloading a code fragment will still require linear scans in the table, and thus a quadratic behavior in the worst case.

I think we should either use one dynamically allocated array for each fragment or a hash table.

@gasche
Copy link
Member

gasche commented May 28, 2020

I'm worried that the sorting approach could give code that is less clear, in the final version, than a copy-of-fragments approach (in addition to having a less interesting / more complex performance profile). But I guess that it is worth giving it a shot.

@gasche
Copy link
Member

gasche commented May 28, 2020

@jhjourdan I don'tt think we would need a sorting instruction. We could have a flag to remember if the array is sorted, set it when we sort and unset it when we add an element. But I agree that this is yet another piece of state to keep around.

@xavierleroy
Copy link
Contributor

We could have a flag to remember if the array is sorted, set it when we sort and unset it when we add an element.

Yes, I was thinking along these lines. But it is tricky to know when it is a good time to sort.

If nothing clever is proposed, I'll offer something based on the skip list library we already have (in globroots.c).

xavierleroy added a commit to xavierleroy/ocaml that referenced this issue Jun 4, 2020
…ebugger

This avoids the quadratic behaviors of the previous array-based
implementation.

Closes: ocaml#9606
xavierleroy added a commit to xavierleroy/ocaml that referenced this issue Jun 4, 2020
…ebugger

This avoids the quadratic behaviors of the previous array-based
implementation.

Closes: ocaml#9606
@xavierleroy
Copy link
Contributor

xavierleroy commented Jun 4, 2020

I'll offer something based on the skip list library we already have (in globroots.c).

See PR#9635. @garrigue 's repro (#9606 (comment)) takes 0.6 seconds on my machine.

xavierleroy added a commit to xavierleroy/ocaml that referenced this issue Jun 4, 2020
…ebugger

This avoids the quadratic behaviors of the previous array-based
implementation.

Closes: ocaml#9606
@jhjourdan
Copy link
Contributor

I note that Coq developpers asked for #8654 with tears, but apparently did not even tried it, since they would have noticed the bug. (That's not without having noticed them in #8654 (comment).).

Cc @ppedrot, @maximedenes

@gasche
Copy link
Member

gasche commented Jun 8, 2020

To play the "good cop" for once: let me speculate that traditional ocamldebug-users (among Coq developers) are using older OCaml version for development, using Emilio's static-linking workaround to debug plugins, and they haven't yet adapted their workflows to the new feature (... or, apparently, tried ocamldebug under OCaml 4.10).

This makes sense in the wider context: 4.10 has only been out for a bit more than three months, and the first report of this issue was only 13 days ago.

@ppedrot
Copy link
Contributor

ppedrot commented Jun 8, 2020

I confirm @gasche's speculations: the most recent OCaml version I personally use for daily Coq development is 4.08. Furthermore, Coq was made compatible with OCaml 4.10 only quite recently.

@gasche
Copy link
Member

gasche commented Jun 8, 2020

You miss all our good new stuff and also our bad new stuff.

xavierleroy added a commit that referenced this issue Jun 8, 2020
Introduce a library of skip lists and use them to fix a performance issue in the debugger (issue #9606)
@garrigue
Copy link
Contributor Author

garrigue commented Jun 9, 2020

@xavierleroy Thanks !

antalsz pushed a commit to antalsz/ocaml that referenced this issue Sep 22, 2020
Introduce a library of skip lists and use them to fix a performance issue in the debugger (issue ocaml#9606)
antalsz pushed a commit to antalsz/ocaml that referenced this issue Sep 22, 2020
Introduce a library of skip lists and use them to fix a performance issue in the debugger (issue ocaml#9606)
gasche pushed a commit that referenced this issue Jan 28, 2021
Introduce a library of skip lists and use them to fix a performance issue in the debugger (issue #9606)

(cherry picked from commit c61fc39)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants