-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Dynlink startup to avoid parsing bytecode sections twice #12599
Conversation
I haven't reviewed in details yet, but I have the impression that this PR is increasing the memory requirements and start-up times of all bytecode programs, not just those that use Dynlink. Is this correct? |
I don't think it adds to start-up time - the primitive table and DLLs were always processed before, but there is an increase in memory because those structures are then kept. |
I tried benchmarking I was not able to measure any slowdown. Slightly more work is being done: some extra binary sections are being read. They aren't parsed until Dynlink is initialised, though, and reading some bytes from a file into a buffer does not take long. Before and after this patch, Memory usage goes up by about 20k. This represents a ~0.2% increase in the memory use of |
a964f8c
to
55675db
Compare
Refactor Dynlink startup to avoid parsing bytecode sections twice. This removes the dependency from Symtable->Bytesections, because now Dynlink and toplevel startup can ask the runtime for the bytecode sections that were parsed at startup time, rather than re-parsing them in OCaml.
Refactor Dynlink startup to avoid parsing bytecode sections twice. This removes the dependency from Symtable->Bytesections, because now Dynlink and toplevel startup can ask the runtime for the bytecode sections that were parsed at startup time, rather than re-parsing them in OCaml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good modulo a few suggestions, mostly comments that need updating.
runtime/caml/dynlink.h
Outdated
extern void caml_build_primitive_table(char_os * lib_path, | ||
char_os * libs, | ||
char * req_prims); | ||
extern void caml_init_dynlink(char_os * lib_path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should update the comment above this declaration.
runtime/dynlink.c
Outdated
/* Build the table of primitives, given a search path and a list | ||
of shared libraries (both 0-separated in a char array). | ||
Abort the runtime system on error. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also need to update this comment.
runtime/startup_byt.c
Outdated
req_prims = read_section(fd, &trail, "PRIM", NULL); | ||
symb_section = read_section(fd, &trail, "SYMB", &symb_section_len); | ||
crcs_section = read_section(fd, &trail, "CRCS", &crcs_section_len); | ||
if (req_prims == NULL) caml_fatal_error("no PRIM section"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check for req_prims == NULL
should stay next to the assignment to req_prims
. (diff doesn't do a good job on this one)
req_prims = read_section(fd, &trail, "PRIM", NULL); | |
symb_section = read_section(fd, &trail, "SYMB", &symb_section_len); | |
crcs_section = read_section(fd, &trail, "CRCS", &crcs_section_len); | |
if (req_prims == NULL) caml_fatal_error("no PRIM section"); | |
req_prims = read_section(fd, &trail, "PRIM", NULL); | |
if (req_prims == NULL) caml_fatal_error("no PRIM section"); | |
symb_section = read_section(fd, &trail, "SYMB", &symb_section_len); | |
crcs_section = read_section(fd, &trail, "CRCS", &crcs_section_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming I didn't mess something else up, at the moment this breaks -output-complete-exe
and so forth subtly. My test is:
let () = Dynlink.allow_unsafe_modules true in
let () = Dynlink.loadfile (Filename.concat Config.standard_library "unix/unix.cma") in
print_endline "Hello, pointless world!"
and compile with -output-complete-exe
. It will fail to find dllunixbyt.so
This issue is that ld.conf
and CAML_LD_LIBRARY_PATH
are not inspected in the image-as-data path, and they now need to be - this will thread through as the field for dlpt
.
It's imperfect, given that at the moment the Symtable.init_toplevel
API model doesn't allow a program which doesn't use Dynlink to release memory, but it seems a fairly easy win to release all the memory held over at bytecode startup at the end of the first call to the primitive?
I apologize for this, but I'm still stuck at my question of 3 weeks ago:
to which @dra27 replied
DLPT, DLLS and PRIM sections were always processed before, but now every bytecode program is also reading and keeping in memory SYMB and CRCS sections, which are probably bigger. I'm not saying this is a show-stopper, and @stedolan's quick measurements suggest it is not, but let's present the facts accurately. For the same reasons, @stedolan's claim that
is not true, since the runtime doesn't need nor gather SYMB and CRCS sections, these are only for Dynlink and for the toplevel. Going back to the premises of this PR: what is wrong with reading sections off the bytecode executable in Dynlink and the toplevel? |
@stedolan's exposition does say most of which, not all; inaccuracies in my first assessment were covered by I don't think (which apparently I hadn't done enough of). The overall premise is to reduce the things which both ocaml/ocamlc and dynlink have to do in #11996, as that either increases code duplication between For the extra memory and processing, there's a reasonably straightforward approach which I hinted at in #12599 (comment). In making the bootstrap repeatable in #11149, I added the stripping of the The presence of a CRCS section can then be used by Footnotes
|
Maybe my message came out as harsher than I wanted it to be, sorry about that. Still, I remain more comfortable with the current approach, where Dynlink reads (possibly again) the bytecode sections it needs off the bytecode executable file, without interfering with bytecode program startup, which is quite complicated already. I think there are ways to keep doing this while cutting the dependency on compiler-libs, e.g. by duplicating the (small) OCaml code that reads bytecode sections, or by adding a C primitive to runtime/meta.c that wraps caml_read_section_descriptors / caml_seek_section /read_section, or by other ways to be discussed later. |
Ah, OK - the |
There are four sections that Dynlink needs the contents of:
The runtime requires PRIM and DLPT, regardless of whether Dynlink is in use. (I was confused on this point before because the logic that parses these lives in There are various ways to get this information to Dynlink. (I don't claim the list here is complete: if I'm missing someone's preferred option, say so):
I don't like the current option (1) because it is one of the few remaining things keeping dynlink_compilerlibs around. @xavierleroy doesn't like option (2) because it keeps the contents of SYMB and CRCS around pointlessly in programs not using Dynlink. I'm happy with anything that's not (1), but I'd now prefer (3): that way only Dynlink loads the SYMB and CRCS sections, but it doesn't have to duplicate the bytecode table-of-contents logic nor (worse) the ld.conf parsing. @xavierleroy does that seem reasonable to you? |
I didn't think of option (3) before, but it looks good to me, and a nice way to move forward on this PR. For what it's worth, I had a look at (4) but didn't go very far yet. The code to read sections off bytecode executable files is rather simple and easy to maintain: we have an implementation in Perl (!) in tools/ocamlsize that had its last nontrivial change in 2000 (when the "sections" mechanism was introduced)... @dra27 mentioned yet another alternative: (5) export some of the C code that reads TOC and sections off bytecode files as OCaml primitives, and use them to read sections in byte/dynlink.ml. I didn't look into this yet. |
8ad439e
to
cd4196b
Compare
@xavierleroy I've updated the code to approach, eh, (3.5): it reopens the bytecode executable and re-parses the section table (using the nice C API already present in the runtime), but keeps the primitive table and search path as previously computed by the runtime. (I still need to address @dra27 's comments about |
This is done now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The -output-complete-...
stuff indeed now works. I have thoroughly (hopefully!) re-reviewed the logic. A few minor suggestions, and a tedious tweak needed to list_of_ext_table
to cope with UCS-2 strings on Windows (that I should have spotted the first time round).
This removes the dependency from Symtable->Bytesections, because now Dynlink and toplevel startup can ask the runtime for the bytecode sections that were parsed at startup time, rather than re-parsing them in OCaml.
c8c781c
to
205ed09
Compare
Thanks for the review @dra27. Just pushed an updated version. I ended up deleting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thank you! Marking as approved to clear my "requesting changes", but this shouldn't be merged until the bootstrap commit is re-done.
Unfortunately, you've been stung on the rebase and the bootstrap isn't repeatable: boot/ocamlrun boot/ocamlc -vnum
reports 5.2.0+dev0-2023-04-11
where it ought to be reporting 5.3.0+dev0-2023-12-22
(I expect you didn't re-run configure
after the rebase; a fact which really the build system ought to complain about).
The bootstrap, as now, needs to be in a separate commit from the main change - I guess you're planning on squashing the 5 commits together and then the 6th commit?
205ed09
to
86ac518
Compare
From a side-channel discussion with @stedolan - the bootstrap part has been removed from here, so this PR can be squash-merged once CI catches up; the removal of the old primitive can then be put more cleanly in a separate PR, which is slightly less awkward. |
Dynlink startup currently re-parses the bytecode executable (via Symtable) to extract various bits of information, most of which the runtime has already gathered. This PR makes the runtime hang on to this information, and provide it directly to Dynlink startup. (The dependency between Symtable and bytecode parsing (Bytesections) caused some annoyance for @shindere in #11996, and is removed here)
For reviewers: Start by reading
bytecomp/symtable.ml
- the simplification ofinit_toplevel
is the point of this patch.(thanks @dra27 for help figuring out interactions with
-output-complete-obj
and other unusual cases of bytecode loading)