Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add padding before and after the atom table #9128

Merged
merged 3 commits into from
Jan 7, 2020

Conversation

jhjourdan
Copy link
Contributor

We need to make sure it does not share a page with code, otherwise the GC and the polymorphic hash and comparison functions will follow code pointers and potentially trigger a segfault.

This seems to be the cause of the segfaults that we observed recently on Alpine Linux. It seems like the linker and musl conspired by putting some bytecode next to the atom table. Then, the polymorphic hash function decided to follow the code pointer of a closure, which did not point to any valid block, finally triggering the segfault.

Even though this PR fixes the issue of Alpine, I am still slightly concerned by a similar issue : in native mode, static data is not surrounded by similar padding, so that, in theory, the OS could decide to map the static data in the same page as some code. I know most OSes would typically use different pages for code and data in order to give the different permissions, but there is no guarantee here. Also, when not in the no-naked-pointers mode, it is possible that the heap contains a pointer to some non-heap data in a page registered as a value page.

So, there is a possible extension to this fix: add padding before and after any static data segment in native mode. Of course, this means we increase by 8kB the size of the binary generated for any OCaml compilation unit...

@jhjourdan
Copy link
Contributor Author

Of course, this is a fix for a bug, which we may consider critical since it can cause segmentation faults. So we probably want to backport it to the versions of OCaml for which we consider releasing a bugfix release sometimes.

@gasche
Copy link
Member

gasche commented Nov 20, 2019

Backporting: yes! Alpine linux is heavily used for virtual machines and CI purpose, so I think a backport in 4.08 in addition to active versions makes sense. But first we need a review and approval.

@xavierleroy
Copy link
Contributor

Thanks for the detective work on the Alpine Linux crashes.

An alternate, perhaps simpler fix would be to allocate the atom table dynamically, using caml_stat_alloc_aligned_noexc. Accessing the atom table is not performance-critical, so an extra indirection would not hurt.

For native code, I'm confident that (read-only) code and (read-write) data will never end up in the same page. That might not be the case for readonly data, which is why we should not put initialized data in readonly areas until the page table goes away.

@stedolan
Copy link
Contributor

Nicely found!

The patch looks good, but when reading it I think I saw an off-by-one error in startup_aux.c that this patch doesn't fix. During startup, we do:

caml_page_table_add(In_static_data,
                    caml_atom_table, caml_atom_table + 256)

This adds &caml_atom_table[0] through &caml_atom_table[255] (inclusive) to the page table. However, since these are headers and object pointers point after headers, the actual value of Atom(255) is &caml_atom_table[256] (i.e. Val_hp (&caml_atom_table[255])), which isn't necessarily added to the page table if the atom table happens to end on a page boundary.

@gasche
Copy link
Member

gasche commented Nov 20, 2019

I was a bit surprised by @xavierleroy's comment that the atom table is not performance-critical, because I remembered seeing Atom(0) in the runtime in places that should go fast. In fact the comment is (of course!) fairly believable: it's important that Atom(n) does not allocate, but otherwise adding an extra indirection is probably not going to impact the observable performance of most runtime functions using Atom (typically to create an empty array in case where array-creation functions are called with size 0).

I noticed two specific use-cases that one could try to exploit to generate contradictory micro-benchmarks:

  • Atom(String_tag) is used when unmarshalling empty strings, and there are probably real-world workloads that receive marshalled values with a lot of empty strings.
  • Atom(0) is used to create a backtrace or callstack when the runtime determines that the available backtrace has size 0; I think this may happen in situations where users have "disabled" backtrace or callstack collection, and expect that calling these functions results in minimal overhead. But I guess that the time spent following the extra indirection is negligible compared to the cost of calling the runtime in this situation, so I don't expect the change to be an issue there.

@stedolan
Copy link
Contributor

I don't think there will be any performance effect: there is no extra indirection. Atom(0) remains a constant address, just a slightly different one than previously.

@gasche
Copy link
Member

gasche commented Nov 20, 2019

Sorry for the confusion; I was discussing Xavier's proposal to allocate the atom table dynamically at runtime startup.

@jhjourdan
Copy link
Contributor Author

For native code, I'm confident that (read-only) code and (read-write) data will never end up in the same page. That might not be the case for readonly data, which is why we should not put initialized data in readonly areas until the page table goes away.

And what about static OCaml data which end up next to data allocated by some C stub, to which OCaml heap block points to (without no-naked-pointers) ?

@jhjourdan
Copy link
Contributor Author

This adds &caml_atom_table[0] through &caml_atom_table[255] (inclusive) to the page table. However, since these are headers and object pointers point after headers, the actual value of Atom(255) is &caml_atom_table[256] (i.e. Val_hp (&caml_atom_table[255])), which isn't necessarily added to the page table if the atom table happens to end on a page boundary.

Indeed, fixed.

@jhjourdan
Copy link
Contributor Author

An alternate, perhaps simpler fix would be to allocate the atom table dynamically, using caml_stat_alloc_aligned_noexc. Accessing the atom table is not performance-critical, so an extra indirection would not hurt.

Indeed. I agree this is should not be performance critical. @gasche, do you confirm, shall I implement this?

@stedolan
Copy link
Contributor

@gasche thanks, I understand your comment now.

I thought that there would be a problem with this approach in compiling the expression [| |]. It turns out that this doesn't generate a reference to Atom(0), but instead compiles a new statically allocated zero-length object, with tag and length 0 which is not physically equal to Atom(0). (This is surprising to me, but I don't think anything depends on the only zero-length objects being Atom(x)).

@gasche
Copy link
Member

gasche commented Nov 20, 2019

I can't confirm because I know less than the three of you on how to assess the performance-impact of a runtime change. I looked at the uses of Atom in the runtime and the only way I could think of to propose a plausible micro-benchmark to assess the change is: create a marshalled value with a lot of empty strings (say, a long array of empty strings) and unmarshal it repeatedly.

@nojb
Copy link
Contributor

nojb commented Nov 20, 2019

Out of curiosity, is there a bug report anywhere about the Alpine crashes ?

@let-def
Copy link
Contributor

let-def commented Nov 20, 2019

I can't confirm because I know less than the three of you on how to assess the performance-impact of a runtime change. I looked at the uses of Atom in the runtime and the only way I could think of to propose a plausible micro-benchmark to assess the change is: create a marshalled value with a lot of empty strings (say, a long array of empty strings) and unmarshal it repeatedly.

An empty string has size 1... Empty arrays maybe?

@xavierleroy
Copy link
Contributor

xavierleroy commented Nov 20, 2019

I was a bit surprised by @xavierleroy's comment that the atom table is not performance-critical,

It's not performance-critical in the sense that allocation in the minor heap is performance-critical. The only use, as far as I can remember, is to evaluate the empty array [||] in bytecode. (EDIT: that, plus unmarshaling data containing empty arrays.) So, one extra load to get at the atom table is not a problem.

Also it opens the way to lazy allocation and initialization of the atom table the first time it's needed, which might actually save time and space in many programs (all those that don't use empty arrays). Just a thought.

I thought that there would be a problem with this approach in compiling the expression [| |]. It turns out that this doesn't generate a reference to Atom(0), but instead compiles a new statically allocated zero-length object, with tag and length 0 which is not physically equal to Atom(0)

Right. The main reason for the atom table is that we cannot allocate zero-sized objects in the minor heap (EDIT: because there would be no room for a forwarding pointer) nor in the major heap (EDIT: because there would be n room to maintain a free list), so static allocation is a must. In native code we have a solid general mechanism for statically-allocated data. In bytecode we don't so we need the atom table.

If you wonder whether [||] == [||] is allowed to evaluate to false, I say yes: physical equality is specified only for data that can be mutated in-place: "x == y is true if and only if a modification of x also modifies y. However, [||] cannot be modified!

@xavierleroy
Copy link
Contributor

And what about static OCaml data which end up next to data allocated by some C stub, to which OCaml heap block points to (without no-naked-pointers) ?

Naked pointers must go away, the sooner the better. (Mental note: update chapter 20 of the reference manual to say "don't use naked pointers".)

@jhjourdan jhjourdan force-pushed the caml_atom_table_padding branch 2 times, most recently from a34e2b1 to a2c06d9 Compare November 21, 2019 10:53
@jhjourdan
Copy link
Contributor Author

I've pushed a new version of the PR, which:

  • implements @xavierleroy's proposal
  • make sure the minor heap also have its own page (which may have the same problem, in theory, although it is not possible to trigger this issue with the default minor heap size)
  • do a minor cleanup (not a bugfix) in gc_ctrl.c, which is related to the question of pages of the major heap.

@jhjourdan
Copy link
Contributor Author

The test c-api/alloc_async_stub was failing on x86 32bits. This was a false alarm: the issue was in the test itself. This is now fixed.

Copy link
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Some unimportant remarks and suggestions below. I still think this must be reviewed by @damiendoligez himself.

runtime/gc_ctrl.c Outdated Show resolved Hide resolved
.depend Outdated Show resolved Hide resolved
otherlibs/unix/.depend Outdated Show resolved Hide resolved
runtime/gc_ctrl.c Show resolved Hide resolved
runtime/gc_ctrl.c Outdated Show resolved Hide resolved
runtime/startup_aux.c Outdated Show resolved Hide resolved
testsuite/tests/c-api/alloc_async_stubs.c Show resolved Hide resolved
@jhjourdan jhjourdan force-pushed the caml_atom_table_padding branch 2 times, most recently from 857e7c1 to cb32991 Compare December 9, 2019 13:57
Copy link
Member

@damiendoligez damiendoligez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very small change, then good to go.

runtime/startup_aux.c Outdated Show resolved Hide resolved
@jhjourdan
Copy link
Contributor Author

jhjourdan commented Dec 18, 2019

Thanks @damiendoligez. I included your change. This should be ready to merge as soon as CI passes.

@gasche could you backport to 4.10 (and 4.09 ?)?

@gasche
Copy link
Member

gasche commented Dec 18, 2019

For 4.10, "yes of course". For 4.09, we need some sort of assessment of the regression-risk of the change. If you are confident that this is a low-risk change, it can go into a maintenance branch, but otherwise I would be hesitant to backport it, at least before it gets more testing in 4.10 and trunk. (To me, a non-runtime-expert, it looks like a change that might go wrong in unplanned ways?)

@jhjourdan
Copy link
Contributor Author

As you prefer. To me, it is low-risk, but I have a hard time to have a objective assessment of the risk of a change.

Moreover, it seems that without my memprof change, the memory mapping was somehow avoiding the issue. So we may decide not to backport on 4.09.

@damiendoligez
Copy link
Member

This is a large-ish diff, so if the bug was not observed in the wild on 4.09, I'd advise against backporting.

@gasche
Copy link
Member

gasche commented Dec 19, 2019

There are, in fact, reports in the wild of segfault of OCaml 4.09 on alpine+musl Docker images, which are the initial reason for @jhjourdan's investigation and the currently proposed fix. I haven't been able to find a link to the discussions just now, but I ended up on older issues, such as #7562, which may in fact be related.

@damiendoligez
Copy link
Member

Then let's backport.

@gasche
Copy link
Member

gasche commented Dec 20, 2019

I'll start by running precheck on this change and then merging it. Then I'm willing to be wooed by @damiendoligez's enthusiasm and backport.

@gasche
Copy link
Member

gasche commented Dec 20, 2019

(enthusiasm, which I hope is correlated with a careful code review!)

@jhjourdan
Copy link
Contributor Author

Is there anything blocking merging for this PR, now that we have agreed to backport to 4.09?

…me and inline it at the only place it is used.
We need to make sure they do not share a page with code, otherwise the
GC and the polymorphic hash and comparison functions will follow code
pointers and potentially trigger a segfault.
We must wait for two major cycles to finish in order to make sure that finalizers have indeed be examined.
@jhjourdan
Copy link
Contributor Author

(I have just fixed the conflict in Changes.)

@gasche
Copy link
Member

gasche commented Jan 6, 2020

Sorry, I just forgot about it. I'll try to remember to merge after the CI comes back green again. Thanks for pinging!

@gasche
Copy link
Member

gasche commented Jan 6, 2020

(The precheck job was https://ci.inria.fr/ocaml/job/precheck/329/ and passed.)

@gasche gasche merged commit 11b5182 into ocaml:trunk Jan 7, 2020
gasche added a commit that referenced this pull request Jan 8, 2020
Add padding before and after the atom table

(cherry picked from commit 11b5182)
gasche added a commit that referenced this pull request Jan 8, 2020
Add padding before and after the atom table

(cherry picked from commit 11b5182)
gasche added a commit that referenced this pull request Jan 8, 2020
@gasche
Copy link
Member

gasche commented Jan 8, 2020

I cherry-picked the PR into 4.09 ( f49db35 ) and 4.10 ( 586e7e3 ).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants