Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add padding before and after the atom table #9128

Merged
merged 3 commits into from Jan 7, 2020

Conversation

@jhjourdan
Copy link
Contributor

jhjourdan commented Nov 19, 2019

We need to make sure it does not share a page with code, otherwise the GC and the polymorphic hash and comparison functions will follow code pointers and potentially trigger a segfault.

This seems to be the cause of the segfaults that we observed recently on Alpine Linux. It seems like the linker and musl conspired by putting some bytecode next to the atom table. Then, the polymorphic hash function decided to follow the code pointer of a closure, which did not point to any valid block, finally triggering the segfault.

Even though this PR fixes the issue of Alpine, I am still slightly concerned by a similar issue : in native mode, static data is not surrounded by similar padding, so that, in theory, the OS could decide to map the static data in the same page as some code. I know most OSes would typically use different pages for code and data in order to give the different permissions, but there is no guarantee here. Also, when not in the no-naked-pointers mode, it is possible that the heap contains a pointer to some non-heap data in a page registered as a value page.

So, there is a possible extension to this fix: add padding before and after any static data segment in native mode. Of course, this means we increase by 8kB the size of the binary generated for any OCaml compilation unit...

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

jhjourdan commented Nov 19, 2019

Of course, this is a fix for a bug, which we may consider critical since it can cause segmentation faults. So we probably want to backport it to the versions of OCaml for which we consider releasing a bugfix release sometimes.

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Nov 20, 2019

Backporting: yes! Alpine linux is heavily used for virtual machines and CI purpose, so I think a backport in 4.08 in addition to active versions makes sense. But first we need a review and approval.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

xavierleroy commented Nov 20, 2019

Thanks for the detective work on the Alpine Linux crashes.

An alternate, perhaps simpler fix would be to allocate the atom table dynamically, using caml_stat_alloc_aligned_noexc. Accessing the atom table is not performance-critical, so an extra indirection would not hurt.

For native code, I'm confident that (read-only) code and (read-write) data will never end up in the same page. That might not be the case for readonly data, which is why we should not put initialized data in readonly areas until the page table goes away.

@stedolan

This comment has been minimized.

Copy link
Contributor

stedolan commented Nov 20, 2019

Nicely found!

The patch looks good, but when reading it I think I saw an off-by-one error in startup_aux.c that this patch doesn't fix. During startup, we do:

caml_page_table_add(In_static_data,
                    caml_atom_table, caml_atom_table + 256)

This adds &caml_atom_table[0] through &caml_atom_table[255] (inclusive) to the page table. However, since these are headers and object pointers point after headers, the actual value of Atom(255) is &caml_atom_table[256] (i.e. Val_hp (&caml_atom_table[255])), which isn't necessarily added to the page table if the atom table happens to end on a page boundary.

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Nov 20, 2019

I was a bit surprised by @xavierleroy's comment that the atom table is not performance-critical, because I remembered seeing Atom(0) in the runtime in places that should go fast. In fact the comment is (of course!) fairly believable: it's important that Atom(n) does not allocate, but otherwise adding an extra indirection is probably not going to impact the observable performance of most runtime functions using Atom (typically to create an empty array in case where array-creation functions are called with size 0).

I noticed two specific use-cases that one could try to exploit to generate contradictory micro-benchmarks:

  • Atom(String_tag) is used when unmarshalling empty strings, and there are probably real-world workloads that receive marshalled values with a lot of empty strings.
  • Atom(0) is used to create a backtrace or callstack when the runtime determines that the available backtrace has size 0; I think this may happen in situations where users have "disabled" backtrace or callstack collection, and expect that calling these functions results in minimal overhead. But I guess that the time spent following the extra indirection is negligible compared to the cost of calling the runtime in this situation, so I don't expect the change to be an issue there.
@stedolan

This comment has been minimized.

Copy link
Contributor

stedolan commented Nov 20, 2019

I don't think there will be any performance effect: there is no extra indirection. Atom(0) remains a constant address, just a slightly different one than previously.

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Nov 20, 2019

Sorry for the confusion; I was discussing Xavier's proposal to allocate the atom table dynamically at runtime startup.

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

jhjourdan commented Nov 20, 2019

For native code, I'm confident that (read-only) code and (read-write) data will never end up in the same page. That might not be the case for readonly data, which is why we should not put initialized data in readonly areas until the page table goes away.

And what about static OCaml data which end up next to data allocated by some C stub, to which OCaml heap block points to (without no-naked-pointers) ?

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

jhjourdan commented Nov 20, 2019

This adds &caml_atom_table[0] through &caml_atom_table[255] (inclusive) to the page table. However, since these are headers and object pointers point after headers, the actual value of Atom(255) is &caml_atom_table[256] (i.e. Val_hp (&caml_atom_table[255])), which isn't necessarily added to the page table if the atom table happens to end on a page boundary.

Indeed, fixed.

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

jhjourdan commented Nov 20, 2019

An alternate, perhaps simpler fix would be to allocate the atom table dynamically, using caml_stat_alloc_aligned_noexc. Accessing the atom table is not performance-critical, so an extra indirection would not hurt.

Indeed. I agree this is should not be performance critical. @gasche, do you confirm, shall I implement this?

@stedolan

This comment has been minimized.

Copy link
Contributor

stedolan commented Nov 20, 2019

@gasche thanks, I understand your comment now.

I thought that there would be a problem with this approach in compiling the expression [| |]. It turns out that this doesn't generate a reference to Atom(0), but instead compiles a new statically allocated zero-length object, with tag and length 0 which is not physically equal to Atom(0). (This is surprising to me, but I don't think anything depends on the only zero-length objects being Atom(x)).

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Nov 20, 2019

I can't confirm because I know less than the three of you on how to assess the performance-impact of a runtime change. I looked at the uses of Atom in the runtime and the only way I could think of to propose a plausible micro-benchmark to assess the change is: create a marshalled value with a lot of empty strings (say, a long array of empty strings) and unmarshal it repeatedly.

@nojb

This comment has been minimized.

Copy link
Contributor

nojb commented Nov 20, 2019

Out of curiosity, is there a bug report anywhere about the Alpine crashes ?

@let-def

This comment has been minimized.

Copy link
Contributor

let-def commented Nov 20, 2019

I can't confirm because I know less than the three of you on how to assess the performance-impact of a runtime change. I looked at the uses of Atom in the runtime and the only way I could think of to propose a plausible micro-benchmark to assess the change is: create a marshalled value with a lot of empty strings (say, a long array of empty strings) and unmarshal it repeatedly.

An empty string has size 1... Empty arrays maybe?

@xavierleroy

This comment has been minimized.

Copy link
Contributor

xavierleroy commented Nov 20, 2019

I was a bit surprised by @xavierleroy's comment that the atom table is not performance-critical,

It's not performance-critical in the sense that allocation in the minor heap is performance-critical. The only use, as far as I can remember, is to evaluate the empty array [||] in bytecode. (EDIT: that, plus unmarshaling data containing empty arrays.) So, one extra load to get at the atom table is not a problem.

Also it opens the way to lazy allocation and initialization of the atom table the first time it's needed, which might actually save time and space in many programs (all those that don't use empty arrays). Just a thought.

I thought that there would be a problem with this approach in compiling the expression [| |]. It turns out that this doesn't generate a reference to Atom(0), but instead compiles a new statically allocated zero-length object, with tag and length 0 which is not physically equal to Atom(0)

Right. The main reason for the atom table is that we cannot allocate zero-sized objects in the minor heap (EDIT: because there would be no room for a forwarding pointer) nor in the major heap (EDIT: because there would be n room to maintain a free list), so static allocation is a must. In native code we have a solid general mechanism for statically-allocated data. In bytecode we don't so we need the atom table.

If you wonder whether [||] == [||] is allowed to evaluate to false, I say yes: physical equality is specified only for data that can be mutated in-place: "x == y is true if and only if a modification of x also modifies y. However, [||] cannot be modified!

@xavierleroy

This comment has been minimized.

Copy link
Contributor

xavierleroy commented Nov 20, 2019

And what about static OCaml data which end up next to data allocated by some C stub, to which OCaml heap block points to (without no-naked-pointers) ?

Naked pointers must go away, the sooner the better. (Mental note: update chapter 20 of the reference manual to say "don't use naked pointers".)

@jhjourdan jhjourdan force-pushed the jhjourdan:caml_atom_table_padding branch 2 times, most recently from a34e2b1 to a2c06d9 Nov 21, 2019
@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

jhjourdan commented Nov 21, 2019

I've pushed a new version of the PR, which:

  • implements @xavierleroy's proposal
  • make sure the minor heap also have its own page (which may have the same problem, in theory, although it is not possible to trigger this issue with the default minor heap size)
  • do a minor cleanup (not a bugfix) in gc_ctrl.c, which is related to the question of pages of the major heap.
@jhjourdan jhjourdan force-pushed the jhjourdan:caml_atom_table_padding branch from a2c06d9 to 4c4046f Nov 21, 2019
@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

jhjourdan commented Nov 21, 2019

The test c-api/alloc_async_stub was failing on x86 32bits. This was a false alarm: the issue was in the test itself. This is now fixed.

@jhjourdan jhjourdan force-pushed the jhjourdan:caml_atom_table_padding branch 3 times, most recently from e0b63e5 to 9c7cc72 Nov 22, 2019
Copy link
Contributor

xavierleroy left a comment

Looks good to me. Some unimportant remarks and suggestions below. I still think this must be reviewed by @damiendoligez himself.

runtime/gc_ctrl.c Outdated Show resolved Hide resolved
.depend Outdated Show resolved Hide resolved
otherlibs/unix/.depend Outdated Show resolved Hide resolved
runtime/gc_ctrl.c Show resolved Hide resolved
runtime/gc_ctrl.c Outdated Show resolved Hide resolved
runtime/startup_aux.c Outdated Show resolved Hide resolved
@xavierleroy xavierleroy requested a review from damiendoligez Dec 7, 2019
@jhjourdan jhjourdan force-pushed the jhjourdan:caml_atom_table_padding branch 2 times, most recently from 857e7c1 to cb32991 Dec 9, 2019
Copy link
Member

damiendoligez left a comment

A very small change, then good to go.

runtime/startup_aux.c Outdated Show resolved Hide resolved
@jhjourdan jhjourdan force-pushed the jhjourdan:caml_atom_table_padding branch from 0d6c8bf to ad4595e Dec 18, 2019
@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

jhjourdan commented Dec 18, 2019

Thanks @damiendoligez. I included your change. This should be ready to merge as soon as CI passes.

@gasche could you backport to 4.10 (and 4.09 ?)?

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Dec 18, 2019

For 4.10, "yes of course". For 4.09, we need some sort of assessment of the regression-risk of the change. If you are confident that this is a low-risk change, it can go into a maintenance branch, but otherwise I would be hesitant to backport it, at least before it gets more testing in 4.10 and trunk. (To me, a non-runtime-expert, it looks like a change that might go wrong in unplanned ways?)

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

jhjourdan commented Dec 19, 2019

As you prefer. To me, it is low-risk, but I have a hard time to have a objective assessment of the risk of a change.

Moreover, it seems that without my memprof change, the memory mapping was somehow avoiding the issue. So we may decide not to backport on 4.09.

@damiendoligez

This comment has been minimized.

Copy link
Member

damiendoligez commented Dec 19, 2019

This is a large-ish diff, so if the bug was not observed in the wild on 4.09, I'd advise against backporting.

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Dec 19, 2019

There are, in fact, reports in the wild of segfault of OCaml 4.09 on alpine+musl Docker images, which are the initial reason for @jhjourdan's investigation and the currently proposed fix. I haven't been able to find a link to the discussions just now, but I ended up on older issues, such as #7562, which may in fact be related.

@damiendoligez

This comment has been minimized.

Copy link
Member

damiendoligez commented Dec 20, 2019

Then let's backport.

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Dec 20, 2019

I'll start by running precheck on this change and then merging it. Then I'm willing to be wooed by @damiendoligez's enthusiasm and backport.

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Dec 20, 2019

(enthusiasm, which I hope is correlated with a careful code review!)

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

jhjourdan commented Jan 6, 2020

Is there anything blocking merging for this PR, now that we have agreed to backport to 4.09?

jhjourdan added 3 commits Nov 21, 2019
…me and inline it at the only place it is used.
We need to make sure they do not share a page with code, otherwise the
GC and the polymorphic hash and comparison functions will follow code
pointers and potentially trigger a segfault.
We must wait for two major cycles to finish in order to make sure that finalizers have indeed be examined.
@jhjourdan jhjourdan force-pushed the jhjourdan:caml_atom_table_padding branch from ad4595e to d5d972d Jan 6, 2020
@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

jhjourdan commented Jan 6, 2020

(I have just fixed the conflict in Changes.)

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Jan 6, 2020

Sorry, I just forgot about it. I'll try to remember to merge after the CI comes back green again. Thanks for pinging!

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Jan 6, 2020

(The precheck job was https://ci.inria.fr/ocaml/job/precheck/329/ and passed.)

@gasche gasche merged commit 11b5182 into ocaml:trunk Jan 7, 2020
2 checks passed
2 checks passed
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
gasche added a commit that referenced this pull request Jan 8, 2020
Add padding before and after the atom table

(cherry picked from commit 11b5182)
gasche added a commit that referenced this pull request Jan 8, 2020
Add padding before and after the atom table

(cherry picked from commit 11b5182)
gasche added a commit that referenced this pull request Jan 8, 2020
@gasche

This comment has been minimized.

Copy link
Member

gasche commented Jan 8, 2020

I cherry-picked the PR into 4.09 ( f49db35 ) and 4.10 ( 586e7e3 ).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

7 participants
You can’t perform that action at this time.