Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid page table lookup in Pervasives.compare with no-naked-pointers #2079

Merged
merged 2 commits into from Oct 2, 2018

Conversation

Projects
None yet
6 participants
@samwgoldman
Copy link
Contributor

commented Sep 29, 2018

Background

OCaml values are either (a) immediates (b) pointers into the OCaml heap, or (c) pointers to some off-heap data. The (c) case is also known as foreign or naked pointers.

To accommodate these naked pointers, the runtime takes care to look up addresses in the page table. On 64-bit architectures, the page table is implemented as a hash table.

There is a configure-time --no-naked-pointers option, which disables these page table lookups in the garbage collector. However, even with this option enabled, Pervasives.compare will still perform page table lookups.

This change

If the --no-naked-pointers option is enabled, the runtime can be sure that all pointers will point to some OCaml value. Like the GC, we will follow these pointers and assume the referenced data has the expected structure.

In workloads with large heaps that also perform many polymorphic comparisons, this change will eliminate the overhead of these lookups.

Justification

The current code checks if values pass the Is_in_value_area check. This check is defined as (Classify_addr(a) & (In_heap | In_young | In_static_data)) where Classify_addr is the page table lookup.

The compare code wraps this check around three sections:

  1. v1 = long && v2 = block

    This section does a tag check in case v2 is either a Forward_tag (lazy) or Custom_tag (custom compare function). These tag checks are unsafe if v2 is a naked pointer to some arbitrary data. Without this possibility, it is safe to perform the tag check.

  2. v1 = block && v2 = long

    The justification for this case is the same as for case 1.

  3. v1 = block && v2 = block

    In this section, we have determined that both v1 and v2 are pointers. The following code walks values structurally, which is unsafe if either v1 or v2 are naked pointers to arbitrary data. Without this possibility, it is safe to perform the tag checks directly.

No change entry needed

@gasche

This comment has been minimized.

Copy link
Member

commented Sep 29, 2018

Some comments:

  • To evaluate the change I would indeed expect a discussion of why the change matter, with particular some benchmark (hopefully not just a micro-benchmark but an example of code that is representative of the workloads you care about) to evaluate the impact.

  • It is not immediately clear ot me why you export more widely and use a NATIVE_CODE_AND_NO_NAKED_POINTERS macro, instead of just checking #if defined (NO_NAKED_POINTERS).

  • The patch looks very strange to me: you embed the Is_in_value_area checks under #ifdef NATIVE_CODE_AND_NO_NAKED_POINTERS tests, but isn't the idea of your change precisely to not check Is_in_value_area when NO_NAKED_POINTERS is configured? Did you mean to do the opposite, that is check Is_in_value_area only when NO_NAKED_POINTERS is not set?

@samwgoldman

This comment has been minimized.

Copy link
Contributor Author

commented Sep 29, 2018

To evaluate the change I would indeed expect some benchmark

Yeah, for sure. I can use Flow on a large JavaScript repository. Flow programs have large-ish heaps and have many uses of polymorphic compare.

It is not immediately clear ot me why you export more widely and use a NATIVE_CODE_AND_NO_NAKED_POINTERS macro

I just stole this change from #203. I imagine you are right. I am not certain about the interaction between the native code and naked pointers settings.

The patch looks very strange to me

Well, this is certainly embarrassing. Will fix.

@samwgoldman samwgoldman force-pushed the samwgoldman:compare-no-foreign-pointers branch from 61d6792 to b7f67ba Sep 29, 2018

@samwgoldman samwgoldman force-pushed the samwgoldman:compare-no-foreign-pointers branch from b7f67ba to e8a726c Sep 29, 2018

@samwgoldman

This comment has been minimized.

Copy link
Contributor Author

commented Sep 29, 2018

I have performed some benchmarks on a meaningful workload -- specifically running Flow over the main JavaScript codebase at my work. This workload runs multiple OCaml processes which communicate by marshaling and unmarshaling OCaml values into a region of shared memory, similar to the Hack type checker.

First I built two versions of the OCaml compiler. One was built from the 4.05 tag (we use 4.05 internally) with the --no-naked-pointers configuration. The other was this commit, except based off the 4.05 tag, also with the --no-naked-pointers configuration. I used opam-compiler-conf to install both builds simultaneously, as Flow requires a number of opam dependencies.

I then built two Flow binaries, one with my 4.05+no-naked-pointers switch, and one with 4.05+no-naked-pointers+GH2079. I then ran each binary over our codebase 3 times, recording our internal profiling data and also grabbing stack samples using perf record -f 99 -ag.

With 4.05+no-naked-pointers, caml_page_table_lookup appears in 3-4% of samples recorded with perf. Under 4.05+no-naked-pointers+GH2079, the function appears in 0.4% of samples.

Our profiling measures CPU time, which saw a 2.23% improvement.

@samwgoldman

This comment has been minimized.

Copy link
Contributor Author

commented Sep 30, 2018

I should add that I have an ulterior motive here, which I don't expect will be very convincing, but perhaps will help explain what I am doing here.

Like I mentioned above, Flow uses multiple processes, but the serialization overhead is significant. We explored storing OCaml values directly outside the heap in a shared memory segment†, but it was difficult to test due to our pervasive use of polymorphic compare (and hash).

It seemed like the runtime was moving away from naked pointers anyway, so it seemed appealing to make this change, as it would allow us to use off-heap values and polymorphic functions together.

† I appreciate that this is unsafe, but several invariants conspire to make it relatively less insane. Off-heap values are immutable and never refer to on-heap ones. Off-heap values are referenced from on-heap ones, but only in forked worker processes, and they are never moved or freed until the forked worker(s) die. We set the copied heap values to Color_black during the copy, so they are not walked by the GC.

@gasche

This comment has been minimized.

Copy link
Member

commented Sep 30, 2018

This makes sense, thanks for the evaluation!

(cc @mshinwell)

@dra27

This comment has been minimized.

Copy link
Contributor

commented Sep 30, 2018

I think this is self-evidently an improvement, but benchmarks are always good!

However, I wonder if we're missing a trick - with --no-naked-pointers, what prevents Is_in_value_area being short-circuited to 1? Polymorphic hash certainly benefits and the other instances look safe to me? I'm never one to criticise #ifdef-soup (those in glass houses, and all that), but it seems avoidable here and will benefit other code paths.

@dra27

This comment has been minimized.

Copy link
Contributor

commented Sep 30, 2018

-no-naked-pointers is only tested on Inria's CI, so the final version of this GPR should be run through precheck before merge.

@lpw25

This comment has been minimized.

Copy link
Contributor

commented Sep 30, 2018

I'd have to look properly, but I wouldn't be surprised if setting Is_in_value_area being set to 1 caused problems with function pointers in closures.

@gasche

This comment has been minimized.

Copy link
Member

commented Sep 30, 2018

Maybe a compromise approach would be to have a

#define has_naked_pointers 1

(and 0 in the no-naked-pointers case of course) and then change the test here to if (!has_naked_pointers || Is_in_value_area(...)) and trust the compiler optimize the constant away. This does not automagically improve all Is_in_value_area code (which we probably shouldn't if Leo is right), but it reduces the verbosity of the conditional logic.

@samwgoldman

This comment has been minimized.

Copy link
Contributor Author

commented Sep 30, 2018

Another compromise would be to invert the Is_in_value_area checks for the long/block cases, to avoid the need to wrap the closing brace in #ifdefs. That is, adding this patch to this diff.

If this change received a warm reaction, my hope was to also follow up with a similar change in hash.c. Maybe that might tip the scales more toward a different solution?

If someone with more experience in this codebase would prefer a different implementation approach, I am of course happy to close this PR in favor of something else. I would like to see this change make it into the runtime, but it doesn't need to be through this patch specifically. :)

@stedolan

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2018

Apart from issues of style, this looks like a good idea.

Another compromise would be to invert the Is_in_value_area checks for the long/block cases, to avoid the need to wrap the closing brace in #ifdefs. That is, adding this patch to this diff.

My preference is for this solution. I get worried when I see #ifdef blocks containing only braces!

Eventually I'd like to move to what @dra27 suggests, but I think that requires figuring out the interaction with code pointers and is best left to another PR.

@gasche

This comment has been minimized.

Copy link
Member

commented Oct 2, 2018

@samwgoldman I think it would be fine to go forward along the line that @stedolan suggests: apply your patch to this PR and wait for a maintainer to approve/merge it. (It looks like @stedolan may be willing to do that.)

@samwgoldman

This comment has been minimized.

Copy link
Contributor Author

commented Oct 2, 2018

Okay, updated!

@stedolan

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2018

Looks good to me! I'll merge once CI passes.

@stedolan stedolan merged commit f04bbaf into ocaml:trunk Oct 2, 2018

1 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build failed
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
@stedolan

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2018

The build succeeded, but Travis is complaining about the lack of Changes entry, which seems unnecessary for this. Merging!

@mshinwell

This comment has been minimized.

Copy link
Contributor

commented Oct 3, 2018

@stedolan I think this should have a Changes entry, it isn't a completely trivial change.

@samwgoldman

This comment has been minimized.

Copy link
Contributor Author

commented Oct 3, 2018

@mshinwell I wasn't sure if I should include the change under standard library (affects behavior of pervasives compare if someone is using foreign pointers in no-naked-pointers mode) or runtime system (improves performance in no-naked-pointers mode).

@stedolan

This comment has been minimized.

Copy link
Contributor

commented Oct 5, 2018

@mshinwell OK! I'm still trying to figure out exactly what goes in the Changes file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.