Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for ocamlfdo #2768

Merged
merged 14 commits into from Oct 30, 2019
Merged

Initial support for ocamlfdo #2768

merged 14 commits into from Oct 30, 2019

Conversation

gretay-js
Copy link
Contributor

This PR adds new rules to dune for performing Feedback-Directed Optimization (FDO) of OCaml native code.

The basic approach of FDO is to collect execution profiles using Linux perf with low-overhead sampling, and then use the profiles to guide compiler optimization decisions. Currently, the only supported optimizations are code layout (function and basic block reordering). FDO is implemented in a tool called ocamlfdo that will soon be available via opam.

We observed significant performance improvements with FDO on low-latency applications at Jane Street (using jenga to build them). Integration with dune will make it easier for users outside of Jane Street to experiment with FDO.

user interface

To enable FDO, dune users create a new context in their dune-workspace, and speficy the target executable to optimize using the new fdo field:

(context (default
           (fdo src/foo.exe)))

The name of the context is derived automatically from the default name, unless explicitly specified in dune-workspace file.

compilation rules

Under the hood, dune compilation rules for FDO split the standard compilation into two phases: compile and emit. The compile phase stops ocamlopt after scheduling pass in the backend, and saves the intermediate representation into a file. The emit phase reads that file and continues to emit assembly, assemble, link, etc, depending on the options passed to ocamlopt.

In an FDO build, after compile phase, the intermediate representation is optimized using the profile, and then passed on to the emit phase. If the profile changes, there is no need to repeat compile phase to rebuild the target executable.

In summary, module compilation rules for FDO are composed of three phases:

  • compile phase: ocamlopt -stop-after scheduling -save-ir-after scheduling ... outputs .cmir-linear file and the usual compilation artifacts including .cmx, but not .S or .o.

  • fdo phase: ocamlfdo opt ... with the appropriate options including a profile, if available, reads .cmir-linear and writes .cmir-linear-fdo.

  • emit phase: ocamlopt -start-from emit .. reads .cmir-linear-fdo and produces the rest of the compilation targets and artifacts.

link rule

Function reordering is achieved using a linker script generated by ocamlfdo from a template and a profile. This is limited to gnu linker scripts at the moment, mainly because it requires named function
section that are not available or untested on other platforms for ocaml functions.

dependencies

The ocamlfdo tool is implemented via compiler-libs and relies on a handful of simple compiler changes, some of which have already been merged. The three remaining PRs are under review to add the command line options mentioned above:

ocaml/ocaml#8939
ocaml/ocaml#8938
ocaml/ocaml#9003

A compiler version with these PRs is required to use the proposed feature. This is checked via Ocaml_version.supports_split_at_emit function.

@gretay-js gretay-js requested a review from a user October 18, 2019 15:33
@rgrinberg
Copy link
Member

rgrinberg commented Oct 19, 2019

Looks very cool. Will give this a review next week. In the meantime could you give this a rebase first?

src/dune/fdo.ml Outdated
~hint:"try: opam install ocamlfdo"
in
match ocamlfdo with
| Error e -> Action.Prog.Not_found.raise e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we raise here? It seems better to raise when a rule that requires this ocamlfdo is invoked instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed it, thanks.

It was used to find the location of the default template linker script to mark it as a dependency, but I think it is not needed because the default script that comes with ocamlfdo is not intended to be modified.

@@ -65,6 +73,8 @@ module Context = struct
field_o "host" (Dune_lang.Syntax.since syntax (1, 10) >>> string)
and+ toolchain =
field_o "toolchain" (Dune_lang.Syntax.since syntax (1, 5) >>> string)
and+ fdo_target_exe =
field_o "fdo" (Dune_lang.Syntax.since syntax (1, 12) >>> string)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this will need to be adjusted before merging. It likely needs to be (2, 0) if not (2, 1).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optimistically changed to (2, 0) and improved tests for fdo along the way.

src/dune/fdo.ml Outdated
| Error e -> Action.Prog.Not_found.raise e
| Ok _ -> ocamlfdo

(* CR gyorsh: this should also be cached *)
Copy link
Member

@rgrinberg rgrinberg Oct 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the imperative to cache this? This function seems to be completely pure and without IO that isn’t already cached. In summary, it seems to be performant enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I just wanted to make sure we do not repeat File_tree.file_exists for every module we compile, because fdo_profile is the same per context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well File_tree.file_exists should be fairly cheap as it's already cached. However, perhaps something should indeed be done.

Some suggestions:

Introduce an Fdo.Mode with roughly this signature:

module Mode : sig
  type t = If_exists | Always | Never

  val of_context : Context.t -> t (* or [t option] *)
end

Wrap the profile_exists and fdo_profile_src calculation in a lazy (...).

By the way, regarding performance, I'm now indeed worried that we're running all this code for every module. Perhaps the absence of OCAMLFDO_USE_PROFILE should be the same as Never and we do this check in opt_rule before doing anything else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well File_tree.file_exists should be fairly cheap as it's already cached.

great! what about Super_context.resolve_program - is it also cached?

By the way, regarding performance, I'm now indeed worried that we're running all this code for every module.

These rules are only added for fdo context, so this should not affect normal compilation contexts.

Perhaps the absence of OCAMLFDO_USE_PROFILE should be the same as Never and we do this check in opt_rule before doing anything else.

I think the current default of if-exist is a good one from usability point of view. By default, there is no need for the user to set any env var. I think this check can still be done with opt_rule executes. I'll try to rewrite it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great! what about Super_context.resolve_program - is it also cached?

Yup. It's cached.

These rules are only added for fdo context, so this should not affect normal compilation contexts.

Got it. That's good to know.

I think the current default of if-exist is a good one from usability point of view. By default, there is no need for the user to set any env var. I think this check can still be done with opt_rule executes. I'll try to rewrite it.

By the way, I'm wondering if it should be possible to set this option directly in the workspace file as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some changes to address your suggestions, as far as I understand them. I haven't been able to test it properly yet, because I am having trouble building packages from opam with the newest dune. What would be the best way to set up an opam switch with a version of dune 2.0 that is based off of an unreleased dune version?

  • Workspace: type of fdo_target_exe is Path.t instead of string.
  • Fdo.Mode to read and parse OCAMLFDO_USE_PROFILE env variable from
    context.
  • File_tree.file_exists is lazy, and forced only when needed,
    depending on Fdo.Mode.
  • Fdo.use_profile is computed dynamically (when opt_rule executes, not
    when it is constructed).
  • The result of Fdo.use_profile is now memoized based on context, to
    speed up module compilation under fdo.
  • Simplify linker_script_rule: remove linker_script_hot, which is not
    is intended for experiments and can specified via
    OCAMLFDO_LINKER_SCRIPT_FLAGS variable.
  • Fix tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I'm wondering if it should be possible to set this option directly in the workspace file as well.

I'm happy to try to add it, perhaps in a separate PR?
One way is to add another field, like this:

(context
 (default
   (fdo src/foo.exe)
   (fdo_use_profile if-exists)))

It may be better to have more structure:

(context
 (default
   (fdo
    (target_exe src/foo.exe)
    (use_profile if-exists))))

There are other fdo-related flags read from environment variables in fdo.ml. They could be specified in dune-workspace as well, but I am not sure if it's the right place for all of them. The usage scenarios for them are not entirely clear yet. There could be reasons to specify some of the fdo flags per directory for example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the best way to set up an opam switch with a version of dune 2.0 that is based off of an unreleased dune version?

$ opam pin dune --dev should work I think.

This PR seems good enough for me to merge. One thing that is still needed is a short page in the manual describing this feature.

Also, this is up to your discretion, but you may state that this feature as experimental in the docs. The benefit of this is that you'll be able to break the user facing interface without having to support older versions.

As for the tests, I suppose we should disable them for versions of OCaml less than 4.0.10.

Copy link
Contributor Author

@gretay-js gretay-js Oct 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ opam pin dune --dev should work I think.

It did before dune 2.0 (that's how I tested the first version of this PR), but breaks with dune 2.0. I think it has to do with the move of dune-configurator into a separate package. I pinned dune-configurator too, and it builds some, but not all packages (after a hack to opam files of dune to convince dune-private-libs and dune-configurator to be friends). I'll try again track it down a bit later today.

This PR seems good enough for me to merge.

Great, thank you!

One thing that is still needed is a short page in the manual describing this feature.
Also, this is up to your discretion, but you may state that this feature as experimental in the docs. The benefit of this is that you'll be able to break the user facing interface without having to support older versions.
As for the tests, I suppose we should disable them for versions of OCaml less than 4.0.10.

I'll add these changes shortly, thank you.

src/dune/workspace.ml Outdated Show resolved Hide resolved
@rgrinberg
Copy link
Member

rgrinberg commented Oct 29, 2019 via email

Signed-off-by: Greta Yorsh <gyorsh@janestreet.com>
Signed-off-by: Greta Yorsh <gyorsh@janestreet.com>
Signed-off-by: Greta Yorsh <gyorsh@janestreet.com>
Signed-off-by: Greta Yorsh <gyorsh@janestreet.com>
Signed-off-by: Greta Yorsh <gyorsh@janestreet.com>
Signed-off-by: Greta Yorsh <gyorsh@janestreet.com>
Signed-off-by: Greta Yorsh <gyorsh@janestreet.com>
Signed-off-by: Greta Yorsh <gyorsh@janestreet.com>
Signed-off-by: Greta Yorsh <gyorsh@janestreet.com>
Signed-off-by: Greta Yorsh <gyorsh@janestreet.com>
Signed-off-by: Greta Yorsh <gyorsh@janestreet.com>
Signed-off-by: Greta Yorsh <gyorsh@janestreet.com>
- Workspace: type of fdo_target_exe is Path.t instead of string.
- Fdo.Mode to read and parse OCAMLFDO_USE_PROFILE env variable from
  context.
- File_tree.file_exists is lazy, and forced only when needed,
  depending on Fdo.Mode.
- Fdo.use_profile is computed dynamically (when opt_rule executes, not
  when it is constructed).
- The result of Fdo.use_profile is now memoized based on context, to
  speed up module compilation under fdo.
- Simplify linker_script_rule: remove linker_script_hot, which is not
  is intended for experiments and can specified via
  OCAMLFDO_LINKER_SCRIPT_FLAGS variable.
- Fix tests.

Signed-off-by: Greta Yorsh <gyorsh@janestreet.com>
@rgrinberg
Copy link
Member

@gretay-js I'm merging. Please update the docs in a separate PR.

Well, the error handling can be improved to take locations into account
but this will do for now.

Signed-off-by: Rudi Grinberg <me@rgrinberg.com>
@gretay-js
Copy link
Contributor Author

Thank you!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants