New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial support for ocamlfdo #2768
Conversation
Looks very cool. Will give this a review next week. In the meantime could you give this a rebase first? |
448a458
to
f08fb10
Compare
src/dune/fdo.ml
Outdated
~hint:"try: opam install ocamlfdo" | ||
in | ||
match ocamlfdo with | ||
| Error e -> Action.Prog.Not_found.raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should we raise here? It seems better to raise when a rule that requires this ocamlfdo
is invoked instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed it, thanks.
It was used to find the location of the default template linker script to mark it as a dependency, but I think it is not needed because the default script that comes with ocamlfdo is not intended to be modified.
src/dune/workspace.ml
Outdated
@@ -65,6 +73,8 @@ module Context = struct | |||
field_o "host" (Dune_lang.Syntax.since syntax (1, 10) >>> string) | |||
and+ toolchain = | |||
field_o "toolchain" (Dune_lang.Syntax.since syntax (1, 5) >>> string) | |||
and+ fdo_target_exe = | |||
field_o "fdo" (Dune_lang.Syntax.since syntax (1, 12) >>> string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this will need to be adjusted before merging. It likely needs to be (2, 0)
if not (2, 1)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optimistically changed to (2, 0) and improved tests for fdo along the way.
src/dune/fdo.ml
Outdated
| Error e -> Action.Prog.Not_found.raise e | ||
| Ok _ -> ocamlfdo | ||
|
||
(* CR gyorsh: this should also be cached *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the imperative to cache this? This function seems to be completely pure and without IO that isn’t already cached. In summary, it seems to be performant enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I just wanted to make sure we do not repeat File_tree.file_exists
for every module we compile, because fdo_profile
is the same per context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well File_tree.file_exists
should be fairly cheap as it's already cached. However, perhaps something should indeed be done.
Some suggestions:
Introduce an Fdo.Mode
with roughly this signature:
module Mode : sig
type t = If_exists | Always | Never
val of_context : Context.t -> t (* or [t option] *)
end
Wrap the profile_exists
and fdo_profile_src
calculation in a lazy (...)
.
By the way, regarding performance, I'm now indeed worried that we're running all this code for every module. Perhaps the absence of OCAMLFDO_USE_PROFILE
should be the same as Never
and we do this check in opt_rule
before doing anything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well
File_tree.file_exists
should be fairly cheap as it's already cached.
great! what about Super_context.resolve_program
- is it also cached?
By the way, regarding performance, I'm now indeed worried that we're running all this code for every module.
These rules are only added for fdo context, so this should not affect normal compilation contexts.
Perhaps the absence of OCAMLFDO_USE_PROFILE should be the same as Never and we do this check in opt_rule before doing anything else.
I think the current default of if-exist is a good one from usability point of view. By default, there is no need for the user to set any env var. I think this check can still be done with opt_rule executes. I'll try to rewrite it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great! what about Super_context.resolve_program - is it also cached?
Yup. It's cached.
These rules are only added for fdo context, so this should not affect normal compilation contexts.
Got it. That's good to know.
I think the current default of if-exist is a good one from usability point of view. By default, there is no need for the user to set any env var. I think this check can still be done with opt_rule executes. I'll try to rewrite it.
By the way, I'm wondering if it should be possible to set this option directly in the workspace file as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some changes to address your suggestions, as far as I understand them. I haven't been able to test it properly yet, because I am having trouble building packages from opam with the newest dune. What would be the best way to set up an opam switch with a version of dune 2.0 that is based off of an unreleased dune version?
- Workspace: type of fdo_target_exe is Path.t instead of string.
- Fdo.Mode to read and parse OCAMLFDO_USE_PROFILE env variable from
context. - File_tree.file_exists is lazy, and forced only when needed,
depending on Fdo.Mode. - Fdo.use_profile is computed dynamically (when opt_rule executes, not
when it is constructed). - The result of Fdo.use_profile is now memoized based on context, to
speed up module compilation under fdo. - Simplify linker_script_rule: remove linker_script_hot, which is not
is intended for experiments and can specified via
OCAMLFDO_LINKER_SCRIPT_FLAGS variable. - Fix tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, I'm wondering if it should be possible to set this option directly in the workspace file as well.
I'm happy to try to add it, perhaps in a separate PR?
One way is to add another field, like this:
(context
(default
(fdo src/foo.exe)
(fdo_use_profile if-exists)))
It may be better to have more structure:
(context
(default
(fdo
(target_exe src/foo.exe)
(use_profile if-exists))))
There are other fdo-related flags read from environment variables in fdo.ml
. They could be specified in dune-workspace as well, but I am not sure if it's the right place for all of them. The usage scenarios for them are not entirely clear yet. There could be reasons to specify some of the fdo flags per directory for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the best way to set up an opam switch with a version of dune 2.0 that is based off of an unreleased dune version?
$ opam pin dune --dev
should work I think.
This PR seems good enough for me to merge. One thing that is still needed is a short page in the manual describing this feature.
Also, this is up to your discretion, but you may state that this feature as experimental in the docs. The benefit of this is that you'll be able to break the user facing interface without having to support older versions.
As for the tests, I suppose we should disable them for versions of OCaml less than 4.0.10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ opam pin dune --dev
should work I think.
It did before dune 2.0 (that's how I tested the first version of this PR), but breaks with dune 2.0. I think it has to do with the move of dune-configurator into a separate package. I pinned dune-configurator too, and it builds some, but not all packages (after a hack to opam files of dune to convince dune-private-libs and dune-configurator to be friends). I'll try again track it down a bit later today.
This PR seems good enough for me to merge.
Great, thank you!
One thing that is still needed is a short page in the manual describing this feature.
Also, this is up to your discretion, but you may state that this feature as experimental in the docs. The benefit of this is that you'll be able to break the user facing interface without having to support older versions.
As for the tests, I suppose we should disable them for versions of OCaml less than 4.0.10.
I'll add these changes shortly, thank you.
Btw, I’ve rebased your PR myself. My changes caused the conflict so I felt responsible. Regarding this pinning thing, could you ask the entire team on Slack? I’m not quite sure what’s the best way either anymore.
…On Oct 29, 2019 21:48 +0900, Greta Yorsh ***@***.***>, wrote:
@gretay-js commented on this pull request.
In src/dune/fdo.ml:
> +
+let cxx_flags = c_flags
+
+(* Location of ocamlfdo binary tool is independent of the module, but may
+ depend on the context. If it isn't cached elsewhere, we should do it here.
+ CR gyorsh: is it cached? *)
+let ocamlfdo_binary sctx dir =
+ let ocamlfdo =
+ Super_context.resolve_program sctx ~dir ~loc:None "ocamlfdo"
+ ~hint:"try: opam install ocamlfdo"
+ in
+ match ocamlfdo with
+ | Error e -> Action.Prog.Not_found.raise e
+ | Ok _ -> ocamlfdo
+
+(* CR gyorsh: this should also be cached *)
> $ opam pin dune --dev should work I think.
> It did before dune 2.0 (that's how I tested the first version of this PR), but breaks with dune 2.0. I think it has to do with the move of dune-configurator into a separate packaged. I pinned dune-configurator too, and it builds some, but not all packages (after a hack to opam files of dune to convince dune-private-libs and dune-configurator to be friends). I'll try again track it down a bit later today.
> This PR seems good enough for me to merge.
> Great, thank you!
> One thing that is still needed is a short page in the manual describing this feature.
> Also, this is up to your discretion, but you may state that this feature as experimental in the docs. The benefit of this is that you'll be able to break the user facing interface without having to support older versions.
> As for the tests, I suppose we should disable them for versions of OCaml less than 4.0.10.
> I'll add these changes shortly, thank you.
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Signed-off-by: Greta Yorsh <gyorsh@janestreet.com>
Signed-off-by: Greta Yorsh <gyorsh@janestreet.com>
Signed-off-by: Greta Yorsh <gyorsh@janestreet.com>
Signed-off-by: Greta Yorsh <gyorsh@janestreet.com>
Signed-off-by: Greta Yorsh <gyorsh@janestreet.com>
Signed-off-by: Greta Yorsh <gyorsh@janestreet.com>
Signed-off-by: Greta Yorsh <gyorsh@janestreet.com>
Signed-off-by: Greta Yorsh <gyorsh@janestreet.com>
Signed-off-by: Greta Yorsh <gyorsh@janestreet.com>
Signed-off-by: Greta Yorsh <gyorsh@janestreet.com>
Signed-off-by: Greta Yorsh <gyorsh@janestreet.com>
Signed-off-by: Greta Yorsh <gyorsh@janestreet.com>
- Workspace: type of fdo_target_exe is Path.t instead of string. - Fdo.Mode to read and parse OCAMLFDO_USE_PROFILE env variable from context. - File_tree.file_exists is lazy, and forced only when needed, depending on Fdo.Mode. - Fdo.use_profile is computed dynamically (when opt_rule executes, not when it is constructed). - The result of Fdo.use_profile is now memoized based on context, to speed up module compilation under fdo. - Simplify linker_script_rule: remove linker_script_hot, which is not is intended for experiments and can specified via OCAMLFDO_LINKER_SCRIPT_FLAGS variable. - Fix tests. Signed-off-by: Greta Yorsh <gyorsh@janestreet.com>
@gretay-js I'm merging. Please update the docs in a separate PR. |
Well, the error handling can be improved to take locations into account but this will do for now. Signed-off-by: Rudi Grinberg <me@rgrinberg.com>
Thank you!! |
This PR adds new rules to dune for performing Feedback-Directed Optimization (FDO) of OCaml native code.
The basic approach of FDO is to collect execution profiles using Linux perf with low-overhead sampling, and then use the profiles to guide compiler optimization decisions. Currently, the only supported optimizations are code layout (function and basic block reordering). FDO is implemented in a tool called ocamlfdo that will soon be available via opam.
We observed significant performance improvements with FDO on low-latency applications at Jane Street (using jenga to build them). Integration with dune will make it easier for users outside of Jane Street to experiment with FDO.
user interface
To enable FDO, dune users create a new context in their dune-workspace, and speficy the target executable to optimize using the new
fdo
field:The name of the context is derived automatically from the default name, unless explicitly specified in dune-workspace file.
compilation rules
Under the hood, dune compilation rules for FDO split the standard compilation into two phases: compile and emit. The compile phase stops
ocamlopt
afterscheduling
pass in the backend, and saves the intermediate representation into a file. The emit phase reads that file and continues to emit assembly, assemble, link, etc, depending on the options passed to ocamlopt.In an FDO build, after compile phase, the intermediate representation is optimized using the profile, and then passed on to the emit phase. If the profile changes, there is no need to repeat compile phase to rebuild the target executable.
In summary, module compilation rules for FDO are composed of three phases:
compile phase:
ocamlopt -stop-after scheduling -save-ir-after scheduling ...
outputs .cmir-linear file and the usual compilation artifacts including .cmx, but not .S or .o.fdo phase:
ocamlfdo opt ...
with the appropriate options including a profile, if available, reads .cmir-linear and writes .cmir-linear-fdo.emit phase:
ocamlopt -start-from emit ..
reads .cmir-linear-fdo and produces the rest of the compilation targets and artifacts.link rule
Function reordering is achieved using a linker script generated by ocamlfdo from a template and a profile. This is limited to gnu linker scripts at the moment, mainly because it requires named function
section that are not available or untested on other platforms for ocaml functions.
dependencies
The ocamlfdo tool is implemented via compiler-libs and relies on a handful of simple compiler changes, some of which have already been merged. The three remaining PRs are under review to add the command line options mentioned above:
ocaml/ocaml#8939
ocaml/ocaml#8938
ocaml/ocaml#9003
A compiler version with these PRs is required to use the proposed feature. This is checked via Ocaml_version.supports_split_at_emit function.