Profile-guided optimizations (PGO) #1219

Merged
merged 27 commits into from Jun 20, 2016

Conversation

Projects
None yet
5 participants
@JohanEngelen
Member

JohanEngelen commented Nov 22, 2015

See this page on the wiki that documents the work: http://wiki.dlang.org/LDC_LLVM_profiling_instrumentation

This PR implements PGO as Clang/LLVM does it:

  1. Compile with instrumentation: ldc2 -fprofile-instr-generate test.d -of=test1
  2. Run executable: ./test1
    This generates a "default.profraw" file.
  3. Run llvm-profdata tool: llvm-profdata merge default.profraw -output test.profdata
  4. Compile again, now using profile data: ldc2 -profile-instr-use=test.profdata test.d -of=test2

Instead of adding the LLVM profiler runtime to druntime like the PR does now, I think it should be a separate library that can be linked to in case of -fprofile-instr-generate.

@klickverbot

This comment has been minimized.

Show comment
Hide comment
@klickverbot

klickverbot Dec 8, 2015

Member

This random microbenchmark for std.regex does not compile with -fprofile-instr-generate on LLVM/LDC master:

import std.stdio, std.conv, std.array, std.regex, std.utf,
       std.algorithm, std.exception;

string reEncode(string s) {
    validate(s); // Throw if it's not a well-formed UTF string
    static string rep(Captures!string m) {
        auto c = canFind("0123456789#", m[1]) ? "#" ~ m[1] : m[1];
        return text(m.hit.length / m[1].length) ~ c;
    }
    return std.regex.replace!rep(s, regex(`(.|[\n\r\f])\1*`, "g"));
}


string reDecode(string s) {
    validate(s); // Throw if it's not a well-formed UTF string
    static string rep(Captures!string m) {
        string c = m[2];
        if (c.length > 1 && c[0] == '#')
            c = c[1 .. $];
        return replicate(c, to!int(m[1]));
    }
    auto r=regex(`(\d+)(#[0123456789#]|[\n\r\f]|[^0123456789#\n\r\f]+)`
                 , "g");
    return std.regex.replace!rep(s, r);
}

void rle() {
    pragma(LDC_never_inline);
    auto s = "??????????????\nWWWWWWWWWWWWBWWWWWWWWWWW" ~
             "WBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW\n" ~
             "11#222##333";
    enforce(s == reDecode(reEncode(s)));
}

void main() {
    foreach (_; 0..100000) {
        rle();
    }
}
Basic Block in function '_D3std5regex8internal6parser15__T6ParserTAyaZ6Parser13parseCharTermMFZS3std8typecons136__T5TupleTS3std3uni38__T13InversionListTS3std3uni8GcPolicyZ13InversionListTE3std5regex8internal6parser15__T6ParserTAyaZ6Parser8OperatorZ5Tuple' does not have terminator!
label %postinvoke14
Member

klickverbot commented Dec 8, 2015

This random microbenchmark for std.regex does not compile with -fprofile-instr-generate on LLVM/LDC master:

import std.stdio, std.conv, std.array, std.regex, std.utf,
       std.algorithm, std.exception;

string reEncode(string s) {
    validate(s); // Throw if it's not a well-formed UTF string
    static string rep(Captures!string m) {
        auto c = canFind("0123456789#", m[1]) ? "#" ~ m[1] : m[1];
        return text(m.hit.length / m[1].length) ~ c;
    }
    return std.regex.replace!rep(s, regex(`(.|[\n\r\f])\1*`, "g"));
}


string reDecode(string s) {
    validate(s); // Throw if it's not a well-formed UTF string
    static string rep(Captures!string m) {
        string c = m[2];
        if (c.length > 1 && c[0] == '#')
            c = c[1 .. $];
        return replicate(c, to!int(m[1]));
    }
    auto r=regex(`(\d+)(#[0123456789#]|[\n\r\f]|[^0123456789#\n\r\f]+)`
                 , "g");
    return std.regex.replace!rep(s, r);
}

void rle() {
    pragma(LDC_never_inline);
    auto s = "??????????????\nWWWWWWWWWWWWBWWWWWWWWWWW" ~
             "WBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW\n" ~
             "11#222##333";
    enforce(s == reDecode(reEncode(s)));
}

void main() {
    foreach (_; 0..100000) {
        rle();
    }
}
Basic Block in function '_D3std5regex8internal6parser15__T6ParserTAyaZ6Parser13parseCharTermMFZS3std8typecons136__T5TupleTS3std3uni38__T13InversionListTS3std3uni8GcPolicyZ13InversionListTE3std5regex8internal6parser15__T6ParserTAyaZ6Parser8OperatorZ5Tuple' does not have terminator!
label %postinvoke14
@JohanEngelen

This comment has been minimized.

Show comment
Hide comment
@JohanEngelen

JohanEngelen Dec 8, 2015

Member

@klickverbot Thanks for the testcase. I get a different error though: assert in PGO code. Traversing DMD's AST is non-trivial :/

Member

JohanEngelen commented Dec 8, 2015

@klickverbot Thanks for the testcase. I get a different error though: assert in PGO code. Traversing DMD's AST is non-trivial :/

gen/pgo.cpp
+//
+//===----------------------------------------------------------------------===//
+
+//===--- CodeGenPGO.cpp - PGO Instrumentation for LLVM CodeGen --*- C++ -*-===//

This comment has been minimized.

@klickverbot

klickverbot Dec 13, 2015

Member

Could you merge this into the preceding LDC header comment? Maybe just note that it is adapted from CodeGenPGO.cpp, and that it is under LLVM's license. The reference to LICENSE.TXT also doesn't make a lot of sense in our repository.

@klickverbot

klickverbot Dec 13, 2015

Member

Could you merge this into the preceding LDC header comment? Maybe just note that it is adapted from CodeGenPGO.cpp, and that it is under LLVM's license. The reference to LICENSE.TXT also doesn't make a lot of sense in our repository.

@JohanEngelen

This comment has been minimized.

Show comment
Hide comment
@JohanEngelen

JohanEngelen Jan 21, 2016

Member

Johan, take a look at this too at some point: http://reviews.llvm.org/D15829

Member

JohanEngelen commented Jan 21, 2016

Johan, take a look at this too at some point: http://reviews.llvm.org/D15829

@smolt

This comment has been minimized.

Show comment
Hide comment
@smolt

smolt Jan 22, 2016

Member

I talk to myself often too, but not yet on github.

Member

smolt commented Jan 22, 2016

I talk to myself often too, but not yet on github.

@JohanEngelen

This comment has been minimized.

Show comment
Hide comment
@JohanEngelen

JohanEngelen Mar 3, 2016

Member

Rebased onto master!

Member

JohanEngelen commented Mar 3, 2016

Rebased onto master!

@JohanEngelen JohanEngelen changed the title from [WIP] Profile-guided optimizations to Profile-guided optimizations (PGO) Mar 5, 2016

@JohanEngelen

This comment has been minimized.

Show comment
Hide comment
@JohanEngelen

JohanEngelen Mar 5, 2016

Member

Now also working for LLVM 3.9

Member

JohanEngelen commented Mar 5, 2016

Now also working for LLVM 3.9

@JohanEngelen

This comment has been minimized.

Show comment
Hide comment
@JohanEngelen

JohanEngelen Mar 6, 2016

Member

@klickverbot @redstar The work is done! Please review, and merge ;-)

(This PR is becoming larger and larger. By now, so much time has gone into this that i am very biased towards it. It's been tested to safely compile Phobos/druntime. The frontend changes are not there for the fun of it. Some frontend changes have been submitted upstream, but dlang/dmd#5501 was met with disappointing review, especially considering [time investment PGO] vs [apparent review time].)

Member

JohanEngelen commented Mar 6, 2016

@klickverbot @redstar The work is done! Please review, and merge ;-)

(This PR is becoming larger and larger. By now, so much time has gone into this that i am very biased towards it. It's been tested to safely compile Phobos/druntime. The frontend changes are not there for the fun of it. Some frontend changes have been submitted upstream, but dlang/dmd#5501 was met with disappointing review, especially considering [time investment PGO] vs [apparent review time].)

@etcimon

This comment has been minimized.

Show comment
Hide comment
@etcimon

etcimon Mar 6, 2016

Just curious, has anyone benchmarked this yet in favorable conditions? What are the potential gains?

etcimon commented Mar 6, 2016

Just curious, has anyone benchmarked this yet in favorable conditions? What are the potential gains?

@JohanEngelen

This comment has been minimized.

Show comment
Hide comment
@JohanEngelen

JohanEngelen Mar 7, 2016

Member

@etcimon I did some simple tests, but have not been able to create testcases with big performance improvements. I too am very curious to see what the gains can be (and for what kind of code).

Member

JohanEngelen commented Mar 7, 2016

@etcimon I did some simple tests, but have not been able to create testcases with big performance improvements. I too am very curious to see what the gains can be (and for what kind of code).

@JohanEngelen

This comment has been minimized.

Show comment
Hide comment
@JohanEngelen

JohanEngelen Mar 18, 2016

Member

@klickverbot dlang/dmd#5501 is not going anywhere, what do you think?

Member

JohanEngelen commented Mar 18, 2016

@klickverbot dlang/dmd#5501 is not going anywhere, what do you think?

@JohanEngelen

This comment has been minimized.

Show comment
Hide comment
@JohanEngelen

JohanEngelen Mar 18, 2016

Member

@etcimon PGO is relatively new in LLVM, and I think that LLVM 3.8 actually does not use much of the profiling information during optimization (e.g. afaik it is not used for inlining). Development of PGO is pretty active in trunk though.

Member

JohanEngelen commented Mar 18, 2016

@etcimon PGO is relatively new in LLVM, and I think that LLVM 3.8 actually does not use much of the profiling information during optimization (e.g. afaik it is not used for inlining). Development of PGO is pretty active in trunk though.

@smolt

This comment has been minimized.

Show comment
Hide comment
@smolt

smolt Mar 20, 2016

Member

Curious if it could make LDC faster, optimizing D front-end, and clang w/ PGO make a faster LLVM?

Member

smolt commented Mar 20, 2016

Curious if it could make LDC faster, optimizing D front-end, and clang w/ PGO make a faster LLVM?

@JohanEngelen

This comment has been minimized.

Show comment
Hide comment
@JohanEngelen

JohanEngelen Mar 26, 2016

Member

@klickverbot Rebased!

What needs some work in the future is PGO for exception handling statements (try/catch/finally/scope). It is not so easy, and I don't expect to work on it soon. Hope this can be merged without it. @rainers I will need your help to implement PGO for MSVC EH!

Member

JohanEngelen commented Mar 26, 2016

@klickverbot Rebased!

What needs some work in the future is PGO for exception handling statements (try/catch/finally/scope). It is not so easy, and I don't expect to work on it soon. Hope this can be merged without it. @rainers I will need your help to implement PGO for MSVC EH!

@klickverbot

This comment has been minimized.

Show comment
Hide comment
@klickverbot

klickverbot Mar 26, 2016

Member

@JohanEngelen: Fails on Travis, unfortunately.

Edit: More specifically, the LLVM 3.5/3.6 builds do.

Member

klickverbot commented Mar 26, 2016

@JohanEngelen: Fails on Travis, unfortunately.

Edit: More specifically, the LLVM 3.5/3.6 builds do.

@JohanEngelen

This comment has been minimized.

Show comment
Hide comment
@JohanEngelen

JohanEngelen Mar 29, 2016

Member

This is currently broken on Windows because of COMDAT "any" selection for all functions. I will have to ask on LLVMdev whether this can be worked around or not.

Member

JohanEngelen commented Mar 29, 2016

This is currently broken on Windows because of COMDAT "any" selection for all functions. I will have to ask on LLVMdev whether this can be worked around or not.

@klickverbot

This comment has been minimized.

Show comment
Hide comment
@klickverbot

klickverbot Apr 13, 2016

Member

The build is unfortunately still broken on CircleCI.

Member

klickverbot commented Apr 13, 2016

The build is unfortunately still broken on CircleCI.

@JohanEngelen

This comment has been minimized.

Show comment
Hide comment
@JohanEngelen

JohanEngelen Apr 14, 2016

Member

CircleCI is broken because it uses latest LLVM. I was waiting for #1420 but now I see it is for merge-2.071

Member

JohanEngelen commented Apr 14, 2016

CircleCI is broken because it uses latest LLVM. I was waiting for #1420 but now I see it is for merge-2.071

@JohanEngelen

This comment has been minimized.

Show comment
Hide comment
@JohanEngelen

JohanEngelen May 29, 2016

Member

Still fails on Windows. It needs LLVM >= r270596 (#1513).

Edit: and it needs #1520

Member

JohanEngelen commented May 29, 2016

Still fails on Windows. It needs LLVM >= r270596 (#1513).

Edit: and it needs #1520

@klickverbot

This comment has been minimized.

Show comment
Hide comment
@JohanEngelen

This comment has been minimized.

Show comment
Hide comment
@JohanEngelen

JohanEngelen Jun 5, 2016

Member

Yeah I had seen it. It is sad: perhaps front-end PGO in LDC is doomed now (so much work.... luckily some of the code is still relevant for coverage). I still think front-end PGO has use (e.g. the virtual call optimization is much harder in IR, but could perhaps be solved with some extra front-end IR annotations), but the case for it is a lot weaker when Clang front-end PGO is stopped. But some people argue that they want to use front-end pgo (because it also gives coverage at the same time). Not a good argument I think: coverage tests corner-cases, not the common behavior of your program that you want to PGO for.

Member

JohanEngelen commented Jun 5, 2016

Yeah I had seen it. It is sad: perhaps front-end PGO in LDC is doomed now (so much work.... luckily some of the code is still relevant for coverage). I still think front-end PGO has use (e.g. the virtual call optimization is much harder in IR, but could perhaps be solved with some extra front-end IR annotations), but the case for it is a lot weaker when Clang front-end PGO is stopped. But some people argue that they want to use front-end pgo (because it also gives coverage at the same time). Not a good argument I think: coverage tests corner-cases, not the common behavior of your program that you want to PGO for.

@MoritzMaxeiner MoritzMaxeiner referenced this pull request in gentoo/dlang Jun 18, 2016

Merged

Update for ldc 1.0.0 release #41

Johan Engelen and others added some commits Nov 16, 2015

[PGO] Add PGO to LDC. Supported for LLVM >= 3.7
Add the commandline options -fprofile-instr-generate[=filename] and -profile-instr-use=filename
-fprofile-instr-generate
-- Add instrumentation on branches, switches, and function entry; uses LLVM's InstrProf pass.
-- Link to profile runtime that writes instrumentation counters to a file.
-fprofile-instr-use
-- Read profile data from a file and apply branch weights to branches and switches, and annotate functions with entrycount in LLVM IR.
-- Functions with low or high entrycount are marked with 'cold' or 'inlinehint'.

The only statement type without PGO yet is "try-finally".

A new pragma, `pragma(LDC_profile_instr, [ true | false ])`, is added to selectively disable/enable instrumentation of functions (granularity = whole functions).

The runtime library ldc-profile-rt is a copy of LLVM compiler-rt lib/profile. It has to be exactly in-sync with the LLVM version, and thus we need a copy for each PGO-supported LLVM (>=3.7).
import ldc.profile for a D interface to ldc-profile-rt (for example to reset execution counts after a program startup phase).

The instrumentation data is mainly passed on to LLVM: function-entry counts and branch counts/probabilities. LDC marks functions as hot when "execution count is 30% of the maximum function execution count", and marks functions as cold if their count is 1% of maximum function execution count.

The source of LLVM's llvm-profdata tool is hereby included in LDCs repository (different source for each LLVM version), and the binary is included in the install bin folder.
The executable is named "ldc-profdata" to avoid clashing with llvm-profdata on the same machine. This is needed because profdata executable has to be in-sync with the LLVM version used to build LDC.

Maintenance burden: for trunk LLVM, we have to keep ldc-profile-rt and llvm-profdata in sync. There is no diff with upstream; but because of active development there are the occasional API changes.
[PGO] Separate the statements involving exceptions from the other sta…
…tements.

Because of the new MSVC EH, PGO is not yet implemented for try/catch/scope/finally on Windows.
[PGO] If a function is in a comdat group, set the linkage of __profn_…
… symbol to internal.

Attempt to fix PGO on Windows platform.
@JohanEngelen

This comment has been minimized.

Show comment
Hide comment
@JohanEngelen

JohanEngelen Jun 20, 2016

Member

GREEN, also on AppVeyor! :-)

Member

JohanEngelen commented Jun 20, 2016

GREEN, also on AppVeyor! :-)

@kinke

This comment has been minimized.

Show comment
Hide comment
@kinke

kinke Jun 20, 2016

Member

Congratz! 👍

Member

kinke commented Jun 20, 2016

Congratz! 👍

@klickverbot

This comment has been minimized.

Show comment
Hide comment
@klickverbot

klickverbot Jun 20, 2016

Member

Let's just merge this, then. What's the worst that could happen? ;P

(We can always back out the changes if needed, with only the negligible .git size overhead from all the compiler-rt files remaining.)

Member

klickverbot commented Jun 20, 2016

Let's just merge this, then. What's the worst that could happen? ;P

(We can always back out the changes if needed, with only the negligible .git size overhead from all the compiler-rt files remaining.)

@klickverbot klickverbot merged commit 94fe1a6 into ldc-developers:master Jun 20, 2016

1 of 3 checks passed

continuous-integration/travis-ci/pr The Travis CI build could not complete due to an error
Details
ci/circleci Your tests failed on CircleCI
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
@klickverbot

This comment has been minimized.

Show comment
Hide comment
@klickverbot

klickverbot Jun 20, 2016

Member

Would it be a good idea to put up a CircleCI job that builds everything with instrumentation enabled?

Member

klickverbot commented Jun 20, 2016

Would it be a good idea to put up a CircleCI job that builds everything with instrumentation enabled?

@JohanEngelen

This comment has been minimized.

Show comment
Hide comment
@JohanEngelen

JohanEngelen Jun 20, 2016

Member

Btw, for using IRPGO we also need compiler-rt. So if later on, IRPGO starts to rock more, we can pull out a bit of the PGO code and only keep it for coverage (LLVM coverage annotation is on my todo list, but it's too long already...).

Member

JohanEngelen commented Jun 20, 2016

Btw, for using IRPGO we also need compiler-rt. So if later on, IRPGO starts to rock more, we can pull out a bit of the PGO code and only keep it for coverage (LLVM coverage annotation is on my todo list, but it's too long already...).

@JohanEngelen

This comment has been minimized.

Show comment
Hide comment
@JohanEngelen

JohanEngelen Jun 20, 2016

Member

Would it be a good idea to put up a CircleCI job that builds everything with instrumentation enabled?

Maybe? Build LDC with instrumentation -> build phobos -> rebuild LDC using instrumentation?

Member

JohanEngelen commented Jun 20, 2016

Would it be a good idea to put up a CircleCI job that builds everything with instrumentation enabled?

Maybe? Build LDC with instrumentation -> build phobos -> rebuild LDC using instrumentation?

@JohanEngelen

This comment has been minimized.

Show comment
Hide comment
@JohanEngelen

JohanEngelen Jun 20, 2016

Member

Really happy about this :) Now I need to write that promised blog post.

Member

JohanEngelen commented Jun 20, 2016

Really happy about this :) Now I need to write that promised blog post.

@klickverbot

This comment has been minimized.

Show comment
Hide comment
@klickverbot

klickverbot Jun 20, 2016

Member

Maybe? Build LDC with instrumentation -> build phobos -> rebuild LDC using instrumentation?

This would be a next step, I guess. I was more worried about just getting decent test coverage for the PGO code besides sporadically trying to build wekapp with it (specifically the AST-facing side of things on more complex code). Or did I miss something along these lines?

Member

klickverbot commented Jun 20, 2016

Maybe? Build LDC with instrumentation -> build phobos -> rebuild LDC using instrumentation?

This would be a next step, I guess. I was more worried about just getting decent test coverage for the PGO code besides sporadically trying to build wekapp with it (specifically the AST-facing side of things on more complex code). Or did I miss something along these lines?

@JohanEngelen

This comment has been minimized.

Show comment
Hide comment
@JohanEngelen

JohanEngelen Jun 22, 2016

Member

It's a good point. I think not all bugs I found during development made it into the testcases.

What we could do is add a test job that builds Phobos's unittests with instrumentation enabled. That should give decent coverage of complex ASTs with -profile-instr-generate. Then for testing -profile-instr-use with complex ASTs we could add a profile file to the repo, that is then used while building something (perhaps again the phobos unittests). All this to say that: we don't have to actually generate the profile on the tester.

Member

JohanEngelen commented Jun 22, 2016

It's a good point. I think not all bugs I found during development made it into the testcases.

What we could do is add a test job that builds Phobos's unittests with instrumentation enabled. That should give decent coverage of complex ASTs with -profile-instr-generate. Then for testing -profile-instr-use with complex ASTs we could add a profile file to the repo, that is then used while building something (perhaps again the phobos unittests). All this to say that: we don't have to actually generate the profile on the tester.

@klickverbot

This comment has been minimized.

Show comment
Hide comment
@klickverbot

klickverbot Jun 22, 2016

Member

What we could do is add a test job that builds Phobos's unittests with instrumentation enabled. That should give decent coverage of complex ASTs with -profile-instr-generate. Then for testing -profile-instr-use with complex ASTs we could add a profile file to the repo, that is then used while building something (perhaps again the phobos unittests). All this to say that: we don't have to actually generate the profile on the tester.

Actually testing the profile-instr-use part is of course very welcome as well, but my suggestion for a first step was much more trivial – just add -prof-instr-generate to the global D flags and have everything build/run as normal.

Member

klickverbot commented Jun 22, 2016

What we could do is add a test job that builds Phobos's unittests with instrumentation enabled. That should give decent coverage of complex ASTs with -profile-instr-generate. Then for testing -profile-instr-use with complex ASTs we could add a profile file to the repo, that is then used while building something (perhaps again the phobos unittests). All this to say that: we don't have to actually generate the profile on the tester.

Actually testing the profile-instr-use part is of course very welcome as well, but my suggestion for a first step was much more trivial – just add -prof-instr-generate to the global D flags and have everything build/run as normal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment