Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regressions without LTO in LDC 1.3.0-beta[1|2] #2168

Open
jondegenhardt opened this issue Jun 16, 2017 · 113 comments
Open

Performance regressions without LTO in LDC 1.3.0-beta[1|2] #2168

jondegenhardt opened this issue Jun 16, 2017 · 113 comments

Comments

@jondegenhardt
Copy link
Contributor

I'm seeing performance regressions in several of my standard benchmarks (https://github.com/eBay/tsv-utils-dlang/blob/master/docs/Performance.md). Specifically the tsv-filter, tsv-summarize, and csv2tsv tests. Performance is off 20-30% from previous LDC releases.

So far I've only tested OSX (Xcode 8.3.3).

I won't have time to diagnose further for several days, but I'll see if I can narrow this further sometime during the week of June 19. I don't think it's likely to be due to changes in Phobos, but I can check that and if it is specific to OSX. If a fix is created for issue #2161 (performance degradation with boundscheck=off) I can check that as well. Testing with this option removed did show improvement on one benchmark, but overall benchmarks were worse.

@kinke
Copy link
Member

kinke commented Jun 16, 2017

According to your benchmark page, the numbers have been obtained using LDC 1.1, so have you compared 1.1 to the 1.3 betas, skipping 1.2? I'm asking since we switched to LLVM 4.0 (on non-Windows) starting with 1.2 and the regressions are most likely due to that, although we have only heard from general performance improvements (even > 10%) with LLVM 4.0 until now.

@jondegenhardt
Copy link
Contributor Author

jondegenhardt commented Jun 16, 2017

It's possible that I skipped 1.2, don't remember off-hand. I'll test with 1.2 and update this report as soon as I can. It may be a couple days, I have some schedule conflicts at the moment.

I normally test every LDC release, though without updating the benchmarks page. And I've been using 1.2 since it came out, so it's likely I tested it also. However, I don't have any record of it.

@jondegenhardt
Copy link
Contributor Author

jondegenhardt commented Jun 18, 2017

I've rerun the six benchmarks I use, there is definitely a performance regression in three tests between LDC 1.2 and LDC 1.3-beta2 on OSX (xcode 8.3.3). The main deltas:

  • tsv-filter (numeric): 24% slower
  • tsv-summarize: 38% slower
  • csv2tsv: 27% slower

The other three tests are not materially changed. The regex version of the tsv-filter test has had a small performance decrease over several releases, this is likely due to changes in std.regex during these releases.

While running the tests, I discovered there is a delta when using the -flto=full switch. It generates speed gains on several tests.

I'll see what I can do to narrow this further. The tsv-filter (numeric) and tsv-summarize tests have numeric processing in common, may be a clue there. The csv2tsv test does not do numeric processing. I'll still have schedule conflicts for several days, so may be a bit before I have more.

Below is a table of performance numbers over several LDC releases. Each metric was run at least 5 times to ensure consistency. The "flto" versions are built with the -flto=full option. All source files are on included in one command line. Options used:

  • Without flto: -release -O3 -boundscheck=off
  • With flto: -release -O3 -boundscheck=off -singleobj -flto=full
Compilation tsv-filter (numeric) tsv-filter (regex) tsv-select tsv-summarize tsv-join csv2tsv
LDC 1.0 4.50 7.48 7.80 15.76 23.64 26.49
LDC 1.1 4.55 7.32 4.24 15.93 20.73 29.05
LDC 1.1 flto 4.53 7.16 4.16 16.25 20.84 22.50
LDC 1.2 4.47 7.95 4.26 15.37 20.75 29.70
LDC 1.2 flto 4.42 7.61 4.19 15.80 20.76 23.91
LDC 1.3 5.50 8.59 4.25 21.43 20.84 38.26
LDC 1.3 flto 5.47 7.57 4.17 21.78 20.77 30.37
DMD 2.074 5.71 11.34 9.71 18.58 40.59 57.38

LLVM and Phobos versions for each LDC release:

LDC version Phobos version LLVM version
LDC 1.0 2.070.2 3.8.0
LDC 1.1 2.071.2 3.9.1
LDC 1.2 2.072.2 4.0.0
LDC 1.3 beta2 2.073.2 4.0.0

@JohanEngelen
Copy link
Member

@jondegenhardt just a quick note: those numbers are kindof cool to report in a blog post somewhere :) Did you also try with -flto=thin ?

@JohanEngelen
Copy link
Member

Can you add the LLVM version to that table? Thanks.

@jondegenhardt
Copy link
Contributor Author

@JohanEngelen Thanks! Thought the numbers might be of general interest. I did not try -flto=thin. I'll add the LLVM version to the table later today, can't right now.

@kinke
Copy link
Member

kinke commented Jun 19, 2017

I would have liked this to correlate with #2161 of course, but sadly it doesn't. It may be worth trying to recompile with LDC 1.3 and 2.072.2 druntime/Phobos (hopefully compatible) and see whether we have to blame the libs, the compiler or both. ;)

@kinke
Copy link
Member

kinke commented Jun 19, 2017

[Btw, I don't know how fair that'd be for your benchmark comparison, but compiling with -mcpu=native may give an additional boost.]

@jondegenhardt
Copy link
Contributor Author

jondegenhardt commented Jun 19, 2017

One specific possibility is that converting from char[] to double via std.conv.to could be slower. The "tsv-filter (numeric)" and "tsv-summarize" tests do this quite a bit. I ran some ad-hoc tests with same tools, but running operations that do not need to convert to double, and did not see a performance hit. I'll generate some more definitive tests when I get a chance.

The "csv2tsv" benchmark does not do any converting, or any numeric operations for that matter. It would be something different.

@jondegenhardt
Copy link
Contributor Author

I tried the easy thing - Run DMD executables for 2.072.2 and 2.073.2 and see if there are similar regressions. There weren't. The "tsv-filter (numeric)" and "tsv-summarize" tests are actually faster. Still doesn't rule out Phobos changes as the cause.

Compilation tsv-filter (numeric) tsv-summarize csv2tsv
DMD 2.072 6.15 25.32 55.38
DMD 2.073 5.54 18.31 55.14

@p0nce
Copy link
Contributor

p0nce commented Jun 20, 2017

Tested it on some audio processing (30 measures, arch x86_64)

Results 1.3:
 * minimum time: 617 ms => 34.2472 x real-time
 * median  time: 620 ms => 34.0815 x real-time
 * average time: 625.767 ms => 33.7674 x real-time

Results 1.2:
 * minimum time: 613 ms => 34.4707 x real-time
 * median  time: 619.5 ms => 34.109 x real-time
 * average time: 624.667 ms => 33.8269 x real-time

Results 1.0:
 * minimum time: 633 ms => 33.3816 x real-time
 * median  time: 637.5 ms => 33.146 x real-time
 * average time: 646.567 ms => 32.6812 x real-time

Here it's a less than 1% regression so hard to tell.

@dnadlinger
Copy link
Member

A 30% regression should be somewhat easy to localize by just comparing time profiles for the two programs.

@jondegenhardt
Copy link
Contributor Author

@klickverbot Do you have suggestions for a profiling tool to use for this purpose? I tried looking at function call counts via the LDC -fprofile-instr-generate. Saw things I didn't expect, but no major differences between versions stood out. However, I'm not sure what tools are available to give insight into time allocation in an OSX LDC build.

@p0nce
Copy link
Contributor

p0nce commented Jun 21, 2017

CodeXL is pretty great.

@dnadlinger
Copy link
Member

On OS X, you can also simply use Xcode's Instruments for a rudimentary sampling profiler.

@jondegenhardt
Copy link
Contributor Author

Update: I've taken a few more looks at this, but so far have only eliminated things, haven't pinpointed specific causes. Main bottleneck is that I've got other engagements taking precedent and very limited time to investigate. That won't change for a few days. I did try XCode's profiler, but there wasn't enough detail to identify specifics. I also tried simpler versions of some of the functionality that seems troublesome, but these attempts were not revealing. I'll continue investigating as I have time.

@JohanEngelen
Copy link
Member

@jondegenhardt It's also interesting to know what you have eliminated! :)

@jondegenhardt
Copy link
Contributor Author

jondegenhardt commented Jun 24, 2017

@JohanEngelen Main thing eliminated was char[] to double conversion via std.conv.to. Or least eliminated for simple uses.

Background - For the pair of tests "tsv-filter (numeric)" and tsv-summarize, the obvious thing is they are slower on operations involving numbers (doubles). Beyond the published benchmarks, every operation they support that requires converting the text in the input files to a number is slower. However, for operations on strings, like string equality, there is no performance degradation.

There's not a lot going on in numeric operations that does not occur in string operations, except for converting to numbers and using numeric operations like less-than or plus. So I figured that constructing simpler code involving large numbers of char[] to double conversions and simple numeric operations should see similar degradations, but I wasn't able to show this.

These operations could be still be the cause, but perhaps with a more complex trigger. I've been imagining that perhaps in-lining that took place in LDC 1.2 versions is no longer taking place, so perhaps that's still the problem.

Using a better profiling tool might show this, but the OS X profiler doesn't know seem to know function names produced by LDC code. Looking at the assembly might do it, I haven't gotten to this step yet.

The csv2tsv does not convert strings to numbers or do comparisons. However, it does many things the other tools do not. It could easily be that it is slower due to something different than what is affecting tsv-filter and tsv-summarize.

@JohanEngelen
Copy link
Member

@jondegenhardt Can you try again with #2180 ? Thanks!

@dnadlinger
Copy link
Member

@jondegenhardt: The commit in question is in master now.

As for mangled names in profiles, what I meant is to just compare the profiles structurally (as in, just look which bits of the call stack take more time than before). If your program is sufficiently long-running so that the statistical noise is low, this should give you a pretty good starting point, pointing us to the area where things got slower (or to the fact that everything got slower uniformly).

@jondegenhardt
Copy link
Contributor Author

Building with latest (6c97a02, which should have #2180 ), tests are mostly unchanged. The csv2tsv test without the flto-full option improved, nothing else did. Do I need to pass additional options to the compiler?

I ran the three tests in question, with no changes to compiler command line:

  • tsv-filter numeric is unchanged (same as 1.3-beta2), without and without the 'flto=full' option
  • tsv-summarize is unchanged, without and without the 'flto=full' option
  • csv2tsv is materially faster 1.3-beta2, nearly back to 1.2 speed. This is without the flto=full option. However, with the flto=full option on, the gains from flto=full disappear. The new 1.3 is about the same with and without flto=full, and almost as fast as 1.2 without flto=full. 1.2 with flto=full is materially faster.

I'll run the full test suite and report any other findings. It will take a couple hours.

@JohanEngelen
Copy link
Member

@jondegenhardt Can you try with LLVM 3.9 + LDC master? (I can't tell from the thread here, but it looks you've been using LLVM4.0 so far with 1.3, right?)

@jondegenhardt
Copy link
Contributor Author

Yes, I used LLVM 4.0. I'll also try with LLVM 3.9.

@dnadlinger
Copy link
Member

(Thank you very much for all the work on tracking this down and your persistence, Jon!)

@jondegenhardt
Copy link
Contributor Author

jondegenhardt commented Jun 25, 2017

You're welcome. I very much appreciate that these finding are being taken seriously in release evaluation. The tsv toolkit is by itself a pretty minor piece of software, I do hope that these findings apply more widely that just the toolkit.

@jondegenhardt
Copy link
Contributor Author

jondegenhardt commented Jun 25, 2017

Here are the results for current master with LLVM 4.0. I'll try LLVM 3.9 next. Each was metric run 3 times to ensure consistency.

Update: Now with the LLVM 3.9 times.

Compilation tsv-filter (numeric) tsv-filter (regex) tsv-select tsv-summarize tsv-join csv2tsv
LDC 1.2 4.40 7.98 4.16 15.46 21.09 30.03
LDC 1.2 flto 4.36 7.61 4.18 15.83 21.13 24.05
LDC 1.3 beta-2 (LLVM 4.0) 5.49 8.58 4.21 21.53 21.18 38.50
LDC 1.3 beta-2 (LLVM 4.0) flto 5.40 7.56 4.18 21.75 21.23 30.59
LDC 1.3 (6c97a02, LLVM 4.0) 5.54 7.97 4.15 21.41 21.05 32.18
LDC 1.3 (6c97a02, LLVM 4.0) flto 5.54 7.53 4.15 21.83 21.05 28.79
LDC 1.3 (6c97a02, LLVM 3.9) 5.48 7.64 4.16 23.31 20.91 30.02
LDC 1.3 (6c97a02, LLVM 3.9) flto 5.46 7.54 4.13 22.52 20.94 28.42

@jondegenhardt
Copy link
Contributor Author

I've updated the table above to include the LLVM 3.9 times. LLVM 3.9 shows only minor differences from LLVM 4.0. Most tests are unchanged, tsv-summarize is a little faster with LLVM 4.0 and csv2tsv is a little faster with LLVM 3.9.

@kinke
Copy link
Member

kinke commented Jun 26, 2017

Thanks for the interesting numbers. I'd be way more interested in a comparison with 2.072 Phobos (or LDC 1.2 with 2.073 Phobos). If you managed to build LDC yourself, swapping out the druntime/Phobos sources should be simple.

Of course the other option is to provide us with an easy way of reproducing the issue. I already built tsv-filter, but didn't want to download that huge dataset, so I took a smaller one (Alaska only), but with that the filter you use didn't yield any records at all...

@kinke
Copy link
Member

kinke commented Jul 6, 2017

That hack shouldn't be necessary anymore, I'd test it enabled (vanilla).

@kinke kinke removed the A-blocker label Jul 6, 2017
@jondegenhardt
Copy link
Contributor Author

Well, it didn't work first try. Suggestions?

[ 45%] Building ASM object runtime/CMakeFiles/druntime-ldc-debug-shared.dir/druntime/src/ldc/eh_asm.S.o
[ 45%] Linking C shared library ../lib/libdruntime-ldc-debug-shared.dylib
ld: Invalid record (Producer: 'LLVM4.0.0' Reader: 'LLVM APPLE_1_802.0.42_0') for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [lib/libdruntime-ldc-debug-shared.2.0.73.dylib] Error 1
make[1]: *** [runtime/CMakeFiles/druntime-ldc-debug-shared.dir/all] Error 2
make: *** [all] Error 2

@kinke
Copy link
Member

kinke commented Jul 6, 2017

This is related to #2077. I think the clang version needs to match up with the LLVM libs used to build LDC.

@JohanEngelen
Copy link
Member

JohanEngelen commented Jul 6, 2017

@jondegenhardt You need to use the newer LLVM version of clang or LDC for linking. In your case, I guess we use clang for linking, but LDC is built with a newer LLVM version. You can specify which LTO plugin to use for linking (such that it uses the newer LTO plugin version): try adding -lto_library path/to/ldc/lib/libLTO-ldc.dylib or something like that to the linker flags. You have to override what clang is adding to the linker line. You can also copy-rename libLTO-ldc.dylib on top of clang's libLTO.dylib ;-)

@jondegenhardt
Copy link
Contributor Author

jondegenhardt commented Jul 6, 2017

Should the linker flag have been set automatically? I'm specifying the LLVM installation on the cmake line (LLVM_ROOT_DIR), it appears to trigger this block: https://github.com/ldc-developers/ldc/blob/master/CMakeLists.txt#L714-L736, and indeed there is a cmake message:

-- Also installing LTO binary: /opt/local/libexec/llvm-4.0/lib/libLTO.dylib

@kinke
Copy link
Member

kinke commented Jul 6, 2017

Should the linker flag have been set automatically?

Nope, it was removed here. When also building the C parts with clang and LTO support, the used clang (CC env variable) most likely needs to match the LLVM version used for LDC anyway. Here, when only compiling the D modules with LTO support via D_FLAGS, you should be able to get by by simply replacing your clang's libLTO.dylib as Johan suggested.

[The other alternative would be building LLVM incl. clang source and using that clang binary + LTO plugin.]

@jondegenhardt
Copy link
Contributor Author

jondegenhardt commented Jul 6, 2017

Here, when only compiling the D modules with LTO support via D_FLAGS, you should be able to get by by simply replacing your clang's libLTO.dylib as Johan suggested.

Yeah, um, I'm not going to modify my xcode installation, even temporarily. I rely on it too much for other work. If there's a cmake variable I can set I'll do that, or if there's a mod I can make to one of the CMakeLists.txt files, I'll do that. The CMakeLists.txt files are just involved enough It's not immediately clear what change to make.

@kinke
Copy link
Member

kinke commented Jul 6, 2017

Alright. ;) Simply restoring the previous block cmake/Modules/HandleLTOPGOBuildOptions.cmake:36-45 should do the trick.
Edit: Nope, simply add list(APPEND LLVM_LDFLAGS "-Wl,-lto_library,/opt/local/libexec/llvm-4.0/lib/libLTO.dylib") after new line 34.
Edit2: Sorry, still wrong, those flags are just for LDC, not for the libs. Inserting append("-Wl,-lto_library,/opt/local/libexec/llvm-4.0/lib/libLTO.dylib" LD_FLAGS) here hopefully works.

@jondegenhardt
Copy link
Contributor Author

jondegenhardt commented Jul 7, 2017

Wow! The results are remarkable. My system isn't quiet enough right now to do a reliable benchmark, but I ran the tests by hand to get a preview. Improvements in most programs, in some cases quite dramatic. I'll generate a full benchmark later, but here's the quick preview:

  • csv2tsv: 20.20 sec (down from 24.05)
  • tsv-filter (numeric): 3.70 sec (down from 4.36).
  • tsv-filter (regex) 7.10 sec (down from 7.53)
  • tsv-summarize: 9.70 sec (down from 15.46)
  • tsv-select: 4.10 sec (no change, may become a delta on a quiet system)
  • tsv-join: 20.90 sec (no change, may become a delta on a quiet system)

Executable sizes are smaller, by 2-4x.

My immediate reaction is to say that if there's a reliable way to bundle this it'd be certainly worth while.

@JohanEngelen
Copy link
Member

@jondegenhardt The build time probably went up quite a lot too? ;) Can you try with -flto=thin aswell? That should have build time comparable to normal non-LTO builds.

@kinke
Copy link
Member

kinke commented Jul 7, 2017

Thx Jon, excellent results! :)

@jondegenhardt
Copy link
Contributor Author

Results of the full suite below (best time of seven runs). Last row is with Phobos compiled -flto=full, and the app code as well. It's a clear win. Best times for all benchmarks, significant in several cases, with no signs of degradation. I ran a number of variants so you can see how they compare.

Sorry @JohanEngelen, I didn't think to do an -flto=thin variant. For these programs compiles are pretty quick, using -flto=full didn't change it enough that I noticed. Still, for larger programs -flto=thin might really benefit, so it'd be useful to compare those as well. I'll try to run this some time in the next few days.

Overall very impressive results!

Compilation csv2tsv tsv-summarize tsv-filter (numeric) tsv-filter (regex) tsv-select tsv-join
LDC 1.2 29.85 15.30 4.32 7.85 4.10 20.83
LDC 1.2 -flto=full 23.88 15.61 4.27 7.54 4.06 20.73
LDC 1.3 beta-2 38.31 21.26 5.39 8.55 4.13 20.81
LDC 1.3 beta-2; flto=full 30.37 21.61 5.32 7.49 4.10 20.82
LDC 1.3 (6c97a02) 31.97 21.22 5.47 7.88 4.11 20.88
LDC 1.3 (6c97a02); flto=full 28.55 21.56 5.46 7.45 4.04 20.69
LDC 1.3 (6c97a02; tnext disabled) 24.98 22.48 5.36 7.12 4.10 21.02
LDC 1.3 (6c97a02; tnext disabled) flto=full 22.95 22.00 5.22 7.04 4.06 20.75
LDC 1.3 (6c97a02); Phobos & App: flto=full 19.58 9.34 3.55 6.91 3.95 20.43

@JohanEngelen
Copy link
Member

Great stuff. Forum post? ;-)

@jondegenhardt
Copy link
Contributor Author

Great stuff. Forum post? ;-)

I agree, but perhaps you or the LDC team should write it :) . Myself, I find these results eye-opening and I wonder if others in the community would as well.

Of course, the tsv utilities may be more amenable to this type of optimization than other apps (they typically repeat the same operations large numbers of times in tight loops). But still, this is really promising.

If it's not easy to bundle Phobos LTO nicely as released feature, then perhaps a useful step would be to make it easy for others in the D community to try it and benchmark their own apps. It'd be worth getting some additional data points.

@jondegenhardt
Copy link
Contributor Author

jondegenhardt commented Jul 8, 2017

Follow-up to the previous benchmark results - Executable sizes of the tsv utility tools for a couple different builds. Last row in each table is with Phobos built with -flto=full. Size reduction with Phobos built with LTO is dramatic, typically 3-4x. It'd be interesting to know how larger apps behave. (Sizes in bytes, builds are on OS X.)

Compilation csv2tsv keep-header number-lines tsv-append tsv-filter
LDC 1.2 3841420 3872228 3926696 3944512 6217924
LDC 1.2 flto=full 3764380 3749728 3816512 3820664 4463724
LDC 1.2 flto=thin 3764120 3748300 3815084 3819236 4461224
LDC 1.3 (6c97a02) 3787544 3822940 3873668 3891644 6174932
LDC 1.3 (6c97a02) flto=full 3714620 3692856 3764040 3764144 4414060
LDC 1.3 (6c97a02) flto=thin 3712256 3637100 3708112 3708152 4363064
LDC 1.3 (6c97a02), Phobos & App: flto=full 913712 856736 914372 927520 1630172
Compilation tsv-join tsv-sample tsv-select tsv-summarize tsv-uniq
LDC 1.2 4118700 4726936 4066664 5144720 4067560
LDC 1.2 flto=full 3849832 3978708 3842376 4152988 3841472
LDC 1.2 flto=thin 3849564 3977936 3841712 4151600 3840060
LDC 1.3 (6c97a02) 4068776 4673884 4016316 5125876 4017164
LDC 1.3 (6c97a02) flto=full 3802352 3916624 3790200 4073192 3789248
LDC 1.3 (6c97a02) flto=thin 3746640 3861432 3738328 4017416 3737136
LDC 1.3 (6c97a02), Phobos & App: flto=full 972280 995448 959700 1212152 970620

Update: Added several -flto=thin builds.

@kinke
Copy link
Member

kinke commented Jul 8, 2017

The large size for non-LTO builds (and when linking in the static runtime libs) is most likely also caused by the template culling mechanism. (One of?) the Phobos object(s) containing the template instantiation definition will be dragged in, and with it all its dependencies, so that you end up dragging in large parts of Phobos.

I'm currently leaning towards focusing on LTO for optimal release builds instead of trying to tweak the template culling.

@jondegenhardt
Copy link
Contributor Author

I'm currently leaning towards focusing on LTO for optimal release builds instead of trying to tweak the template culling.

Makes sense. Performance gains via template culling sounds challenging to do well, while the LTO path looks like it holds quite a bit more potential. There might be short term performance opportunities missed, but should be worth it.

@jondegenhardt
Copy link
Contributor Author

Regarding flto=thin - This causes linker failures when used with the LDC build with Phobos LTO support. Example:

ld: internal error: atom not found in symbolIndex(__D11TypeInfo_xa6__initZ) for architecture x86_64

Compiler line used: -release -O3 -boundscheck=off -singleobj -flto=thin. Using -flto=full and excluding the flto option work fine.

@JohanEngelen
Copy link
Member

@jondegenhardt ThinLTO is pretty new in LLVM, so there may be some bugs lurking. We could test with LLVM trunk and see if there is the same linker failure. If so, dustmite, etc.. work :(

@jondegenhardt
Copy link
Contributor Author

jondegenhardt commented Jul 9, 2017

Here is a benchmark run comparing -flto=thin to -flto=full for LDC 1.2 and LDC 1.3. Couldn't test a version of -flto=thin for the LDC build with Phobos compiled with LTO due to the linker failure. Best time of five runs for each metric.

For LDC 1.2, 'thin' and 'full' had nearly identical results. For LDC 1.3, 'thin' was better than 'full' in several benchmarks. It appears to have done a better job avoiding the performance regressions that surfaced with LDC 1.3, though not all of them.

Executable sizes were not materially different (updated in the table a few messages back).

Compiler csv2tsv tsv-summarize tsv-filter (numeric) tsv-filter (regex) tsv-select tsv-join
LDC 1.2 29.93 15.38 4.39 7.89 4.18 21.90
LDC 1.2 flto=full 24.04 15.71 4.34 7.58 4.16 21.61
LDC 1.2 flto=thin 24.12 15.78 4.34 7.57 4.14 21.29
LDC 1.3 (6c97a02) 32.15 21.35 5.55 7.93 4.14 21.25
LDC 1.3 (6c97a02) flto=full 28.76 21.67 5.56 7.48 4.13 21.76
LDC 1.3 (6c97a02) flto=thin 24.06 21.62 5.26 6.98 4.13 21.61
LDC 1.3 (6c97a02), Phobos & App: flto=full 19.71 9.42 3.60 6.94 4.01 20.93

Note: These runs were a touch slower than the previous benchmark, likely due to a somewhat higher level of other activities on the machine. This is why I always run all metrics needed to do a relative comparison. Run-to-run times were quite consistent so relative comparisons should be valid.

@jondegenhardt
Copy link
Contributor Author

Compile times for the different LTO build options. These are for the full tsv utilities library (8 executable builds). Perhaps a surprise, but the -flto=thin option was more expensive than -flto=full. Perhaps this is due to the small size of the apps. As expected, the compile times increased when Phobos was built with LTO support. Times in seconds.

no flto -flto=full -flto=thin
ldc 1.2 30.44 32.97 40.17
ldc 1.3 (6c97a02) 36.36 39.62 43.40
ldc 1.3 (6c97a02) w/ Phobos LTO 73.55

@JohanEngelen
Copy link
Member

Thanks for more data :-)
As you suspect, I also think that ThinLTO would make more difference with large LTO code sizes, like compiling with Phobos LTO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants