Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8283323: libharfbuzz optimization level results in extreme build times #7919

Closed
wants to merge 3 commits into from

Conversation

magicus
Copy link
Member

@magicus magicus commented Mar 23, 2022

JDK-8247872 (upgrade HarfBuzz to 2.7.2) caused build time to go up with 24 seconds on my reference linux machine. This was one of the four culprits that caused a 25-30% build time regression over the last two years.

The problem here was that the new HarfBuzz code caught really bad behaviour from gcc when compiling with optimizations. The official HarfBuzz build does not use any -O flags at all for gcc, so presumably the HarfBuzz team is:

a) not thinking compiler optimization is important for the performance of this library, and
b) unaware that their code causes such a headache for gcc.

(Other compilers fare much better: visual studio makes no difference at all, and for clang just a small regression was observed.)

The current optimization level was introduced by JDK-8255790, which were really about moving libharfbuzz compilation back into libfontmanager. I could find no comments/discussion relating to the change of optimization level, so I assume it was incidental, and just seemed good at the time.

This patch changes the optimization level to SIZE (which is the closest thing we have to no optimization level) on gcc.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8283323: libharfbuzz optimization level results in extreme build times

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/7919/head:pull/7919
$ git checkout pull/7919

Update a local copy of the PR:
$ git checkout pull/7919
$ git pull https://git.openjdk.java.net/jdk pull/7919/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 7919

View PR using the GUI difftool:
$ git pr show -t 7919

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/7919.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 23, 2022

👋 Welcome back ihse! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 23, 2022
@openjdk
Copy link

openjdk bot commented Mar 23, 2022

@magicus The following labels will be automatically applied to this pull request:

  • build
  • client

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added build build-dev@openjdk.org client client-libs-dev@openjdk.org labels Mar 23, 2022
@mlbridge
Copy link

mlbridge bot commented Mar 23, 2022

Webrevs

@openjdk
Copy link

openjdk bot commented Mar 23, 2022

@magicus This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8283323: libharfbuzz optimization level results in extreme build times

Reviewed-by: erikj, prr

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 10 new commits pushed to the master branch:

  • 0ee65e1: 8283465: Character.UnicodeBlock.NUM_ENTITIES is out of date
  • f9137cb: 8280896: java/nio/file/Files/probeContentType/Basic.java fails on Windows 11
  • 0b11b57: 8283222: improve diagnosability of runtime/8176717/TestInheritFD.java timeouts
  • 138460c: 8163327: Remove 3DES from the default enabled cipher suites list
  • f017739: 8282241: Invalid generic signature for redefined classes
  • 3e73a0b: 8283237: CallSite should be a sealed class
  • a771600: 8283279: [Testbug] Improve TestGetSwapSpaceSize
  • 2b291d8: 8282536: java.net.InetAddress should be a sealed class
  • dc45b0a: 8283513: Parallel: Skip the card marking in PSRootsClosure
  • 78ef2fd: 8283562: JDK-8282306 breaks gtests on zero

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 23, 2022
@shipilev
Copy link
Member

I have a question, though, because I suppose font rendering engine is performance sensitive. AFAICS, the LIBFONTMANAGER_OPTIMIZATION was changed from HIGH to HIGHEST here: 05fe06a6#diff-c75e2d581b3730cf900075cedb66ac72eaba40fbbd92eaac5710161830b572b4L534-R491

Does it mean we should instead downgrade to HIGH, not SIZE? Would that recover the build times?

@magicus
Copy link
Member Author

magicus commented Mar 23, 2022

No, unfortunately it won't. :( The problem started with the update to HarfBuzz 2.7.2, and there the opt level was HIGH. I also tested manually compiling with -O1 (which we don't really have a level for in the build system) and even that was too problematic for gcc.

But another approach is possible. I talked off-list with Erik, and he had noticed (as had I) that a single file in harfbuzz was responsible for the lion's share of the build time regression. What he did, though, that I didn't come to think of, was to try to change optimization level for just that file. This fixes most of the build time regression, while being a much less intrusive fix. It also makes sense to do this fix for clang, since the 8 second regression seen there is basically also completely due to this file.

So I'll upload an update too this fix that is more conservative, and only changes opt level for a single file.

@erikj79
Copy link
Member

erikj79 commented Mar 23, 2022

I have a question, though, because I suppose font rendering engine is performance sensitive. AFAICS, the LIBFONTMANAGER_OPTIMIZATION was changed from HIGH to HIGHEST here: 05fe06a6#diff-c75e2d581b3730cf900075cedb66ac72eaba40fbbd92eaac5710161830b572b4L534-R491

Does it mean we should instead downgrade to HIGH, not SIZE? Would that recover the build times?

For GCC on Linux, that doesn't make a difference as both HIGH and HIGHEST translate to -O3.

I've been looking a bit at this too, and there are a handful of files that stick out, hb-subset.cc is the worst one. Here are the top three on my Linux host (using 31 threads), first with -O3 and then with Os:

[TIME:0:18.36] hb-subset-plan.cc
[TIME:0:21.53] hb-ot-layout.cc
[TIME:0:35.63] hb-subset.cc

[TIME:0:12.51] hb-subset-plan.cc
[TIME:0:14.81] hb-ot-layout.cc
[TIME:0:21.40] hb-subset.cc

On my mac (6 threads, but single core perf is higher), the corresponding numbers are:

[TIME:0:14.46] hb-subset-plan.cc
[TIME:0:19.64] hb-ot-layout.cc
[TIME:0:27.45] hb-subset.cc

[TIME:0:11.69] hb-subset-plan.cc
[TIME:0:16.00] hb-ot-layout.cc
[TIME:0:20.68] hb-subset.cc

This is certainly not quite as bad, but still a clear difference.

The reason this has such a high impact on total compilation time is due to how java.desktop ends up being one of the last top level targets on the critical path of possible concurrency. So a single compilation unit here will stall the whole process on a machine with many threads (32 in my case). On a smaller machine, this impact will be much less pronounced. Here are the total compilation times for "make exploded-image" with a few different settings:

all libfontmanager HIGHEST
linux: 4m14.174s
macosx: 7m8.736s

all libfontmanager SIZE
linux: 3m56.134s
macosx: 7m5.966s

just the above 3 files SIZE
linux: 3m57.009s
macosx: (skipped, no point)

The difference on the smaller macosx machine is negligible. Most of the compilation time gain on the big machine is gained by just dropping the optimization level on a few files.

One could certainly argue that dropping the optimization level for libfontmanager is only helping a rather niche usecase. There could certainly be other compilation units that would have a similar effect on total build time due to how they end up on the critical path, I would be surprised if there aren't. The question is which ones we can safely drop the optimization level for without adversely affect performance of the finished product.

For this particular change it's probably better to at least just drop it for one or a few files.

@prrace
Copy link
Contributor

prrace commented Mar 23, 2022

We definitely do not want to downgrade the optimisation level of the entire library.
Performance of this library actually matters quite a lot.
When we added some additional checking to the previous library we used (ICU) we had complaints of as much as 20% loss
in PDF conversion (a backend process) and we've had escalations on user responsiveness in editors due to the computational complexity of layout vs just counting glyph advances.
Getting that performance back was a lot of work .. except increasing compiler optimisation was the easy one that made a big difference.
To see the performance degradation may require a specific combination of font (Linux fonts vary widely) and the text being laid out.
The example we used in the previous case required a windows font, so I don't have a handy test case I know will show this .. and it was years ago ..

Looking at the list of files from Erik, I think we don't use the subset functionality. Certainly not directly and likely not at all.
So downgrading that probably won't have any impact at all at runtime.

But hb-ot-layout.cc is one of the most important files in the library, maybe the most important.

I strongly suggest that one not be downgraded just for 7 seconds of build time.

@magicus
Copy link
Member Author

magicus commented Mar 23, 2022

@prrace If we're not using the subset functionality, can we remove the files entirely, or exclude them from compilation?

(In the meantime, I'm preparing a patch which only limits optimization to the two subset files Erik listed)

@prrace
Copy link
Contributor

prrace commented Mar 23, 2022

harfbuzz does not make it easy to subset the build. So no.

@magicus
Copy link
Member Author

magicus commented Mar 23, 2022

Changing just these two files removes most of the build time regression. I'm okay with that.

Going forward I'd like to understand why this is on the critical path, and if there's anything we can do about it. But that's much more complex than just addressing this regression in build times. I'm also pondering if we could help make somehow by hinting that files like subset and layout take a long time to compile, and schedule them early, so we don't suddenly find us inn the position of waiting for just them to be able to link the library. I'm suspecting that if the file was named hb-aarrdwark.cc it would not have impacted the total build time as much as now...

Edit: I realized this was actually a testable hypothesis. :) I renamed the two files, but it did not help. The time of compilation of hb-subset.cc is in the magnitude of the time of compiling all the other files, so it's still on the long pole (without SIZE). Anyway. Worth a try. :)

@mrserb
Copy link
Member

mrserb commented Mar 23, 2022

Do we really want to change the product performance characteristics based on the compilation time? Can we try parallelize/cache or something first?

@magicus
Copy link
Member Author

magicus commented Mar 24, 2022

@mrserb Phil said that we don't even use the subset part of harfbuzz.

And unless you're backing up your claim that this patch is changing runtime characteristic with data, that's just a guess, just like the initial optimization level of this library (and as most of the native libraries in the JDK... :-() was just a guess. Otoh, that we get a build time regression for these two files is a proven fact.

What is it you want to cache?

@prrace
Copy link
Contributor

prrace commented Mar 24, 2022

@mrserb Phil said that we don't even use the subset part of harfbuzz.

And unless you're backing up your claim that this patch is changing runtime characteristic with data, that's just a guess, just like the initial optimization level of this library (and as most of the native libraries in the JDK... :-() was just a guess. Otoh, that we get a build time regression for these two files is a proven fact.

What is it you want to cache?

It is a reasonable question. Build times might seem important to a small number of developers but runtime performance is King. That's why I asked to reduce it to just the files we don't care about.
Would we give up even 0.1% of hotspot GC performance on Linux to avoid 30 seconds of compile time ?
I doubt it.
And haven't we improved compile time a lot with the parallelism ? So are we now over-optimising in the wrong place ?
I mean good to keep an eye on it but I mean if a previous build arch took 15 mins and now with parallelism we take 3 mins and 45 secs .. is it so bad to be back up to 4 mins for more runtime benefit ? (Numbers entirely made up)
There's always going to be a long pole (something that is last) and the questions are whether its understood and for a good reason and so forth. I mean we can doubtless try to improve but at some point it is diminishing returns.
Perhaps someone can ask gcc devs why they are so slow on this code ? Maybe there's a good reason that helps runtime or maybe they just have design issues. They may want to know.

@mrserb
Copy link
Member

mrserb commented Mar 24, 2022

And unless you're backing up your claim that this patch is changing runtime characteristic with data, that's just a guess, just like the initial optimization level of this library (and as most of the native libraries in the JDK... :-() was just a guess. Otoh, that we get a build time regression for these two files is a proven fact.

All java2d libraries are compiled using HIGHEST, that option was bumped to this level over time because it was proved that they affect the performance, an example the last change: https://bugs.openjdk.java.net/browse/JDK-8264846
So I think we should check first that it does not affect performance, even if it is not used now it can be used in future version of harfbuzz.

@magicus
Copy link
Member Author

magicus commented Mar 24, 2022

@mrserb As Phil has stated twice, this code is not used by the JDK, so will not affect performance.

It is impossible to predict what future changes in harfbuzz will do to our performance. That will of course needed to be tested when we update harfbuzz. If it turns out that setting this optimization level is detrimental to our performance at that point of time, we'll of course do what fixes needs to be done, at that point.

@prrace The sad truth is that java.desktop-libs is at the end of the long pole, due to how different modules are interrelated. :-( That means that the compilation time for these native libraries tend to dominate the entire build time, regardless of how parallelized the rest of the build can be made. (cf. Amdahl's law)

I'll look into once more if we can do some shenanigans to try to get around some of these dependencies, but I'm not sure it's possible.

I encourage you to report this to both upstream harfbuzz and gcc.

@magicus
Copy link
Member Author

magicus commented Mar 24, 2022

/integrate

@openjdk
Copy link

openjdk bot commented Mar 24, 2022

Going to push as commit 2c43ecb.
Since your change was applied there have been 21 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Mar 24, 2022
@openjdk openjdk bot closed this Mar 24, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Mar 24, 2022
@openjdk
Copy link

openjdk bot commented Mar 24, 2022

@magicus Pushed as commit 2c43ecb.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@magicus magicus deleted the fix-harfbuzz-build-regression branch March 31, 2022 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build build-dev@openjdk.org client client-libs-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

5 participants