New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster file stat #1183

Merged
merged 5 commits into from Jan 19, 2017

Conversation

Projects
None yet
4 participants
@melix
Member

melix commented Jan 16, 2017

While investigating performance regressions on master, I discovered that file stat (in particular because of annotation processor detection) was responsible for a large proportion of the performance regression. It is clearly visible in the following flame graph:

selection_001

This commit adds a new Jdk7FileMetadataAccessor which uses the JDK 7 NIO API to get file stats. We can therefore do the same in just 2 calls:

  • one to check exists
  • one to get all file stats (directory, length, last modified)

The difference is great:

selection_002

The regression goes from 10% slower to 4% slower (without the fixes I made in the other PR). We still need to make 2 calls, because Files.exists in JDK 7/8 is crazily expensive (because it internally throws an exception!). But this makes me wonder why native-platform would be slower than this, it makes no sense.

@melix

This comment has been minimized.

Member

melix commented Jan 17, 2017

The commit above adds a JMH benchmark of the various implementations we have so far.

There are 4 different implementations:

  • Fallback uses the good old File Java API, and consists of (subsequent) calls to exists(), isDirectory(), lastModified() and length()
  • NativePlatformBacked is the version which uses @adammurdoch 's native-platform library
  • Jdk7 is the version which is added by this pull request. It consists of a first call to File#exists(), then, if it exists, uses nio's DefaultFileAttributes to get the rest of metadata (2 stat calls)
  • for comparison, nio is the pure NIO version, which does everything on a single call

Here is the result on my laptop (Linux, ext4, SSD):

selection_003

What we can see is that the NIO version is significantly faster, as long as the file exists. Its performance is catastrophic if the file doesn't exist. Which is exactly why the jdk7 version makes a first call to File#exists to avoid the internal exception that NIO triggers when the file is missing.

For existing files, native platform and jdk7 have barely the same performance, while the fallback version is significantly slower.

In short, given those results, I think it makes sense to use jdk7 by default under Linux. It would be nice to run this benchmark under OS X and Windows to see the difference. Eventually, it would be nice to find out why the JDK can be significantly faster than native-platform, because if we can remove this difference, then using native-platform everywhere would be the best solution.

/cc @oehme for the record

@melix

This comment has been minimized.

Member

melix commented Jan 17, 2017

Note that we could consider using the pure NIO implementation if we believe there won't be many "missing" files. Which I guess is adventurous as soon as you consider the "clean" case.

@melix

This comment has been minimized.

Member

melix commented Jan 17, 2017

Ah, I almost forgot I had a MacBook Pro at hand :D I'm going to run the same benchmark on it. BTW:

./gradlew native:jmhReport
@melix

This comment has been minimized.

Member

melix commented Jan 17, 2017

So here's the result for Mac OS:

selection_004

First we can see that it's almost twice as slow as under Linux, but native-platform is a clear winner here. So it's really different. Using the jdk7 version would not be a bad choice compared to the fallback version, though, since it's always at least twice as fast, for the cases where the file actually exists.

@@ -281,6 +287,28 @@ public static boolean propertyExists(Object target, String propertyName) {
return true;
}
public static <T> T newInstanceOrFallback(String jdk7Type, ClassLoader loader, Class<? extends T> fallbackType) {
// Use java 7 APIs, if available

This comment has been minimized.

@bsideup

bsideup Jan 17, 2017

isn't Gradle's base Java version is 7?

This comment has been minimized.

@melix

melix Jan 17, 2017

Member

Some services still need to run on older JDKs (example, executes tests in a forked environment with JDK 5).

This comment has been minimized.

@bsideup
@melix

This comment has been minimized.

Member

melix commented Jan 17, 2017

Thanks for the links @shipilev :) It's good to know it's on the radar, yet we need solutions for everybody :)

@melix melix changed the base branch from master to release Jan 18, 2017

@melix

This comment has been minimized.

Member

melix commented Jan 18, 2017

So here are the latest results, after rebasing on release, and using the lotProjectDependencies test that was failing.

With jdk7 as the default accessor:

Speed Results for test project 'lotProjectDependencies' with tasks resolveDependencies: AWESOME! we're faster than 3.3 :D
Difference: 229.5 ms faster (229.5 ms), -6.53%, max regression: 123.938 ms
  Current Gradle median: 3.286 s min: 3.118 s, max: 3.591 s, se: 112.14 ms, sem: 15.859 ms
  > [3.306 s, 3.295 s, 3.285 s, 3.345 s, 3.431 s, 3.591 s, 3.23 s, 3.243 s, 3.315 s, 3.181 s, 3.22 s, 3.302 s, 3.204 s, 3.287 s, 3.194 s, 3.514 s, 3.413 s, 3.468 s, 3.424 s, 3.405 s, 3.238 s, 3.166 s, 3.238 s, 3.295 s, 3.137 s, 3.149 s, 3.126 s, 3.155 s, 3.118 s, 3.216 s, 3.132 s, 3.179 s, 3.143 s, 3.154 s, 3.177 s, 3.132 s, 3.236 s, 3.145 s, 3.195 s, 3.303 s, 3.357 s, 3.329 s, 3.339 s, 3.356 s, 3.386 s, 3.352 s, 3.335 s, 3.311 s, 3.411 s, 3.427 s]
  Gradle 3.3 median: 3.516 s min: 3.397 s, max: 3.99 s, se: 168.3 ms, sem: 23.801 ms
  > [3.751 s, 3.691 s, 3.6 s, 3.637 s, 3.605 s, 3.854 s, 3.566 s, 3.5 s, 3.593 s, 3.518 s, 3.564 s, 3.747 s, 3.591 s, 3.531 s, 3.511 s, 3.752 s, 3.925 s, 3.99 s, 3.781 s, 3.489 s, 3.479 s, 3.513 s, 3.47 s, 3.497 s, 3.416 s, 3.478 s, 3.487 s, 3.488 s, 3.441 s, 3.635 s, 3.884 s, 3.881 s, 3.936 s, 3.867 s, 3.49 s, 3.478 s, 3.474 s, 3.397 s, 3.411 s, 3.413 s, 3.463 s, 3.451 s, 3.412 s, 3.525 s, 3.513 s, 3.861 s, 3.825 s, 3.443 s, 3.397 s, 3.463 s]

Flame graph:

jdk7

Using native-platform as the default accessor:

Speed Results for test project 'lotProjectDependencies' with tasks resolveDependencies: AWESOME! we're faster than 3.3 :D
Difference: 177.5 ms faster (177.5 ms), -5.11%, max regression: 112.24 ms
  Current Gradle median: 3.294 s min: 3.192 s, max: 3.658 s, se: 112.26 ms, sem: 15.876 ms
  > [3.383 s, 3.424 s, 3.403 s, 3.488 s, 3.444 s, 3.433 s, 3.329 s, 3.318 s, 3.548 s, 3.536 s, 3.537 s, 3.658 s, 3.535 s, 3.294 s, 3.301 s, 3.422 s, 3.276 s, 3.378 s, 3.255 s, 3.264 s, 3.261 s, 3.251 s, 3.269 s, 3.318 s, 3.242 s, 3.258 s, 3.248 s, 3.209 s, 3.247 s, 3.237 s, 3.203 s, 3.342 s, 3.487 s, 3.49 s, 3.452 s, 3.436 s, 3.418 s, 3.22 s, 3.273 s, 3.284 s, 3.211 s, 3.31 s, 3.239 s, 3.29 s, 3.242 s, 3.199 s, 3.295 s, 3.192 s, 3.271 s, 3.27 s]
  Gradle 3.3 median: 3.472 s min: 3.372 s, max: 4.068 s, se: 141.71 ms, sem: 20.041 ms
  > [3.975 s, 4.068 s, 3.557 s, 3.638 s, 3.591 s, 3.645 s, 3.615 s, 3.472 s, 3.605 s, 3.443 s, 3.505 s, 3.699 s, 3.543 s, 3.826 s, 3.427 s, 3.509 s, 3.46 s, 3.531 s, 3.461 s, 3.431 s, 3.441 s, 3.473 s, 3.492 s, 3.508 s, 3.753 s, 3.432 s, 3.463 s, 3.462 s, 3.462 s, 3.479 s, 3.452 s, 3.693 s, 3.676 s, 3.453 s, 3.483 s, 3.399 s, 3.448 s, 3.433 s, 3.445 s, 3.472 s, 3.379 s, 3.461 s, 3.448 s, 3.423 s, 3.458 s, 3.655 s, 3.46 s, 3.375 s, 3.372 s, 3.501 s]

Flame graph:

native-platform

There's barely a difference, but it does seem to confirm that native-platform is slower for this scenario. Good news is that we're faster than 3.3 now :)

@melix

This comment has been minimized.

Member

melix commented Jan 18, 2017

Here are results of the benchmarks executed on the CI server build agents.

Linux:

ci-bench-linux

Windows:

ci-bench-windows

Apart from the fact that Windows is considerably slower (is that a surprise?), it seems to confirm that we should use:

  • native-platform for OS X and Windows
  • jdk7 for Linux

And it would be interesting to know why native-platform is slower than the JDK version for Linux.

melix added some commits Jan 16, 2017

Remove logging as it doesn't bring much value and log4j is not always…
… available

There was an error in `buildSrc` compilation for some performance tests due to missing log4j dependency.

Signed-off-by: Cedric Champeau <cedric@gradle.com>
@adammurdoch

Looks good.

@adammurdoch

This comment has been minimized.

Member

adammurdoch commented Jan 18, 2017

Maybe also add a story to remind us to revisit whether we need the initial File.exists() call when running on Java 9 or later.

Use `native-platform` for file stat under Windows
Based on the results of the benchmark.

@melix melix merged commit fc77b3f into release Jan 19, 2017

@melix melix deleted the cc-java7-filestat branch Jan 19, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment