Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance bottlenecks #740

Closed
ben-manes opened this issue Jul 8, 2021 · 10 comments
Closed

Performance bottlenecks #740

ben-manes opened this issue Jul 8, 2021 · 10 comments

Comments

@ben-manes
Copy link

ben-manes commented Jul 8, 2021

I am observing high cpu usage due to extensive usage of JsonPath. Using Java Flight Recording to profile a production machine, these areas standout in JProfiler as cpu hotspots.

  1. JsonPath.getPath()
    This method recursively evaluates the PathToken to build the string representation. In one case we have this in a TreeSet to sort the order of evaluations, and the comparator is surprisingly expensive. Since the JsonPath is compiled and cached, it would make sense for the string representation to be lazily cached (ala String's hashCode). It may also be faster to replace recursive descent with a loop.

  2. JsonContext.pathFromCache(path)
    This method very inefficiently computes the cache key, resulting in excessive cpu and allocations.

    • When there are no filters, then the path can be used directly as the key.
    • Because the filters is a second arg, "[]", the concat slow path is used.
    • There is no reason to wrap a list with a LinkedList, as they have the same toString implementation.
    • Arrays.toString would be a more efficient conversion.
    • The operation could use a computeIfAbsent to avoid redundant work on a cache miss due to races.
  3. JsonPath.compile(path)
    This user-facing method creates a new JsonPath instead of using the cache. Therefore if one tries to optimize to bypass the slow pathFromCache call, there is high overhead. The compilation should return the cached copy, allowing the application user to benefit from the configured cache.

  4. JsonPath.read(Object, Configuration)
    This method uses exceptions as ordinary control flow. That is a bad practice because it is slow when many exceptions are created, as filling the stack trace is a costly operation. The internals should return a result object which could be evaluated to the suppressed value.

@richardstartin
Copy link
Contributor

richardstartin commented Sep 8, 2021

JsonContext.pathFromCache(path) was observed by an Apache Pinot user: see here apache/pinot#7403 - it's beneficial to use the CacheProvider SPI to replace the implementation. My PR here addresses some of the other issues (but I didn't bother to replace the LinkedList wrapping): #750

@ben-manes
Copy link
Author

Good point @richardstartin, I did that from the get go using Caffeine. Unfortunately the inefficient api above the cache is problematic. Thanks for the PR, I hope we see some of these fixes merged in.

@richardstartin
Copy link
Contributor

Something else I found in this vein is that json paths which don't match the document throw PathNotFoundException which really adds up. If PathNotFoundException overrode fillInStackTrace to return this the performance consequences could be remedied easily, but with a little more effort it would be better to return whether the path matched the document or not and return false when it doesn't.

Screenshot 2021-09-08 at 17 58 30

@ben-manes
Copy link
Author

Yes, I think you are finding the same problem as my 4th item. The exception for control flow is pretty bad, as you show.

@richardstartin
Copy link
Contributor

I think these issues are mostly addressed in the latest release.

@ben-manes
Copy link
Author

Thanks @richardstartin. I'll close this then. 🙂

@Aitozi
Copy link

Aitozi commented Aug 3, 2022

hi @ben-manes , I see the code of master branch still use the LRUCache which is somehow inefficient, do we need to replace the default implementation for it by using guava / caffeine cache.

@ben-manes
Copy link
Author

@Aitozi I replace it in my usage, so I would think others would want to as well. LRUCache locks on read so it is not highly concurrent. A simple wrapper like below is all that is needed.

Cache<String, JsonPath> cache = Caffeine...build();
CacheProvider.setCache(new com.jayway.jsonpath.spi.cache.Cache() {
  @Override public JsonPath get(String key) {
    return cache.getIfPresent(key);
  }
  @Override public void put(String key, JsonPath value) {
    cache.put(key, value);
  }
});

@Aitozi
Copy link

Aitozi commented Aug 3, 2022

@ben-manes Thanks for your quick response. I try the same way as you mentioned, and it works. I also run a benchmark for it, I found it will have about 10 times performance delta.

Benchmark                 Mode  Cnt     Score     Error   Units
GuavaCacheBenchmark.get  thrpt   30  4480.563 ± 203.311  ops/ms
GuavaCacheBenchmark.put  thrpt   30  1774.769 ± 119.198  ops/ms

LRUCacheBenchmark.get  thrpt   30  441.239 ±  2.812  ops/ms
LRUCacheBenchmark.put  thrpt   30  350.549 ± 12.285  ops/ms

And I think as a common library it will be nice it can be improved internally (as this is a critical code path)

@ben-manes
Copy link
Author

You'll of course find Caffeine is faster than Guava, too =)

If the project wants a concurrent LRU without dependencies, then there are two quick and good options.

  • Shade a copy of ConcurrentLinkedHashMap like Groovy, MS SQL JDBC, etc do. A trimmed down version is in Jackson. This library was popular in its day, Java 5-based, and the basis for Guava/Caffeine's approach for size eviction.
  • Write a simple Second Chance (Clock) lru cache. This is a FIFO so lock-free reads and locks only on write. A boolean flag is set on read and unset while cycling through the FIFO to find a victim, making the worst case O(n). For a small cache like this that is fine, and it mirrors LRU in hit rates. A pre-1.0 version of CLHM had this as a testbed, which was released since users were building then alpha version directly and running into concurrency bugs. That could be quickly trimmed down to replace LRUCache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants