Performance bottlenecks #740

ben-manes · 2021-07-08T00:43:39Z

I am observing high cpu usage due to extensive usage of JsonPath. Using Java Flight Recording to profile a production machine, these areas standout in JProfiler as cpu hotspots.

JsonPath.getPath()
This method recursively evaluates the PathToken to build the string representation. In one case we have this in a TreeSet to sort the order of evaluations, and the comparator is surprisingly expensive. Since the JsonPath is compiled and cached, it would make sense for the string representation to be lazily cached (ala String's hashCode). It may also be faster to replace recursive descent with a loop.
JsonContext.pathFromCache(path)
This method very inefficiently computes the cache key, resulting in excessive cpu and allocations.
- When there are no filters, then the path can be used directly as the key.
- Because the filters is a second arg, "[]", the concat slow path is used.
- There is no reason to wrap a list with a LinkedList, as they have the same toString implementation.
- Arrays.toString would be a more efficient conversion.
- The operation could use a computeIfAbsent to avoid redundant work on a cache miss due to races.
JsonPath.compile(path)
This user-facing method creates a new JsonPath instead of using the cache. Therefore if one tries to optimize to bypass the slow pathFromCache call, there is high overhead. The compilation should return the cached copy, allowing the application user to benefit from the configured cache.
JsonPath.read(Object, Configuration)
This method uses exceptions as ordinary control flow. That is a bad practice because it is slow when many exceptions are created, as filling the stack trace is a costly operation. The internals should return a result object which could be evaluated to the suppressed value.

The text was updated successfully, but these errors were encountered:

richardstartin · 2021-09-08T12:15:26Z

JsonContext.pathFromCache(path) was observed by an Apache Pinot user: see here apache/pinot#7403 - it's beneficial to use the CacheProvider SPI to replace the implementation. My PR here addresses some of the other issues (but I didn't bother to replace the LinkedList wrapping): #750

ben-manes · 2021-09-08T16:14:19Z

Good point @richardstartin, I did that from the get go using Caffeine. Unfortunately the inefficient api above the cache is problematic. Thanks for the PR, I hope we see some of these fixes merged in.

richardstartin · 2021-09-08T17:01:40Z

Something else I found in this vein is that json paths which don't match the document throw PathNotFoundException which really adds up. If PathNotFoundException overrode fillInStackTrace to return this the performance consequences could be remedied easily, but with a little more effort it would be better to return whether the path matched the document or not and return false when it doesn't.

ben-manes · 2021-09-08T17:51:36Z

Yes, I think you are finding the same problem as my 4th item. The exception for control flow is pretty bad, as you show.

richardstartin · 2022-03-02T21:42:10Z

I think these issues are mostly addressed in the latest release.

ben-manes · 2022-03-02T21:50:08Z

Thanks @richardstartin. I'll close this then. 🙂

Aitozi · 2022-08-03T15:26:35Z

hi @ben-manes , I see the code of master branch still use the LRUCache which is somehow inefficient, do we need to replace the default implementation for it by using guava / caffeine cache.

ben-manes · 2022-08-03T16:59:14Z

@Aitozi I replace it in my usage, so I would think others would want to as well. LRUCache locks on read so it is not highly concurrent. A simple wrapper like below is all that is needed.

Cache<String, JsonPath> cache = Caffeine...build();
CacheProvider.setCache(new com.jayway.jsonpath.spi.cache.Cache() {
  @Override public JsonPath get(String key) {
    return cache.getIfPresent(key);
  }
  @Override public void put(String key, JsonPath value) {
    cache.put(key, value);
  }
});

Aitozi · 2022-08-03T17:12:51Z

@ben-manes Thanks for your quick response. I try the same way as you mentioned, and it works. I also run a benchmark for it, I found it will have about 10 times performance delta.

Benchmark                 Mode  Cnt     Score     Error   Units
GuavaCacheBenchmark.get  thrpt   30  4480.563 ± 203.311  ops/ms
GuavaCacheBenchmark.put  thrpt   30  1774.769 ± 119.198  ops/ms

LRUCacheBenchmark.get  thrpt   30  441.239 ±  2.812  ops/ms
LRUCacheBenchmark.put  thrpt   30  350.549 ± 12.285  ops/ms

And I think as a common library it will be nice it can be improved internally (as this is a critical code path)

ben-manes · 2022-08-03T18:09:48Z

You'll of course find Caffeine is faster than Guava, too =)

If the project wants a concurrent LRU without dependencies, then there are two quick and good options.

Shade a copy of ConcurrentLinkedHashMap like Groovy, MS SQL JDBC, etc do. A trimmed down version is in Jackson. This library was popular in its day, Java 5-based, and the basis for Guava/Caffeine's approach for size eviction.
Write a simple Second Chance (Clock) lru cache. This is a FIFO so lock-free reads and locks only on write. A boolean flag is set on read and unset while cycling through the FIFO to find a victim, making the worst case O(n). For a small cache like this that is fine, and it mirrors LRU in hit rates. A pre-1.0 version of CLHM had this as a testbed, which was released since users were building then alpha version directly and running into concurrency bugs. That could be quickly trimmed down to replace LRUCache.

ben-manes mentioned this issue Sep 17, 2021

Fix JSONPath cache inefficient issue apache/pinot#7409

Merged

3 tasks

ben-manes closed this as completed Mar 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance bottlenecks #740

Performance bottlenecks #740

ben-manes commented Jul 8, 2021 •

edited

richardstartin commented Sep 8, 2021 •

edited

ben-manes commented Sep 8, 2021

richardstartin commented Sep 8, 2021

ben-manes commented Sep 8, 2021

richardstartin commented Mar 2, 2022

ben-manes commented Mar 2, 2022

Aitozi commented Aug 3, 2022

ben-manes commented Aug 3, 2022

Aitozi commented Aug 3, 2022

ben-manes commented Aug 3, 2022

Performance bottlenecks #740

Performance bottlenecks #740

Comments

ben-manes commented Jul 8, 2021 • edited

richardstartin commented Sep 8, 2021 • edited

ben-manes commented Sep 8, 2021

richardstartin commented Sep 8, 2021

ben-manes commented Sep 8, 2021

richardstartin commented Mar 2, 2022

ben-manes commented Mar 2, 2022

Aitozi commented Aug 3, 2022

ben-manes commented Aug 3, 2022

Aitozi commented Aug 3, 2022

ben-manes commented Aug 3, 2022

ben-manes commented Jul 8, 2021 •

edited

richardstartin commented Sep 8, 2021 •

edited