Optimise validation performance #119

lmontrieux · 2017-11-05T19:48:11Z

Context

This pull request improves validation performance by reducing the number of objects created during validation.

I have a particular use case, where I need to validate a lot of JSON objects against a relatively small number of schemas.
I noticed that this library creates a lot of objects during validation, which seems to trigger garbage collection very often, slowing down our application.

Contribution

After some digging with a profiler, I found two low-hanging fruits that could be addressed without too much effort:

reduce the number of lists created to contain validation errors. This is especially important for us, as most of our JSON objects are actually valid
replace the streams API with boring (but fast) for-loops and if-statements

Measurement

To measure the improvement, I wrote a simple test program that validates the same large JSON object against a complex schema 10,000 times in a row (after a warmup round of 1,000 validations). I timed the 10,000 validation. I ran it against version 1.6.0, then against this PR. I ran each measurement 3 times. Here are the results:

With version 1.6.0

run 1

Average time: 1221480 ns
Total time: 122227550895 ns

run 2

Average time: 1270597 ns
Total time: 127142316236 ns

run 3

Average time: 1300015 ns
Total time: 130086278556 ns

With this PR

run 1

Average time: 542332 ns
Total time: 54269496513 ns

run 2

Average time: 546312 ns
Total time: 54668285530 ns

run 3

Average time: 539079 ns
Total time: 53944259960 ns

This commit reduces memory usage by only creating a list of ValidationException objects when the first one in encountered. JSONs that are valid will gain the most in terms of memory usage, as no lists will be created. JSONs that are only partially valid will still see memory usage improvements.

Using the stream API results in the creation of quite a lot of objects. In scenarios where a lot of JSONs are validated, this causes frequent, and large, garbage collection. In this commit, maps are replaced with for loops, and filters with if statements.

Catch up with project

…ayLists # Conflicts: # core/src/test/java/org/everit/json/schema/NumberSchemaTest.java

coveralls · 2017-11-05T19:50:21Z

Coverage increased (+0.09%) to 91.842% when pulling fb48aa4 on lmontrieux:ARUHA-1271-less-ArrayLists into 724b94a on everit-org:master.

erosb · 2017-11-05T20:22:21Z

Hello @lmontrieux ,

first of all thank you for putting effort into this. The performance gain is quite considerable. Could you please share the test program you used so I can reproduce it?

Although in general I don't mind using validationExceptions as collecting parameters and forgetting immutability for a while, I'd much more prefer keeping the streams and optional mappings in their current state, unless they cause significant performance drawbacks (I doubt so). Could you please run your test suite again with reverting these control flow changes and keeping only the validationException parameters?
Also, please don't reorganize imports.

coveralls · 2017-11-06T16:12:07Z

Coverage increased (+0.08%) to 91.834% when pulling 15f0f5a on lmontrieux:ARUHA-1271-less-ArrayLists into 724b94a on everit-org:master.

lmontrieux · 2017-11-06T16:16:08Z

Hi @erosb ,

I have fixed the imports issue in 15f0f5a - sorry, my IDE did that on its own, I should have been more careful.

Regarding the test I used, you can find the program in this gist: https://gist.github.com/lmontrieux/5393763bea6693c7186c4f0c822a81f9

Unfortunately it is difficult for me to share the schema and object that I used, as they contain confidential data. They are quite big (>300 lines for the schema, >2700 lines for the object), but mostly feature objects, Strings, array, and a number of Dates.

Regarding the streams API, I'm afraid it is removing it that causes the bulk of the performance improvement. Details will follow in the next comment, in a few minutes.

lmontrieux · 2017-11-06T16:30:43Z

For each commit, I ran the same test program, with the same JSON document and schema. Starting with version 1.6.0, then a version where Lists of ValidationExceptions are only created on first exception (this one is not thread safe), then the replacement of streams with for-loops and if-statements, then the reintroduction of some Lists to make the library thread-safe again. As you can see, the performance gains come mostly from the streams replacement.

Version 1.6.0

run 1

Average time: 1221480 ns
Total time: 122227550895 ns

run 2

Average time: 1270597 ns
Total time: 127142316236 ns

run 3

Average time: 1300015 ns
Total time: 130086278556 ns

No Lists before first ValidationException

run 1

Average time: 1161968 ns
Total time: 116276581734 ns

run 2

Average time: 1086089 ns
Total time: 108684575539 ns

run 3

Average time: 1120900 ns
Total time: 112158748588 ns

No stream API

run 1

Average time: 569080 ns
Total time: 56953024097 ns

run 2

Average time: 552631 ns
Total time: 55301991987 ns

run 3

Average time: 526898 ns
Total time: 52727318890 ns

Thread safe again

run 1

Average time: 542332 ns
Total time: 54269496513 ns

run 2

Average time: 546312 ns
Total time: 54668285530 ns

run 3

Average time: 539079 ns
Total time: 53944259960 ns

lmontrieux · 2017-11-06T16:32:33Z

Also, here is a snapshot from running the same test program with VisualVM, which shows a lot of java.util.stream -related objects in memory (version 1.6.0)

:

lmontrieux · 2017-11-06T16:33:30Z

I hope this helps. If you need more details, please let me know, and I will try to provide them asap.

erosb · 2017-11-06T16:59:36Z

Wow. This performance increase is just shocking. I merge this PR, thanks a lot for your time and efforts.

lmontrieux · 2017-11-06T17:03:41Z

Thanks @erosb . Out of curiosity, do you have plans to release a new version anytime soon?

erosb · 2017-11-06T17:11:07Z

@lmontrieux this PR probably qualifies for a new release on its own today / tomorrow, but anyway, you can start using the HEAD rev by referring to the commit hash as the version number in your pom (jitpack will handle it):

        <dependency>
            <groupId>com.github.everit-org.json-schema</groupId>
            <artifactId>org.everit.json.schema</artifactId>
            <version>b01b8aa6d213e8e051e70c16b7826cdb17aec9f6</version>
        </dependency>

lmontrieux · 2017-11-06T17:20:08Z

@erosb thanks a lot!

Lionel Montrieux and others added 5 commits November 2, 2017 16:47

Merge pull request #1 from everit-org/master

5e5331b

Catch up with project

Merge remote-tracking branch 'origin/master' into ARUHA-1271-less-Arr…

bdc0add

…ayLists # Conflicts: # core/src/test/java/org/everit/json/schema/NumberSchemaTest.java

Keep schema validation classes thread safe

fb48aa4

Fix imports order and a couple of unnecessary statements

15f0f5a

erosb merged commit b01b8aa into everit-org:master Nov 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimise validation performance #119

Optimise validation performance #119

lmontrieux commented Nov 5, 2017

coveralls commented Nov 5, 2017

erosb commented Nov 5, 2017

coveralls commented Nov 6, 2017

lmontrieux commented Nov 6, 2017

lmontrieux commented Nov 6, 2017

lmontrieux commented Nov 6, 2017

lmontrieux commented Nov 6, 2017

erosb commented Nov 6, 2017

lmontrieux commented Nov 6, 2017

erosb commented Nov 6, 2017 •

edited

Loading

lmontrieux commented Nov 6, 2017

Optimise validation performance #119

Optimise validation performance #119

Conversation

lmontrieux commented Nov 5, 2017

Context

Contribution

Measurement

With version 1.6.0

run 1

run 2

run 3

With this PR

run 1

run 2

run 3

coveralls commented Nov 5, 2017

erosb commented Nov 5, 2017

coveralls commented Nov 6, 2017

lmontrieux commented Nov 6, 2017

lmontrieux commented Nov 6, 2017

Version 1.6.0

run 1

run 2

run 3

No Lists before first ValidationException

run 1

run 2

run 3

No stream API

run 1

run 2

run 3

Thread safe again

run 1

run 2

run 3

lmontrieux commented Nov 6, 2017

lmontrieux commented Nov 6, 2017

erosb commented Nov 6, 2017

lmontrieux commented Nov 6, 2017

erosb commented Nov 6, 2017 • edited Loading

lmontrieux commented Nov 6, 2017

erosb commented Nov 6, 2017 •

edited

Loading