Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise validation performance #119

Merged
merged 6 commits into from
Nov 6, 2017

Conversation

lmontrieux
Copy link
Contributor

Context

This pull request improves validation performance by reducing the number of objects created during validation.

I have a particular use case, where I need to validate a lot of JSON objects against a relatively small number of schemas.
I noticed that this library creates a lot of objects during validation, which seems to trigger garbage collection very often, slowing down our application.

Contribution

After some digging with a profiler, I found two low-hanging fruits that could be addressed without too much effort:

  • reduce the number of lists created to contain validation errors. This is especially important for us, as most of our JSON objects are actually valid
  • replace the streams API with boring (but fast) for-loops and if-statements

Measurement

To measure the improvement, I wrote a simple test program that validates the same large JSON object against a complex schema 10,000 times in a row (after a warmup round of 1,000 validations). I timed the 10,000 validation. I ran it against version 1.6.0, then against this PR. I ran each measurement 3 times. Here are the results:

With version 1.6.0

run 1

Average time: 1221480 ns
Total time: 122227550895 ns

run 2

Average time: 1270597 ns
Total time: 127142316236 ns

run 3

Average time: 1300015 ns
Total time: 130086278556 ns

With this PR

run 1

Average time: 542332 ns
Total time: 54269496513 ns

run 2

Average time: 546312 ns
Total time: 54668285530 ns

run 3

Average time: 539079 ns
Total time: 53944259960 ns

Lionel Montrieux and others added 5 commits November 2, 2017 16:47
This commit reduces memory usage by only creating a list of
ValidationException objects when the first one in encountered. JSONs
that are valid will gain the most in terms of memory usage, as no lists
will be created. JSONs that are only partially valid will still see
memory usage improvements.
Using the stream API results in the creation of quite a lot of objects.
In scenarios where a lot of JSONs are validated, this causes frequent,
and large, garbage collection. In this commit, maps are replaced with
for loops, and filters with if statements.
…ayLists

# Conflicts:
#	core/src/test/java/org/everit/json/schema/NumberSchemaTest.java
@coveralls
Copy link

Coverage Status

Coverage increased (+0.09%) to 91.842% when pulling fb48aa4 on lmontrieux:ARUHA-1271-less-ArrayLists into 724b94a on everit-org:master.

@erosb
Copy link
Contributor

erosb commented Nov 5, 2017

Hello @lmontrieux ,

first of all thank you for putting effort into this. The performance gain is quite considerable. Could you please share the test program you used so I can reproduce it?

Although in general I don't mind using validationExceptions as collecting parameters and forgetting immutability for a while, I'd much more prefer keeping the streams and optional mappings in their current state, unless they cause significant performance drawbacks (I doubt so). Could you please run your test suite again with reverting these control flow changes and keeping only the validationException parameters?
Also, please don't reorganize imports.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.08%) to 91.834% when pulling 15f0f5a on lmontrieux:ARUHA-1271-less-ArrayLists into 724b94a on everit-org:master.

@lmontrieux
Copy link
Contributor Author

Hi @erosb ,

I have fixed the imports issue in 15f0f5a - sorry, my IDE did that on its own, I should have been more careful.

Regarding the test I used, you can find the program in this gist: https://gist.github.com/lmontrieux/5393763bea6693c7186c4f0c822a81f9

Unfortunately it is difficult for me to share the schema and object that I used, as they contain confidential data. They are quite big (>300 lines for the schema, >2700 lines for the object), but mostly feature objects, Strings, array, and a number of Dates.

Regarding the streams API, I'm afraid it is removing it that causes the bulk of the performance improvement. Details will follow in the next comment, in a few minutes.

@lmontrieux
Copy link
Contributor Author

For each commit, I ran the same test program, with the same JSON document and schema. Starting with version 1.6.0, then a version where Lists of ValidationExceptions are only created on first exception (this one is not thread safe), then the replacement of streams with for-loops and if-statements, then the reintroduction of some Lists to make the library thread-safe again. As you can see, the performance gains come mostly from the streams replacement.

Version 1.6.0

run 1

Average time: 1221480 ns
Total time: 122227550895 ns

run 2

Average time: 1270597 ns
Total time: 127142316236 ns

run 3

Average time: 1300015 ns
Total time: 130086278556 ns

No Lists before first ValidationException

run 1

Average time: 1161968 ns
Total time: 116276581734 ns

run 2

Average time: 1086089 ns
Total time: 108684575539 ns

run 3

Average time: 1120900 ns
Total time: 112158748588 ns

No stream API

run 1

Average time: 569080 ns
Total time: 56953024097 ns

run 2

Average time: 552631 ns
Total time: 55301991987 ns

run 3

Average time: 526898 ns
Total time: 52727318890 ns

Thread safe again

run 1

Average time: 542332 ns
Total time: 54269496513 ns

run 2

Average time: 546312 ns
Total time: 54668285530 ns

run 3

Average time: 539079 ns
Total time: 53944259960 ns

@lmontrieux
Copy link
Contributor Author

Also, here is a snapshot from running the same test program with VisualVM, which shows a lot of java.util.stream -related objects in memory (version 1.6.0)
mem-snapshot
:

@lmontrieux
Copy link
Contributor Author

I hope this helps. If you need more details, please let me know, and I will try to provide them asap.

@erosb
Copy link
Contributor

erosb commented Nov 6, 2017

Wow. This performance increase is just shocking. I merge this PR, thanks a lot for your time and efforts.

@erosb erosb merged commit b01b8aa into everit-org:master Nov 6, 2017
@lmontrieux
Copy link
Contributor Author

Thanks @erosb . Out of curiosity, do you have plans to release a new version anytime soon?

@erosb
Copy link
Contributor

erosb commented Nov 6, 2017

@lmontrieux this PR probably qualifies for a new release on its own today / tomorrow, but anyway, you can start using the HEAD rev by referring to the commit hash as the version number in your pom (jitpack will handle it):

        <dependency>
            <groupId>com.github.everit-org.json-schema</groupId>
            <artifactId>org.everit.json.schema</artifactId>
            <version>b01b8aa6d213e8e051e70c16b7826cdb17aec9f6</version>
        </dependency>

@lmontrieux
Copy link
Contributor Author

@erosb thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants