-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise validation performance #119
Optimise validation performance #119
Conversation
This commit reduces memory usage by only creating a list of ValidationException objects when the first one in encountered. JSONs that are valid will gain the most in terms of memory usage, as no lists will be created. JSONs that are only partially valid will still see memory usage improvements.
Using the stream API results in the creation of quite a lot of objects. In scenarios where a lot of JSONs are validated, this causes frequent, and large, garbage collection. In this commit, maps are replaced with for loops, and filters with if statements.
Catch up with project
…ayLists # Conflicts: # core/src/test/java/org/everit/json/schema/NumberSchemaTest.java
Hello @lmontrieux , first of all thank you for putting effort into this. The performance gain is quite considerable. Could you please share the test program you used so I can reproduce it? Although in general I don't mind using |
Hi @erosb , I have fixed the imports issue in 15f0f5a - sorry, my IDE did that on its own, I should have been more careful. Regarding the test I used, you can find the program in this gist: https://gist.github.com/lmontrieux/5393763bea6693c7186c4f0c822a81f9 Unfortunately it is difficult for me to share the schema and object that I used, as they contain confidential data. They are quite big (>300 lines for the schema, >2700 lines for the object), but mostly feature objects, Strings, array, and a number of Dates. Regarding the streams API, I'm afraid it is removing it that causes the bulk of the performance improvement. Details will follow in the next comment, in a few minutes. |
For each commit, I ran the same test program, with the same JSON document and schema. Starting with version 1.6.0, then a version where Lists of ValidationExceptions are only created on first exception (this one is not thread safe), then the replacement of streams with for-loops and if-statements, then the reintroduction of some Lists to make the library thread-safe again. As you can see, the performance gains come mostly from the streams replacement. Version 1.6.0run 1Average time: 1221480 ns run 2Average time: 1270597 ns run 3Average time: 1300015 ns No Lists before first ValidationExceptionrun 1Average time: 1161968 ns run 2Average time: 1086089 ns run 3Average time: 1120900 ns No stream APIrun 1Average time: 569080 ns run 2Average time: 552631 ns run 3Average time: 526898 ns Thread safe againrun 1Average time: 542332 ns run 2Average time: 546312 ns run 3Average time: 539079 ns |
I hope this helps. If you need more details, please let me know, and I will try to provide them asap. |
Wow. This performance increase is just shocking. I merge this PR, thanks a lot for your time and efforts. |
Thanks @erosb . Out of curiosity, do you have plans to release a new version anytime soon? |
@lmontrieux this PR probably qualifies for a new release on its own today / tomorrow, but anyway, you can start using the HEAD rev by referring to the commit hash as the version number in your pom (jitpack will handle it):
|
@erosb thanks a lot! |
Context
This pull request improves validation performance by reducing the number of objects created during validation.
I have a particular use case, where I need to validate a lot of JSON objects against a relatively small number of schemas.
I noticed that this library creates a lot of objects during validation, which seems to trigger garbage collection very often, slowing down our application.
Contribution
After some digging with a profiler, I found two low-hanging fruits that could be addressed without too much effort:
Measurement
To measure the improvement, I wrote a simple test program that validates the same large JSON object against a complex schema 10,000 times in a row (after a warmup round of 1,000 validations). I timed the 10,000 validation. I ran it against version 1.6.0, then against this PR. I ran each measurement 3 times. Here are the results:
With version 1.6.0
run 1
Average time: 1221480 ns
Total time: 122227550895 ns
run 2
Average time: 1270597 ns
Total time: 127142316236 ns
run 3
Average time: 1300015 ns
Total time: 130086278556 ns
With this PR
run 1
Average time: 542332 ns
Total time: 54269496513 ns
run 2
Average time: 546312 ns
Total time: 54668285530 ns
run 3
Average time: 539079 ns
Total time: 53944259960 ns