Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use conversion to sets to check uniqueItems (performance improvement) #116

Merged

Conversation

germanblanco
Copy link
Contributor

@germanblanco germanblanco commented Jan 20, 2022

This pull request addresses #115

The current implementation of the check for uniqueItems condition in a JSON array seems to be inefficient, with an implementation that has approximately O(n^2) complexity on the number of elements in the array.

This proposal is to change the implementation by using a conversion to a set of the array under uniqueItems condition, and in this way eliminating duplicates efficiently. In order to do this, the format for JSON objects within the array is normalized to Erlang maps.

Implementation of normalization and is_equal is moved to jesse_lib.

@germanblanco germanblanco force-pushed the performance_improvement_uniqueitems branch 3 times, most recently from f0c6c74 to 67cfcaa Compare January 20, 2022 10:47
@germanblanco
Copy link
Contributor Author

There was an issue with the macro and OTP releases 20 and lower. I hope it is fixed now.

Copy link
Collaborator

@seriyps seriyps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some issues with CI. Seems latest version of proper requires at least OTP-20. I think the options are:

  • Use older version of proper
  • Create a separate rebar.config profile and make CI to only run proper on OTP >= 20
  • Drop support for OTP-19 (need confirmation from @andreineculau for that)

normalize_and_sort_object(Value) when is_map(Value)->
Value;
normalize_and_sort_object(Value) ->
maps:from_list(jesse_json_path:unwrap_value(Value)).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
maps:from_list(jesse_json_path:unwrap_value(Value)).
maps:from_list([{Key, normalize_and_sort(Value)} || {Key, Value} <- jesse_json_path:unwrap_value(Value)]).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you for spotting that. I have added a test to make sure that this error doesn't come up again. The only thing with your suggestion is that Values that are maps also need this "normalize_and_sort" in their values, so my proposal in the next commit will be to apply the normalization in the is_map function.

@@ -749,26 +749,42 @@ check_unique_items(_, false, State) ->
State;
check_unique_items([], true, State) ->
State;
check_unique_items([_], true, State) ->
State;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea! 👍

do_test("uniqueItemsPerformance", Config),
End = erlang:system_time(millisecond),
ElapsedMilliseconds = End - Start,
?assert(ElapsedMilliseconds < 100).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, this has a risk of being unstable if CI agent is not fast enough... Do you think there is enough margin for it to not be flaky?

Copy link
Collaborator

@seriyps seriyps Jan 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it failed in CI at least once because it took more than 100ms. I think we need to either raise the threshold or not run this test on CI. Also, would be nice to log the ElapsedMilliseconds to see how much it have been exceeded

Suggested change
?assert(ElapsedMilliseconds < 100).
?assert(ElapsedMilliseconds < 100, [{elapsed, ElapsedMilliseconds}]).

but I don't remember if ?assert/2 is supported on older OTP versions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately assert with comment seems to have been added in OTP20

erlang/otp@d2be06f#diff-9b2a91139c4c8fff63934580389d53a634eea53927abba13e0a49a59f239a376

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a bit ugly to have this performance tests within a CT suite.
I thought it would be safe to put it with 100 ms. since in my laptop it takes below 20 ms to run the tests with the improvement and around 800 ms without the improvement. But I see now that it doesn't pass for the CI workers.
If it is ok with you, I will remove it.

@germanblanco
Copy link
Contributor Author

Some issues with CI. Seems latest version of proper requires at least OTP-20. I think the options are:

  • Use older version of proper
  • Create a separate rebar.config profile and make CI to only run proper on OTP >= 20
  • Drop support for OTP-19 (need confirmation from @andreineculau for that)

I think the easiest thing is to use Proper version 1.2.0 for OTP >= 20 and 1.4.0 otherwise. I am going to assume that you mean to crete "rebar.OTPXX.config" files for each release as there are now rebar3.OTPXX files and use links for those releases that are the same, right? i am going to go ahead and try to change the Makefile in that direction.

@germanblanco germanblanco force-pushed the performance_improvement_uniqueitems branch 2 times, most recently from 339fdd4 to fde5e93 Compare January 20, 2022 12:58
Copy link
Collaborator

@seriyps seriyps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there!

@@ -1,5 +1,5 @@
%%-*- mode: erlang -*-
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please let's keep rebar.config file as well (maybe having it as a symlink to rebar.OTP20.config or other way around). I'm afraid it might cause issues when jesse is added as a dependency if there is no rebar.config

Makefile Outdated
@@ -69,38 +71,42 @@ test/JSON-Schema-Test-Suite/tests:
.PHONY: test
# Would be nice to include elvis to test, but it fails on OTP-18
# test: elvis
test: eunit ct xref dialyzer cover
test: eunit ct xref dialyzer cover proper
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's put proper before cover, so coverage generated by running proper is also included to cover report

@germanblanco germanblanco force-pushed the performance_improvement_uniqueitems branch from fde5e93 to 9996923 Compare January 20, 2022 15:44
Copy link
Collaborator

@seriyps seriyps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with it, thank you!
@andreineculau any opinion?
I'll merge, say, 25th of January if noone complains.

@seriyps seriyps merged commit 83281a5 into for-GET:master Jan 25, 2022
@seriyps
Copy link
Collaborator

seriyps commented Jan 25, 2022

Merged. Thank you for your contribution and patience =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants