Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster JSON encoder/decoder #459

Closed
wants to merge 5 commits into from
Closed

Faster JSON encoder/decoder #459

wants to merge 5 commits into from

Conversation

tiran
Copy link
Member

@tiran tiran commented Feb 13, 2017

Improve performance of FreeIPA's JSON serializer and deserializer.

  • Don't indent and sort keys. Both options trigger a slow path in
    Python's json package. Without indention and sorting, encoding
    mostly happens in optimized C code.
  • Replace O(n) type checks with O(1) type lookup and eliminate
    the use of isinstance().
  • Check each client capability only once for every conversion.
  • Use decoder's obj_hook feature to traverse the object tree once and
    to eliminate calls to isinstance().

Closes: https://fedorahosted.org/freeipa/ticket/6655
Signed-off-by: Christian Heimes cheimes@redhat.com

@tiran tiran changed the title Faster JSON encoder/decoder [WIP] Faster JSON encoder/decoder Feb 13, 2017
@pvoborni
Copy link
Member

Is there a way(I did not read changes thoroughly) to enable sorting and indentation, e.g. for testing purposes?

@tiran
Copy link
Member Author

tiran commented Feb 13, 2017

Why would you want to sort or indent the raw output? The extra verbose output of ipa just loads and dumps the output a second time. It's less efficient but who cares about minor efficiency issues of a debug feature? For browser testing, any web developer tool will give you nicely formatted JSON, too.

@abbra
Copy link
Contributor

abbra commented Feb 13, 2017

Right, as long as ipa CLI is capable to print formatted debug output, that's enough.

@pvoborni
Copy link
Member

It's usually quicker to read raw response in browser than the folded "preview" because everything is visible and no clicking is required. Same for curl testing. But for curl I can imagine piping it to some tool.

@tiran
Copy link
Member Author

tiran commented Feb 13, 2017

curl url | python -m json.tool

@pvoborni
Copy link
Member

As mention on meeting, if rpcserver prettyprints into output in debug mode then it is fine.

@tiran
Copy link
Member Author

tiran commented Feb 14, 2017

@pvoborni I have modified the PR and added a pretty_print option. JSON is now pretty printed for verbose level 2 and higher.

The old implementation converted all list to tuples. With obj_hook, only lists in a JSON objects are converted at the moment. Nested lists are not fully converted, which causes a test failure. I wonder why we decided to convert lists to tuples in the first place? Can we drop the conversion and just use lists here?

"""
__slots__ = ('version', '_cap_datetime', '_cap_dnsname')

_identity = object()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a comment explaining that _identity = object() with if func is _identity then arg else func(arg) is faster than _identity = lambda x: x with func(arg) so that no one comes in the future and replaces the former with the latter because it's duplicate code?

:see: _ipa_obj_hook, _JSONPrimer
"""
if isinstance(val, bytes):
val = val.decode('utf-8')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO decoding the value should be the caller's responsibility.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be handled by the JSON library if possible. The approach allows us to replace the JSON library with a more efficient one without breaking API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

@MartinBasti
Copy link
Contributor

LGTM

:note: pretty printing triggers a slow path in Python's JSON module. Only
use pretty_print in debug mode.
"""
result = _JSONPrimer(version).convert(val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be a little bit faster if we kept the type - encoder mapping in a global object initialized once rather than reconstructing it in every json_encode_binary() call?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it might be a tiny bit more efficient, but also more complicated. I would have to find another way to handle client versions difference efficiently. It's a constant startup cost any way. __hash__ and __eq__ are efficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

@MartinBasti MartinBasti changed the title [WIP] Faster JSON encoder/decoder Faster JSON encoder/decoder Feb 15, 2017
Improve performance of FreeIPA's JSON serializer and deserializer.

* Don't indent and sort keys. Both options trigger a slow path in
  Python's json package. Without indention and sorting, encoding
  mostly happens in optimized C code.
* Replace O(n) type checks with O(1) type lookup and eliminate
  the use of isinstance().
* Check each client capability only once for every conversion.
* Use decoder's obj_hook feature to traverse the object tree once and
  to eliminate calls to isinstance().

Closes: https://fedorahosted.org/freeipa/ticket/6655
Signed-off-by: Christian Heimes <cheimes@redhat.com>
Some tests assume that JSON deserializier returns tuples instead of
lists. I don't think it is necessary but let's pass the tests for now.

Signed-off-by: Christian Heimes <cheimes@redhat.com>
Signed-off-by: Christian Heimes <cheimes@redhat.com>
Signed-off-by: Christian Heimes <cheimes@redhat.com>
@MartinBasti MartinBasti added the ack Pull Request approved, can be merged label Feb 15, 2017
@MartinBasti MartinBasti added the pushed Pull Request has already been pushed label Feb 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ack Pull Request approved, can be merged pushed Pull Request has already been pushed
Projects
None yet
5 participants