Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pretty printing, key sorting, and better performance to to_json in Jinja #91253

Merged
merged 11 commits into from Apr 12, 2023

Conversation

depoll
Copy link
Contributor

@depoll depoll commented Apr 11, 2023

Breaking change

The ensure_ascii argument for to_json in Jinja templates now defaults to False, allowing us to use a faster JSON encoder by default. For most people, this should not be an issue as JSON parsers broadly accept unicode input. If you still need to encode unicode characters inside of JSON strings, set ensure_ascii to True explicitly to restore the old behavior.

Proposed change

Adds pretty_print and sort_keysto to_json in Jinja templates. to_json now uses the faster orjson serializer by default, which requires us to default ensure_ascii to false, though this shouldn't impact many people as parsers following the JSON standard should support unicode in strings anyway. ensure_ascii continues to exist for people to set explicitly if they still need this compatibility option.

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Deprecation (breaking change to happen in the future)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • I have followed the perfect PR recommendations
  • The code has been formatted using Black (black --fast homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.
  • Untested files have been added to .coveragerc.

To help with the load of incoming pull requests:

@depoll depoll requested a review from a team as a code owner April 11, 2023 23:15
@home-assistant home-assistant bot added cla-signed core new-feature small-pr PRs with less than 30 lines. labels Apr 11, 2023
Comment on lines 2032 to 2036
def to_json(value, ensure_ascii=True, indent=None, sort_keys=False):
"""Convert an object to a JSON string."""
return json.dumps(value, ensure_ascii=ensure_ascii)
return json.dumps(
value, ensure_ascii=ensure_ascii, indent=indent, sort_keys=sort_keys
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are generally moving away from json because of performance reasons.

We should use homeassistant.helpers.json here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can take a look at the json helpers api, but I'm using this here because it's actually how the existing to_json Jinja filter is implemented, and would be worried about introducing quirks by using a different API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at it a bit, it looks like the helper version is made for serializing home assistant objects and handles things like custom types, date encodings, etc. I'm not sure that's the right fit for Jinja, but open to input.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even using orjson directly would be fine. The performance issues are with the built in json lib

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Will explore making the change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way out of this would be to deprecate the existing to_json (leaving it in place but deprecated/not recommended) and replace it with as_json using the new library. The existing conversion APIs like as_datetime, as_local use the as naming scheme instead of the to naming scheme, so it would be more consistent anyway. Deserializing may be possible compatibly already.

Copy link
Member

@bdraco bdraco Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My biggest question would be whether the quirks of orjson are the ones we want to live with in the API -- Hyrum's law and all 😉.

90-95% of the common paths in the code base has already moved to it so I think thats not a concern we need to retread.

It seems at a glance like a fairly inflexible API (sacrificing some usability and flexibility for performance, which is often the right trade off as internal implementation detail but maybe less nice for developer experience) and would take things like uuids and datetimes without a straightforward way to turn those features off -- quirks we'd need to support in perpetuity.

I don't think templates are going to have dataclass or UUID objects, but we already serialize datetime to isoformat almost everywhere else so it lines up nicely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way out of this would be to deprecate the existing to_json (leaving it in place but deprecated/not recommended) and replace it with as_json using the new library. The existing conversion APIs like as_datetime, as_local use the as naming scheme instead of the to naming scheme, so it would be more consistent anyway. Deserializing may be possible compatibly already.

That seems like a reasonable path forward 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually -- from_json is already using orjson. So it's just to_json. I think I can set a default handler and turn on passthroughs to disable all of the nonstandard types (at least at the outset so we wouldn't be stuck with them). I'm going to give that a shot. Will update this PR with the proposal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK -- gave it a shot. Please take a look. I'm not sure if I did all the things right for deprecation here (and certainly we'd need to document the change).

@depoll depoll changed the title Add options to to_json Migrate to orjson for Jinja JSON filtering Apr 12, 2023
@bdraco bdraco added breaking-change deprecation Indicates a breaking change to happen in the future labels Apr 12, 2023
Co-authored-by: J. Nick Koston <nick@koston.org>
@depoll
Copy link
Contributor Author

depoll commented Apr 12, 2023

Ok -- added a documentation PR and hopefully addressed outstanding comments. Let me know if there's more to do!

@depoll depoll requested a review from bdraco April 12, 2023 03:48
@bdraco bdraco removed docs-missing small-pr PRs with less than 30 lines. labels Apr 12, 2023
@bdraco
Copy link
Member

bdraco commented Apr 12, 2023

LGTM. I need to do manual testing but may run out of time tonight

@depoll
Copy link
Contributor Author

depoll commented Apr 12, 2023

Awesome. Thanks for the review and all the back and forth! Appreciate you working through it with me.

Copy link
Contributor

@epenet epenet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment here: I suggest adding type hints to the new functions.

homeassistant/helpers/template.py Outdated Show resolved Hide resolved
homeassistant/helpers/template.py Outdated Show resolved Hide resolved
Co-authored-by: epenet <6771947+epenet@users.noreply.github.com>
Copy link
Member

@frenck frenck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this is a breaking change and why we need to make this breaking?

This will make every single template out there (in forums, google, youtube vids, tutorials whatever) that uses it, broken.

If we want a faster JSON serializer, I suggest applying it to to_json instead.

@depoll depoll marked this pull request as ready for review April 12, 2023 15:11
@depoll
Copy link
Contributor Author

depoll commented Apr 12, 2023

One more option just came to mind:

  1. Keep to_json for everything, but have ensure_ascii use the old library (and warn of deprecation of the option + not be compatible with new options, which nobody can be using anyhow), and use the new library for now only if ensure_ascii is false.

I may sketch that out briefly -- see if it makes it more palatable, but @frenck definitely interested in feedback on that approach.

@frenck
Copy link
Member

frenck commented Apr 12, 2023

@depoll IMHO, removing/deprecating/replacing is not an option or consideration.

Templates are used a lot, including blueprints for which end-users aren't provided updates.

@bdraco
Copy link
Member

bdraco commented Apr 12, 2023

One more option just came to mind:

  1. Keep to_json for everything, but have ensure_ascii use the old library (and warn of deprecation of the option + not be compatible with new options, which nobody can be using anyhow), and use the new library for now only if ensure_ascii is false.

5 seems seem like reasonable approach if we don't want to accept a breaking change.

Personally I am not a fan of backwards compatibility forever here considering I had trouble finding an example of someone using this on the forums (not that it means it's not being used) but it isn't especially difficult to maintain it here if that's the goal.

@depoll
Copy link
Contributor Author

depoll commented Apr 12, 2023

Ok, so I gave option 5 a shot. It's still technically breaking because ensure_ascii is deprecated and the default behavior of to_json will no longer encode unicode strings, however the likelihood that folks are really depending on \u encoding instead of generating utf-8 json seems small -- the vast, vast, vast majority are probably only getting that encoding because of the default value of ensure_ascii. If people are explicitly specifying ensure_ascii, we warn that it's going away. For the normal use case it just uses orjson and moves on with it.

If that's still too breaky, there are two paths forward:

  1. orjson is only used for the new use cases and we just stick with the old json library by default
  2. just use the old json library and not try to migrate to orjson, which would definitely not break anything but would still allow pretty printing and key sorting to be added

@bdraco
Copy link
Member

bdraco commented Apr 12, 2023

Ok, so I gave option 5 a shot. It's still technically breaking because ensure_ascii is deprecated and the default behavior of to_json will no longer encode unicode strings, however the likelihood that folks are really depending on \u encoding instead of generating utf-8 json seems small -- the vast, vast, vast majority are probably only getting that encoding because of the default value of ensure_ascii. If people are explicitly specifying ensure_ascii, we warn that it's going away. For the normal use case it just uses orjson and moves on with it.

If that's still too breaky, there are two paths forward:

  1. orjson is only used for the new use cases and we just stick with the old json library by default
  2. just use the old json library and not try to migrate to orjson, which would definitely not break anything but would still allow pretty printing and key sorting to be added

That seems like a reasonable approach to me. I think ensure_ascii is only likely to be needed for endpoints that can't handle utf-8 properly. If it turns out someone really does need it we can leave the fallback code as its easy enough to flip ensure_ascii to True and only those cases would have to pay the performance penalty to use it.

@bdraco
Copy link
Member

bdraco commented Apr 12, 2023

I think it would also be fine to not deprecate ensure_ascii so the only change anyone would have to make would be to set ensure_ascii=True if they are sending data to an endpoint that doesn't support utf-8 and using unicode characters which would limit a breaking change to only really out there corner cases

@bdraco
Copy link
Member

bdraco commented Apr 12, 2023

Another option would be to keep to_json as-is. Add all the new stuff to as_json, mark to_json as deprecated in the docs but note that we have no plans to remove it which means nobody has to do anything unless they want the new features/better performance.

@depoll
Copy link
Contributor Author

depoll commented Apr 12, 2023

I think it would also be fine to not deprecate ensure_ascii so the only change anyone would have to make would be to set ensure_ascii=True if they are sending data to an endpoint that doesn't support utf-8 and using unicode characters which would limit a breaking change to only really out there corner cases

I think this would still technically be breaking (because it's effectively changing ensure_ascii's default value from True to False), but am ok with leaving ensure_ascii in with the performance penalty for people who really need it (I can even make the other flags work with it). The number of people who would be broken in practice is likely vanishingly small given that the json spec is explicit that valid json is a sequence of unicode code points anyway.

@bdraco
Copy link
Member

bdraco commented Apr 12, 2023

I think it would also be fine to not deprecate ensure_ascii so the only change anyone would have to make would be to set ensure_ascii=True if they are sending data to an endpoint that doesn't support utf-8 and using unicode characters which would limit a breaking change to only really out there corner cases

I think this would still technically be breaking (because it's effectively changing ensure_ascii's default value from True to False), but am ok with leaving ensure_ascii in with the performance penalty for people who really need it (I can even make the other flags work with it). The number of people who would be broken in practice is likely vanishingly small given that the json spec is explicit that valid json is a sequence of unicode code points anyway.

That seems like the best compromise as we want it perform well by default and users shouldn't have to figure out how to make it performant.

@frenck
Copy link
Member

frenck commented Apr 12, 2023

I agree on the latter parts, I think that is acceptable. I also agree that we can't keep backward compatibility forever. But that said, looking at the resulting solution, the larger part of use cases won't be affected (which IMHO is a good compromise).

@bdraco
Copy link
Member

bdraco commented Apr 12, 2023

I think we have a good path forward here. I'm happy with the result. Thanks everyone for the input and collaboration

@depoll depoll changed the title Migrate to orjson for Jinja JSON filtering Add pretty printing and key sorting to to_json Apr 12, 2023
@depoll depoll changed the title Add pretty printing and key sorting to to_json Add pretty printing, key sorting, and better performance to to_json in Jinja Apr 12, 2023
@depoll
Copy link
Contributor Author

depoll commented Apr 12, 2023

Tried to update the PR description and text to match where we landed. It definitely resulted in a few things getting bundled together, but I think we landed in a good spot. Also updated the docs PR.

@bdraco
Copy link
Member

bdraco commented Apr 12, 2023

Let's separate the text into proposed changes and breaking changes with the new functionality in the proposed changes section.

The breaking changes section should be limited to what an affected user needs to do to handle the change and under what conditions they would need to do it. It's what will appear in the release notes

@depoll
Copy link
Contributor Author

depoll commented Apr 12, 2023

Ok -- moved the text into the proposed change section and called out the breaking part explicitly in the Breaking Change section.

@depoll depoll requested a review from bdraco April 12, 2023 18:59
@bdraco bdraco removed the deprecation Indicates a breaking change to happen in the future label Apr 12, 2023
Copy link
Member

@bdraco bdraco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @depoll

Manual testing looks good. (not that I expected it to work any other way than the test cases but always good to go)

Copy link
Member

@frenck frenck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @depoll 👍

../Frenck

@frenck frenck merged commit ea12d7a into home-assistant:dev Apr 12, 2023
54 checks passed
tkhadimullin pushed a commit to tkhadimullin/ha-core that referenced this pull request Apr 13, 2023
…n Jinja (home-assistant#91253)

Co-authored-by: J. Nick Koston <nick@koston.org>
Co-authored-by: epenet <6771947+epenet@users.noreply.github.com>
@github-actions github-actions bot locked and limited conversation to collaborators Apr 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants