Add pretty printing, key sorting, and better performance to to_json in Jinja #91253

depoll · 2023-04-11T23:15:53Z

Breaking change

The ensure_ascii argument for to_json in Jinja templates now defaults to False, allowing us to use a faster JSON encoder by default. For most people, this should not be an issue as JSON parsers broadly accept unicode input. If you still need to encode unicode characters inside of JSON strings, set ensure_ascii to True explicitly to restore the old behavior.

Proposed change

Adds pretty_print and sort_keysto to_json in Jinja templates. to_json now uses the faster orjson serializer by default, which requires us to default ensure_ascii to false, though this shouldn't impact many people as parsers following the JSON standard should support unicode in strings anyway. ensure_ascii continues to exist for people to set explicitly if they still need this compatibility option.

Type of change

Dependency upgrade
Bugfix (non-breaking change which fixes an issue)
New integration (thank you!)
New feature (which adds functionality to an existing integration)
Deprecation (breaking change to happen in the future)
Breaking change (fix/feature causing existing functionality to break)
Code quality improvements to existing code or addition of tests

Additional information

This PR fixes or closes issue: fixes #
This PR is related to issue:
Link to documentation pull request: Add documentation for to_json options home-assistant.io#26956

Checklist

The code change is tested and works locally.
Local tests pass. Your PR cannot be merged unless tests pass
There is no commented out code in this PR.
I have followed the development checklist
I have followed the perfect PR recommendations
The code has been formatted using Black (black --fast homeassistant tests)
Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

Documentation added/updated for www.home-assistant.io

If the code communicates with devices, web services, or third-party tools:

The manifest file has all fields filled out correctly.
Updated and included derived files by running: python3 -m script.hassfest.
New or updated dependencies have been added to requirements_all.txt.
Updated by running python3 -m script.gen_requirements_all.
For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.
Untested files have been added to .coveragerc.

To help with the load of incoming pull requests:

I have reviewed two other open pull requests in this repository.

bdraco · 2023-04-11T23:16:53Z

homeassistant/helpers/template.py

+def to_json(value, ensure_ascii=True, indent=None, sort_keys=False):
    """Convert an object to a JSON string."""
-    return json.dumps(value, ensure_ascii=ensure_ascii)
+    return json.dumps(
+        value, ensure_ascii=ensure_ascii, indent=indent, sort_keys=sort_keys
+    )


We are generally moving away from json because of performance reasons.

We should use homeassistant.helpers.json here

I can take a look at the json helpers api, but I'm using this here because it's actually how the existing to_json Jinja filter is implemented, and would be worried about introducing quirks by using a different API.

Looking at it a bit, it looks like the helper version is made for serializing home assistant objects and handles things like custom types, date encodings, etc. I'm not sure that's the right fit for Jinja, but open to input.

Even using orjson directly would be fine. The performance issues are with the built in json lib

Got it. Will explore making the change.

One way out of this would be to deprecate the existing to_json (leaving it in place but deprecated/not recommended) and replace it with as_json using the new library. The existing conversion APIs like as_datetime, as_local use the as naming scheme instead of the to naming scheme, so it would be more consistent anyway. Deserializing may be possible compatibly already.

My biggest question would be whether the quirks of orjson are the ones we want to live with in the API -- Hyrum's law and all 😉.

90-95% of the common paths in the code base has already moved to it so I think thats not a concern we need to retread.

It seems at a glance like a fairly inflexible API (sacrificing some usability and flexibility for performance, which is often the right trade off as internal implementation detail but maybe less nice for developer experience) and would take things like uuids and datetimes without a straightforward way to turn those features off -- quirks we'd need to support in perpetuity.

I don't think templates are going to have dataclass or UUID objects, but we already serialize datetime to isoformat almost everywhere else so it lines up nicely.

One way out of this would be to deprecate the existing to_json (leaving it in place but deprecated/not recommended) and replace it with as_json using the new library. The existing conversion APIs like as_datetime, as_local use the as naming scheme instead of the to naming scheme, so it would be more consistent anyway. Deserializing may be possible compatibly already.

That seems like a reasonable path forward 👍

actually -- from_json is already using orjson. So it's just to_json. I think I can set a default handler and turn on passthroughs to disable all of the nonstandard types (at least at the outset so we wouldn't be stuck with them). I'm going to give that a shot. Will update this PR with the proposal.

OK -- gave it a shot. Please take a look. I'm not sure if I did all the things right for deprecation here (and certainly we'd need to document the change).

homeassistant/helpers/template.py

tests/helpers/test_template.py

homeassistant/helpers/template.py

Co-authored-by: J. Nick Koston <nick@koston.org>

depoll · 2023-04-12T03:48:10Z

Ok -- added a documentation PR and hopefully addressed outstanding comments. Let me know if there's more to do!

bdraco · 2023-04-12T05:24:08Z

LGTM. I need to do manual testing but may run out of time tonight

depoll · 2023-04-12T05:43:19Z

Awesome. Thanks for the review and all the back and forth! Appreciate you working through it with me.

epenet

Small comment here: I suggest adding type hints to the new functions.

homeassistant/helpers/template.py

Co-authored-by: epenet <6771947+epenet@users.noreply.github.com>

frenck

I don't understand why this is a breaking change and why we need to make this breaking?

This will make every single template out there (in forums, google, youtube vids, tutorials whatever) that uses it, broken.

If we want a faster JSON serializer, I suggest applying it to to_json instead.

depoll · 2023-04-12T15:20:12Z

One more option just came to mind:

Keep to_json for everything, but have ensure_ascii use the old library (and warn of deprecation of the option + not be compatible with new options, which nobody can be using anyhow), and use the new library for now only if ensure_ascii is false.

I may sketch that out briefly -- see if it makes it more palatable, but @frenck definitely interested in feedback on that approach.

frenck · 2023-04-12T15:46:46Z

@depoll IMHO, removing/deprecating/replacing is not an option or consideration.

Templates are used a lot, including blueprints for which end-users aren't provided updates.

bdraco · 2023-04-12T17:04:55Z

One more option just came to mind:

Keep to_json for everything, but have ensure_ascii use the old library (and warn of deprecation of the option + not be compatible with new options, which nobody can be using anyhow), and use the new library for now only if ensure_ascii is false.

5 seems seem like reasonable approach if we don't want to accept a breaking change.

Personally I am not a fan of backwards compatibility forever here considering I had trouble finding an example of someone using this on the forums (not that it means it's not being used) but it isn't especially difficult to maintain it here if that's the goal.

depoll · 2023-04-12T17:42:55Z

Ok, so I gave option 5 a shot. It's still technically breaking because ensure_ascii is deprecated and the default behavior of to_json will no longer encode unicode strings, however the likelihood that folks are really depending on \u encoding instead of generating utf-8 json seems small -- the vast, vast, vast majority are probably only getting that encoding because of the default value of ensure_ascii. If people are explicitly specifying ensure_ascii, we warn that it's going away. For the normal use case it just uses orjson and moves on with it.

If that's still too breaky, there are two paths forward:

orjson is only used for the new use cases and we just stick with the old json library by default
just use the old json library and not try to migrate to orjson, which would definitely not break anything but would still allow pretty printing and key sorting to be added

bdraco · 2023-04-12T17:47:53Z

Ok, so I gave option 5 a shot. It's still technically breaking because ensure_ascii is deprecated and the default behavior of to_json will no longer encode unicode strings, however the likelihood that folks are really depending on \u encoding instead of generating utf-8 json seems small -- the vast, vast, vast majority are probably only getting that encoding because of the default value of ensure_ascii. If people are explicitly specifying ensure_ascii, we warn that it's going away. For the normal use case it just uses orjson and moves on with it.

If that's still too breaky, there are two paths forward:

orjson is only used for the new use cases and we just stick with the old json library by default

just use the old json library and not try to migrate to orjson, which would definitely not break anything but would still allow pretty printing and key sorting to be added

That seems like a reasonable approach to me. I think ensure_ascii is only likely to be needed for endpoints that can't handle utf-8 properly. If it turns out someone really does need it we can leave the fallback code as its easy enough to flip ensure_ascii to True and only those cases would have to pay the performance penalty to use it.

bdraco · 2023-04-12T17:50:31Z

I think it would also be fine to not deprecate ensure_ascii so the only change anyone would have to make would be to set ensure_ascii=True if they are sending data to an endpoint that doesn't support utf-8 and using unicode characters which would limit a breaking change to only really out there corner cases

bdraco · 2023-04-12T17:53:56Z

Another option would be to keep to_json as-is. Add all the new stuff to as_json, mark to_json as deprecated in the docs but note that we have no plans to remove it which means nobody has to do anything unless they want the new features/better performance.

depoll · 2023-04-12T17:55:36Z

I think it would also be fine to not deprecate ensure_ascii so the only change anyone would have to make would be to set ensure_ascii=True if they are sending data to an endpoint that doesn't support utf-8 and using unicode characters which would limit a breaking change to only really out there corner cases

I think this would still technically be breaking (because it's effectively changing ensure_ascii's default value from True to False), but am ok with leaving ensure_ascii in with the performance penalty for people who really need it (I can even make the other flags work with it). The number of people who would be broken in practice is likely vanishingly small given that the json spec is explicit that valid json is a sequence of unicode code points anyway.

bdraco · 2023-04-12T17:57:32Z

I think it would also be fine to not deprecate ensure_ascii so the only change anyone would have to make would be to set ensure_ascii=True if they are sending data to an endpoint that doesn't support utf-8 and using unicode characters which would limit a breaking change to only really out there corner cases

I think this would still technically be breaking (because it's effectively changing ensure_ascii's default value from True to False), but am ok with leaving ensure_ascii in with the performance penalty for people who really need it (I can even make the other flags work with it). The number of people who would be broken in practice is likely vanishingly small given that the json spec is explicit that valid json is a sequence of unicode code points anyway.

That seems like the best compromise as we want it perform well by default and users shouldn't have to figure out how to make it performant.

frenck · 2023-04-12T18:02:01Z

I agree on the latter parts, I think that is acceptable. I also agree that we can't keep backward compatibility forever. But that said, looking at the resulting solution, the larger part of use cases won't be affected (which IMHO is a good compromise).

homeassistant/helpers/template.py

bdraco · 2023-04-12T18:10:56Z

I think we have a good path forward here. I'm happy with the result. Thanks everyone for the input and collaboration

depoll · 2023-04-12T18:25:02Z

Tried to update the PR description and text to match where we landed. It definitely resulted in a few things getting bundled together, but I think we landed in a good spot. Also updated the docs PR.

bdraco · 2023-04-12T18:29:26Z

Let's separate the text into proposed changes and breaking changes with the new functionality in the proposed changes section.

The breaking changes section should be limited to what an affected user needs to do to handle the change and under what conditions they would need to do it. It's what will appear in the release notes

depoll · 2023-04-12T18:58:16Z

Ok -- moved the text into the proposed change section and called out the breaking part explicitly in the Breaking Change section.

bdraco

Thanks @depoll

Manual testing looks good. (not that I expected it to work any other way than the test cases but always good to go)

frenck

Thanks, @depoll 👍

../Frenck

…n Jinja (home-assistant#91253) Co-authored-by: J. Nick Koston <nick@koston.org> Co-authored-by: epenet <6771947+epenet@users.noreply.github.com>

Add options to to_json

e5aa571

depoll requested a review from a team as a code owner April 11, 2023 23:15

home-assistant bot added cla-signed core new-feature small-pr PRs with less than 30 lines. labels Apr 11, 2023

bdraco reviewed Apr 11, 2023

View reviewed changes

Migrate to orjson by adding as_json

7fae8d1

depoll changed the title ~~Add options to to_json~~ Migrate to orjson for Jinja JSON filtering Apr 12, 2023

bdraco added breaking-change deprecation Indicates a breaking change to happen in the future labels Apr 12, 2023

bdraco reviewed Apr 12, 2023

View reviewed changes

homeassistant/helpers/template.py Outdated Show resolved Hide resolved

bdraco reviewed Apr 12, 2023

View reviewed changes

homeassistant/helpers/template.py Outdated Show resolved Hide resolved

bdraco added the docs-missing label Apr 12, 2023

Incorporate PR feedback

1e00441

bdraco reviewed Apr 12, 2023

View reviewed changes

homeassistant/helpers/template.py Outdated Show resolved Hide resolved

bdraco reviewed Apr 12, 2023

View reviewed changes

tests/helpers/test_template.py Outdated Show resolved Hide resolved

bdraco reviewed Apr 12, 2023

View reviewed changes

homeassistant/helpers/template.py Outdated Show resolved Hide resolved

Apply suggestions from code review

478f55e

Co-authored-by: J. Nick Koston <nick@koston.org>

depoll mentioned this pull request Apr 12, 2023

Add documentation for to_json options home-assistant/home-assistant.io#26956

Merged

8 tasks

Fixing lint errors.

8ad3755

depoll requested a review from bdraco April 12, 2023 03:48

bdraco removed docs-missing small-pr PRs with less than 30 lines. labels Apr 12, 2023

epenet reviewed Apr 12, 2023

View reviewed changes

homeassistant/helpers/template.py Outdated Show resolved Hide resolved

homeassistant/helpers/template.py Outdated Show resolved Hide resolved

Apply suggestions from code review

dac3a03

Co-authored-by: epenet <6771947+epenet@users.noreply.github.com>

frenck requested changes Apr 12, 2023

View reviewed changes

depoll marked this pull request as ready for review April 12, 2023 15:11

Fixing formatting issue.

3589c6e

Moves things back to to_json

f6e07a8

Fix broken test

cacf700

Bring back ensure_ascii as supported

27f2760

bdraco reviewed Apr 12, 2023

View reviewed changes

homeassistant/helpers/template.py Outdated Show resolved Hide resolved

ensure_ascii doesn't need to support None

8066791

depoll changed the title ~~Migrate to orjson for Jinja JSON filtering~~ Add pretty printing and key sorting to to_json Apr 12, 2023

depoll changed the title ~~Add pretty printing and key sorting to to_json~~ Add pretty printing, key sorting, and better performance to to_json in Jinja Apr 12, 2023

depoll requested a review from bdraco April 12, 2023 18:59

bdraco removed the deprecation Indicates a breaking change to happen in the future label Apr 12, 2023

bdraco approved these changes Apr 12, 2023

View reviewed changes

frenck approved these changes Apr 12, 2023

View reviewed changes

frenck merged commit ea12d7a into home-assistant:dev Apr 12, 2023
54 checks passed

github-actions bot locked and limited conversation to collaborators Apr 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pretty printing, key sorting, and better performance to to_json in Jinja #91253

Add pretty printing, key sorting, and better performance to to_json in Jinja #91253

depoll commented Apr 11, 2023 •

edited

bdraco Apr 11, 2023

depoll Apr 11, 2023

depoll Apr 11, 2023

bdraco Apr 12, 2023

depoll Apr 12, 2023

depoll Apr 12, 2023

bdraco Apr 12, 2023 •

edited

bdraco Apr 12, 2023

depoll Apr 12, 2023

depoll Apr 12, 2023

depoll commented Apr 12, 2023

bdraco commented Apr 12, 2023 •

edited

depoll commented Apr 12, 2023

epenet left a comment

frenck left a comment •

edited

depoll commented Apr 12, 2023

frenck commented Apr 12, 2023

bdraco commented Apr 12, 2023 •

edited

depoll commented Apr 12, 2023

bdraco commented Apr 12, 2023 •

edited

bdraco commented Apr 12, 2023

bdraco commented Apr 12, 2023

depoll commented Apr 12, 2023

bdraco commented Apr 12, 2023

frenck commented Apr 12, 2023

bdraco commented Apr 12, 2023

depoll commented Apr 12, 2023

bdraco commented Apr 12, 2023 •

edited

depoll commented Apr 12, 2023

bdraco left a comment

frenck left a comment

Add pretty printing, key sorting, and better performance to to_json in Jinja #91253

Add pretty printing, key sorting, and better performance to to_json in Jinja #91253

Conversation

depoll commented Apr 11, 2023 • edited

Breaking change

Proposed change

Type of change

Additional information

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdraco Apr 12, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

depoll commented Apr 12, 2023

bdraco commented Apr 12, 2023 • edited

depoll commented Apr 12, 2023

epenet left a comment

Choose a reason for hiding this comment

frenck left a comment • edited

Choose a reason for hiding this comment

depoll commented Apr 12, 2023

frenck commented Apr 12, 2023

bdraco commented Apr 12, 2023 • edited

depoll commented Apr 12, 2023

bdraco commented Apr 12, 2023 • edited

bdraco commented Apr 12, 2023

bdraco commented Apr 12, 2023

depoll commented Apr 12, 2023

bdraco commented Apr 12, 2023

frenck commented Apr 12, 2023

bdraco commented Apr 12, 2023

depoll commented Apr 12, 2023

bdraco commented Apr 12, 2023 • edited

depoll commented Apr 12, 2023

bdraco left a comment

Choose a reason for hiding this comment

frenck left a comment

Choose a reason for hiding this comment

depoll commented Apr 11, 2023 •

edited

bdraco Apr 12, 2023 •

edited

bdraco commented Apr 12, 2023 •

edited

frenck left a comment •

edited

bdraco commented Apr 12, 2023 •

edited

bdraco commented Apr 12, 2023 •

edited

bdraco commented Apr 12, 2023 •

edited