Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Reduce unnecessary whitespace in JSON #7372

Merged
merged 15 commits into from
Aug 7, 2020

Conversation

auscompgeek
Copy link
Contributor

@auscompgeek auscompgeek commented Apr 30, 2020

The json and simplejson modules when serialising default to separators such that the output is compatible with YAML 1.0. As there is no need for JSON responses from Matrix homeservers to be consumed by YAML 1.0/1.1 implementations, the whitespace this generates is redundant.

This PR reduces the amount of redundant whitespace that is stored and sent in responses, hopefully significantly reducing the resources required for Synapse in the long run.

This does not attempt to eliminate the unnecessary whitespace completely, nor does it introduce anything to rewrite data in existing databases to be compact, but rather this tries to find the biggest future wins for the average homeserver.

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file.
  • Pull request includes a sign off
  • Code style is correct (run the linters)

Signed-off-by: David Vo <david@vovo.id.au>

@clokep clokep requested a review from a team April 30, 2020 11:09
@richvdh
Copy link
Member

richvdh commented Apr 30, 2020

my main concern here would be if it causes us to fall back to the pure-python implementation of json.dumps instead of the optimised C version.

@auscompgeek
Copy link
Contributor Author

Both json and simplejson look like they both have non-default separators support in their C extension modules.

Taking a look though, it seems dumps internally creates a new JSONEncoder instance if passed non-default options though. Probably worth creating our own JSONEncoder instance and calling encode on it?

@auscompgeek
Copy link
Contributor Author

Anyway, if falling back to the pure-Python implementation was a concern here, I would have some bad news about canonicaljson...

@richvdh
Copy link
Member

richvdh commented May 4, 2020

Probably worth creating our own JSONEncoder instance and calling encode on it?

yes, that sounds sensible.

Anyway, if falling back to the pure-Python implementation was a concern here, I would have some bad news about canonicaljson...

it is a concern, so please share your bad news. canonicaljson goes to some lengths to use the C implementation.

@auscompgeek
Copy link
Contributor Author

Probably worth creating our own JSONEncoder instance and calling encode on it?

yes, that sounds sensible.

What would be a good place for this to go? Seems like it could go in synapse.util but I can't see a good existing module to put it in, and it seems like overkill to create an entire new module for a single object.

canonicaljson goes to some lengths to use the C implementation.

That is an interesting interpretation of that comment, given that both paths are implemented in C.

@richvdh
Copy link
Member

richvdh commented May 11, 2020

What would be a good place for this to go?

I guess it could just go in synapse/util/__init__.py 🤷‍♂️

@richvdh
Copy link
Member

richvdh commented May 11, 2020

canonicaljson goes to some lengths to use the C implementation.

That is an interesting interpretation of that comment,

"that comment" being "if falling back to the pure-Python implementation was a concern here, I would have some bad news about canonicaljson..."?

My interpretation of that was that you believe that canonicaljson is falling back to pure-python. Looking again, it seems that my interpretation of your comment was wrong? Either way, I'd appreciate it if you could be clear rather than making me guess at your meaning.

given that both paths are implemented in C.

again, these links don't help me without spending 15 minutes staring at the context. Please: be clear what I'm supposed to understand here.

@richvdh richvdh added the X-Awaiting-Changes A contributed PR which needs changes and re-review before it can be merged label May 13, 2020
@richvdh richvdh removed the request for review from a team May 14, 2020 16:03
@richvdh richvdh removed the X-Awaiting-Changes A contributed PR which needs changes and re-review before it can be merged label Jun 5, 2020
@richvdh richvdh self-requested a review June 9, 2020 16:32
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apologies for sitting on this for so long. I've had another look at this and I agree, changing separators is just fine.

OTOH I do agree that creating a single JSONEncoder with the settings we want to use would be preferable to hardcoding these options at 50 locations throughout the code.

Are you still interested in working on this PR?

@richvdh richvdh added the X-Awaiting-Changes A contributed PR which needs changes and re-review before it can be merged label Jun 10, 2020
@auscompgeek
Copy link
Contributor Author

I would, however I won't have an opportunity to work on this until next month. Happy for anyone to take over.

@richvdh
Copy link
Member

richvdh commented Jun 11, 2020

Thanks, we'll wait! :)

@auscompgeek
Copy link
Contributor Author

I suppose 0364266eb734cb5b99be3a8afc4ddcd6f32bf3a1 could be merged separately though. That should give the biggest storage wins.

@clokep
Copy link
Member

clokep commented Jun 16, 2020

I suppose 0364266 could be merged separately though. That should give the biggest storage wins.

@auscompgeek That makes sense to me, I don't think that would be a controversial change either.

@auscompgeek
Copy link
Contributor Author

Okay, I've rewritten this PR to directly use a JSONEncoder.

@richvdh richvdh removed the X-Awaiting-Changes A contributed PR which needs changes and re-review before it can be merged label Jul 28, 2020
@richvdh richvdh requested a review from a team July 28, 2020 12:34
@clokep
Copy link
Member

clokep commented Jul 29, 2020

@auscompgeek Would you mind merging develop into this? I think with some of the changes made for #7674 it would be nice to have that in here for review to ensure there's no issues with compatibility.

Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm otherwise, thanks!

synapse/util/__init__.py Show resolved Hide resolved
synapse/util/frozenutils.py Outdated Show resolved Hide resolved
@clokep
Copy link
Member

clokep commented Aug 6, 2020

@auscompgeek I left a couple of suggestions to improve the comments. Additionally it seems that this bitrotted. Hopefully git will mostly do the right thing when merging in develop! I think this looks good and should be a great change!

@clokep
Copy link
Member

clokep commented Aug 6, 2020

It looks like this could also replace the custom encoder in synapse/synapse/replication/tcp/commands.py, no reason not to reduce the traffic sent over replication!

auscompgeek and others added 2 commits August 7, 2020 16:12
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
@auscompgeek
Copy link
Contributor Author

auscompgeek commented Aug 7, 2020

Thanks for the comment suggestions :)

Additionally it seems that this bitrotted. Hopefully git will mostly do the right thing when merging in develop!

Yup, this PR seems to have a bit of a habit of bitrotting, thank goodness for git config merge.conflictstyle diff3 though.

replace the custom encoder in synapse/replication/tcp/commands.py

Good catch. It looks like this uses the stdlib json directly though, do we want to change that?


Looking around, I found another JSONEncoder as well in synapse.logging._terse_json:

_encoder = json.JSONEncoder(ensure_ascii=False, separators=(",", ":"))

That one uses both ensure_ascii=False and the stdlib json directly, so I won't touch that one here.

@clokep
Copy link
Member

clokep commented Aug 7, 2020

replace the custom encoder in synapse/replication/tcp/commands.py

Good catch. It looks like this uses the stdlib json directly though, do we want to change that?

I think changing this one will be fine.

I agree that leaving the synapse.logging._terse_json one alone makes sense.

@clokep clokep dismissed richvdh’s stale review August 7, 2020 11:45

richvdh's request for an additional comment was handled.

@clokep clokep merged commit 4dd27e6 into matrix-org:develop Aug 7, 2020
@clokep
Copy link
Member

clokep commented Aug 7, 2020

Thanks for sticking with us on this one! 👍

@auscompgeek auscompgeek deleted the squish-json branch August 7, 2020 12:04
reivilibre added a commit that referenced this pull request Aug 13, 2020
Synapse 1.19.0rc1 (2020-08-13)
==============================

Removal warning
---------------

As outlined in the [previous release](https://github.com/matrix-org/synapse/releases/tag/v1.18.0), we are no longer publishing Docker images with the `-py3` tag suffix. On top of that, we have also removed the `latest-py3` tag. Please see [the announcement in the upgrade notes for 1.18.0](https://github.com/matrix-org/synapse/blob/develop/UPGRADE.rst#upgrading-to-v1180).

Features
--------

- Add option to allow server admins to join rooms which fail complexity checks. Contributed by @lugino-emeritus. ([\#7902](#7902))
- Add an option to purge room or not with delete room admin endpoint (`POST /_synapse/admin/v1/rooms/<room_id>/delete`). Contributed by @dklimpel. ([\#7964](#7964))
- Add rate limiting to users joining rooms. ([\#8008](#8008))
- Add a `/health` endpoint to every configured HTTP listener that can be used as a health check endpoint by load balancers. ([\#8048](#8048))
- Allow login to be blocked based on the values of SAML attributes. ([\#8052](#8052))
- Allow guest access to the `GET /_matrix/client/r0/rooms/{room_id}/members` endpoint, according to MSC2689. Contributed by Awesome Technologies Innovationslabor GmbH. ([\#7314](#7314))

Bugfixes
--------

- Fix a bug introduced in Synapse v1.7.2 which caused inaccurate membership counts in the room directory. ([\#7977](#7977))
- Fix a long standing bug: 'Duplicate key value violates unique constraint "event_relations_id"' when message retention is configured. ([\#7978](#7978))
- Fix "no create event in auth events" when trying to reject invitation after inviter leaves. Bug introduced in Synapse v1.10.0. ([\#7980](#7980))
- Fix various comments and minor discrepencies in server notices code. ([\#7996](#7996))
- Fix a long standing bug where HTTP HEAD requests resulted in a 400 error. ([\#7999](#7999))
- Fix a long-standing bug which caused two copies of some log lines to be written when synctl was used along with a MemoryHandler logger. ([\#8011](#8011), [\#8012](#8012))

Updates to the Docker image
---------------------------

- We no longer publish Docker images with the `-py3` tag suffix, as [announced in the upgrade notes](https://github.com/matrix-org/synapse/blob/develop/UPGRADE.rst#upgrading-to-v1180). ([\#8056](#8056))

Improved Documentation
----------------------

- Document how to set up a client .well-known file and fix several pieces of outdated documentation. ([\#7899](#7899))
- Improve workers docs. ([\#7990](#7990), [\#8000](#8000))
- Fix typo in `docs/workers.md`. ([\#7992](#7992))
- Add documentation for how to undo a room shutdown. ([\#7998](#7998), [\#8010](#8010))

Internal Changes
----------------

- Reduce the amount of whitespace in JSON stored and sent in responses. Contributed by David Vo. ([\#7372](#7372))
- Switch to the JSON implementation from the standard library and bump the minimum version of the canonicaljson library to 1.2.0. ([\#7936](#7936), [\#7979](#7979))
- Convert various parts of the codebase to async/await. ([\#7947](#7947), [\#7948](#7948), [\#7949](#7949), [\#7951](#7951), [\#7963](#7963), [\#7973](#7973), [\#7975](#7975), [\#7976](#7976), [\#7981](#7981), [\#7987](#7987), [\#7989](#7989), [\#8003](#8003), [\#8014](#8014), [\#8016](#8016), [\#8027](#8027), [\#8031](#8031), [\#8032](#8032), [\#8035](#8035), [\#8042](#8042), [\#8044](#8044), [\#8045](#8045), [\#8061](#8061), [\#8062](#8062), [\#8063](#8063), [\#8066](#8066), [\#8069](#8069), [\#8070](#8070))
- Move some database-related log lines from the default logger to the database/transaction loggers. ([\#7952](#7952))
- Add a script to detect source code files using non-unix line terminators. ([\#7965](#7965), [\#7970](#7970))
- Log the SAML session ID during creation. ([\#7971](#7971))
- Implement new experimental push rules for some users. ([\#7997](#7997))
- Remove redundant and unreliable signature check for v1 Identity Service lookup responses. ([\#8001](#8001))
- Improve the performance of the register endpoint. ([\#8009](#8009))
- Reduce less useful output in the newsfragment CI step. Add a link to the changelog section of the contributing guide on error. ([\#8024](#8024))
- Rename storage layer objects to be more sensible. ([\#8033](#8033))
- Change the default log config to reduce disk I/O and storage for new servers. ([\#8040](#8040))
- Add an assertion on `prev_events` in `create_new_client_event`. ([\#8041](#8041))
- Add a comment to `ServerContextFactory` about the use of `SSLv23_METHOD`. ([\#8043](#8043))
- Log `OPTIONS` requests at `DEBUG` rather than `INFO` level to reduce amount logged at `INFO`. ([\#8049](#8049))
- Reduce amount of outbound request logging at `INFO` level. ([\#8050](#8050))
- It is no longer necessary to explicitly define `filters` in the logging configuration. (Continuing to do so is redundant but harmless.) ([\#8051](#8051))
- Add and improve type hints. ([\#8058](#8058), [\#8064](#8064), [\#8060](#8060), [\#8067](#8067))
babolivier pushed a commit that referenced this pull request Sep 1, 2021
* commit '4dd27e6d1':
  Reduce unnecessary whitespace in JSON. (#7372)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants