Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve decoder performance on SQL call #281

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
15 changes: 9 additions & 6 deletions app/lib/sqlalchemy_plugin.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,20 +61,23 @@ def _default(val: Any) -> str:

@event.listens_for(engine.sync_engine, "connect")
def _sqla_on_connect(dbapi_connection: Any, _: Any) -> Any:
"""Using orjson for serialization of the json column values means that the
"""Using msgspec for serialization of the json column values means that the
output is binary, not `str` like `json.dumps` would output.

SQLAlchemy expects that the json serializer returns `str` and calls
`.encode()` on the value to turn it to bytes before writing to the
JSONB column. I'd need to either wrap `orjson.dumps` to return a
`str` so that SQLAlchemy could then convert it to binary, or do the
following, which changes the behaviour of the dialect to expect a
binary value from the serializer.
JSONB column. I'd need to do the following, which changes the behaviour
of the dialect to expect a binary value from the serializer.

See Also:
https://github.com/sqlalchemy/sqlalchemy/blob/14bfbadfdf9260a1c40f63b31641b27fe9de12a0/lib/sqlalchemy/dialects/postgresql/asyncpg.py#L934
"""

# Create Decoder once and reuse it for all encoding calls,
# to avoid paying setup cost for every call. More info:
# https://jcristharif.com/msgspec/perf-tips.html#reuse-encoders-decoders
_decoder = msgspec.json.Decoder()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi - msgspec author here.

The main benefit of creating and reusing a decoder is avoiding processing extra config kwargs on every call to decode, especially the type kwarg. Since you're not passing in any keyword arguments here, using the msgspec.json.decode convenience method should be comparable in performance. No harm in the change you've made here, but also likely no measurable benefit. I'll update our docs page to better clarify when you may want to preallocate a decoder.

Copy link
Author

@Olegt0rr Olegt0rr Aug 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying!

The encoder example and the phrase "The same goes for decoding" disoriented me. I understood this in such a way that resources are spent on creating the object itself, so in my projects I created both an encoder and a decoder - then reuse both. And where I know the model in advance, I create an additional decoder.

P.S.: thanks for great product!


def encoder(bin_value: bytes) -> bytes:
# \x01 is the prefix for jsonb used by PostgreSQL.
# asyncpg requires it when format='binary'
Expand All @@ -83,7 +86,7 @@ def encoder(bin_value: bytes) -> bytes:
def decoder(bin_value: bytes) -> Any:
# the byte is the \x01 prefix for jsonb used by PostgreSQL.
# asyncpg returns it when format='binary'
return msgspec.json.decode(bin_value[1:])
return _decoder.decode(bin_value[1:])

dbapi_connection.await_(
dbapi_connection.driver_connection.set_type_codec(
Expand Down
2 changes: 1 addition & 1 deletion app/lib/worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
class Queue(saq.Queue):
"""[SAQ Queue](https://github.com/tobymao/saq/blob/master/saq/queue.py)

Configures `orjson` for JSON serialization/deserialization if not otherwise configured.
Configures `msgspec` for JSON serialization/deserialization if not otherwise configured.

Parameters
----------
Expand Down