Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting alternative msgpack implementations #15

Closed
faresbakhit opened this issue Aug 30, 2021 · 11 comments
Closed

Supporting alternative msgpack implementations #15

faresbakhit opened this issue Aug 30, 2021 · 11 comments
Labels
enhancement New feature or request

Comments

@faresbakhit
Copy link
Contributor

ormsgpack is a faster alternative to python-msgpack and since speed is critical/required in Web/ASGI application I think it would be nice to have support for using ormsgpack!

My current idea is to provide ormsgpack as an extra/optional dependency to msgpack-asgi (installed like pip install msgpack-asgi[ormsgpack]?) and provide two more classes to the public API ORMessagePackMiddleware and ORMessagePackResponse if ormsgpack can be imported.

I will happily create a PR and start working on this feature if this is something can be added to the project! 😄

@florimondmanca
Copy link
Owner

florimondmanca commented Sep 19, 2021

Hi @FaresAhmedb,

Thanks for opening this! I'm actually quite curious about measuring performance improvements we'd get from this extra alternative as well. In the past this library was marked as limited by some commentators because it saves bandwidth (thanks to msgpack) by inducing a greater burden on the CPU due to msgpack-JSON serde, which some people found too big in some situations (I'd need to dig up those discussions tho). So a "faster on the CPU" option would definitely be interesting to explore.

I think a pip extra sounds fair, but that and the wording actually depends on a broader choice:

There's a few options on dependency detection strategies we can take, with different trade offs and assumptions. I'll list them below. What do you think is most appropriate given your knowledge of the ormsgpack library?

Options:

  • Implicit: check whether ormsgpack is installed, and if so, use that instead of the pure Python library. Simple and transparent. There's a risk of using the wrong library, but given the stakes that might not actually be a problem? Eg if there's a platform compatibility requirement, then surely if the user managed to install the library, we can go ahead and use it. :)
  • Explicit: require users to toggle an explicit flag somewhere. I think that's the option you suggest with a separate middleware class, although there are other implementation possibilities (library flag, environment variable, etc). Would be appropriate if the alternative is somewhat experimental. But if ormsgpack is stable and well that might be unnecessary?
  • Direct change: switch to using ormsgpack altogether, drop msgpack-python. Only works if there's absolutely no compatibility issue possible compared to the pure Python version. We'd probably bump major versions to mark the change, but the public API wouldn't change at all.

@faresbakhit
Copy link
Contributor Author

I'm actually quite curious about measuring performance improvements we'd get from this extra alternative as well.

The ormsgpack README says that it serializes faster than msgpack-python and deserializes a bit slower (Just tested it. msgpack-python deserializes much faster) so I think a performance boost would only be noticed in responses. Maybe we can make a special middleware which benefits from both worlds?

There's a few options on dependency detection strategies we can take, with different trade offs and assumptions. I'll list them below. What do you think is most appropriate given your knowledge of the ormsgpack library?

I'm in favor of the Explicit option; It seems to not bring any issues a long the way and perfectly follows python's philosophy...

Explicit is better than implicit.
- The Zen of Python

This is the problems I think the other options could bring:

Implicit: ...

Assuming the user wants to use ormsgpack can be the undesired behavior for some users but it will be as easy as doing this:

try:
    import ormsgpack as msgpack
except ImportError:
    import msgpack

Direct change: ...

This would mean dropping support for PyPy (also Python2 but won't be the case since it doesn't support asgi/async anyway).

@florimondmanca
Copy link
Owner

florimondmanca commented Sep 20, 2021

@FaresAhmedb Thanks, that's useful info.

Actually, I assume most users would be in a "I'm fine-tuning performance" mindset, which means writing some custom code would be acceptable — if not appropriate. So I wonder about another option: a public "backend" interface.

Like this:

class Packer(Protocol):
    def packb(self, obj: Any) -> bytes: ...
    def unpackb(self, data: bytes) -> Any: ...

We'd add a MsgPacker for our current behavior, and advanced users could do all sorts of customized things, e.g. what you suggested, ormsgpack for serialization vs msgpack for deserialization:

import ormsgpack
import msgpack

class OptimizedPacker:
    def packb(self, obj: Any) -> bytes:
        option = ormsgpack.OPT_NAIVE_UTC | ormsgpack.OPT_SERIALIZE_NUMPY
        return ormsgpack.packb(obj, option=option)

    def unpackb(self, data: bytes) -> Any:
        return msgpack.unpackb(data)

Usage:

from msgpack_asgi import MessagePackMiddleware

app = MessagePackMiddleware(..., packer=OptimizedPacker())

Or maybe, without classes, using packb=... and unpackb=... options (I think I prefer this style):

import ormsgpack
from typing import Any
from msgpack_asgi import MessagePackMiddleware

def packb(obj: Any) -> bytes:
    option = ormsgpack.OPT_NAIVE_UTC | ormsgpack.OPT_SERIALIZE_NUMPY
    return ormsgpack.packb(obj, option=option)

app = MessagePackMiddleware(..., packb=packb)  # Default `unpackb=partial(msgpack.unpackb, raw=False)`

Thoughts?

@florimondmanca florimondmanca added the enhancement New feature or request label Sep 20, 2021
@florimondmanca florimondmanca changed the title Add support for ormsgpack Alternative msgpack implementations Sep 20, 2021
@florimondmanca florimondmanca changed the title Alternative msgpack implementations Supporting alternative msgpack implementations Sep 20, 2021
@faresbakhit
Copy link
Contributor Author

Very interesting! I like the latter implementation. Supporting ormsgpack directly into the library wouldn't be needed then. There's also a lot of MessagePack flavors out there like u-msgpack-python (A Pure Python implementation) so that would be definitely a win win for everyone.

But I assume this would be a breaking change, Right?

@florimondmanca
Copy link
Owner

@FaresAhmedb Nope, as far as I can tell we can make this change in a purely additive manner (merely two new optional arguments, packb=... + unpackb=...), provided we ensure the existing behavior remains in the default case. Right?

@florimondmanca
Copy link
Owner

@FaresAhmedb I think we're settled on the solution, then. :-) At this point anyone up for it can give it a try!

@faresbakhit
Copy link
Contributor Author

@florimondmanca Sorry for the late reply, I was busy.

I will open a PR now! 😃

@faresbakhit
Copy link
Contributor Author

@florimondmanca Just opened a pull request at #20 😀

@jcrist
Copy link

jcrist commented Sep 24, 2021

Just stumbled upon this package (from the fast-api docs). I have no stake in this issue, but y'all may be interested in trying out msgspec (https://github.com/jcrist/msgspec), a faster msgpack library that I maintain. I've benchmarked it against both msgpack-python and ormsgpack, and it's generally faster than either of these (up to 4x in some cases) for both encode or decode. That said, since you're already decoding and re-encoding (which comes at a cost), this may not really matter in the scheme of things.

The packb kwarg mentioned above makes sense I think, as it allows different encoding libraries to be used transparently, so no real action item here, just thought I'd bring it up.

@florimondmanca
Copy link
Owner

florimondmanca commented Sep 26, 2021

Thanks @jcrist, msgspec looks fab. :-) Thanks to #20 (still needs documenting: #21) which got merged recently, the next version of msgpack-asgi will support using msgspec like this, I think?

import msgpec

_enc = msgpec.Encoder()
_dec = msgpec.Decoder()

app = MessagePackMiddleware(..., packb=_enc.encode, unpackb=_dec.decode)

msgspec supports a wide range of type validations which I assume people would be able to support as well, even supporting schema evolution as necessary, all within packb/unpackb. Sounds correct?

If this is all good, that's an interesting case to validate the callable-based design from #20. :-)

@florimondmanca
Copy link
Owner

florimondmanca commented Sep 26, 2021

I'll close this off since it got resolved in code by #20, we still have #21 for tracking the documentation bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants