Skip to content
S'More speed for Marshmallow
Branch: master
Clone or download
rowillia Implement load. (#7)
* Implement load.

This change isn't as optimized as it could be, it's only 2.5X faster, but wanted to get this reviewed before going
further with optimizing.

* Implement inlining for loading
Latest commit 52ee26a Oct 24, 2017
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
marshmallow @ 640fcd4 Implement load. (#7) Oct 23, 2017
performance
tests Implement load. (#7) Oct 23, 2017
toastedmarshmallow Implement load. (#7) Oct 23, 2017
.gitignore
.gitmodules Implement load. (#7) Oct 23, 2017
.pylintrc Implement load. (#7) Oct 23, 2017
.travis.yml
CODE_OF_CONDUCT.md
LICENSE Initial commit to public repo Aug 8, 2017
MANIFEST.in Initial commit to public repo Aug 8, 2017
NOTICE Initial commit to public repo Aug 8, 2017
README.rst
dev-requirements.txt Initial commit to public repo Aug 8, 2017
setup.cfg Initial commit to public repo Aug 8, 2017
setup.py Initial commit to public repo Aug 8, 2017

README.rst

🔥toastedmarshmallow🔥: Makes Marshmallow Toasty Fast

Toasted Marshmallow implements a JIT for marshmallow that speeds up dumping objects 10-25X (depending on your schema). Toasted Marshmallow allows you to have the great API that Marshmallow provides without having to sacrifice performance!

Benchmark Result:
  Original Time: 2682.61 usec/dump
  Optimized Time: 176.38 usec/dump
  Optimized (Cython) Time: 125.77 usec/dump
  Speed up: 15.21x
  Cython Speed up: 21.33x

Even PyPy benefits from toastedmarshmallow!

Benchmark Result:
    Original Time: 189.78 usec/dump
    Optimized Time: 20.03 usec/dump
    Speed up: 9.48x

Installing toastedmarshmallow

pip install toastedmarshmallow

This will also install a slightly-forked marshmallow that includes some hooks Toastedmarshmallow needs enable the JIT to run before falling back to the original marshmallow code. These changes are minimal making it easier to track upstream. You can find the changes Here.

This means you should remove marshmallow from your requirements and replace it with toastedmarshmallow. By default there is no difference unless you explicitly enable Toasted Marshmallow.

Enabling Toasted Marshmallow

Enabling Toasted Marshmallow on an existing Schema is just one line of code, set the jit property on any Schema instance to toastedmarshmallow.Jit. For example:

from datetime import date
import toastedmarshmallow
from marshmallow import Schema, fields, pprint

class ArtistSchema(Schema):
    name = fields.Str()

class AlbumSchema(Schema):
    title = fields.Str()
    release_date = fields.Date()
    artist = fields.Nested(ArtistSchema())

schema = AlbumSchema()
# Specify the jit method as toastedmarshmallow's jit
schema.jit = toastedmarshmallow.Jit
# And that's it!  Your dump methods are 15x faster!

It's also possible to use the Meta class on the Marshmallow schema to specify all instances of a given Schema should be optimized:

import toastedmarshmallow
from marshmallow import Schema, fields, pprint

class ArtistSchema(Schema):
    class Meta:
        jit = toastedMarshmallow.Jit
    name = fields.Str()

You can also enable Toasted Marshmallow globally by setting the environment variable MARSHMALLOW_SCHEMA_DEFAULT_JIT to toastedmarshmallow.Jit . Future versions of Toasted Marshmallow may make this the default.

How it works

Toasted Marshmallow works by generating code at runtime to optimize dumping objects without going through layers and layers of reflection. The generated code optimistically assumes the objects being passed in are schematically valid, falling back to the original marshmallow code on failure.

For example, taking AlbumSchema from above, Toastedmarshmallow will generate the following 3 methods:

def InstanceSerializer(obj):
    res = {}
    value = obj.release_date; value = value() if callable(value) else value; res["release_date"] = _field_release_date__serialize(value, "release_date", obj)
    value = obj.artist; value = value() if callable(value) else value; res["artist"] = _field_artist__serialize(value, "artist", obj)
    value = obj.title; value = value() if callable(value) else value; value = str(value) if value is not None else None; res["title"] = value
    return res

def DictSerializer(obj):
    res = {}
    if "release_date" in obj:
        value = obj["release_date"]; value = value() if callable(value) else value; res["release_date"] = _field_release_date__serialize(value, "release_date", obj)
    if "artist" in obj:
        value = obj["artist"]; value = value() if callable(value) else value; res["artist"] = _field_artist__serialize(value, "artist", obj)
    if "title" in obj:
        value = obj["title"]; value = value() if callable(value) else value; value = str(value) if value is not None else None; res["title"] = value
    return res

def HybridSerializer(obj):
    res = {}
    try:
        value = obj["release_date"]
    except (KeyError, AttributeError, IndexError, TypeError):
        value = obj.release_date
    value = value; value = value() if callable(value) else value; res["release_date"] = _field_release_date__serialize(value, "release_date", obj)
    try:
        value = obj["artist"]
    except (KeyError, AttributeError, IndexError, TypeError):
        value = obj.artist
    value = value; value = value() if callable(value) else value; res["artist"] = _field_artist__serialize(value, "artist", obj)
    try:
        value = obj["title"]
    except (KeyError, AttributeError, IndexError, TypeError):
        value = obj.title
    value = value; value = value() if callable(value) else value; value = str(value) if value is not None else None; res["title"] = value
    return res

Toastedmarshmallow will invoke the proper serializer based upon the input.

Since Toastedmarshmallow is generating code at runtime, it's critical you re-use Schema objects. If you're creating a new Schema object every time you serialize/deserialize an object you'll likely have much worse performance.

⚡️🔬 Experimental 🔬⚡️

Toastedmarshmallow also has an experimental Cython based jit. It takes the generated code above and runs it through Cython first, getting another 1.5x win. Generally the generated Python code is fast enough, but this is a useful option when you've got to squeeze out every last bit of performance.

To use the Cython jit, replace Jit with CythonJit:

schema.jit = toastedmarshmallow.CythonJit
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.