mashumaro is a fast and well tested serialization framework on top of dataclasses.
When using dataclasses, you often need to dump and load objects according to the described scheme. This framework not only adds this ability to serialize in different formats, but also makes serialization rapidly.
- Installation
- Supported serialization formats
- Supported field types
- Usage example
- How does it work?
- Benchmark
- API
- Customization
Use pip to install:
$ pip install mashumaro
This framework adds methods for dumping to and loading from the following formats:
Plain dict can be useful when you need to pass a dict object to a third-party library, such as a client for MongoDB.
There is support for generic types from the standard typing
module:
for special primitives from the typing
module:
for enumerations based on classes from the standard enum
module:
for common built-in types:
for built-in datetime oriented types (see more details):
for pathlike types:
for other less popular built-in types:
uuid.UUID
decimal.Decimal
fractions.Fraction
ipaddress.IPv4Address
ipaddress.IPv6Address
ipaddress.IPv4Network
ipaddress.IPv6Network
ipaddress.IPv4Interface
ipaddress.IPv6Interface
for specific types like:
from enum import Enum
from typing import Set
from dataclasses import dataclass
from mashumaro import DataClassJSONMixin
class PetType(Enum):
CAT = 'CAT'
MOUSE = 'MOUSE'
@dataclass(unsafe_hash=True)
class Pet(DataClassJSONMixin):
name: str
age: int
pet_type: PetType
@dataclass
class Person(DataClassJSONMixin):
first_name: str
second_name: str
age: int
pets: Set[Pet]
tom = Pet(name='Tom', age=5, pet_type=PetType.CAT)
jerry = Pet(name='Jerry', age=3, pet_type=PetType.MOUSE)
john = Person(first_name='John', second_name='Smith', age=18, pets={tom, jerry})
dump = john.to_json()
person = Person.from_json(dump)
# person == john
Pet.from_json('{"name": "Tom", "age": 5, "pet_type": "CAT"}')
# Pet(name='Tom', age=5, pet_type=<PetType.CAT: 'CAT'>)
This framework works by taking the schema of the data and generating a specific parser and builder for exactly that schema. This is much faster than inspection of field types on every call of parsing or building at runtime.
- macOS 11.5.2 Big Sur
- Apple M1
- 16GB RAM
- Python 3.9.1
Load and dump sample data 1.000 times in 5 runs. The following figures show the best overall time in each case.
Framework | From dict | To dict | ||
---|---|---|---|---|
Time | Slowdown factor | Time | Slowdown factor | |
mashumaro | 0.04096 | 1x | 0.02741 | 1x |
cattrs | 0.07307 | 1.78x | 0.05062 | 1.85x |
pydantic | 0.24847 | 6.07x | 0.12292 | 4.48x |
marshmallow | 0.29205 | 7.13x | 0.09310 | 3.4x |
dataclasses | — | — | 0.22583 | 8.24x |
dacite | 0.91553 | 22.35x | — | — |
To run benchmark in your environment:
git clone git@github.com:Fatal1ty/mashumaro.git
cd mashumaro
python3 -m venv env && source env/bin/activate
pip install -e .
pip install -r requirements-dev.txt
python benchmark/run.py
Mashumaro provides a couple of mixins for each format.
Make a dictionary from dataclass object based on the dataclass schema provided. Options include:
use_bytes: False # False - convert bytes/bytearray objects to base64 encoded string, True - keep untouched
use_enum: False # False - convert enum objects to enum values, True - keep untouched
use_datetime: False # False - convert datetime oriented objects to ISO 8601 formatted string, True - keep untouched
Make a new object from dict object based on the dataclass schema provided. Options include:
use_bytes: False # False - load bytes/bytearray objects from base64 encoded string, True - keep untouched
use_enum: False # False - load enum objects from enum values, True - keep untouched
use_datetime: False # False - load datetime oriented objects from ISO 8601 formatted string, True - keep untouched
DataClassJSONMixin.to_json(encoder: Optional[Encoder], dict_params: Optional[Mapping], **encoder_kwargs)
Make a JSON formatted string from dataclass object based on the dataclass schema provided. Options include:
encoder # function called for json encoding, defaults to json.dumps
dict_params # dictionary of parameter values passed underhood to `to_dict` function
encoder_kwargs # keyword arguments for encoder function
DataClassJSONMixin.from_json(data: Union[str, bytes, bytearray], decoder: Optional[Decoder], dict_params: Optional[Mapping], **decoder_kwargs)
Make a new object from JSON formatted string based on the dataclass schema provided. Options include:
decoder # function called for json decoding, defaults to json.loads
dict_params # dictionary of parameter values passed underhood to `from_dict` function
decoder_kwargs # keyword arguments for decoder function
DataClassMessagePackMixin.to_msgpack(encoder: Optional[Encoder], dict_params: Optional[Mapping], **encoder_kwargs)
Make a MessagePack formatted bytes object from dataclass object based on the dataclass schema provided. Options include:
encoder # function called for MessagePack encoding, defaults to msgpack.packb
dict_params # dictionary of parameter values passed underhood to `to_dict` function
encoder_kwargs # keyword arguments for encoder function
DataClassMessagePackMixin.from_msgpack(data: Union[str, bytes, bytearray], decoder: Optional[Decoder], dict_params: Optional[Mapping], **decoder_kwargs)
Make a new object from MessagePack formatted data based on the dataclass schema provided. Options include:
decoder # function called for MessagePack decoding, defaults to msgpack.unpackb
dict_params # dictionary of parameter values passed underhood to `from_dict` function
decoder_kwargs # keyword arguments for decoder function
DataClassYAMLMixin.to_yaml(encoder: Optional[Encoder], dict_params: Optional[Mapping], **encoder_kwargs)
Make an YAML formatted bytes object from dataclass object based on the dataclass schema provided. Options include:
encoder # function called for YAML encoding, defaults to yaml.dump
dict_params # dictionary of parameter values passed underhood to `to_dict` function
encoder_kwargs # keyword arguments for encoder function
DataClassYAMLMixin.from_yaml(data: Union[str, bytes], decoder: Optional[Decoder], dict_params: Optional[Mapping], **decoder_kwargs)
Make a new object from YAML formatted data based on the dataclass schema provided. Options include:
decoder # function called for YAML decoding, defaults to yaml.safe_load
dict_params # dictionary of parameter values passed underhood to `from_dict` function
decoder_kwargs # keyword arguments for decoder function
If you already have a separate custom class, and you want to serialize
instances of it with mashumaro, you can achieve this by implementing
SerializableType
interface:
from typing import Dict
from datetime import datetime
from dataclasses import dataclass
from mashumaro import DataClassDictMixin
from mashumaro.types import SerializableType
class DateTime(datetime, SerializableType):
def _serialize(self) -> Dict[str, int]:
return {
"year": self.year,
"month": self.month,
"day": self.day,
"hour": self.hour,
"minute": self.minute,
"second": self.second,
}
@classmethod
def _deserialize(cls, value: Dict[str, int]) -> 'DateTime':
return DateTime(
year=value['year'],
month=value['month'],
day=value['day'],
hour=value['hour'],
minute=value['minute'],
second=value['second'],
)
@dataclass
class Holiday(DataClassDictMixin):
when: DateTime = DateTime.now()
new_year = Holiday(when=DateTime(2019, 1, 1, 12))
dictionary = new_year.to_dict()
# {'x': {'year': 2019, 'month': 1, 'day': 1, 'hour': 0, 'minute': 0, 'second': 0}}
assert Holiday.from_dict(dictionary) == new_year
If you have a custom generic type and are looking for a generic version of such an interface, read this.
In some cases creating a new class just for one little thing could be
excessive. Moreover, you may need to deal with third party classes that you are
not allowed to change. You can usedataclasses.field
function as a default field value to configure some serialization aspects
through its metadata
parameter. Next section describes all supported options
to use in metadata
mapping.
This option allows you to change the serialization method through
a value of type Callable[[Any], Any]
that could be any callable object like
a function, a class method, a class instance method, an instance of a callable
class or even a lambda function.
Example:
@dataclass
class A(DataClassDictMixin):
dt: datetime = field(
metadata={
"serialize": lambda v: v.strftime('%Y-%m-%d %H:%M:%S')
}
)
This option allows you to change the deserialization method. When using
this option, the deserialization behaviour depends on what type of value the
option has. It could be either Callable[[Any], Any]
or str
.
A value of type Callable[[Any], Any]
is a generic way to specify any callable
object like a function, a class method, a class instance method, an instance
of a callable class or even a lambda function to be called for deserialization.
A value of type str
sets a specific engine for deserialization. Keep in mind
that all possible engines depend on the field type that this option is used
with. At this moment there are next deserialization engines to choose from:
Applicable field types | Supported engines | Description |
---|---|---|
datetime , date , time |
ciso8601 , pendulum |
How to parse datetime string. By default native fromisoformat of corresponding class will be used for datetime , date and time fields. It's the fastest way in most cases, but you can choose an alternative. |
Example:
from datetime import datetime
from dataclasses import dataclass, field
from typing import List
from mashumaro import DataClassDictMixin
import ciso8601
import dateutil
@dataclass
class A(DataClassDictMixin):
x: datetime = field(
metadata={"deserialize": "pendulum"}
)
class B(DataClassDictMixin):
x: datetime = field(
metadata={"deserialize": ciso8601.parse_datetime_as_naive}
)
@dataclass
class C(DataClassDictMixin):
dt: List[datetime] = field(
metadata={
"deserialize": lambda l: list(map(dateutil.parser.isoparse, l))
}
)
This option is useful when you want to change the serialization behaviour
for a class depending on some defined parameters. For this case you can create
the special class implementing SerializationStrategy
interface:
from dataclasses import dataclass, field
from datetime import datetime
from mashumaro import DataClassDictMixin
from mashumaro.types import SerializationStrategy
class FormattedDateTime(SerializationStrategy):
def __init__(self, fmt):
self.fmt = fmt
def serialize(self, value: datetime) -> str:
return value.strftime(self.fmt)
def deserialize(self, value: str) -> datetime:
return datetime.strptime(value, self.fmt)
@dataclass
class DateTimeFormats(DataClassDictMixin):
short: datetime = field(
metadata={
"serialization_strategy": FormattedDateTime(
fmt="%d%m%Y%H%M%S",
)
}
)
verbose: datetime = field(
metadata={
"serialization_strategy": FormattedDateTime(
fmt="%A %B %d, %Y, %H:%M:%S",
)
}
)
formats = DateTimeFormats(
short=datetime(2019, 1, 1, 12),
verbose=datetime(2019, 1, 1, 12),
)
dictionary = formats.to_dict()
# {'short': '01012019120000', 'verbose': 'Tuesday January 01, 2019, 12:00:00'}
assert DateTimeFormats.from_dict(dictionary) == formats
In some cases it's better to have different names for a field in your class and in its serialized view. For example, a third-party legacy API you are working with might operate with camel case style, but you stick to snake case style in your code base. Or even you want to load data with keys that are invalid identifiers in Python. This problem is easily solved by using aliases:
from dataclasses import dataclass, field
from mashumaro import DataClassDictMixin, field_options
@dataclass
class DataClass(DataClassDictMixin):
a: int = field(metadata=field_options(alias="FieldA"))
b: int = field(metadata=field_options(alias="#invalid"))
x = DataClass.from_dict({"FieldA": 1, "#invalid": 2}) # DataClass(a=1, b=2)
x.to_dict() # {"a": 1, "b": 2} # no aliases on serialization by default
If you want to write all the field aliases in one place there is such a config option.
If you want to serialize all the fields by aliases you have two options to do so:
It's hard to imagine when it might be necessary to serialize only specific fields by alias, but such functionality is easily added to the library. Open the issue if you need it.
If you don't want to remember the names of the options you can use
field_options
helper function:
from dataclasses import dataclass, field
from mashumaro import DataClassDictMixin, field_options
@dataclass
class A(DataClassDictMixin):
x: int = field(
metadata=field_options(
serialize=str,
deserialize=int,
...
)
)
More options are on the way. If you know which option would be useful for many, please don't hesitate to create an issue or pull request.
If inheritance is not an empty word for you, you'll fall in love with the
Config
class. You can register serialize
and deserialize
methods, define
code generation options and other things just in one place. Or in some
classes in different ways if you need flexibility. Inheritance is always on the
first place.
There is a base class BaseConfig
that you can inherit for the sake of
convenience, but it's not mandatory.
In the following example you can see how
the debug
flag is changed from class to class: ModelA
will have debug mode enabled but
ModelB
will not.
from mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig
class BaseModel(DataClassDictMixin):
class Config(BaseConfig):
debug = True
class ModelA(BaseModel):
a: int
class ModelB(BaseModel):
b: int
class Config(BaseConfig):
debug = False
Next section describes all supported options to use in the config.
If you enable the debug
option the generated code for your data class
will be printed.
Some users may need functionality that wouldn't exist without extra cost such as valuable cpu time to execute additional instructions. Since not everyone needs such instructions, they can be enabled by a constant in the list, so the fastest basic behavior of the library will always remain by default. The following table provides a brief overview of all the available constants described below.
Constant | Description |
---|---|
TO_DICT_ADD_OMIT_NONE_FLAG |
Adds omit_none keyword-only argument to to_dict method. |
TO_DICT_ADD_BY_ALIAS_FLAG |
Adds by_alias keyword-only arguments to to_dict method. |
You can register custom SerializationStrategy
, serialize
and deserialize
methods for specific types just in one place. It could be configured using
a dictionary with types as keys. The value could be either a
SerializationStrategy
instance or a dictionary with serialize
and
deserialize
values with the same meaning as in the
field options.
from dataclasses import dataclass
from datetime import datetime, date
from mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig
from mashumaro.types import SerializationStrategy
class FormattedDateTime(SerializationStrategy):
def __init__(self, fmt):
self.fmt = fmt
def serialize(self, value: datetime) -> str:
return value.strftime(self.fmt)
def deserialize(self, value: str) -> datetime:
return datetime.strptime(value, self.fmt)
@dataclass
class DataClass(DataClassDictMixin):
datetime: datetime
date: date
class Config(BaseConfig):
serialization_strategy = {
datetime: FormattedDateTime("%Y"),
date: {
# you can use specific str values for datetime here as well
"deserialize": "pendulum",
"serialize": date.isoformat,
},
}
instance = DataClass.from_dict({"datetime": "2021", "date": "2021"})
# DataClass(datetime=datetime.datetime(2021, 1, 1, 0, 0), date=Date(2021, 1, 1))
dictionary = instance.to_dict()
# {'datetime': '2021', 'date': '2021-01-01'}
Sometimes it's better to write the field aliases in one place. You can mix aliases here with aliases in the field options, but the last ones will always take precedence.
from dataclasses import dataclass
from mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig
@dataclass
class DataClass(DataClassDictMixin):
a: int
b: int
class Config(BaseConfig):
aliases = {
"a": "FieldA",
"b": "FieldB",
}
DataClass.from_dict({"FieldA": 1, "FieldB": 2}) # DataClass(a=1, b=2)
All the fields with aliases will be serialized by them when
this option is enabled. The more flexible but less fast way to do the same
is using by_alias
keyword argument.
from dataclasses import dataclass, field
from mashumaro import DataClassDictMixin, field_options
from mashumaro.config import BaseConfig
@dataclass
class DataClass(DataClassDictMixin):
field_a: int = field(metadata=field_options(alias="FieldA"))
class Config(BaseConfig):
serialize_by_alias = True
DataClass(field_a=1).to_dict() # {'FieldA': 1}
If you want to have control over whether to skip None
values on serialization
you can add omit_none
parameter to to_dict
method using the
code_generation_options
list:
from dataclasses import dataclass
from mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig, TO_DICT_ADD_OMIT_NONE_FLAG
@dataclass
class Inner(DataClassDictMixin):
x: int = None
# "x" won't be omitted since there is no TO_DICT_ADD_OMIT_NONE_FLAG here
@dataclass
class Model(DataClassDictMixin):
x: Inner
a: int = None
b: str = None # will be omitted
class Config(BaseConfig):
code_generation_options = [TO_DICT_ADD_OMIT_NONE_FLAG]
Model(x=Inner(), a=1).to_dict(omit_none=True) # {'x': {'x': None}, 'a': 1}
If you want to have control over whether to serialize fields by their
aliases you can add by_alias
parameter to to_dict
method
using the code_generation_options
list. On the other hand if serialization
by alias is always needed, the best solution is to use the
serialize_by_alias
config option.
from dataclasses import dataclass, field
from mashumaro import DataClassDictMixin, field_options
from mashumaro.config import BaseConfig, TO_DICT_ADD_BY_ALIAS_FLAG
@dataclass
class DataClass(DataClassDictMixin):
field_a: int = field(metadata=field_options(alias="FieldA"))
class Config(BaseConfig):
code_generation_options = [TO_DICT_ADD_BY_ALIAS_FLAG]
DataClass(field_a=1).to_dict() # {'field_a': 1}
DataClass(field_a=1).to_dict(by_alias=True) # {'FieldA': 1}
Keep in mind, if you're serializing data in JSON or another format, then you
need to pass by_alias
argument to dict_params
dictionary.
There is support for user-defined generic types. You can inherit generic dataclasses along with overwriting types in them, use generic dataclasses as field types, or create your own generic types with serialization under your control.
If you have a generic version of a dataclass and want to serialize and deserialize its instances depending on the concrete types, you can achieve this using inheritance:
from dataclasses import dataclass
from datetime import date
from typing import Generic, Mapping, TypeVar
from mashumaro import DataClassDictMixin
KT = TypeVar("KT")
VT = TypeVar("VT", date, str)
@dataclass
class GenericDataClass(Generic[KT, VT]):
x: Mapping[KT, VT]
@dataclass
class ConcreteDataClass(GenericDataClass[str, date], DataClassDictMixin):
pass
ConcreteDataClass.from_dict({"x": {"a": "2021-01-01"}}) # ok
ConcreteDataClass.from_dict({"x": {"a": "not a date but str"}}) # error
You can override TypeVar
field with a concrete type or another TypeVar
.
Partial specification of concrete types is also allowed. If a generic dataclass
is inherited without type overriding the types of its fields remain untouched.
Another approach is to specify concrete types in the field type hints. This can help to have different versions of the same generic dataclass:
from dataclasses import dataclass
from datetime import date
from typing import Generic, TypeVar
from mashumaro import DataClassDictMixin
T = TypeVar('T')
@dataclass
class GenericDataClass(Generic[T], DataClassDictMixin):
x: T
@dataclass
class DataClass(DataClassDictMixin):
date: GenericDataClass[date]
str: GenericDataClass[str]
instance = DataClass(
date=GenericDataClass(x=date(2021, 1, 1)),
str=GenericDataClass(x='2021-01-01'),
)
dictionary = {'date': {'x': '2021-01-01'}, 'str': {'x': '2021-01-01'}}
assert DataClass.from_dict(dictionary) == instance
There is a generic alternative to SerializableType
called GenericSerializableType
. It makes it possible to serialize and deserialize
instances of generic types depending on the types provided:
from typing import Dict, TypeVar, Iterator
from datetime import datetime
from dataclasses import dataclass
from mashumaro import DataClassDictMixin
from mashumaro.types import GenericSerializableType
KT = TypeVar("KT", int, str)
VT = TypeVar("VT", int, str)
class GenericDict(Dict[KT, VT], GenericSerializableType):
def _serialize(self, types) -> Dict[KT, VT]:
k_type, v_type = types
if k_type not in (int, str) or v_type not in (int, str):
raise TypeError
return {k_type(k): v_type(v) for k, v in self.items()}
@classmethod
def _deserialize(cls, value, types) -> 'GenericDict[KT, VT]':
k_type, v_type = types
if k_type not in (int, str) or v_type not in (int, str):
raise TypeError
return cls({k_type(k): v_type(v) for k, v in value.items()})
@dataclass
class DataClass(DataClassDictMixin):
x: GenericDict[int, str]
y: GenericDict[str, int]
instance = DataClass(GenericDict({1: 'a'}), GenericDict({'b': 2}))
dictionary = instance.to_dict() # {'x': {1: 'a'}, 'y': {'b': 2}}
assert DataClass.from_dict(dictionary) == instance
The difference between SerializableType
and
GenericSerializableType
is that
the methods of GenericSerializableType
have a parameter types
, to which the concrete types will be passed.
If you don't need this information you can still use
SerializableType
interface even with generic
types.
In some cases you need to prepare input / output data or do some extraordinary actions at different stages of the deserialization / serialization lifecycle. You can do this with different types of hooks.
For doing something with a dictionary that will be passed to deserialization
you can use __pre_deserialize__
class method:
@dataclass
class A(DataClassJSONMixin):
abc: int
@classmethod
def __pre_deserialize__(cls, d: Dict[Any, Any]) -> Dict[Any, Any]:
return {k.lower(): v for k, v in d.items()}
print(DataClass.from_dict({"ABC": 123})) # DataClass(abc=123)
print(DataClass.from_json('{"ABC": 123}')) # DataClass(abc=123)
For doing something with a dataclass instance that was created as a result
of deserialization you can use __post_deserialize__
class method:
@dataclass
class A(DataClassJSONMixin):
abc: int
@classmethod
def __post_deserialize__(cls, obj: 'A') -> 'A':
obj.abc = 456
return obj
print(DataClass.from_dict({"abc": 123})) # DataClass(abc=456)
print(DataClass.from_json('{"abc": 123}')) # DataClass(abc=456)
For doing something before serialization you can use __pre_serialize__
method:
@dataclass
class A(DataClassJSONMixin):
abc: int
counter: ClassVar[int] = 0
def __pre_serialize__(self) -> 'A':
self.counter += 1
return self
obj = DataClass(abc=123)
obj.to_dict()
obj.to_json()
print(obj.counter) # 2
For doing something with a dictionary that was created as a result of
serialization you can use __post_serialize__
method:
@dataclass
class A(DataClassJSONMixin):
user: str
password: str
def __post_serialize__(self, d: Dict[Any, Any]) -> Dict[Any, Any]:
d.pop('password')
return d
obj = DataClass(user="name", password="secret")
print(obj.to_dict()) # {"user": "name"}
print(obj.to_json()) # '{"user": "name"}'
- add optional validation
- write custom useful types such as URL, Email etc