Support encoding and decoding attrs types #323

jcrist · 2023-02-16T05:42:44Z

This adds support for encoding/decoding attrs types. It's mostly built off of our existing dataclasses functionality, and has the same restrictions:

When encoding, any object attribute without a leading underscore is encoded using the attribute name, even if it's not marked as an attribute as part of the class definition. This is for performance reasons, accessing __attrs_attrs__ at encoding time comes at a perf cost.
When decoding, the __init__ method on the class is not called. This is for both efficiency and correctness reasons. As such, classes that define their own __init__ or __attrs_init__ will not have this called during the decoding process. __attrs_pre_init__ and __attrs_post_init__ methods are called at the proper times though.
Currently default factories with takes_self=True are not supported. We could support this, but it was more work than I wanted to do right now.

Note that right now neither attrs' validators or convertors are run on decode. This is fixable, but would require a lot more work since this would diverge from the existing dataclasses implementation.

All in all this was pretty straightforward to get working. It's nice that dataclasses and attrs implementations haven't diverged too much here.

This also fixes a bug in the existing dataclass decoder that prevented decoding dataclasses with frozen=True configured.

Fixes #51.

Related to (but doesn't resolve) #316.

jcrist · 2023-02-16T05:46:05Z

A quick example:

In [1]: import attrs                           

In [2]: import msgspec                         

In [3]: @attrs.define                          
   ...: class User:                            
   ...:     name: str                          
   ...:     groups: list[str] = []
   ...:     email: str | None = None
   ...:                                        

In [4]: alice = User("alice", groups=["admin", "engineering"])

In [5]: msg = msgspec.json.encode(alice)

In [6]: msg                                    
Out[6]: b'{"email":null,"groups":["admin","engineering"],"name":"alice"}'

In [7]: msgspec.json.decode(msg, type=User)
Out[7]: User(name='alice', groups=['admin', 'engineering'], email=None)

In [8]: msgspec.json.decode(b'{"name": 123}', type=User)
---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In [8], line 1                            
----> 1 msgspec.json.decode(b'{"name": 123}', type=User)

ValidationError: Expected `str`, got `int` - at `$.name`

ltworf · 2023-02-28T20:10:11Z

When decoding, the init method on the class is not called.

Does this still work if the field had defined a converter?

jcrist · 2023-02-28T20:35:58Z

Does this still work if the field had defined a converter?

Converters and validators aren't currently supported with the above model. I'm not an attrs user, so I'm not sure what the expected behavior is here - I was hoping to get some feedback on this from @Tinche on what's expected before moving forward with this.

Tinche · 2023-03-16T01:03:42Z

Looks pretty cool to me! I personally don't use validators, and converters sparingly so this is a good first step.

Why do you skip attributes with a leading underscore though? We use MongoDB a lot (through our own ODM based on attrs ;) and Mongo documents require an _id field. So all my database models have that, and I'd expect it to be in the payload.

jcrist · 2023-03-16T01:43:07Z

Looks pretty cool to me! I personally don't use validators, and converters sparingly so this is a good first step.

Thanks for the review! Just to clarify - do you think this is good to merge as is, and would be useful for you (or others) without immediate further changes?

Why do you skip attributes with a leading underscore though?

For performance reasons, we don't touch __attrs_attrs__/__dataclass_fields__ at all during encode time - all attributes without a leading underscore found in either __dict__ or __slots__ on an object are encoded. FWIW this is the same behavior as orjson has for dataclasses, and seems to work well enough in practice. We drop the leading underscore attributes since sometimes users stash private state on an object with a leading-underscore attribute, but in my experience it's rare to have a true dataclass field have a leading underscore (I don't use attrs, so no experience there).

Mongo documents require an _id field. So all my database models have that, and I'd expect it to be in the payload.

For struct types, we support renaming fields for this use case. The assumption is that user code probably wants to work with python-friendly attribute names (e.g. id), and the name used for encoding is metadata on the field. This isn't exposed at the msgspec.field level yet (msgspec.field itself is pretty new), so currently you'd do this as:

In [1]: import msgspec

In [2]: class Person(msgspec.Struct, rename={"id": "_id"}):
   ...:     id: int
   ...:     name: str
   ...: 

In [3]: alice = Person(1, "alice")

In [4]: alice.id  # attribute is original field name
Out[4]: 1

In [5]: alice
Out[5]: Person(id=1, name='alice')

In [6]: msgspec.json.encode(alice)  # encoding uses renamed name
Out[6]: b'{"_id":1,"name":"alice"}'

In [7]: msgspec.json.decode(_, type=Person)  # as does decoding
Out[7]: Person(id=1, name='alice')

Currently this feature is only exposed for Struct types, since dataclasses/attrs don't have an equivalent existing config option. We could expose this through the arbitrary metadata option if needed, but I'd rather not until someone asks for it. My recommendation for users needing this feature is to switch to msgspec.Struct types. They're more performant, and have more config options.

In summary:

The current implementation of what's encoded on an attrs/dataclass type is done for performance/ease of implementation. It doesn't look at __attrs_attrs__/__dataclass_fields__ at all. We could change this to respect the defined list of fields, but it would complicate things. Happy to do it if asked, but not until a user (or you) asks for it.
msgspec also supports configuring alternative names for encoding on msgspec.Struct types only. We could expose this for dataclasses/attrs if needed, but for now I'd recommend users to switch to using a msgspec.Struct in those cases. It's not clear to me how we'd expose this config option for dataclasses/attrs anyway.

These uses the same code path as `dataclasses`, with the same behaviors and restrictions. In particular any attribute lacking a leading underscore on an `attrs` type is encoded, regardless of if it's declared as an attribute.

These follow the same code path as `dataclasses`, and have the same restrictions: - The generated `__init__` is not called. `msgspec` calls the proper hooks in the proper order. - `__attrs_pre_init__` and `__attrs_post_init__` hooks are both supported and will be called on decode. - `attrs.Factory(..., takes_self=True)` defaults aren't currently supported, but could be if someone asks for it.

This fixes a bug in both the dataclass and attrs decoder implementations that previously prevented decoding a message into a frozen dataclass or attrs type.

jcrist mentioned this pull request Feb 16, 2023

Integrate with the attrs ecosystem #316

Open

jcrist added 5 commits March 17, 2023 20:32

Support encoding attrs types

5c0c312

These uses the same code path as `dataclasses`, with the same behaviors and restrictions. In particular any attribute lacking a leading underscore on an `attrs` type is encoded, regardless of if it's declared as an attribute.

Add attrs to test deps

2b41306

Document attrs support

b3a5cc1

Support decoding frozen attrs and dataclass instances

36d1aa4

This fixes a bug in both the dataclass and attrs decoder implementations that previously prevented decoding a message into a frozen dataclass or attrs type.

jcrist force-pushed the support-attrs branch from bb797d9 to 36d1aa4 Compare March 18, 2023 01:32

jcrist merged commit 1896fb4 into main Mar 18, 2023
7 checks passed

jcrist deleted the support-attrs branch March 18, 2023 02:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support encoding and decoding attrs types #323

Support encoding and decoding attrs types #323

jcrist commented Feb 16, 2023 •

edited

jcrist commented Feb 16, 2023

ltworf commented Feb 28, 2023

jcrist commented Feb 28, 2023

Tinche commented Mar 16, 2023

jcrist commented Mar 16, 2023

Support encoding and decoding attrs types #323

Support encoding and decoding attrs types #323

Conversation

jcrist commented Feb 16, 2023 • edited

jcrist commented Feb 16, 2023

ltworf commented Feb 28, 2023

jcrist commented Feb 28, 2023

Tinche commented Mar 16, 2023

jcrist commented Mar 16, 2023

jcrist commented Feb 16, 2023 •

edited