Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support encoding and decoding attrs types #323

Merged
merged 5 commits into from Mar 18, 2023
Merged

Support encoding and decoding attrs types #323

merged 5 commits into from Mar 18, 2023

Conversation

jcrist
Copy link
Owner

@jcrist jcrist commented Feb 16, 2023

This adds support for encoding/decoding attrs types. It's mostly built off of our existing dataclasses functionality, and has the same restrictions:

  • When encoding, any object attribute without a leading underscore is encoded using the attribute name, even if it's not marked as an attribute as part of the class definition. This is for performance reasons, accessing __attrs_attrs__ at encoding time comes at a perf cost.
  • When decoding, the __init__ method on the class is not called. This is for both efficiency and correctness reasons. As such, classes that define their own __init__ or __attrs_init__ will not have this called during the decoding process. __attrs_pre_init__ and __attrs_post_init__ methods are called at the proper times though.
  • Currently default factories with takes_self=True are not supported. We could support this, but it was more work than I wanted to do right now.

Note that right now neither attrs' validators or convertors are run on decode. This is fixable, but would require a lot more work since this would diverge from the existing dataclasses implementation.

All in all this was pretty straightforward to get working. It's nice that dataclasses and attrs implementations haven't diverged too much here.

This also fixes a bug in the existing dataclass decoder that prevented decoding dataclasses with frozen=True configured.

Fixes #51.

Related to (but doesn't resolve) #316.

@jcrist
Copy link
Owner Author

jcrist commented Feb 16, 2023

A quick example:

In [1]: import attrs                           

In [2]: import msgspec                         

In [3]: @attrs.define                          
   ...: class User:                            
   ...:     name: str                          
   ...:     groups: list[str] = []
   ...:     email: str | None = None
   ...:                                        

In [4]: alice = User("alice", groups=["admin", "engineering"])

In [5]: msg = msgspec.json.encode(alice)

In [6]: msg                                    
Out[6]: b'{"email":null,"groups":["admin","engineering"],"name":"alice"}'

In [7]: msgspec.json.decode(msg, type=User)
Out[7]: User(name='alice', groups=['admin', 'engineering'], email=None)

In [8]: msgspec.json.decode(b'{"name": 123}', type=User)
---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In [8], line 1                            
----> 1 msgspec.json.decode(b'{"name": 123}', type=User)

ValidationError: Expected `str`, got `int` - at `$.name`

@ltworf
Copy link

ltworf commented Feb 28, 2023

When decoding, the init method on the class is not called.

Does this still work if the field had defined a converter?

@jcrist
Copy link
Owner Author

jcrist commented Feb 28, 2023

Does this still work if the field had defined a converter?

Converters and validators aren't currently supported with the above model. I'm not an attrs user, so I'm not sure what the expected behavior is here - I was hoping to get some feedback on this from @Tinche on what's expected before moving forward with this.

@Tinche
Copy link

Tinche commented Mar 16, 2023

Looks pretty cool to me! I personally don't use validators, and converters sparingly so this is a good first step.

Why do you skip attributes with a leading underscore though? We use MongoDB a lot (through our own ODM based on attrs ;) and Mongo documents require an _id field. So all my database models have that, and I'd expect it to be in the payload.

@jcrist
Copy link
Owner Author

jcrist commented Mar 16, 2023

Looks pretty cool to me! I personally don't use validators, and converters sparingly so this is a good first step.

Thanks for the review! Just to clarify - do you think this is good to merge as is, and would be useful for you (or others) without immediate further changes?

Why do you skip attributes with a leading underscore though?

For performance reasons, we don't touch __attrs_attrs__/__dataclass_fields__ at all during encode time - all attributes without a leading underscore found in either __dict__ or __slots__ on an object are encoded. FWIW this is the same behavior as orjson has for dataclasses, and seems to work well enough in practice. We drop the leading underscore attributes since sometimes users stash private state on an object with a leading-underscore attribute, but in my experience it's rare to have a true dataclass field have a leading underscore (I don't use attrs, so no experience there).

Mongo documents require an _id field. So all my database models have that, and I'd expect it to be in the payload.

For struct types, we support renaming fields for this use case. The assumption is that user code probably wants to work with python-friendly attribute names (e.g. id), and the name used for encoding is metadata on the field. This isn't exposed at the msgspec.field level yet (msgspec.field itself is pretty new), so currently you'd do this as:

In [1]: import msgspec

In [2]: class Person(msgspec.Struct, rename={"id": "_id"}):
   ...:     id: int
   ...:     name: str
   ...: 

In [3]: alice = Person(1, "alice")

In [4]: alice.id  # attribute is original field name
Out[4]: 1

In [5]: alice
Out[5]: Person(id=1, name='alice')

In [6]: msgspec.json.encode(alice)  # encoding uses renamed name
Out[6]: b'{"_id":1,"name":"alice"}'

In [7]: msgspec.json.decode(_, type=Person)  # as does decoding
Out[7]: Person(id=1, name='alice')

Currently this feature is only exposed for Struct types, since dataclasses/attrs don't have an equivalent existing config option. We could expose this through the arbitrary metadata option if needed, but I'd rather not until someone asks for it. My recommendation for users needing this feature is to switch to msgspec.Struct types. They're more performant, and have more config options.


In summary:

  • The current implementation of what's encoded on an attrs/dataclass type is done for performance/ease of implementation. It doesn't look at __attrs_attrs__/__dataclass_fields__ at all. We could change this to respect the defined list of fields, but it would complicate things. Happy to do it if asked, but not until a user (or you) asks for it.
  • msgspec also supports configuring alternative names for encoding on msgspec.Struct types only. We could expose this for dataclasses/attrs if needed, but for now I'd recommend users to switch to using a msgspec.Struct in those cases. It's not clear to me how we'd expose this config option for dataclasses/attrs anyway.

These uses the same code path as `dataclasses`, with the same behaviors
and restrictions. In particular any attribute lacking a leading
underscore on an `attrs` type is encoded, regardless of if it's declared
as an attribute.
These follow the same code path as `dataclasses`, and have the same
restrictions:

- The generated `__init__` is not called. `msgspec` calls the proper
  hooks in the proper order.
- `__attrs_pre_init__` and `__attrs_post_init__` hooks are both
  supported and will be called on decode.
- `attrs.Factory(..., takes_self=True)` defaults aren't currently
  supported, but could be if someone asks for it.
This fixes a bug in both the dataclass and attrs decoder implementations
that previously prevented decoding a message into a frozen dataclass or
attrs type.
@jcrist jcrist merged commit 1896fb4 into main Mar 18, 2023
7 checks passed
@jcrist jcrist deleted the support-attrs branch March 18, 2023 02:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Attrs Support?
3 participants