Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Explain _how_ dataclass & TypedDict are supported as (body) models #3202

Open
tuukkamustonen opened this issue Mar 14, 2024 · 13 comments
Labels
Documentation 📚 This is related to documentation

Comments

@tuukkamustonen
Copy link
Contributor

tuukkamustonen commented Mar 14, 2024

Summary

https://docs.litestar.dev/2/usage/requests.html#request-body states that:

The type of data an be any supported type, including

  1. It's not obvious how dataclasses (and TypedDicts and Pydantic models) are supported
  2. Link to plugins leads to the page that explain the plugin architecture - where is the list for seeing which plugins exist / are enabled?
  3. Pydantic support is actually implemented as a plugin, but it's in the "standard list" above

More on 1:

By "support for dataclasses/TypedDicts" you'd expect to be able to declare OpenAPI constraints and extra validations somehow, e.g.:

import json
from dataclasses import dataclass, field
from typing import Annotated

from litestar import post
from litestar.app import Litestar
from litestar.params import Parameter, KwargDefinition


@dataclass
class DataBody:
    foo1: Annotated[str, Parameter(description="ipsum", gt=1)] = "dummy"
    foo2: Annotated[str, Parameter(description="ipsum", gt=1)] = field(default="dummy")


@dataclass
class DictBody:
    foo1: Annotated[str, KwargDefinition(description="ipsum", gt=1)]
    foo2: Annotated[str, KwargDefinition(default="hello")]  # marked as required


@post("/1")
async def handler1(data: DataBody) -> None: ...


@post("/2")
async def handler2(data: DictBody) -> None: ...


app = Litestar([handler1, handler2])

print(json.dumps(app.openapi_schema.to_schema(), indent=4))

The generated OpenAPI schema output is missing default in many cases (but that's a separate issue, see #3201).

The thing is, how can you:

  • Define the OpenAPI constraints for TypedDicts and dataclasses?
  • Define custom validations for them?

Pydantic models have Field, Msgspec has Meta. Dataclasses have dataclass.field but you cannot specify OpenAPI constraints through that. TypedDicts don't have that, and cannot even have default values.

It would appear that you can just use Parameter or KwargDefinition in TypedDicts and dataclasses as shown in the above snippet, and maybe it (mostly) works? But this should be instructed in the docs (and decision made if that is intended as supported or not). Also, that would allow to define a default also for TypedDict (via KwargDefinition(default=...) but then that actually doesn't work. Misleading APIs like that should be blocked (although maybe that's a quite special case).


Note

While we are open for sponsoring on GitHub Sponsors and
OpenCollective, we also utilize Polar.sh to engage in pledge-based sponsorship.

Check out all issues funded or available for funding on our Polar.sh dashboard

  • If you would like to see an issue prioritized, make a pledge towards it!
  • We receive the pledge once the issue is completed & verified
  • This, along with engagement in the community, helps us know which features are a priority to our users.
Fund with Polar
@tuukkamustonen tuukkamustonen added the Documentation 📚 This is related to documentation label Mar 14, 2024
@guacs
Copy link
Member

guacs commented Mar 14, 2024

The "how" of the support for the various types is kind of implementation details which is why I assume it hasn't been documented. Basically all the validation etc. comes from msgspec and the other stuff like generation of OpenAPI works by using the official APIs to interact with each of them.

The reason pydantic is kept in that list even though it's not part of the standard library, it's very widely used and Pydantic was a core part of starlite (v1).

NOTE: These are all assumptions I'm making because I wasn't a part of the project while those docs were written.

@guacs
Copy link
Member

guacs commented Mar 14, 2024

@dataclass
class MyBody:
    foo1: Annotated[str, field(default="dummy"), Parameter(description="ipsum", gt=1)]
    foo2: Annotated[str, Parameter(description="ipsum", gt=1)] = "dummy"
    foo3: str = "dummy"
    

Specifying the default like in foo1 will not work because it's not recognized by dataclasses.fields and we use that when we parse the model.

@tuukkamustonen
Copy link
Contributor Author

tuukkamustonen commented Mar 14, 2024

Specifying the default like in foo1 will not work because it's not recognized by dataclasses.fields and we use that when we parse the model.

Hmm, I don't get this. dataclasses.field(default=...) is supported for usual dataclasses, so do you refer to that with dataclasses.fields plural? And what does it mean that it's not supported when "we use that when we parse the model"?

Can you re-phrase the same 😄 ?

@provinzkraut
Copy link
Member

I don't get this. dataclasses.field(default=...) is supported for usual dataclasses,

It is, but not within Annotated. Dataclasses only support declaring fields via assignment:

from typing import Annotated
from dataclasses import dataclass, field, fields

@dataclass
class One:
  some: str = field(default="thing")

@dataclass
class Two:
  some: Annotated[str, field(default="thing")]


print(fields(One)[0].default)  # "something"
print(fields(Two)[0].default)  # <dataclasses._MISSING_TYPE object at 0x7f297be13290>

In general, Litestar doesn't try to enhance the support of the modelling library you chose beyond what it offers out of the box. If you want the Pydantic feature set, you should use Pydantic. If you want the attrs feature set, use attrs, etc. The reason for keeping it this way is that Litestar isn't trying to be a modelling library itself, and even though we already have quite a lot of code adjacent to these topics, I think it's important we stick to this rule so as to not let things get too bloated and complex.


It's not obvious how dataclasses (and TypedDicts and Pydantic models) are supported

Is that important? It might be interesting to some people, from a technical perspective, but in general I would consider this an implementation detail.

Pydantic support is actually implemented as a plugin, but it's in the "standard list" above

True, but that's because it is enabled by default, so the fact that support is implemented via a plugin is also just an implementation detail. Although, now that we've added some configuration options to this plugin, we might want to reconsider this stance.

@tuukkamustonen
Copy link
Contributor Author

tuukkamustonen commented Mar 17, 2024

Thanks for the clarifications. I wasn't aware that Annotated wasn't supported for dataclasses. Anyway, that's a bit of a side track.

Despite those facts, it's not obvious to the reader where the limitations are. At least my expectation was that you could use TypedDicts and dataclasses at least to the basic level (ie. generating OpenAPI constraints). Maybe even add custom validations, etc.

But if you cannot do those, it should be good to highlight that. "Support is TypedDicts and dataclasses is very simple, and you cannot ...".

Also, what good is supporting them in the first place, if you cannot do even the simplest of customization with them (e.g. setting field description)? Do people actually use them...?

In Discord __peter__ (is that @peterschutt?) even mentioned:

IDK - I haven't tried, have you tried declaring the type on the dataclass like field: Annotated[int, KwargDefinition(gt=3)]?

No, I havent' tried directly that. I did try Parameter and it (at least) partially worked, but it's certainly not obvious to the users that you can use KwargDefinition there. And then someone somewhere said that I shouldn't do that, so I don't know.

I believe the validity of this ticket stands - this topic should be not only better instructed, but someone should decide what is actually supported and how (and then briefly document it).

@provinzkraut
Copy link
Member

provinzkraut commented Mar 17, 2024

but someone should also decide what is supported and how (and then briefly document it).

My issue here is that documenting what is not supported is quite tricky and it's easier to just assume only the things that are documented are supported.
E.g. constraints on TypedDicts are not supported because constrained TypedDicts aren't a thing. That would be a feature unique to Litestar, which I have explained above why this isn't feasible.

Also, what good is supporting them in the first place, if you cannot do even the simplest of customization with them

I'd say mostly backwards compatibility and users who already happen to have those types at hand, e.g. from a third party library or other pre-existing code. It would also require an active effort from our part to not support TypedDicts.

At least my expectation was that you could use TypedDicts and dataclasses at least to the basic level (ie. generating OpenAPI constraints). Maybe even add custom validations, etc.

Could you elaborate why that was your expectation? Constraints and custom validations are something neither TypedDict nor dataclasses support, so I'm curious what gave the impression those would work here? IMO that's the point we should address / clarify in the docs if it leads to confusion.

"Support is TypedDicts and dataclasses is very simple, and you cannot ...".

Expanding on this, personally, I would find it confusing if I read docs that said "You cannot use value constraints in TypedDicts" because I would immediately think "I must have missed something, since when do TypedDicts support value constraints?". It seems counterintuitive to me to point out that a feature of a thing isn't supported if that feature isn't actually a feature of that thing in the first place.

@provinzkraut
Copy link
Member

provinzkraut commented Mar 17, 2024

Also

In Discord __peter__ (is that @peterschutt?) even mentioned:

Yes, that's him :)

@provinzkraut
Copy link
Member

No, I havent' tried directly that. I did try Parameter and it (at least) partially worked, but it's certainly not obvious to the users that you can use KwargDefinition there. And then someone somewhere said that I shouldn't do that, so I don't know.

I think that would (should) be considered a hack and is definitely not an intentional public API for this 👀

@tuukkamustonen tuukkamustonen changed the title Docs: Explain _how_ dataclass/TypedDict/Msgspec/Pydantic are supported as (body) models Docs: Explain _how_ dataclass & TypedDict are supported as (body) models Mar 17, 2024
@tuukkamustonen
Copy link
Contributor Author

I'd say mostly backwards compatibility and users who already happen to have those types at hand, e.g. from a third party library or other pre-existing code. It would also require an active effort from our part to not support TypedDicts.

So people use those? Maybe in some super simple internal endpoints, I guess.

Could you elaborate why that was your expectation? Constraints and custom validations are something neither TypedDict nor dataclasses support, so I'm curious what gave the impression those would work here? IMO that's the point we should address / clarify in the docs if it leads to confusion.

It's what you expect of a web framework - that you can validate fields and customize the generated OpenAPI (one of the selling points of Litestar/FastAPI).

It's not a technical judgement, that when I see a TypedDict I expect some fanciness there. It's that I read the web framework docs and it defines that I might as well use TypedDicts to declare my parameters, I assume that "okay, that's pretty cool, they've thought this through and I guess I can use those then... one dependency less" 🤷🏻‍♂️

Expanding on this, personally, I would find it confusing if I read docs that said "You cannot use value constraints in TypedDicts" because I would immediately think "I must have missed something, since when do TypedDicts support value constraints?". It seems counterintuitive to me to point out that a feature of a thing isn't supported if that feature isn't actually a feature of that thing in the first place.

From my perspective, it's a good note. It helps me to realize that "ahh okay so dataclasses/TypedDicts are just for some simpe poccing and not for actual use".

@provinzkraut
Copy link
Member

So people use those? Maybe in some super simple internal endpoints, I guess.

I don't know if it's restricted to that, but even then, it's worth supporting :)

It helps me to realize that "ahh okay so dataclasses/TypedDicts are just for some simpe poccing and not for actual use".

This part I don't understand. They are very much for actual use, it's more like "If I need features that dataclasses/TypedDicts don't have, I need to use something other than dataclasses/TypedDicts". That's why Litestar has adopted the "bring your own modelling library" approach. If you want the Pydantic features, use Pydantic. If you want the msgspec features, use msgspec. If you want the (c)attrs feature, use (c)attrs. If you want to use another library that's not supported out of the box, you can write your own plugin. And if you need more feature than TypedDicts have, than don't use TypedDicts.

If we were going to be strict about what you're asking, we'd have to list every difference in modelling behaviour between all those libraries. I wouldn't expect a Pydantic constraint to work on an attrs class. Why should that be different for dataclasses/TypedDicts?

I'd like to make it obvious from reading the docs that for each library, you get the feature set that it provides.

@tuukkamustonen
Copy link
Contributor Author

I understand your argument, but I think mine makes sense as well. My argument is that you should hint the reader of what to expect, and your view differs. Maybe it's not worth iterating this more than this, and we can let other chime in if they will.

As I see it, this topic has been a bit confusing also for the other Litestar maintainers so I think there's a case here.

But if you think there's nothing to do here, feel free to close it. Even having a ticket about it will help people in the future to find the reasoning and history, if they're wondering about the same.

@peterschutt
Copy link
Contributor

I find this a little confusing:

I think that would (should) be considered a hack and is definitely not an intentional public API for this 👀

It is in contrast to the reality of what we actually do. We pull anything out of the metadata for a type that we parse and add it to a ParameterDefinition instance for the type as long as it has the right attributes:

litestar/litestar/typing.py

Lines 104 to 130 in 6e7e54a

return {
k: v
for k, v in {
"gt": getattr(value, "gt", None),
"ge": getattr(value, "ge", None),
"lt": getattr(value, "lt", None),
"le": getattr(value, "le", None),
"multiple_of": getattr(value, "multiple_of", None),
"min_length": None if is_sequence_container else getattr(value, "min_length", None),
"max_length": None if is_sequence_container else getattr(value, "max_length", None),
"description": getattr(value, "description", None),
"examples": example_list,
"title": getattr(value, "title", None),
"lower_case": getattr(value, "to_lower", None),
"upper_case": getattr(value, "to_upper", None),
"pattern": getattr(value, "regex", getattr(value, "pattern", None)),
"min_items": getattr(value, "min_items", getattr(value, "min_length", None))
if is_sequence_container
else None,
"max_items": getattr(value, "max_items", getattr(value, "max_length", None))
if is_sequence_container
else None,
"const": getattr(value, "const", None) is not None,
**extra,
}.items()
if v is not None
}

In general, Litestar doesn't try to enhance the support of the modelling library you chose beyond what it offers out of the box.

Yet we have a unit test checking that annotated-types is supported on a dataclass definition for schema generation:

def test_annotated_types() -> None:
historical_date = date(year=1980, day=1, month=1)
today = date.today()
@dataclass
class MyDataclass:
constrained_int: Annotated[int, annotated_types.Gt(1), annotated_types.Lt(10)]
constrained_float: Annotated[float, annotated_types.Ge(1), annotated_types.Le(10)]
constrained_date: Annotated[date, annotated_types.Interval(gt=historical_date, lt=today)]
constrained_lower_case: Annotated[str, annotated_types.LowerCase]
constrained_upper_case: Annotated[str, annotated_types.UpperCase]
constrained_is_ascii: Annotated[str, annotated_types.IsAscii]
constrained_is_digit: Annotated[str, annotated_types.IsDigits]
schema = get_schema_for_field_definition(FieldDefinition.from_kwarg(name="MyDataclass", annotation=MyDataclass))
assert schema.properties["constrained_int"].exclusive_minimum == 1 # type: ignore[index, union-attr]
assert schema.properties["constrained_int"].exclusive_maximum == 10 # type: ignore[index, union-attr]
assert schema.properties["constrained_float"].minimum == 1 # type: ignore[index, union-attr]
assert schema.properties["constrained_float"].maximum == 10 # type: ignore[index, union-attr]
assert datetime.utcfromtimestamp(schema.properties["constrained_date"].exclusive_minimum) == datetime.fromordinal( # type: ignore[arg-type, index, union-attr]
historical_date.toordinal()
)
assert datetime.utcfromtimestamp(schema.properties["constrained_date"].exclusive_maximum) == datetime.fromordinal( # type: ignore[arg-type, index, union-attr]
today.toordinal()
)
assert schema.properties["constrained_lower_case"].description == "must be in lower case" # type: ignore[index]
assert schema.properties["constrained_upper_case"].description == "must be in upper case" # type: ignore[index]
assert schema.properties["constrained_is_ascii"].pattern == "[[:ascii:]]" # type: ignore[index, union-attr]
assert schema.properties["constrained_is_digit"].pattern == "[[:digit:]]" # type: ignore[index, union-attr]

So for the following application:

from dataclasses import dataclass

import annotated_types
from typing_extensions import Annotated

from litestar import Litestar, post

@dataclass
class Foo:
    foo: Annotated[str, annotated_types.MaxLen(3)]

@post()
async def post_handler(data: Foo) -> None:
    ...

app = Litestar([post_handler])

We do render the constraint:

image

.. yet it is not enforced:

image

However, add a DTO into the mix:

@post(dto=DataclassDTO[Foo])
async def post_handler(data: Foo) -> None:
    ...

And now those constraints are enforced:

image

This is all a by-product of the way that we indiscriminately parse anything out of type metadata if it smells like a thing that declares schema constraints, immediately upon parsing the type. And once we have the constraint parsed, the DTOs will convert it into msgspec.Meta for the transfer model.

It also opens us up to weird issues like this app failing on schema generation:

from dataclasses import dataclass

from typing_extensions import Annotated

from litestar import Litestar, post

@dataclass
class Foo:
    foo: Annotated[str, "totally unrelated metadata"]

@post()
async def post_handler(data: Foo) -> None:
    ...

app = Litestar([post_handler], debug=True)

image

This occurs because the string "totally unrelated metadata" has a title attribute that has nothing to do with schema constraints, and is a method on the object. This indiscriminate parsing of whatever is in the type metadata goes against the way annotated is supposed to be used:

If a library or tool encounters an annotation Annotated[T, x] and has no special logic for the metadata, it should ignore the metadata and simply treat the annotation as T.

We make no attempt to ignore metadata that doesn't apply to us.

If you want the Pydantic feature set, you should use Pydantic. If you want the attrs feature set, use attrs, etc. The reason for keeping it this way is that Litestar isn't trying to be a modelling library itself, and even though we already have quite a lot of code adjacent to these topics, I think it's important we stick to this rule so as to not let things get too bloated and complex.

This is the way it should be. Metadata should only be parsed by plugins that are aware of the framework they represent and how that framework carries constraint info, if at all.

The current approach of indiscriminate metadata parsing opens us up to a world of hyrums-law hurt. E.g., this works:

from dataclasses import dataclass

from typing_extensions import Annotated

from litestar import Litestar, post

class Constraint:
    max_length = 3

@dataclass
class Foo:
    foo: Annotated[str, Constraint]

@post()
async def post_handler(data: Foo) -> None:
    ...

app = Litestar([post_handler], debug=True)

And via DTOs the constraint would also be enforced.

So does this:

from dataclasses import dataclass

from typing_extensions import Annotated

from litestar import Litestar, get

class Constraint:
    max_length = 3

@get()
async def get_handler(arg: Annotated[str, Constraint]) -> None:
    ...

app = Litestar([get_handler], debug=True)

image

image

For parameters, we should only parse constraints from KwargDefinition sub-types and ignore anything else. For parsing models, we should only parse constraints from the model as per the modelling libraries official support for supplying such constraints. I think that being clear and precise about how and where we draw schema constraints from annotations will be an overall improvement to understanding how litestar works, and will allow us to simplify internals by cordoning off 3rd party framework specific logic into plugins.

Given how permissive we've been parsing metadata so far, I think any work to tighten this up needs to be against the 3.0 branch.

peterschutt added a commit that referenced this issue Apr 3, 2024
This PR aims to change our approach of parsing schema metadata from an indiscriminate ducktype approach to a deliberate approach with situational awareness.

For parameter declarations, we should only parse schema metadata from instances of `KwargDefinition` and sub-types.

For model parsing, including dataclasses, typeddict and 3rd party modelling libraries, we should only parse metadata according to that modelling libraries supported methods of constraint delivery.

For #3202
@tuukkamustonen
Copy link
Contributor Author

tuukkamustonen commented Apr 4, 2024

That's a really great investigation @peterschutt

In short, what I've seen from the codebase, crafting the OpenAPI spec via getattr() and friends make the code really difficult to follow, too. And impossible to lint. You lose all the nice typing capabilities that we have these days.

The parsing should really be much more strict (and typed) as suggested. 💯 for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation 📚 This is related to documentation
Projects
None yet
Development

No branches or pull requests

4 participants