Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protected namespaces #4915

Closed
samuelcolvin opened this issue Jan 6, 2023 · 18 comments · Fixed by #5768
Closed

Protected namespaces #4915

samuelcolvin opened this issue Jan 6, 2023 · 18 comments · Fixed by #5768
Assignees

Comments

@samuelcolvin
Copy link
Member

samuelcolvin commented Jan 6, 2023

See #4617.

We need to:

  • add the concept of protected namespaces on models, by default just model_
  • stop fields being added which conflict with them
  • With config.extra = 'allow' ignore extra items which conflict with these namespaces - maybe not!

I think it's okay for users to add methods which conflict with namespaces, though we should warn in the docs that that could break in future.

@adriangb
Copy link
Member

adriangb commented Jan 6, 2023

I imagine you've considered this before but I want to bring it up here since it's directly related: could we avoiding protected namespaces altogether by making these methods free floating functions like they are for dataclasses?

from pydantic import BaseModel, to_json


class MyModel(BaseModel):
    json: Any

to_json(MyModel(json={}))

I think the main cons would be:

  1. User experience. More imports, possible name conflicts with other imports. But this is very common (pandas/numpy, dataclasses and many others are in similar boats) and solvable by doing import pydantic as pyd or other options.
  2. Caching data / accessing private bits of models. Either BaseModel would have to expose internal bits as public or to_json would have to access private bits of a class it does not own.

@samuelcolvin
Copy link
Member Author

I've thought about this a lot, see #1001. I even pitched a slightly crazy idea at the language summit at pycon last year and got shot down, see here.

I really don't want to loose the great DX of my_model.<tab> -- shown list of methods, even though it causes a bunch of problems like this.

However we will need an equivalent of to_json and other methods to deal with the case where there's no pydantic model, see #4669, so you could perhaps use them with a NamedTuple, TypedDict or dataclass. If none of those types support extra properly, we could perhaps add a BareBaseModel with no methods but that still included __fields_set__ and config.extra. WDYT?

@adriangb
Copy link
Member

adriangb commented Jan 6, 2023

I even pitched a slightly crazy idea at the language summit at pycon last year and got shot down

Oh yeah I remember this discussion 😄

we will need an equivalent of to_json and other methods to deal with the case where there's no pydantic model

I think this is a really strong argument for having them as top level methods!
I just read through #1001 which is really helpful context on this.
My vote would be to have the top level methods be the actual implementation / low level version and provide aliases on models as conveniences with some sort of shadowing scheme (no strong opinion as to error when the class is defined by default, error at runtime by default or allow by default).

I don't think concerns over frameworks like FastAPI accessing the shadowed methods and finding a user field instead hold water: it would be easy to say that frameworks should always use the top level methods. An extra line of code or two in a framework is not a problem, it's the end-user DX where every character typed really matters.

I also don't think autocompletion is a big deal: we're just using BaseModel as a namespace so if users do import pydantic as pyd or something they'll get the same autocompletion when they type pyd.to_j.

Would you be concerned at all with internal implementation bits? Top level methods would have a hard time modifying the model or accessing private things.

@samuelcolvin
Copy link
Member Author

I agreed, also most customisation will be via @validator and @serializer decorators, not custom implementations of model_json() etc.

If we kept the names unchanged:

  • model_dump()
  • model_dump_json()
  • model_json_schema()
  • model_validate_json()
  • etc.

Then it would be even easier for end users.

@adriangb
Copy link
Member

adriangb commented Jan 6, 2023

I agreed

Sorry just to clarify, are you agreeing that top level methods with convenience aliases on the model class are the way to go?

@samuelcolvin
Copy link
Member Author

Yes exactly.

The "what should FastAPI do" question is a bit more complicated, but doesn't need to be decided now.

@adriangb
Copy link
Member

adriangb commented Jan 6, 2023

Awesome. I guess then the discussion is:

  1. What is the default shadowing rule for fields? I would vote for "do nothing special". That is, define the method as a regular method on the class and if users shadow it with a field their type checker will warn them but Pydantic does nothing at runtime. Users can then just ignore this via a # type: ignore.
  2. What is the default shadowing rule for extras? I think this should be config.extra = "ignore" by default (for performance reasons mainly) with config.extra = "allow" resulting in shadowing being allowed. I believe we're in agreement on this one.

So then I would call this a "soft" protection. We'd add something to the docs along the lines of "Pydantic uses the model_ namespace for internal things, we may add more methods in the future, avoid overriding these methods if possible".

@samuelcolvin
Copy link
Member Author

On extra we definitely need a hard block - either validation error or matching item are just ignored.

For fields I would argue for a hard block (error) too, but I guess we could use a UserWarning instead.

@dmontagu
Copy link
Contributor

A suggestion I'd like to offer for consideration: retaining the convenience methods with public names, but instead of using model_<method_name> for the methods on BaseModel, we could use model__<method_name> (two underscores between model and the method name).

I think using the double underscore would significantly reduce the opportunity for conflicts in the course of normal usage (e.g., wanting to have a field named model_json), while retaining the autocompletion benefits of being methods with a "public"-format identifier. And the double underscore also serves as a convenient differentiator between library-provided methods and user-defined methods/fields/etc.

I personally don't think the double underscore would have a significant impact on readability or aesthetic of the code, but I wouldn't be surprised by disagreement 😅.

@samuelcolvin
Copy link
Member Author

We should include BareModel - Better name required! in this.

@dmontagu
Copy link
Contributor

This is waiting on #4617

@astoff
Copy link

astoff commented Sep 13, 2023

We should include BareModel - Better name required! in this.

Do you mean a variant of BaseModel without predefined methods, which would then be available as regular functions from the pydantic module? I would really like to have this, both as a matter of stylistic preference and because working with anything related to "models" (e.g. machine learning models) became annoying in Pydantic 2.x.

@adriangb
Copy link
Member

Dataclasses (pydantic.dataclass) work pretty well as a BareModel now

@seperman
Copy link

Hi, it would have been great if model_config had been renamed to _model_config to signify it is a private variable and has nothing to do with the other fields. model_ is a common prefix for field names and setting the protected namespace to empty anywhere that we have a model_xxx field, seems to be defeating the purpose of having protected namespaces.

@adriangb
Copy link
Member

@seperman could you use dataclasses and TypeAdapter for your use case? If it's FastAPI you won't even need to use TypeAdapter.

@seperman
Copy link

@adriangb Can you please point me to the right resource? Yes I'm using FastApi.
What are the pros and cons of using dataclasses vs. Pydantic for FastApi? Thanks.

@adriangb
Copy link
Member

https://fastapi.tiangolo.com/advanced/dataclasses/

The pro, for your use case, is that there’s no protected namespace. The (general) con is that not all of the same features are supported and that you’d have to use a TypeAdapter (https://docs.pydantic.dev/latest/api/type_adapter/) to get validation outside of FasAPI (that may even be a pro).

A lot of the Pydantic features will still work if you use Annotated, eg x: Annotated[int, Pydantic.Field(gt=0)] will work with a standard library dataclass.

@seperman
Copy link

Cool. Thanks @adriangb !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

6 participants