Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences between BaseModel and @dataclass not expected based on documentation #710

Closed
SethMMorton opened this issue Aug 3, 2019 · 13 comments

Comments

@SethMMorton
Copy link

Documentation Update Request

For bugs/questions:

  • OS: Linux
  • Python version: 3.7.3 | packaged by conda-forge
  • Pydantic version: 0.31.0

I liked the idea of using a dataclass instead of subclassing from BaseModel, so I tried changing the very first example from the docs to use dataclass instead of BaseModel and it fails.

from datetime import datetime
from typing import List
# from pydantic import BaseModel
from pydantic.dataclasses import dataclass

# class User(BaseModel):
@dataclass
class User:
    id: int
    name = 'John Doe'
    signup_ts: datetime = None
    friends: List[int] = []

external_data = {'id': '123', 'signup_ts': '2017-06-01 12:22', 'friends': [1, '2', b'3']}
user = User(**external_data)
print(user)
# > User id=123 name='John Doe' signup_ts=datetime.datetime(2017, 6, 1, 12, 22) friends=[1, 2, 3]
print(user.id)
# > 123

Result:

Traceback (most recent call last):
  File "my_pydantic_test.py", line 7, in <module>
    @dataclass
  File "pydantic/dataclasses.py", line 128, in pydantic.dataclasses.dataclass
    # +-------+-------+-------+--------+--------+
  File "pydantic/dataclasses.py", line 123, in pydantic.dataclasses.dataclass.wrap
    #    |       |       |
  File "pydantic/dataclasses.py", line 77, in pydantic.dataclasses._process_class
    #    +--- frozen= parameter
  File "/path/to/python/lib/python3.7/dataclasses.py", line 834, in _process_class
    for name, type in cls_annotations.items()]
  File "/path/to/python/lib/python3.7/dataclasses.py", line 834, in <listcomp>
    for name, type in cls_annotations.items()]
  File "/path/to/python/lib/python3.7/dataclasses.py", line 727, in _get_field
    raise ValueError(f'mutable default {type(f.default)} for field '
ValueError: mutable default <class 'list'> for field friends is not allowed: use default_factory

I realize that this error is coming from the Std. Lib. dataclasses module, not pydantic. However, based on the language in the dataclasses section of the documentation I had expected what anything I could do with BaseModel I could do with dataclass as well.

If you don’t want to use pydantic’s BaseModel you can instead get the same data validation on standard dataclasses (introduced in python 3.7).

You can use all the standard pydantic field types and the resulting dataclass will be identical to the one created by the standard library dataclass decorator.

Can I suggest that there be a note or warning to the user that there are certain restrictions associated with using a dataclass that are not present when using BaseModel (such as not being able to use mutable defaults, as well as #484 and #639)?

@samuelcolvin
Copy link
Member

happy to accept a PR to improve the documentation.

@peteboothroyd
Copy link

hi @samuelcolvin, first just want to say thanks for such a nice tool! So the reason I believe the mutable fields are not allowed on regular data classes is due to them being shared (probably unexpectedly for most people) similar to how if you do this:

class Y(object):
    def __init__(self,mutable=[]):
        self._mutable = mutable

y1 = Y()              # y1._mutable = []
y2 = Y()              # y2._mutable = []
y1._mutable.append(1) # y1._mutable = [1], but surprise! y2._mutable = [1]

The Pydantic BaseModel does not seem to suffer from this:

class X(pydantic.BaseModel):
    list_: List[int] = []

x1 = X()           # x1.list_ = []
x2 = X()           # x2.list_ = []
x1.list_.append(1) # x1.list_ = [1], x2.list_ = []

Have I understood that correctly? (Sorry if it's in the docs, I looked but couldn't find it specifically mentioned)

@leiserfg
Copy link

Is not a pydantic error, check https://docs.python.org/3/library/dataclasses.html#dataclasses.field

@SethMMorton
Copy link
Author

SethMMorton commented Sep 11, 2019

@leiserfg Yes, I tried to make that point very clear in the original issue with the text

I realize that this error is coming from the Std. Lib. dataclasses module, not pydantic.

The point of this issue is that the documentation makes it seem like you can get the same behavior inheriting from BaseModel as with using dataclass, but in corner cases like this that is not possible, and it should be documented as such. I do NOT think any pydantic behavior should be changed WRT to this issue.

@leiserfg
Copy link

I had another issue with dataclasses, they don't support extra fields even when you have Extra.ignore (the default), that's because the generated __init__ does not allow extra arguments, so I'm using BaseModel again.

@samuelcolvin
Copy link
Member

We should make it clear that pydantic.dataclasses.dataclass is (mostly) a drop in replacement for dataclasses.dataclass with validation, not a replacement for pydantic.BaseModel.

@damonallison
Copy link

I stumbled upon this issue when trying to understand the functional differences between pydantic.dataclasses.dataclass and pydantic.BaseModel. The documentation on dataclasses starts with:

If you don't want to use pydantic's BaseModel you can instead get the same data validation on standard [dataclasses]

My question is: WHY would I not want to use BaseModel? Performance? Simplicity? I feel like the documentation should include that - so users can determine what they are optimizing for.

From the comments, there are a few key differences:

  • How mutable field defaults are handled. BaseModel does not require default_factory for mutable defaults.
  • BaseModel handles extra fields.

Are their any other differences between the two? Should I put a PR in to update the docs to include a comparison?

@samuelcolvin
Copy link
Member

Thanks for the question, it's probably not worth putting in a PR at this time as we're in the middle of a rebuild for V2.

The main reason to use dataclasses is compatibility with other tools/code and providing a quick way to switch from vanilla dataclasses to pydantic.

@plannigan
Copy link

Another different I found when experimenting with the two:

  • When using BaseModel, initializing the class without one or more of the fields will raise a ValditionError that mentions all of the missing fields.
  • When using dataclass, initializing the class without one or more of the fields will raise a TypeError that mentions only the first missing field.

@x0s
Copy link

x0s commented Nov 14, 2022

Another different I found when experimenting with the two:

* When using `BaseModel`, initializing the class without one or more of the fields will raise a `ValditionError` that mentions all of the missing fields.

* When using `dataclass`, initializing the class without one or more of the fields will raise a `TypeError` that mentions only the first missing field.

I cannot reproduce what you are saying. Both situations mentions all the missing fields: (python 3.10.4 and pydantic 1.10.2)

import dataclasses
import pydantic

@dataclasses.dataclass
class ChairBuiltin:
    width: int
    height: int

@pydantic.dataclasses.dataclass
class ChairPydantic:
    width: int
    height: int

class ChairPydanticBaseModel(pydantic.BaseModel):
    width: int
    height: int

ChairBuiltin()
# TypeError: ChairBuiltin.__init__() missing 2 required positional arguments: 'width' and 'height'

ChairPydantic()
# TypeError: ChairPydantic.__init__() missing 2 required positional arguments: 'width' and 'height'

ChairPydanticBaseModel()
# ValidationError: 2 validation errors for ChairPydanticBaseModel
#width
#  field required (type=value_error.missing)
#height
#  field required (type=value_error.missing)

@matrumz
Copy link

matrumz commented Sep 8, 2023

Another difference I've discovered: using BaseModel will make __post_init__() unused.

@plusls
Copy link

plusls commented May 27, 2024

after test, I found that the pydantic.dataclass is faster than basemodel

I test it in python3.11 debian

class ThirdPartType(NamedTuple):
    a: int
print('TypedDict', timeit('t.a', setup="t=ThirdPartType(a=114)\n", globals=globals()))
class ThirdPartType:
    a: int

class ThirdPartType(TypedDict):
    a: int
print('TypedDict', timeit('t["a"]', setup="t=ThirdPartType(a=114)\n", globals=globals()))
class ThirdPartType:
    a: int
print('vanila',timeit('t.a', setup="t=ThirdPartType()\nt.a=114", globals=globals()))

class ThirdPartType(BaseModel):
    a: int
print('BaseModel', timeit('t.a', setup="t=ThirdPartType(a=114)", globals=globals()))
@dataclass
class ThirdPartType:
    a: int
print('pydantic.dataclass', timeit('t.a', setup="t=ThirdPartType(a=114)", globals=globals()))
from dataclasses import dataclass
@dataclass
class ThirdPartType:
    a: int
print('dataclasses.dataclass',timeit('t.a', setup="t=ThirdPartType(a=114)", globals=globals()))

the result is:

NamedTuple 0.015396732982480898
TypedDict 0.014935008977772668
vanila 0.009511373995337635
BaseModel 0.022553119983058423
pydantic.dataclass 0.010486075014341623
dataclasses.dataclass 0.009584977990016341

for the access speed,

vanila == dataclasses.dataclass > pydantic.dataclass > TypedDict > NamedTuple > BaseModel

@davnat
Copy link

davnat commented Aug 29, 2024

Another difference between BaseModel and @dataclass is: with BaseModel you can have defaulted attributes and required ones (without a default value) in any order you wish, while with @dataclass (both from pydantic and stdlib) you must have all required attributes before all defaulted ones.

from pydantic import BaseModel
from pydantic.dataclasses import dataclass

class Works(BaseModel):
    one: int = 0
    two: int

@dataclass
class Broken:
    one: int = 0
    two: int  # Mypy: Attributes without a default cannot follow attributes with one

# TypeError: non-default argument 'two' follows default argument

This is especially relevant with inheritance:

from pydantic.dataclasses import dataclass

@dataclass
class Base:
    field: int = 0

@dataclass
class Test(Base):
    test: int # Mypy: Attributes without a default cannot follow attributes with one

# TypeError: non-default argument 'test' follows default argument
from pydantic import BaseModel

class Base(BaseModel):
    field: int = 0

class Test(Base):
    test: int

t = Test(test=2)

# no errors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants