Differences between BaseModel and @dataclass not expected based on documentation #710

SethMMorton · 2019-08-03T00:06:45Z

Documentation Update Request

For bugs/questions:

OS: Linux
Python version: 3.7.3 | packaged by conda-forge
Pydantic version: 0.31.0

I liked the idea of using a dataclass instead of subclassing from BaseModel, so I tried changing the very first example from the docs to use dataclass instead of BaseModel and it fails.

from datetime import datetime
from typing import List
# from pydantic import BaseModel
from pydantic.dataclasses import dataclass

# class User(BaseModel):
@dataclass
class User:
    id: int
    name = 'John Doe'
    signup_ts: datetime = None
    friends: List[int] = []

external_data = {'id': '123', 'signup_ts': '2017-06-01 12:22', 'friends': [1, '2', b'3']}
user = User(**external_data)
print(user)
# > User id=123 name='John Doe' signup_ts=datetime.datetime(2017, 6, 1, 12, 22) friends=[1, 2, 3]
print(user.id)
# > 123

Result:

Traceback (most recent call last):
  File "my_pydantic_test.py", line 7, in <module>
    @dataclass
  File "pydantic/dataclasses.py", line 128, in pydantic.dataclasses.dataclass
    # +-------+-------+-------+--------+--------+
  File "pydantic/dataclasses.py", line 123, in pydantic.dataclasses.dataclass.wrap
    #    |       |       |
  File "pydantic/dataclasses.py", line 77, in pydantic.dataclasses._process_class
    #    +--- frozen= parameter
  File "/path/to/python/lib/python3.7/dataclasses.py", line 834, in _process_class
    for name, type in cls_annotations.items()]
  File "/path/to/python/lib/python3.7/dataclasses.py", line 834, in <listcomp>
    for name, type in cls_annotations.items()]
  File "/path/to/python/lib/python3.7/dataclasses.py", line 727, in _get_field
    raise ValueError(f'mutable default {type(f.default)} for field '
ValueError: mutable default <class 'list'> for field friends is not allowed: use default_factory

I realize that this error is coming from the Std. Lib. dataclasses module, not pydantic. However, based on the language in the dataclasses section of the documentation I had expected what anything I could do with BaseModel I could do with dataclass as well.

If you don’t want to use pydantic’s BaseModel you can instead get the same data validation on standard dataclasses (introduced in python 3.7).

You can use all the standard pydantic field types and the resulting dataclass will be identical to the one created by the standard library dataclass decorator.

Can I suggest that there be a note or warning to the user that there are certain restrictions associated with using a dataclass that are not present when using BaseModel (such as not being able to use mutable defaults, as well as #484 and #639)?

The text was updated successfully, but these errors were encountered:

samuelcolvin · 2019-08-05T09:23:16Z

happy to accept a PR to improve the documentation.

peteboothroyd · 2019-08-07T18:32:39Z

hi @samuelcolvin, first just want to say thanks for such a nice tool! So the reason I believe the mutable fields are not allowed on regular data classes is due to them being shared (probably unexpectedly for most people) similar to how if you do this:

class Y(object):
    def __init__(self,mutable=[]):
        self._mutable = mutable

y1 = Y()              # y1._mutable = []
y2 = Y()              # y2._mutable = []
y1._mutable.append(1) # y1._mutable = [1], but surprise! y2._mutable = [1]

The Pydantic BaseModel does not seem to suffer from this:

class X(pydantic.BaseModel):
    list_: List[int] = []

x1 = X()           # x1.list_ = []
x2 = X()           # x2.list_ = []
x1.list_.append(1) # x1.list_ = [1], x2.list_ = []

Have I understood that correctly? (Sorry if it's in the docs, I looked but couldn't find it specifically mentioned)

leiserfg · 2019-09-11T08:13:23Z

Is not a pydantic error, check https://docs.python.org/3/library/dataclasses.html#dataclasses.field

SethMMorton · 2019-09-11T15:56:54Z

@leiserfg Yes, I tried to make that point very clear in the original issue with the text

I realize that this error is coming from the Std. Lib. dataclasses module, not pydantic.

The point of this issue is that the documentation makes it seem like you can get the same behavior inheriting from BaseModel as with using dataclass, but in corner cases like this that is not possible, and it should be documented as such. I do NOT think any pydantic behavior should be changed WRT to this issue.

leiserfg · 2019-09-12T06:23:16Z

I had another issue with dataclasses, they don't support extra fields even when you have Extra.ignore (the default), that's because the generated __init__ does not allow extra arguments, so I'm using BaseModel again.

samuelcolvin · 2019-09-12T09:48:30Z

We should make it clear that pydantic.dataclasses.dataclass is (mostly) a drop in replacement for dataclasses.dataclass with validation, not a replacement for pydantic.BaseModel.

damonallison · 2022-09-09T14:01:37Z

I stumbled upon this issue when trying to understand the functional differences between pydantic.dataclasses.dataclass and pydantic.BaseModel. The documentation on dataclasses starts with:

If you don't want to use pydantic's BaseModel you can instead get the same data validation on standard [dataclasses]

My question is: WHY would I not want to use BaseModel? Performance? Simplicity? I feel like the documentation should include that - so users can determine what they are optimizing for.

From the comments, there are a few key differences:

How mutable field defaults are handled. BaseModel does not require default_factory for mutable defaults.
BaseModel handles extra fields.

Are their any other differences between the two? Should I put a PR in to update the docs to include a comparison?

samuelcolvin · 2022-09-09T15:05:17Z

Thanks for the question, it's probably not worth putting in a PR at this time as we're in the middle of a rebuild for V2.

The main reason to use dataclasses is compatibility with other tools/code and providing a quick way to switch from vanilla dataclasses to pydantic.

plannigan · 2022-10-31T15:31:23Z

Another different I found when experimenting with the two:

When using BaseModel, initializing the class without one or more of the fields will raise a ValditionError that mentions all of the missing fields.
When using dataclass, initializing the class without one or more of the fields will raise a TypeError that mentions only the first missing field.

x0s · 2022-11-14T14:37:41Z

Another different I found when experimenting with the two:

* When using `BaseModel`, initializing the class without one or more of the fields will raise a `ValditionError` that mentions all of the missing fields.

* When using `dataclass`, initializing the class without one or more of the fields will raise a `TypeError` that mentions only the first missing field.

I cannot reproduce what you are saying. Both situations mentions all the missing fields: (python 3.10.4 and pydantic 1.10.2)

import dataclasses
import pydantic

@dataclasses.dataclass
class ChairBuiltin:
    width: int
    height: int

@pydantic.dataclasses.dataclass
class ChairPydantic:
    width: int
    height: int

class ChairPydanticBaseModel(pydantic.BaseModel):
    width: int
    height: int

ChairBuiltin()
# TypeError: ChairBuiltin.__init__() missing 2 required positional arguments: 'width' and 'height'

ChairPydantic()
# TypeError: ChairPydantic.__init__() missing 2 required positional arguments: 'width' and 'height'

ChairPydanticBaseModel()
# ValidationError: 2 validation errors for ChairPydanticBaseModel
#width
#  field required (type=value_error.missing)
#height
#  field required (type=value_error.missing)

matrumz · 2023-09-08T14:16:24Z

Another difference I've discovered: using BaseModel will make __post_init__() unused.

… in `python/pydantic_core/_pydantic_core.pyi` (pydantic#710)

plusls · 2024-05-27T03:33:52Z

after test, I found that the pydantic.dataclass is faster than basemodel

I test it in python3.11 debian

class ThirdPartType(NamedTuple):
    a: int
print('TypedDict', timeit('t.a', setup="t=ThirdPartType(a=114)\n", globals=globals()))
class ThirdPartType:
    a: int

class ThirdPartType(TypedDict):
    a: int
print('TypedDict', timeit('t["a"]', setup="t=ThirdPartType(a=114)\n", globals=globals()))
class ThirdPartType:
    a: int
print('vanila',timeit('t.a', setup="t=ThirdPartType()\nt.a=114", globals=globals()))

class ThirdPartType(BaseModel):
    a: int
print('BaseModel', timeit('t.a', setup="t=ThirdPartType(a=114)", globals=globals()))
@dataclass
class ThirdPartType:
    a: int
print('pydantic.dataclass', timeit('t.a', setup="t=ThirdPartType(a=114)", globals=globals()))
from dataclasses import dataclass
@dataclass
class ThirdPartType:
    a: int
print('dataclasses.dataclass',timeit('t.a', setup="t=ThirdPartType(a=114)", globals=globals()))

the result is:

NamedTuple 0.015396732982480898
TypedDict 0.014935008977772668
vanila 0.009511373995337635
BaseModel 0.022553119983058423
pydantic.dataclass 0.010486075014341623
dataclasses.dataclass 0.009584977990016341

for the access speed，

vanila == dataclasses.dataclass > pydantic.dataclass > TypedDict > NamedTuple > BaseModel

davnat · 2024-08-29T08:32:45Z

Another difference between BaseModel and @dataclass is: with BaseModel you can have defaulted attributes and required ones (without a default value) in any order you wish, while with @dataclass (both from pydantic and stdlib) you must have all required attributes before all defaulted ones.

from pydantic import BaseModel
from pydantic.dataclasses import dataclass

class Works(BaseModel):
    one: int = 0
    two: int

@dataclass
class Broken:
    one: int = 0
    two: int  # Mypy: Attributes without a default cannot follow attributes with one

# TypeError: non-default argument 'two' follows default argument

This is especially relevant with inheritance:

from pydantic.dataclasses import dataclass

@dataclass
class Base:
    field: int = 0

@dataclass
class Test(Base):
    test: int # Mypy: Attributes without a default cannot follow attributes with one

# TypeError: non-default argument 'test' follows default argument

from pydantic import BaseModel

class Base(BaseModel):
    field: int = 0

class Test(Base):
    test: int

t = Test(test=2)

# no errors

samuelcolvin added the documentation label Aug 5, 2019

Maddosaurus mentioned this issue Oct 1, 2019

Extend dataclass doc #848

Merged

4 tasks

samuelcolvin closed this as completed Oct 7, 2019

thomasmatecki mentioned this issue Apr 27, 2020

Is FastAPI possible to use Python dataclasses in the future? fastapi/fastapi#1327

Closed

iudeen mentioned this issue Jul 20, 2022

Not rendering multi-select in API doc while using Pydantic model fastapi/fastapi#5042

Closed

9 tasks

fishi0x01 mentioned this issue Nov 7, 2022

rely on asset field metadata for CNA API and asset class conversion app-sre/qontract-reconcile#2932

Merged

cglacet mentioned this issue Jul 20, 2023

Suggestion: use Pydantic for data parsing (and type validation) duffelhq/duffel-api-python#313

Open

talonchandler mentioned this issue Aug 1, 2023

Add more testing for missing variables for Channel Settings. czbiohub-sf/shrimPy#40

Closed

gsakkis mentioned this issue Dec 14, 2023

Bug: Pydantic validators don't run for HTTP handler parameters litestar-org/litestar#2603

Open

4 tasks

antonsteenvoorden mentioned this issue Dec 21, 2023

Decide whether to support pydantic.BaseModel rogiervandergeer/pydargs#18

Closed

alexdrydew pushed a commit to alexdrydew/pydantic that referenced this issue Dec 23, 2023

Use from __future__ import annotations instead of string type hints…

081cf31

… in `python/pydantic_core/_pydantic_core.pyi` (pydantic#710)

schrockn mentioned this issue Mar 13, 2024

strict_dataclass dagster-io/dagster#20461

Closed

taranu mentioned this issue Apr 11, 2024

DM-42870: Add ModelRebuilder lsst/meas_extensions_multiprofit#9

Merged

Viicos mentioned this issue Jul 25, 2024

default_factory passed in annotation Field silently ignored #9947

Closed

mansenfranzen mentioned this issue Sep 23, 2024

Support pydantic dataclass models mansenfranzen/autodoc_pydantic#121

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Differences between BaseModel and @dataclass not expected based on documentation #710

Differences between BaseModel and @dataclass not expected based on documentation #710

SethMMorton commented Aug 3, 2019

samuelcolvin commented Aug 5, 2019

peteboothroyd commented Aug 7, 2019

leiserfg commented Sep 11, 2019

SethMMorton commented Sep 11, 2019 •

edited

Loading

leiserfg commented Sep 12, 2019

samuelcolvin commented Sep 12, 2019

damonallison commented Sep 9, 2022

samuelcolvin commented Sep 9, 2022

plannigan commented Oct 31, 2022

x0s commented Nov 14, 2022 •

edited

Loading

matrumz commented Sep 8, 2023

plusls commented May 27, 2024

davnat commented Aug 29, 2024

Differences between BaseModel and @dataclass not expected based on documentation #710

Differences between BaseModel and @dataclass not expected based on documentation #710

Comments

SethMMorton commented Aug 3, 2019

Documentation Update Request

samuelcolvin commented Aug 5, 2019

peteboothroyd commented Aug 7, 2019

leiserfg commented Sep 11, 2019

SethMMorton commented Sep 11, 2019 • edited Loading

leiserfg commented Sep 12, 2019

samuelcolvin commented Sep 12, 2019

damonallison commented Sep 9, 2022

samuelcolvin commented Sep 9, 2022

plannigan commented Oct 31, 2022

x0s commented Nov 14, 2022 • edited Loading

matrumz commented Sep 8, 2023

plusls commented May 27, 2024

davnat commented Aug 29, 2024

SethMMorton commented Sep 11, 2019 •

edited

Loading

x0s commented Nov 14, 2022 •

edited

Loading