Does `pydantic.dataclasses.dataclass` support classic mapping in SQLAlchemy? #1089

HymanZHAN · 2019-12-09T03:40:06Z

Question

OS: Fedora 31
Python version: 3.7.5 (default, Oct 25 2019, 15:51:11)
Pydantic version: 1.2

First of all, thank you so much for your awesome job! Pydantic is a very good library and I really like its combination with FastAPI. 🚀

So my question is does pydantic.dataclasses.dataclass support classic mapping in SQLAlchemy? I am working on a project and hopefully can build it with clean architecture and therefore, would like to use pydantic's dataclass for my domain model and have SQLAlchemy depend on it.

I have a proof of concept code snippet as below, which would work if I use the dataclass from Python's std lib.

from dataclasses import field
from pydantic.dataclasses import dataclass
from pydantic import constr
from typing import List
from datetime import datetime
from sqlalchemy import create_engine, MetaData, Table, Column, Integer, String, ARRAY, TIMESTAMP
from sqlalchemy.orm import sessionmaker, mapper

metadata = MetaData()
person_table = \
    Table('people', metadata,
          Column('id', Integer, primary_key=True, autoincrement=True),
          Column('name', String),
          Column('age', Integer),
          Column('hobbies', ARRAY(String)),
          Column('birthday', TIMESTAMP)
          )


@dataclass
class Person:
    name: str = constr(max_length=50)
    age: int = field(default_factory=int)
    hobbies: List[str] = field(default_factory=list)
    birthday: datetime = field(default_factory=datetime.now)


mapper(Person, person_table)

engine = create_engine(
    'postgresql://postgres:defaultpassword@127.0.0.1:5432/test_db', echo=True)
metadata.create_all(engine)

session = sessionmaker(bind=engine)()
person = Person(hobbies=['golf', 'hiking'],
                birthday=datetime(1985, 7, 25))
session.add(person)
session.commit()

Error Output:

> python main.py         
2019-12-08 22:11:07,663 INFO sqlalchemy.engine.base.Engine select version()
2019-12-08 22:11:07,663 INFO sqlalchemy.engine.base.Engine {}
2019-12-08 22:11:07,664 INFO sqlalchemy.engine.base.Engine select current_schema()
2019-12-08 22:11:07,664 INFO sqlalchemy.engine.base.Engine {}
2019-12-08 22:11:07,665 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
2019-12-08 22:11:07,665 INFO sqlalchemy.engine.base.Engine {}
2019-12-08 22:11:07,665 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
2019-12-08 22:11:07,665 INFO sqlalchemy.engine.base.Engine {}
2019-12-08 22:11:07,665 INFO sqlalchemy.engine.base.Engine show standard_conforming_strings
2019-12-08 22:11:07,665 INFO sqlalchemy.engine.base.Engine {}
2019-12-08 22:11:07,666 INFO sqlalchemy.engine.base.Engine select relname from pg_class c join pg_namespace n on n.oid=c.relnamespace where pg_catalog.pg_table_is_visible(c.oid) and relname=%(name)s
2019-12-08 22:11:07,666 INFO sqlalchemy.engine.base.Engine {'name': 'people'}
2019-12-08 22:11:07,667 INFO sqlalchemy.engine.base.Engine 
CREATE TABLE people (
        id SERIAL NOT NULL, 
        name VARCHAR, 
        age INTEGER, 
        hobbies VARCHAR[], 
        birthday TIMESTAMP WITHOUT TIME ZONE, 
        PRIMARY KEY (id)
)


2019-12-08 22:11:07,667 INFO sqlalchemy.engine.base.Engine {}
2019-12-08 22:11:07,683 INFO sqlalchemy.engine.base.Engine COMMIT
Traceback (most recent call last):
  File "/home/xzhan/Development/Projects/sqlalchemy_mapping/.venv/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 1955, in add
    state = attributes.instance_state(instance)
AttributeError: 'Person' object has no attribute '_sa_instance_state'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 37, in <module>
    session.add(person)
  File "/home/xzhan/Development/Projects/sqlalchemy_mapping/.venv/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 1957, in add
    raise exc.UnmappedInstanceError(instance)
sqlalchemy.orm.exc.UnmappedInstanceError: Class '__main__.Person' is mapped, but this instance lacks instrumentation.  This occurs when the instance is created before sqlalchemy.orm.mapper(__main__.Person) was called.

Thanks in advance!

The text was updated successfully, but these errors were encountered:

samuelcolvin · 2019-12-09T10:04:36Z

Best to use ORM mode.

I have no personal interest in ORMs or supporting them - long term they're a mistake, you'd do much better to just write SQL.

However if there's some easy way to support mapper that required a relatively small change to pydantic, I'd be happy to review a PR.

HymanZHAN · 2019-12-09T17:08:07Z

Thank you for the quick response!
Wish I had the SQL skill level to say that lol. 🤦‍♂️ I have worked on a Django project where most database operations are written in raw sql and it's not a pleasant experience. I find it to be a great maintenance burden not only for me but for other newer team members.
I will take a look and see what I can do when I am more settled with my current project. I shall experiment with BaseModel and ORM Mode as well. Thanks again!

samuelcolvin · 2019-12-09T17:29:54Z

Django's ORM is kind of a special case, since:

it's integrated with the entire framework
it doesn't try to be a shallow layer on top of SQL, but rather starts from the bottom up with it's own data approach, see Support for (de-)serializing SQLAlchemy models #491 (comment)

Good luck with from_orm that's the approach most people take.

HymanZHAN · 2019-12-09T18:07:59Z

From my personal experience, the major difference between SQLAlchemy and Django ORM is really the difference between Active Record and Data Mapper. (Or maybe I am just not experienced enough to spot/appreciate the others lol. 🤦‍♂️)

I really disliked SQLAlchemy when I first started Python web dev in Flask & Flask-SQLAlchemy and came to like Django ORM quite a lot when my new job uses Django. But using Django ORM means a tight data coupling. With some background in .NET Core and EF Core and recently learning more about software architecture, I slowly come to appreciate many design choices SQLAlchemy makes. The syntax is not the best for sure, but many things just make more sense now.

Edit: SQLAlchemy Core also has some very useful tools for building queries, including helpers for using textual SQL.

sandys · 2020-04-17T11:54:29Z

i have closed my other issue (#1403) and would like to add my request here.
This would be great if there was some way to support sqlalchemy mapper mode. This allows us to not double-define models for sqlalchemy and pydantic separately and will be a huge help in using fastapi

sandys · 2020-04-17T12:25:03Z

a good compromise would be if pydantic could take a python standard dataclass and convert it into a new dataclass made of pydantic dataclasses.
Right now, I'm doing this

from sqlalchemy import Table, Text, MetaData, Column, Integer, String, ForeignKey, create_engine
from sqlalchemy.orm import mapper, relationship, sessionmaker

from typing import Optional
from pydantic import  EmailStr
from pydantic.dataclasses import dataclass as py_dataclass
from dataclasses import dataclass

metadata = MetaData()

audit_mixin = Table('audit_mixin', metadata,
                     Column('id',Integer, primary_key=True),
                      Column('performed_by',Integer, nullable=True))

user = Table('user', metadata,
            Column('user_id', Integer, primary_key=True),
            Column('name', String(50)),
            Column('email', String(100)),
            Column('fullname', String(50)),
            Column('nickname', String(12))
        )


@dataclass
class AuditMixin:
    id: int
    performed_by: int 

@py_dataclass
class AuditMixinPy(AuditMixin):
    pass

@dataclass
class User(AuditMixin):
    user_id :int
    identity :str
    identity_type :str
    row_status :str
    comments :str

@py_dataclass
class UserPy(User):
    pass


engine = create_engine("sqlite:///", echo=False)
metadata.create_all(engine)

session = sessionmaker(engine)()

mapper(AuditMixin, audit_mixin)
mapper(User,user)

u = User(100,123,1, "a","dfd","dfdf","dfdd")
u.id
u.performed_by
session.add(u)
session.commit()

u = User("sss",123,1, "a","dfd","dfdf",1)
u = UserPy("sss",123,1, "a","dfd","dfdf",1)

denisSurkov · 2020-04-21T06:08:08Z

Hi!

This is really interesting feature, and I would like to give it some thought. I was playing around with sqlalchemy mapper and pydantic, so the main problem was:

@no_type_check
def __setattr__(self, name, value):
        ...
        if not hasattr(self, '__fields_set__'):
            object.__setattr__(self, '__fields_set__', set())

This will cause some problems with __eq__, because __eq__ checks all fields, but sqlalchemy loads some private attributes. @samuelcolvin , what do you think about this approach? As I see no tests failed.

Yes, sqlalchemy users would have to rewrite __eq__ of their pydantic BaseModels, but I think it's much easier for developers.

sandys · 2020-04-21T06:15:32Z

That's an interesting idea. Could you illustrate with a complete example?

…

On Tue, 21 Apr, 2020, 11:38 Denis Surkov, ***@***.***> wrote: Hi! This is really interesting feature, and I would like to give it some thought. I was playing around with sqlalchemy mapper and pydantic, so the main problem was: @no_type_check def __setattr__(self, name, value): ... if not hasattr(self, '__fields_set__'): object.__setattr__(self, '__fields_set__', set()) This will cause some problems with *eq*, because *eq* checks all field, but sqlalchemy loads some private attributes. @samuelcolvin <https://github.com/samuelcolvin> , what do you think about this approach? As I see no tests failed. Yes, sqlalchemy users would have to rewrite *eq* of their pydantic BaseModels, but I think it's much easier for developers. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1089 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAASYU7YJP5V2FRZ7JZ7YOTRNUZ5LANCNFSM4JYCWC2A> .

denisSurkov · 2020-04-21T07:19:25Z

Some dirty hacks, but I hope you understand the idea.

pydantic/main.py

class BaseModel(...):
    ...

    @no_type_check
    def __setattr__(self, name, value):
        if self.__config__.extra is not Extra.allow and name not in self.__fields__:
            raise ValueError(f'"{self.__class__.__name__}" object has no field "{name}"')
        elif not self.__config__.allow_mutation:
            raise TypeError(f'"{self.__class__.__name__}" is immutable and does not support item assignment')
        elif self.__config__.validate_assignment:
            known_field = self.__fields__.get(name, None)
            if known_field:
                value, error_ = known_field.validate(value, self.dict(exclude={name}), loc=name, cls=self.__class__)
                if error_:
                    raise ValidationError([error_], self.__class__)
        self.__dict__[name] = value
        # this lines added 
        if not hasattr(self, '__fields_set__'):
            object.__setattr__(self, '__fields_set__', set())
        # end 
        self.__fields_set__.add(name)

   ...

metadata = MetaData()

agents = Table(
    'agents', metadata,
    Column('someid', Integer, primary_key=True, index=True),
    Column('raw_json', JSON)
)


def start_mappers():
    mapper(agent_models.Agent, agents)

class Agent(BaseModel):
    someid: int = Field(...)
    raw_json: t.Optional[t.Dict[str, t.Any]] = Field(None)

    def __eq__(self, other):
        if isinstance(other, Agent):
            return self.someid == other.someid
        else:
            return False

    class Config:
        extra = Extra.allow
        allow_mutation = True
        orm_mode = True

def test_agent_mapper_can_load_lines(session):
    some_dict = {
        "key": "value"
    }

    raw_json = json.dumps(some_dict)

    session.execute(
        'INSERT INTO agents (someid, raw_json) VALUES '
        '(1, NULL ),'
        f'(2, :raw_json)', dict(raw_json=raw_json, )
    )
    session.commit()

    expected = [
        models.Agent(someid=1, raw_json={}),
        models.Agent(someid=2, raw_json=some_dict)
    ]

    assert session.query(models.Agent).all() == expected

saisasidhar · 2020-05-12T14:56:55Z

It is indeed quite an interesting use-case to use @dataclass with classical mapping..
I'm using pydantic for request validation (this dataclass instance is sent around in the application). But, I also wanted to persist values in this pydantic instance. So, here's my solution/hack to implement this transparently.

Model definitions:

from pydantic.dataclasses import dataclass as pydantic_dataclass
from dataclasses import dataclass

@pydantic_dataclass
class PydanticRequest:
    test: confloat(ge=10, le=100)


@dataclass  # This then takes the annotations and builds a new dataclass ;)
@duplicates_pydantic_dataclass(PydanticRequest)  # This prepares __annotations__ dynamically
class Request:
    pass

And the decorator which does the job

def duplicates_pydantic_dataclass(pydantic_cls):
    def wrapper(cls):
        cls.__annotations__ = {}
        for field_key, field_value in pydantic_cls.__dataclass_fields__.items():
            cls.__annotations__[field_key] = field_value.type
        return cls
    return wrapper

This way I transparently use pydantic for validations

pr = PydanticRequest(test=11.0)

sqlalchemy mappers are defined for Request !

For CRUD operation, I have the following:

from dataclasses import asdict
def instantiate_with_data(cls, instance):
    return cls(**asdict(instance))

>>> pr = PydanticRequest(test=11.0)
>>> r = instantiate_with_data(Request, pr)
>>> session.add(r)

Although, it suits alright for simple objects and mappings

samuelcolvin · 2020-12-12T15:10:19Z

Just to add since it isn't mentioned here:

Since v1.7 pydantic has supported automatically creating pydantic dataclasses for stdlib dataclasses, see #1817

DomWeldon · 2021-07-26T22:08:45Z

I think I've followed most of the discussion so far, but I don't think that supporting pydantic dataclass creation from a standard dataclass would solve my problem, as, I like pydantic dataclasses and am using them as a part of my validation elsewhere, I'd rather keep pydantic dataclasses with extra validation, and hopefully stick as close as possible to PEP 557 in terms of keeping dataclasses as close as possible to "normal" python classes and not interfering with the usage of the class otherwise.

A minimal (ish) test case for my issue is here.
https://github.com/DomWeldon/pydantic-dataclasses-sqlalchemy

As far as I can tell, there is something going on with the implementation of a pydantic dataclass which alters the way that SQLAlchemy sets up either the new or init methods on new instances of a model which means that a new instance of a model has no _sa_instance_state, whereas a stdlib dataclass does.

This feels fixable, even if just by some kind of monkeypatch via a class decorator, for now, in advance of a proper PR to fix the issue in the future. I thought it was to do with the __post_init__ hook being messed up, but a cursory look at the SQLA repo seems to suggest this isn't the case, but I may be wrong on that.

@samuelcolvin I understand your opinions/frustrations on ORMs, and in the past have shared them (and indeed, now do share some), it's an interesting academic argument between the relative merits of mappers/active record, and about whether or not ORMs are worth the complexity that they add. My hot take is that they are, I like to reason about my database structures as python objects to automate most things, and have my team focus on spending time writing more complex edge cases (i.e., writing SQL that matters). This is actually something that - given our geographies - I'd love to chat about over a beer in London sometime.

That said, SQLAlchemy is one of two leading ORMs used by Pythonistas around the world, follows a sensible architecture pattern, is well supported and is basically essential to the python ecosystem. Pydantic is also a popular library, and on the up, so easier integration between the two seems like a good idea. It's also blocking a couple of my projects at the moment so is something I want to fix, at least for my code, ASAP!

I would be more than happy (read: keen) to put together a draft PR in either SQLAlchemy or pydantic, or at least post some gist about how to fix and get around this issue, but could do with a pointer to exactly where the issue is currently.

If anyone can add some detail to this issue before tomorrow afternoon, I would be enormously grateful, else, I'll update once I've had chance to find out what the exact issue is.

zzzeek · 2021-07-26T22:13:51Z

As far as I can tell, there is something going on with the implementation of a pydantic dataclass which alters the way that SQLAlchemy sets up either the new or init methods on new instances of a model which means that a new instance of a model has no _sa_instance_state, whereas a stdlib dataclass does.

i dont have much resources to look into this but when I did, the main thing I was observing was that pydantic seems to want to write into obj.__dict__ directly, which is normally fine, however SQLAlchemy's whole thing is based on Python descriptors so that it can intercept setattr() and similar calls before it populates __dict__. That's the main problem, which is that the pydantic base seems to have several areas that it circumvents the descriptor protocol (which is likely because Python attributes are slow. indeed). An alternate Pydantic base could likely work around most/all of these issues but it got pretty in the weeds when I tried to do it awhile back.

DomWeldon · 2021-07-26T22:28:19Z

Thank-you @zzzeek this is very helpful and interesting and I really appreciate your time! If it's easy do you have a source code/docs link to the SQLA use of the descriptor protocol? I've been a long-time introspector of SQLA for how it defines my models/tables but I'm a bit hazy on exactly how it works in relation to instances and it would give me a headstart to have some docs on the ideas behind how this works.

The problem you described makes it sound like it could be trickier than planned, but I think the idea behind pydantic dataclasses is that they're very similar to standard dataclasses, but just add a few dunder properties such as __pydantic_model__. I noticed a few caveat-comments in your implementation of the code to parse dataclasses for 1.4, and I wonder if it's related to these or something entirely different. Any thoughts or braindumps you can provide are super useful.

Honestly, I'm looking forward to getting stuck into this tomorrow. Any notes / headstarts very much welcome.

zzzeek · 2021-07-27T17:51:05Z

SQLAlchemy's descriptor logic is based on the descriptor class InstrumentedAttribute:

https://github.com/sqlalchemy/sqlalchemy/blob/master/lib/sqlalchemy/orm/attributes.py#L447

that's where all userspace attribute operations happen.

SQLAlchemy's use of descriptors can be modified highly using some little-used extension points, but still needs some level where the "set" can be intercepted.

The attempt I made to make some of this work can be seen at:

https://gist.github.com/zzzeek/bcccc70393f14a190e849878ed63e419

DomWeldon · 2021-07-28T22:18:35Z

Thanks Mike - this has been really helpful. Unfortunately I've only been able to look at this late today so will be a day or so until I find a solution to my issue.

Based on your code and notes I think the issue is the line below in pydantic/dataclasses.py:101, but I haven't had chance to properly investigate what the precise issue is yet and how it's manifesting.

    object.__setattr__(self, '__dict__', d)

source

My implementation of building a dataclass from a pydantic schema is slightly different from your gist, in that I've defined my models as (pydantic) dataclasses, thinking that, as in the spirit of PEP 557 there would be minimal interference between the two libraries on what should be essentially just a plain old python class.

However, it seems that since (of course) both libraries have a need to alter behaviour around __init__() they are effectively competing/clashing with eachother and the line above, which is executed post __init__() being called effectively removes all SQLAlchemy behaviour on the instance (although as yet I haven't proven that it does remove behaviour which has already been attributed on the object, this is more of a hunch at this stage, but it's getting late so I'll prove or disprove that hunch tomorrow).

Thinking aloud, and without chance to properly plan or sketch out a solution, I think a potential longer-term candidate for a fix could be to make pydantic's implementation with respect to dataclasses customizable such that operations which directly alter (and/or access?) __dict__ on a dataclass can be configured, even if it's just supplying a callable to access or modify __dict__ which could protect a certain namespace or check that pydantic is operating only in the expected namespace and/or modify behaviour if it does. You might even be able to do this already using the descriptor protocol on __dict__ and/or altering the MRO, but I think that's a sure way to head for a mental stack overflow (if you're reading this in 6 months and wondering if it might be the solution, please don't do this!).

I'll take a closer look again tomorrow and sketch out some thoughts as I try to find a solution to my issue.

As always, any further input/comments much appreciated. Thanks again for your help @zzzeek, and to both Mike and @samuelcolvin for writing and maintaining such amazing open source libraries.

DomWeldon · 2021-07-29T20:31:05Z

The fix (in my example) turns out to be remarkably simple using a monkey patch.

https://github.com/DomWeldon/pydantic-dataclasses-sqlalchemy/blob/8ddab913816702dd5ec6a5a6de1ad9ee9b89164d/pdsqla/dcs.py#L36

I'm not sure whether reassigning __dict__ in the post-init function should be counted as a bug or not, so will outline some thinking in a new issue with proposals for a fix/change.

…ydantic#1089) Co-authored-by: Samuel Colvin <s@muelcolvin.com>

HymanZHAN added the question label Dec 9, 2019

samuelcolvin closed this as completed Dec 9, 2019

sandys mentioned this issue Apr 17, 2020

Pydantic dataclasses causes problems with sqlalchemy classic mappers #1403

Closed

This was referenced Jul 29, 2021

Mapping sqlalchemy and pydantic's dataclass decorator fails #2924

Closed

_pydantic_post_init() reassigns __dict__ and removes properties added to the object, for example in __new__(), breaking integration with SQLAlchemy and other libraries #3043

Closed

alexdrydew pushed a commit to alexdrydew/pydantic that referenced this issue Dec 23, 2023

Avoid using ? with get_item to handle unhashable inputs properly (p…

5b63e7a

…ydantic#1089) Co-authored-by: Samuel Colvin <s@muelcolvin.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does `pydantic.dataclasses.dataclass` support classic mapping in SQLAlchemy? #1089

Does `pydantic.dataclasses.dataclass` support classic mapping in SQLAlchemy? #1089

HymanZHAN commented Dec 9, 2019

samuelcolvin commented Dec 9, 2019

HymanZHAN commented Dec 9, 2019

samuelcolvin commented Dec 9, 2019 •

edited

HymanZHAN commented Dec 9, 2019 •

edited

sandys commented Apr 17, 2020

sandys commented Apr 17, 2020

denisSurkov commented Apr 21, 2020 •

edited

sandys commented Apr 21, 2020 via email

denisSurkov commented Apr 21, 2020

saisasidhar commented May 12, 2020 •

edited

samuelcolvin commented Dec 12, 2020

DomWeldon commented Jul 26, 2021

zzzeek commented Jul 26, 2021

DomWeldon commented Jul 26, 2021

zzzeek commented Jul 27, 2021

DomWeldon commented Jul 28, 2021

DomWeldon commented Jul 29, 2021

Does pydantic.dataclasses.dataclass support classic mapping in SQLAlchemy? #1089

Does pydantic.dataclasses.dataclass support classic mapping in SQLAlchemy? #1089

Comments

HymanZHAN commented Dec 9, 2019

Question

samuelcolvin commented Dec 9, 2019

HymanZHAN commented Dec 9, 2019

samuelcolvin commented Dec 9, 2019 • edited

HymanZHAN commented Dec 9, 2019 • edited

sandys commented Apr 17, 2020

sandys commented Apr 17, 2020

denisSurkov commented Apr 21, 2020 • edited

sandys commented Apr 21, 2020 via email

denisSurkov commented Apr 21, 2020

saisasidhar commented May 12, 2020 • edited

samuelcolvin commented Dec 12, 2020

DomWeldon commented Jul 26, 2021

zzzeek commented Jul 26, 2021

DomWeldon commented Jul 26, 2021

zzzeek commented Jul 27, 2021

DomWeldon commented Jul 28, 2021

DomWeldon commented Jul 29, 2021

Does `pydantic.dataclasses.dataclass` support classic mapping in SQLAlchemy? #1089

Does `pydantic.dataclasses.dataclass` support classic mapping in SQLAlchemy? #1089

samuelcolvin commented Dec 9, 2019 •

edited

HymanZHAN commented Dec 9, 2019 •

edited

denisSurkov commented Apr 21, 2020 •

edited

saisasidhar commented May 12, 2020 •

edited