Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classical mapping, dataclasses with initialiser and filter clauses #5027

Closed
sfermigier opened this issue Dec 5, 2019 · 37 comments
Closed

Classical mapping, dataclasses with initialiser and filter clauses #5027

sfermigier opened this issue Dec 5, 2019 · 37 comments
Labels
classical mapping related to the "classical mapping" system orm use case not really a feature or a bug; can be support for new DB features or user use cases not anticipated
Milestone

Comments

@sfermigier
Copy link

sfermigier commented Dec 5, 2019

Hi,

I have come across some unexpected behaviour when using dataclasses mapped to SQLAlchemy using classical mapping.

Here's a snippet with the essence of the issue:

from __future__ import annotations

from dataclasses import dataclass, field

from sqlalchemy import Column, Table, String, Boolean
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import mapper, sessionmaker

Base = declarative_base()
metadata = Base.metadata

table = Table(
            "table",
            metadata,
            #
            Column("id", String, primary_key=True),
            Column("active", Boolean),
            )


@dataclass
class C:
    id: str
    active: bool = True


@dataclass
class C1:
    id: str
    active: bool


@dataclass
class C2:
    id: str
    active: bool = field(default_factory=lambda: True)


mapper(C, table)
mapper(C1, table)
mapper(C2, table)

Session = sessionmaker()
session = Session()

query = session.query(C)
print(query.filter(C.active == True))
print()

# Result (WRONG)
# SELECT "table".id AS table_id
# FROM "table"
# WHERE true

query = session.query(C1)
print(query.filter(C1.active == True))
print()

# Result: OK
# SELECT "table".id AS table_id, "table".active AS table_active
# FROM "table"
# WHERE "table".active = true

query = session.query(C2)
print(query.filter(C2.active == True))
print()

# Result: OK
# SELECT "table".id AS table_id, "table".active AS table_active
# FROM "table"
# WHERE "table".active = true

So, yes, I have found a workaround (using = field(default_factory=lambda: something) as an initializer instead of just = something), but it doesn't look as good.

Is there some general advice regarding the interaction of dataclasses and SQLAlchemy ORM? (I have also tried with attrs, and got the same issue).

@zzzeek zzzeek added the question issue where a "fix" on the SQLAlchemy side is unlikely, hence more of a usage question label Dec 5, 2019
@zzzeek
Copy link
Member

zzzeek commented Dec 5, 2019

SQLAlchemy is not going to by Python 3 only until version 2.0, but also a dataclass AFAIK is a class that is Python instrumented in a totally unique way, for which I have no idea how this interacts with standard Python descriptors that happen to provide for the same names. I would not expect dataclasses to be mappable at all right now and also I am not clear on the rationale for using them as SQLAlchemy declarative already has a class-declarative syntax.

@sfermigier
Copy link
Author

sfermigier commented Dec 6, 2019

SQLAlchemy is not going to by Python 3 only until version 2.0,

Nothing in the approach I have taken should mandate dropping support for Python 2, unless of course there needs to be some specific code to deal with dataclasses / attrs classes.

I would not expect dataclasses to be mappable at all right now

I wasn't sure it would work or not, so I tried and it worked quite well until I hit this roadblock (but only with initialisers).

and also I am not clear on the rationale for using them as SQLAlchemy declarative already has a class-declarative syntax.

Well, I was experimenting with a way to have "business domain" classes independent of SQLAlchemy (think "clean" or "hexagonal" architecture). Except that unless a more radical approach is taken, we're still dependent on SQLAlchemy (wrt, for instance, relations), even if no SQLAlchemy classes are imported from the "domain" modules.

So in the end you're probably right: for my approach, SQLAlchemy declarative is probably the only approach with a decent chance to work in the short term without too much headache.

Thanks for your feedback.

@zzzeek
Copy link
Member

zzzeek commented Dec 6, 2019

SQLAlchemy is not going to by Python 3 only until version 2.0,

Nothing in the approach I have taken should mandate dropping support for Python 2, unless of course there needs to be some specific code to deal with dataclasses / attrs classes.

Sorry, let me be more specific . For SQLAlchemy to support mapping of dataclasses, we need to have test suites that test this. As dataclasses are using a Python-3 only syntax, we can't have that syntax checked into our git tree within the test suite because the test suite will fail to load under Python 2.7. That is, while we can block tests from running under python 2, we don't as yet have a system by which some tests can not be parsed or loaded by the Python interpreter at all, so while we can test Python 3 concepts, we can't easily test things that require Python 3-only syntaxes. I haven't looked into dataclasses so there may be a way to test them functionally, using Py2-syntactically valid objects that behave the same as Py3 dataclasses, but that's not something that is prioritized right now.

I would not expect dataclasses to be mappable at all right now

I wasn't sure it would work or not, so I tried and it worked quite well until I hit this roadblock (but only with initialisers).

and also I am not clear on the rationale for using them as SQLAlchemy declarative already has a class-declarative syntax.

Well, I was experimenting with a way to have "business domain" classes independent of SQLAlchemy (think "clean" or "hexagonal" architecture). Except that unless a more radical approach is taken, we're still dependent on SQLAlchemy (wrt, for instance, relations), even if no SQLAlchemy classes are imported from the "domain" modules.

So in the end you're probably right: for my approach, SQLAlchemy declarative is probably the only approach with a decent chance to work in the short term without too much headache.

can you give me a clue on what you gain by having the dataclass declare "active: bool" while at the same time having Table metadata that uses Boolean? isn't this needless repetition?

Thanks for your feedback.

@sfermigier
Copy link
Author

can you give me a clue on what you gain by having the dataclass declare "active: bool" while at the same time having Table metadata that uses Boolean? isn't this needless repetition?

The goal is to have a domain independent of the ORM (and of such details as wether a field is limited to a certain size, or is indexed, etc.), and there are several ways to achieve that. All the examples I have seen in the literature imply some level of repetition.

The approach I was implementing when I hit this issue is similar to the one advocated in https://github.com/cosmicpython/book/blob/master/chapter_02_repository.asciidoc#inverting-the-dependency-orm-depends-on-model

@zzzeek
Copy link
Member

zzzeek commented Dec 6, 2019

OK unfortunately I mostly disagree with what I'm reading in that excerpt. Now of course, the original reason SQLAlchemy started with mapper() and a class was exactly that reason. Hey, we should have the database and the model be completely separate, that way you can swap in a mapper against Amazon S3 if you want and nothing will change. This is in practice mostly a myth, since when you normally use mapper(), you are magically assigning a bunch of attributes to the class that the class has to otherwise implicitly assume are there:

class Order(object):
   def get_order_total_message(self):
        return "Your order is %s dollars" % self.dollar_amount

orders = Table('orders', m, Column("dollar_amount", Number))

mapper(Order, orders)

Now what you're trying to do with the dataclasses is to fix this, so that Order does in fact have some declaration for "dollar_amount" that is generic, and then the table mapping backs it up. That's not a model that was anticipated with SQLAlchemy as when I started looking at this, in the vast majority of cases people just want to put their Column declaration right there, and with an ORM like Django, these declarations are fairly generic so it "looks" OK, but then of course all the ugly details of how the Column should behave at a SQL level have to be somewhere also, so with Declarative, we just solved the problem in a way that is enormously popular, just put the Column right there.

The mapper() idea came from Hibernate, which was for Java. In Hibernate, back in the early 2000's, we above would have to create full blown getter and setter methods for every database attribute, then put all the SQL schema stuff in an XML file. That was basically what I was going for with mapper() /Table, except hey it's Python we don't need the getters and setters.

If you look at Hibernate today (OK well 8 years ago last time I looked), they totally abandoned that style and surprise moved to using Java "annotations" so that mapped classes look the same way as SQLAlchemy, with the column declaration in the model.

So the above is an argument for why it's maybe not "important" that we support this, because it would be unpopular. But I think beyond that, there's a lot more assumption of a relational model that comes in as you continue to add more than just one attribute to one class. There's also the entire assumption of the ORM's persistence model; dollar_amount starts as None, not AttributeError, there's a "history" on the attribute that we can see changes on either side of a flush, there's a flush that is in the context of a transaction, there's a transaction. These are things that would almost never map the same way to a non-relational backend, so the SQLAlchemy mapped class in practice is in larger applications designed in a way that very much assumes the SQLAlchemy ORM or something that has worked very hard to act exactly the same way is backing it up.

What about being able to swap out the schema with another one? This works at the level of at most, the names of columns, and the SQLAlchemy mapping already supports being able to have column names be distinct from the attribute names. But in the bigger sense, SQLAlchemy models and queries very intentionally retain relational concepts and geometries; you can write a JOIN from your Order to Item class, that strongly implies a very specific relational model. Can we really change how Order / Item are linked, e.g. that they would no longer be joined by a foreign key or association table, and not have to change our domain model at all? Maybe at the level of the Order class and instance that "items" is just a collection, sure, but all your queries would need to change, since the relational model is very explicit in the SQLAlchemy ORM.

It's not to say it isn't possible to have the domain model be 100% separate from the relational store but SQLAlchemy intentionally creates a more explicit linkage between these two concepts and that's actually the whole secret sauce of the thing, we aren't trying to do the impossible thing that people don't actually need in practice (that you can take your whole relational model and slap it on top of a graph database without any changes).

This is all way bigger than, "I just want to map to a dataclass", which is certainly something we could probably make work, but I think the larger ideal that this mapping implies is not really worth trying to achieve.

@sfermigier
Copy link
Author

Yeah, I agree (from my experience trying to make this approach work) with your analysis.

There are ways to really decouple the domain classes from the ORM, but they imply even more duplication that what we've been discussing here, and are different to the one advocated in the Cosmic Python book.

See for instance this example from a recent tutorial on the "hexagonal architecture": https://github.com/celinegilet/happy-town/blob/hexagonaleArchi/src/main/java/com/happytown/infrastructure/database/HabitantDatabaseAdapter.java (Here, we have JPA classes that are the mirror image of the domain classes, and "to" and "from" methods to convert from one to the other).

@zzzeek
Copy link
Member

zzzeek commented Dec 10, 2019

to sum up, dataclasses don't work right now, I'm not opposed to making them work at least a little bit, but this is not priority right now unless someone was really motivated to work on it, closing this for now.

@sandys
Copy link

sandys commented Apr 17, 2020

hi @zzzeek
sorry for putting this on a closed issue, but i would like to add my vote for dataclasses support as well as a genuine reason for separating out domain model and database specific stuff.

I'm looking the world of pydantic and all kinds of nice validations and typing in your codebase.
Now it is a long standing issue that in all these cases - you have to create your models twice: once for pydantic and once for sqlalchemy. Because they dont interoperate (and neither is anyone expecting them to).

example - tiangolo/fastapi#214
tiangolo/fastapi#721
etc

Starlette and Fastapi have a bunch of questions around this topic and i admit it is extremely troublesome to keep them in sync.

However dataclasses are somewhere where both can meet. Pydantic supports dataclasses - https://pydantic-docs.helpmanual.io/usage/dataclasses/

if sqlalchemy would add first class support for dataclasses, that would be a superb way in which the rest of the ecosystem can interoperate.

Everyone is now backing that mechanism since it has been added as part of python core.

I understand your reluctance - i propose a bridge in this way. sqlalchemy can include a class generator which takes dataclasses and makes them sqlalchemy compatible. It seems completely workable based on this library.
https://pypi.org/project/dataclassesdb/

@zzzeek
Copy link
Member

zzzeek commented Apr 17, 2020

my position here is that I would love for people to have this ability however I defininitely do not want SQLAlchemy having to maintain the fact that "str" would link up to the SQLAlchemy String datatype, and not, for exmaple, the oracle.CLOB datatype or MySQL TEXT with a particular encoding and collation. That is, I find the declaration of "mydata: str" to be insufficient if we are to consider that this declaration is also going to to define a backend SQL database type.

There are of course many ways to allow that to be configurable, such as some kind of lookup table that you define else where, etc., however, these all represent taking the very effective decisions of the current Declarative system and replacing them with an all new way of working and configuring. SQLAlchemy cannot take on this burden. The "dataclasses" support is very simply wide open for a third party extension to take on the burden of having users who want to use "myfield: str" but then they need to set the collation on their string.

To the degree that SQLAlchemy internals need adjustments to support dataclasses, I am fully in support of that. But building out a system that looks just like https://pypi.org/project/dataclassesdb/ should be maintained externally as you will have a lot of user API issues to work out and these are better off maintained under a different set of assumptions.

@zzzeek
Copy link
Member

zzzeek commented Apr 17, 2020

Just noticed that "dataclasses" is already built on SQLAlchemy, so one can already use dataclasses with SQLAlchemy. Can you please define what "first class support" means in this case?

@sandys
Copy link

sandys commented Apr 17, 2020

@zzzeek thanks so much for answering. im conscious that this is a closed issue and sorry about that.

want to begin off by saying that this is an attempt for sqlalchemy and systems like pydantic to interoperate. This is the current norm and state of art - https://github.com/archnaim/fastapi-hive-to-mysql-migration/blob/b90c00e843333d318eff8206dbae92f5a990204b/app/models/job_status.py

Everyone has to declare the exact same structure in sqlalchemy as well as elsewhere. And this becomes unmaintanable and prone to errors. Everyone seems to be supporting dataclasses, so we praying that it becomes easier.

right now we are unable to figure out how to do mixins in sqlalchemy using dataclasses. here's an crafted example for you. to show how we are envisioning to use dataclasses with pydantic and sqlalchemy together. im not sure if stuff like mixins are things that you plan to support, etc



from sqlalchemy import Table, Text, MetaData, Column, Integer, String, ForeignKey, create_engine
from sqlalchemy.orm import mapper, relationship, sessionmaker
from typing import Optional
from pydantic import  EmailStr
from pydantic.dataclasses import dataclass as py_dataclass
from dataclasses import dataclass
metadata = MetaData()
audit_mixin = Table('audit_mixin', metadata,
                     Column('id',Integer, primary_key=True),
                      Column('performed_by',Integer, nullable=True))
user = Table('user', metadata,
            Column('user_id', Integer, primary_key=True),
            Column('name', String(50)),
            Column('email', String(100)),
            Column('fullname', String(50)),
            Column('nickname', String(12))
        )
@dataclass
class AuditMixin:
    id: int
    performed_by: int

@py_dataclass
class AuditMixinPy(AuditMixin):
    pass

@dataclass
class User(AuditMixin):
    name:str
    email:str
    fullname:str
    nickname:str

@py_dataclass
class UserPy(User):
    pass

engine = create_engine("sqlite:///", echo=False)
metadata.create_all(engine)
session = sessionmaker(engine)()
mapper(AuditMixin, audit_mixin)
mapper(User,user)
u = User(100,123, "a","dfd","dfdf","dfdd")
session.add(u)
session.commit()
session = sessionmaker(engine)()
a = session.query(User).first()
print(a)

@zzzeek
Copy link
Member

zzzeek commented Apr 17, 2020

@zzzeek thanks so much for answering. im conscious that this is a closed issue and sorry about that.

want to begin off by saying that this is an attempt for sqlalchemy and systems like pydantic to interoperate. This is the current norm and state of art - https://github.com/archnaim/fastapi-hive-to-mysql-migration/blob/b90c00e843333d318eff8206dbae92f5a990204b/app/models/job_status.py

That seems to be something that could be automated fairly easily using a metaclass.

Everyone has to declare the exact same structure in sqlalchemy as well as elsewhere. And this becomes unmaintanable and prone to errors. Everyone seems to be supporting dataclasses, so we praying that it becomes easier.

I certainly hope that it does!

right now we are unable to figure out how to do mixins in sqlalchemy using dataclasses. here's an crafted example for you. to show how we are envisioning to use dataclasses with pydantic and sqlalchemy together. im not sure if stuff like mixins are things that you plan to support, etc

Declarative supports mixins, yes, see https://docs.sqlalchemy.org/en/13/orm/extensions/declarative/mixins.html

from sqlalchemy import Table, Text, MetaData, Column, Integer, String, ForeignKey, create_engine
from sqlalchemy.orm import mapper, relationship, sessionmaker
from typing import Optional
from pydantic import EmailStr
from pydantic.dataclasses import dataclass as py_dataclass
from dataclasses import dataclass
metadata = MetaData()
audit_mixin = Table('audit_mixin', metadata,
Column('id',Integer, primary_key=True),
Column('performed_by',Integer, nullable=True))
user = Table('user', metadata,
Column('user_id', Integer, primary_key=True),
Column('name', String(50)),
Column('email', String(100)),
Column('fullname', String(50)),
Column('nickname', String(12))
)
@DataClass
class AuditMixin:
id: int
performed_by: int

@py_dataclass
class AuditMixinPy(AuditMixin):
pass

@DataClass
class User(AuditMixin):
name:str
email:str
fullname:str
nickname:str

@py_dataclass
class UserPy(User):
pass

engine = create_engine("sqlite:///", echo=False)
metadata.create_all(engine)
session = sessionmaker(engine)()
mapper(AuditMixin, audit_mixin)
mapper(User,user)
u = User(100,123, "a","dfd","dfdf","dfdd")
session.add(u)
session.commit()
session = sessionmaker(engine)()
a = session.query(User).first()
print(a)

Python class mechanics are extremely customizable, like remarkably so. From class decorators, to metaclasses, to type constructors, to descriptors, there are an unbelievable number of ways to make classes automatically appear based on other declarations.

It's not clear here if you are arguing that SQLAlchemy is missing some capability for the "dataclasses" configuration to be possible, or if the dataclasses community is missing the developer resources to make the above patterns possible? I can only help you with the former part. I am just one person and we are just one project. This is definitely something that many programmers are capable of creating.

@sandys
Copy link

sandys commented Apr 17, 2020

@zzzeek - well, im not arguing on anything really. just looking for a solution.

I'm just wondering (based on the code sample I sent) - how would i write a piece of code that allows for mixins on the sqlalchemy side and allows for pydantic classes to be formed (without writing it again). I may have thought that dataclasses would be the way to go, since both sqlalchemy + pydantic (and indeed the larger ecosystem) seem to be supporting it....however I cannot get sqlalchemy to work with mixins unless i abandon dataclasses and move over to sqlalchemy entirely.

Its a definite struggle on an end-user side since no project wants to support "a particular library" .

@zzzeek
Copy link
Member

zzzeek commented Apr 17, 2020

@zzzeek - well, im not arguing on anything really. just looking for a solution.

I'm just wondering (based on the code sample I sent) - how would i write a piece of code that allows for mixins on the sqlalchemy side and allows for pydantic classes to be formed (without writing it again). I may have thought that dataclasses would be the way to go, since both sqlalchemy + pydantic (and indeed the larger ecosystem) seem to be supporting it....however I cannot get sqlalchemy to work with mixins unless i abandon dataclasses and move over to sqlalchemy entirely.

I'm not looking deeply at what these libraries do, but most likely, you would not be reliant on a mixin or on declarative-style configuration at all, you would write a class decorator or a metaclass that scans through the dataclass definition and then calls upon the mapper function directly to map the class . That's all that "declarative" does after all. The "mapper" function can also be re-implemented on top of declarative itself which is where things are headed, this makes it such that the mapper() function gains some additional declarative niceties.

Generally, the idea is, you have a dataclass with "myfield : int", you build up a data structure that has the Column objects you think should be sprung up from those datatypes, then pass the whole thing into a Table() definition and a mapper() function. Here's classical mapping: https://docs.sqlalchemy.org/en/13/orm/mapping_styles.html#classical-mappings

Its a definite struggle on an end-user side since no project wants to support "a particular library" .

I'm not sure what this means.

@keosak
Copy link
Contributor

keosak commented Aug 11, 2020

Apologies for replying to a closed ticket after this much time, but I want to express support for the general idea of classical mapping in SQLAlchemy. To bring another perspective to this debate, our company keeps schema definitions in SQL files and we have a library for parsing them into SQLAlchemy table objects.

Given a schema.sql file like this:

CREATE TABLE accounts (
    PRIMARY KEY (account_id),
    account_id uuid NOT NULL,
    widget_count int NOT NULL
);

CREATE TABLE widgets (
    PRIMARY KEY (widget_id),
    widget_id uuid NOT NULL,
    account_id uuid NOT NULL REFERENCES accounts (account_id),
    name text NOT NULL
);

We currently define SQLAlchemy mapping like this:

schema = parse("schema.sql")

class Widget(Base):
    __table__ = schema["widgets"]

class Account(Base):
    __table__ = schema["accounts"]
    widgets = relationship(Widget)

However, I am considering moving to this:

class Widget:
    # Note that we omit account_id, because it is an implementation detail.
    widget_id: UUID
    name: str

class Account:
    account_id: UUID    
    widgets: List[Widget]
    widget_count: int

schema = parse("schema.sql")
mapper(Widget, schema["widgets"])
mapper(Account, schema["accounts"], properties={"widgets": relationship(Widget)})

Either way keeps the schema definition in SQL, in my opinion the best language for the job, so there is no need to automagically map Python types to database types, or integrate foreign key definitions with PEP-484 annotations etc.

The second approach, however, adds type annotations to model classes in the standard form supported and promoted by the Python standard library. Note that this style of mapping doesn't necessarily use dataclasses, athough the dataclass decorator is the easiest way to define a constructor, equality, etc. We already group all SQLAlchemy queries and commands in separate repository classes, and this would allow us to move mapper definitions there as well.

I recognize that this is kind of a purist approach, but I really like the separation between pure model modules and pragmatic infrastructure modules. Helps with unit testing as well.

@zzzeek
Copy link
Member

zzzeek commented Aug 11, 2020

So there are two potential problems with the approach above.

The first is, your Account has this nice "account_id", "wdigets", "widget_count" dataclass-style declaration on it at the class level. The first thing SQLAlchemy mapper() does is blow all that away, and it instead places its own Python descriptors known as InstrumentedAttribute in their place. So the attributes you have there are not runtime-introspectable, they won't come up in things like Sphinx documentation, they wont be there for unit tests, etc., they are basically something you can see in your source file and then they're gone, like a "# comment". So you will still get pep-484 validation, but all the other things those attributes might be nice for will not work. The InstruementedAttribute which gets put there is introspectable in its own way but not such that it's been designed to act like a dataclass, because I have not even looked at dataclasses yet. this is once 2.0 is out and a giant heap of cruft has been removed I can start looking at these things.

SQLAlchemy will eventually add real pep-484 capabilities so that your normal declarative mappings will work with mypy; there is already an extension which does this and the direction of SQLAlchemy 2.0 is to be able to support this as a built in feature in a few years.

Second one is, there is no mechanism of any kind to sanity check that what you've put there for your data classes matches what the mapping will have. Like Account: widgets: List[Widget], the mapping could set that up to a column against a String and not a relationship, and then the thing you've stated you are using pep-484 for will no longer work.

classical mappers aren't going away but I think the pep-484 solution will be that SQLAlchemy mappings will deliver pep-484 information fully.

@keosak
Copy link
Contributor

keosak commented Aug 12, 2020

I want to have type annotations because we already use __table__ = ... to define models, and type checkers fail there. If I'm doing that, why not separate the mapping completely, that was my thought.

Type annotations are stored in the __annotations__ class attribute as a dict, which doesn't interfere with SQLAlchemy as far as I know. Default values and field definitions in dataclasses are stored as class attributes, which SQLAlchemy does overwrite. As long as you don't have those, I think that type annotations and SQLAlchemy definitions can live alongside each other.

Mismatch between type annotations and schema definition could be a problem. But maybe not as much of a problem in practice.

You are right, integrating declarative mapping with type checkers is the best way forward for SQLAlchemy and the majority of its users. But I still want the option of defining the mapping separately, even if it's a little bit more difficult and has some caveats. Glad to hear it's not going away.

@zzzeek
Copy link
Member

zzzeek commented Aug 12, 2020

right they are in __annotations__ as i i was reminded yesterday, so a hypothetical integration here could be that when SQLAlchemy applies InstrumentedAttribute it takes the existing __annotations__ and copies them into its new descriptor.

wouldn't that be helpful here? I gather full "dataclasses" go farther than that.

@keosak
Copy link
Contributor

keosak commented Aug 12, 2020

I think the main problem is the one that started this issue. Simple default values as specified using dataclasses break mapping, because SQLAlchemy detects the class attribute value and doesn't create a property for the column. The solution to that could be to ignore these values for dataclass fields and overwrite them with SQLAlchemy properties. These class attributes are not necessary for the function of dataclasses, constructors have default values baked in. They are also still available through the dataclasses.fields accessor. Class attributes are left on the class for introspection, which will not work the same way after SQLAlchemy mapping. But that's a price that I would gladly pay, it just needs to be clearly stated in the documentation.

Given the following patch to the rel_1_3 branch:

diff --git a/lib/sqlalchemy/orm/mapper.py b/lib/sqlalchemy/orm/mapper.py
index 0169532..3da308b 100644
--- a/lib/sqlalchemy/orm/mapper.py
+++ b/lib/sqlalchemy/orm/mapper.py
@@ -22,6 +22,12 @@ import sys
 import types
 import weakref
 
+# dataclasses were added in Python 3.7
+try:
+    import dataclasses
+except ImportError:
+    dataclasses = None
+
 from . import attributes
 from . import exc as orm_exc
 from . import instrumentation
@@ -2603,6 +2609,16 @@ class Mapper(InspectionAttr):
         else:
             return True
 
+    def _is_dataclass_field(self, assigned_name):
+        if dataclasses is None:
+            return False
+        if not dataclasses.is_dataclass(self.class_):
+            return False
+        for field in dataclasses.fields(self.class_):
+            if field.name == assigned_name:
+                return True
+        return False
+
     def _should_exclude(self, name, assigned_name, local, column):
         """determine whether a particular property should be implicitly
         present on the class.
@@ -2614,16 +2630,21 @@ class Mapper(InspectionAttr):
 
         # check for class-bound attributes and/or descriptors,
         # either local or from an inherited class
+        # ignore dataclass field default values
         if local:
             if self.class_.__dict__.get(
                 assigned_name, None
             ) is not None and self._is_userland_descriptor(
                 self.class_.__dict__[assigned_name]
-            ):
+            ) and not self._is_dataclass_field(assigned_name):
                 return True
         else:
             attr = self.class_manager._get_class_attr_mro(assigned_name, None)
-            if attr is not None and self._is_userland_descriptor(attr):
+            if (
+                attr is not None
+                and self._is_userland_descriptor(attr)
+                and not self._is_dataclass_field(assigned_name)
+            ):
                 return True
 
         if (

The script from the original comment now works as expected:

from __future__ import annotations
from dataclasses import dataclass, field

from sqlalchemy import Column, Table, String, Boolean
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import mapper, Session

Base = declarative_base()

@dataclass
class C:
    id: str
    active: bool = True

table = Table(
    "table",
    Base.metadata,
    Column("id", String, primary_key=True),
    Column("active", Boolean),
)

mapper(C, table)

Class C construction still works:

print(C('MY_ID'))
C(id='MY_ID', active=True)

Filtering on the active column now works:

query = Session().query(C).filter(C.active)
print(query)
SELECT "table".id AS table_id, "table".active AS table_active 
FROM "table"
WHERE "table".active

The same in current SQLAlchemy prints this instead, the WHERE clause is wrong and it doesn't SELECT the active column:

SELECT "table".id AS table_id
FROM "table"
WHERE true

@zzzeek
Copy link
Member

zzzeek commented Aug 12, 2020

@keosak in #5027 (comment) I closed the issue only because we have no contributors to work on it. if you are available to help implement dataclass support in SQLAlchemy we can open this right up and get to work on it for 1.4 . are you interested?

@zzzeek zzzeek added PRs (with tests!) welcome a fix or feature which is appropriate to be implemented by volunteers use case not really a feature or a bug; can be support for new DB features or user use cases not anticipated labels Aug 12, 2020
@zzzeek zzzeek added this to the 1.4 milestone Aug 12, 2020
@zzzeek zzzeek added the orm label Aug 12, 2020
@zzzeek
Copy link
Member

zzzeek commented Aug 12, 2020

we can add a Python 3 -only test suite now using a pytest filter that will be merged soon which can be seen here: https://gerrit.sqlalchemy.org/plugins/gitiles/sqlalchemy/sqlalchemy/+/refs/changes/71/2071/37/test/conftest.py#15

so steps are:

  1. add above pytest filter if not already present

  2. new test class test/orm/test_dataclasses_py3k.py

  3. tests should include:

    a. mapping data classes

    b. inheritance situations like those illustrated in this issue at the top

    c. dataclass constructor still works

    d. dataclass .fields still works

need documentation support, I would suggest a section maybe here: https://github.com/sqlalchemy/sqlalchemy/blob/master/doc/build/orm/mapping_columns.rst "Mapping Python Dataclasses" should include the caveat mentioned above about the attribute being replaced

can we also copy over __annotations__ as part of this patch?

This patch might be OK for 1.3.x but 1.3.x is pretty far along and it's likely best this remain as a 1.4 thing.

@zzzeek zzzeek reopened this Aug 12, 2020
@zzzeek
Copy link
Member

zzzeek commented Aug 12, 2020

also note we need to test with the newer proposed version of mapper() which includes all the "nice to have" of Declarative, that's not ticketed yet but the general idea is here: https://docs.sqlalchemy.org/en/14/changelog/migration_20.html#the-original-mapper-function-removed-replaced-with-a-declarative-compatibility-function

@keosak
Copy link
Contributor

keosak commented Aug 12, 2020

Sure, I can try over the weekend. I want to look over the dataclasses documentation and implementation more closely first, to make sure I'm not missing something. But then I'll try to come up with the tests.

@zzzeek
Copy link
Member

zzzeek commented Aug 12, 2020

thanks! you can work with regular PRs, we have an internal gerrit review system but you don't need to deal with it

@zzzeek
Copy link
Member

zzzeek commented Aug 13, 2020

hi folks on twitter are also requesting support for attrs: https://www.attrs.org/en/stable/ I haven't looked at this either. we dont need to support this immediately but it's important we don't make any API mistakes that would preclude its being supported also.

@keosak
Copy link
Contributor

keosak commented Aug 13, 2020

As far as I can tell, classical mapping for attrs already works.

The attrs library predates dataclasses, and uses a different mechanism to define attributes, but the basic principle behind both is the same. They take a class definition and generate __init__(), __eq__(), __repr__(), etc. methods. For attrs, this is confirmed by the documentation.

All attrs does is:

  1. take your declaration,
  2. write dunder methods based on that information,
  3. and attach them to your class.

It does nothing dynamic at runtime, hence zero runtime overhead. It’s still your class. Do with it as you please.

This means that once the class is processed by dataclass or attrs decorator, it behaves just like any other class you wrote yourself. Such classes can usually be straightforwardly passed to SQLAlchemy mapper, but there are three caveats.

  1. Declarative mapping is in my opinion not possible. The SQLAlchemy declarative base metaclass processes the model class before dataclass or attrs decorators. This means they will see SQLAlchemy instrumentation instead of the class definition they expect.

  2. The dataclass decorator leaves behind class attributes corresponding to simple default values as documentation. The patch I posted above just ignores those, because they are not needed. Both libraries also leave behind field definitions (__dataclass_fields__, __dataclass_params__, __attrs_attrs__), but those never come up in the process of mapping.

  3. When dataclass or attrs class is configured to be frozen, it has __setattr__() and __delattr__() methods that raise exceptions. I don't think these can co-exists with SQLAlchemy instrumentation. Fortunately this condition can be detected and we can refuse to map them.

A separate issue is mapping dataclass, attrs, and NamedTuple with SQLAlchemy composite. Both dataclass and attrs classes can be converted to tuples with astuple(), and NamedTuple already is one. We could take advantage of that instead of requiring such classes to define __composite_values__().

@hynek
Copy link

hynek commented Aug 14, 2020

The attrs library predates dataclasses, and uses a different mechanism to define attributes, but the basic principle behind both is the same.

To be clear: attrs allows it both ways: the old school way without types via attr.ib()/attrib() and the new way via attr.dataclass/attr.s(auto_attribs=True).

@keosak
Copy link
Contributor

keosak commented Aug 14, 2020

To be clear: attrs allows it both ways

Good to know. I just checked out the auto_attribs feature and it doesn't change anything in my analysis above. The class prepared by attrs decorator should be viable for classical mapping either way.

@hynek
Copy link

hynek commented Aug 14, 2020

Does this mean you don't need anything from us (attrs project)? I was planning to try out the mapper APIs for a long time so I'd love to have it working properly.

@keosak
Copy link
Contributor

keosak commented Aug 14, 2020

I will need to write tests to verify, but I am convinced that classes defined with attrs can already be mapped by calling mapper(). Note, however, that most SQLAlchemy users do not call mapper() themselves, they let the declarative base metaclass do it. As I wrote above, that use case will not work with attrs or dataclass, because the metaclass code runs before the decorators.

keosak added a commit to keosak/sqlalchemy that referenced this issue Aug 17, 2020


- Ignore simple default value class attributes.
- Add tests for classical mapping of dataclasses.
- Document limitations and show example.
keosak added a commit to keosak/sqlalchemy that referenced this issue Aug 17, 2020


- Ignore simple default value class attributes.
- Add tests for classical mapping of dataclasses.
- Document limitations and show example.
@zzzeek zzzeek removed PRs (with tests!) welcome a fix or feature which is appropriate to be implemented by volunteers question issue where a "fix" on the SQLAlchemy side is unlikely, hence more of a usage question labels Aug 17, 2020
@sqla-tester
Copy link
Collaborator

Václav Klusák has proposed a fix for this issue in the master branch:

Add support for classical mapping of dataclasses https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/2157

@zzzeek
Copy link
Member

zzzeek commented Sep 1, 2020

OK thanks very much for this! I have now reviewed dataclasses and will be adding new versions of the declarative decorator in a subsequent patch. the metaclass was never a requirement. also it might be time to start thinking about populating __annotations__ automatically from InstrumentedAttribute, which is pretty easy to do.

@michaeloliverx
Copy link
Contributor

dataclasses.field includes a metadata key:

metadata: This can be a mapping or None. None is treated as an empty dict. This value is wrapped in MappingProxyType() to make it read-only, and exposed on the Field object. It is not used at all by Data Classes, and is provided as a third-party extension mechanism. Multiple third-parties can each have their own key, to use as a namespace in the metadata

I wonder could sqlalchemy support dataclasses using the Declarative Table Configuration approach with something like:

@mapper_registry.mapped
@dataclass
class User:
    __tablename__ = "user"
    id: int = field(
        init=False,
        metadata={"sqlalchemy": Column(Integer, primary_key=True)},
    )
    name: str = field(
        init=False,
        metadata={"sqlalchemy": Column(String(50))},
    )
    fullname: Optional[str] = field(
        default=None,
        metadata={"sqlalchemy": Column(String(50))},
    )
    nickname: Optional[str] = field(
        default=None,
        metadata={"sqlalchemy": Column(String(12))},
    )

instead of using Declarative with Imperative Table approach:

@mapper_registry.mapped
@dataclass
class User:
    __table__ = Table(
        "user",
        mapper_registry.metadata,
        Column("id", Integer, primary_key=True),
        Column("name", String(50)),
        Column("fullname", String(50)),
        Column("nickname", String(12)),
    )
    id: int = field(init=False)
    name: str = None
    fullname: str = None
    nickname: str = None

@zzzeek
Copy link
Member

zzzeek commented Dec 3, 2020

sure, build yourself a base class or mixin that does this:

@mapper_registry.mapped
@dataclass
class User:
    @declared_attr
    def __table__(cls):
        if cls.dataclass_tablename in mapper_registry.metadata.tables:
            return mapper_registry.metadata.tables[cls.dataclass_tablename]

        cols = []

        for field_ in dataclasses.fields(cls):
            if "sqlalchemy" in field_.metadata:
                column = field_.metadata["sqlalchemy"]
                if column.name is None:
                    column.name = field_.name
                cols.append(column)

        return Table(
            cls.dataclass_tablename,
            mapper_registry.metadata,
            *cols
        )

    dataclass_tablename = "user"

    id: int = field(
        init=False,
        metadata={"sqlalchemy": Column(Integer, primary_key=True)},
    )

@zzzeek
Copy link
Member

zzzeek commented Dec 3, 2020

on that note, the doc at https://docs.sqlalchemy.org/en/14/orm/mapping_styles.html#declarative-mapping-with-dataclasses-and-attrs is a little off because it does not have a relationship() for user.addresses. this has to be added using __mapper_args__ and can also be integrated using an approach such as the above. adding that to the doc now.

Here is a way we can generalize the above idea. declarative supports an extension to how it gets at the attributes to be part of the mapping. i will try a POC for that now.

sqlalchemy-bot pushed a commit that referenced this issue Dec 3, 2020
Change-Id: Id1e3fb660c2aff107aa4a180ba565ee88fa6f56e
References: #5027
@zzzeek
Copy link
Member

zzzeek commented Dec 3, 2020

I didn't like this idea at first but once I did it this might be a nicer way to go, it sure is tricky to set up these dataclasses though. See #5745 + the patch for complete POC it would be extremely helpful if you could test it out.

@michaeloliverx
Copy link
Contributor

I was trying to use that dataclass example from the docs but couldn't figure out how to configure relationships so I abandoned it. I will try again using __mapper_args__ thanks. I like the idea of #5745 as it avoids some duplication, I'll be sure to test it out.

@jvanasco jvanasco added the classical mapping related to the "classical mapping" system label Sep 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
classical mapping related to the "classical mapping" system orm use case not really a feature or a bug; can be support for new DB features or user use cases not anticipated
Projects
None yet
Development

No branches or pull requests

8 participants