-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Classical mapping, dataclasses with initialiser and filter clauses #5027
Comments
SQLAlchemy is not going to by Python 3 only until version 2.0, but also a dataclass AFAIK is a class that is Python instrumented in a totally unique way, for which I have no idea how this interacts with standard Python descriptors that happen to provide for the same names. I would not expect dataclasses to be mappable at all right now and also I am not clear on the rationale for using them as SQLAlchemy declarative already has a class-declarative syntax. |
Nothing in the approach I have taken should mandate dropping support for Python 2, unless of course there needs to be some specific code to deal with dataclasses / attrs classes.
I wasn't sure it would work or not, so I tried and it worked quite well until I hit this roadblock (but only with initialisers).
Well, I was experimenting with a way to have "business domain" classes independent of SQLAlchemy (think "clean" or "hexagonal" architecture). Except that unless a more radical approach is taken, we're still dependent on SQLAlchemy (wrt, for instance, relations), even if no SQLAlchemy classes are imported from the "domain" modules. So in the end you're probably right: for my approach, SQLAlchemy declarative is probably the only approach with a decent chance to work in the short term without too much headache. Thanks for your feedback. |
Sorry, let me be more specific . For SQLAlchemy to support mapping of dataclasses, we need to have test suites that test this. As dataclasses are using a Python-3 only syntax, we can't have that syntax checked into our git tree within the test suite because the test suite will fail to load under Python 2.7. That is, while we can block tests from running under python 2, we don't as yet have a system by which some tests can not be parsed or loaded by the Python interpreter at all, so while we can test Python 3 concepts, we can't easily test things that require Python 3-only syntaxes. I haven't looked into dataclasses so there may be a way to test them functionally, using Py2-syntactically valid objects that behave the same as Py3 dataclasses, but that's not something that is prioritized right now.
can you give me a clue on what you gain by having the dataclass declare "active: bool" while at the same time having Table metadata that uses Boolean? isn't this needless repetition?
|
The goal is to have a domain independent of the ORM (and of such details as wether a field is limited to a certain size, or is indexed, etc.), and there are several ways to achieve that. All the examples I have seen in the literature imply some level of repetition. The approach I was implementing when I hit this issue is similar to the one advocated in https://github.com/cosmicpython/book/blob/master/chapter_02_repository.asciidoc#inverting-the-dependency-orm-depends-on-model |
OK unfortunately I mostly disagree with what I'm reading in that excerpt. Now of course, the original reason SQLAlchemy started with mapper() and a class was exactly that reason. Hey, we should have the database and the model be completely separate, that way you can swap in a mapper against Amazon S3 if you want and nothing will change. This is in practice mostly a myth, since when you normally use mapper(), you are magically assigning a bunch of attributes to the class that the class has to otherwise implicitly assume are there:
Now what you're trying to do with the dataclasses is to fix this, so that Order does in fact have some declaration for "dollar_amount" that is generic, and then the table mapping backs it up. That's not a model that was anticipated with SQLAlchemy as when I started looking at this, in the vast majority of cases people just want to put their Column declaration right there, and with an ORM like Django, these declarations are fairly generic so it "looks" OK, but then of course all the ugly details of how the Column should behave at a SQL level have to be somewhere also, so with Declarative, we just solved the problem in a way that is enormously popular, just put the Column right there. The mapper() idea came from Hibernate, which was for Java. In Hibernate, back in the early 2000's, we above would have to create full blown getter and setter methods for every database attribute, then put all the SQL schema stuff in an XML file. That was basically what I was going for with mapper() /Table, except hey it's Python we don't need the getters and setters. If you look at Hibernate today (OK well 8 years ago last time I looked), they totally abandoned that style and surprise moved to using Java "annotations" so that mapped classes look the same way as SQLAlchemy, with the column declaration in the model. So the above is an argument for why it's maybe not "important" that we support this, because it would be unpopular. But I think beyond that, there's a lot more assumption of a relational model that comes in as you continue to add more than just one attribute to one class. There's also the entire assumption of the ORM's persistence model; dollar_amount starts as None, not AttributeError, there's a "history" on the attribute that we can see changes on either side of a flush, there's a flush that is in the context of a transaction, there's a transaction. These are things that would almost never map the same way to a non-relational backend, so the SQLAlchemy mapped class in practice is in larger applications designed in a way that very much assumes the SQLAlchemy ORM or something that has worked very hard to act exactly the same way is backing it up. What about being able to swap out the schema with another one? This works at the level of at most, the names of columns, and the SQLAlchemy mapping already supports being able to have column names be distinct from the attribute names. But in the bigger sense, SQLAlchemy models and queries very intentionally retain relational concepts and geometries; you can write a JOIN from your Order to Item class, that strongly implies a very specific relational model. Can we really change how Order / Item are linked, e.g. that they would no longer be joined by a foreign key or association table, and not have to change our domain model at all? Maybe at the level of the Order class and instance that "items" is just a collection, sure, but all your queries would need to change, since the relational model is very explicit in the SQLAlchemy ORM. It's not to say it isn't possible to have the domain model be 100% separate from the relational store but SQLAlchemy intentionally creates a more explicit linkage between these two concepts and that's actually the whole secret sauce of the thing, we aren't trying to do the impossible thing that people don't actually need in practice (that you can take your whole relational model and slap it on top of a graph database without any changes). This is all way bigger than, "I just want to map to a dataclass", which is certainly something we could probably make work, but I think the larger ideal that this mapping implies is not really worth trying to achieve. |
Yeah, I agree (from my experience trying to make this approach work) with your analysis. There are ways to really decouple the domain classes from the ORM, but they imply even more duplication that what we've been discussing here, and are different to the one advocated in the Cosmic Python book. See for instance this example from a recent tutorial on the "hexagonal architecture": https://github.com/celinegilet/happy-town/blob/hexagonaleArchi/src/main/java/com/happytown/infrastructure/database/HabitantDatabaseAdapter.java (Here, we have JPA classes that are the mirror image of the domain classes, and "to" and "from" methods to convert from one to the other). |
to sum up, dataclasses don't work right now, I'm not opposed to making them work at least a little bit, but this is not priority right now unless someone was really motivated to work on it, closing this for now. |
hi @zzzeek I'm looking the world of pydantic and all kinds of nice validations and typing in your codebase. example - tiangolo/fastapi#214 Starlette and Fastapi have a bunch of questions around this topic and i admit it is extremely troublesome to keep them in sync. However dataclasses are somewhere where both can meet. Pydantic supports dataclasses - https://pydantic-docs.helpmanual.io/usage/dataclasses/ if sqlalchemy would add first class support for dataclasses, that would be a superb way in which the rest of the ecosystem can interoperate. Everyone is now backing that mechanism since it has been added as part of python core. I understand your reluctance - i propose a bridge in this way. sqlalchemy can include a class generator which takes dataclasses and makes them sqlalchemy compatible. It seems completely workable based on this library. |
my position here is that I would love for people to have this ability however I defininitely do not want SQLAlchemy having to maintain the fact that "str" would link up to the SQLAlchemy String datatype, and not, for exmaple, the oracle.CLOB datatype or MySQL TEXT with a particular encoding and collation. That is, I find the declaration of "mydata: str" to be insufficient if we are to consider that this declaration is also going to to define a backend SQL database type. There are of course many ways to allow that to be configurable, such as some kind of lookup table that you define else where, etc., however, these all represent taking the very effective decisions of the current Declarative system and replacing them with an all new way of working and configuring. SQLAlchemy cannot take on this burden. The "dataclasses" support is very simply wide open for a third party extension to take on the burden of having users who want to use "myfield: str" but then they need to set the collation on their string. To the degree that SQLAlchemy internals need adjustments to support dataclasses, I am fully in support of that. But building out a system that looks just like https://pypi.org/project/dataclassesdb/ should be maintained externally as you will have a lot of user API issues to work out and these are better off maintained under a different set of assumptions. |
Just noticed that "dataclasses" is already built on SQLAlchemy, so one can already use dataclasses with SQLAlchemy. Can you please define what "first class support" means in this case? |
@zzzeek thanks so much for answering. im conscious that this is a closed issue and sorry about that. want to begin off by saying that this is an attempt for sqlalchemy and systems like pydantic to interoperate. This is the current norm and state of art - https://github.com/archnaim/fastapi-hive-to-mysql-migration/blob/b90c00e843333d318eff8206dbae92f5a990204b/app/models/job_status.py Everyone has to declare the exact same structure in sqlalchemy as well as elsewhere. And this becomes unmaintanable and prone to errors. Everyone seems to be supporting dataclasses, so we praying that it becomes easier. right now we are unable to figure out how to do mixins in sqlalchemy using dataclasses. here's an crafted example for you. to show how we are envisioning to use dataclasses with pydantic and sqlalchemy together. im not sure if stuff like mixins are things that you plan to support, etc
|
That seems to be something that could be automated fairly easily using a metaclass.
I certainly hope that it does!
Declarative supports mixins, yes, see https://docs.sqlalchemy.org/en/13/orm/extensions/declarative/mixins.html
Python class mechanics are extremely customizable, like remarkably so. From class decorators, to metaclasses, to type constructors, to descriptors, there are an unbelievable number of ways to make classes automatically appear based on other declarations. It's not clear here if you are arguing that SQLAlchemy is missing some capability for the "dataclasses" configuration to be possible, or if the dataclasses community is missing the developer resources to make the above patterns possible? I can only help you with the former part. I am just one person and we are just one project. This is definitely something that many programmers are capable of creating. |
@zzzeek - well, im not arguing on anything really. just looking for a solution. I'm just wondering (based on the code sample I sent) - how would i write a piece of code that allows for mixins on the sqlalchemy side and allows for pydantic classes to be formed (without writing it again). I may have thought that dataclasses would be the way to go, since both sqlalchemy + pydantic (and indeed the larger ecosystem) seem to be supporting it....however I cannot get sqlalchemy to work with mixins unless i abandon dataclasses and move over to sqlalchemy entirely. Its a definite struggle on an end-user side since no project wants to support "a particular library" . |
I'm not looking deeply at what these libraries do, but most likely, you would not be reliant on a mixin or on declarative-style configuration at all, you would write a class decorator or a metaclass that scans through the dataclass definition and then calls upon the mapper function directly to map the class . That's all that "declarative" does after all. The "mapper" function can also be re-implemented on top of declarative itself which is where things are headed, this makes it such that the mapper() function gains some additional declarative niceties. Generally, the idea is, you have a dataclass with "myfield : int", you build up a data structure that has the Column objects you think should be sprung up from those datatypes, then pass the whole thing into a Table() definition and a mapper() function. Here's classical mapping: https://docs.sqlalchemy.org/en/13/orm/mapping_styles.html#classical-mappings
I'm not sure what this means. |
Apologies for replying to a closed ticket after this much time, but I want to express support for the general idea of classical mapping in SQLAlchemy. To bring another perspective to this debate, our company keeps schema definitions in SQL files and we have a library for parsing them into SQLAlchemy table objects. Given a CREATE TABLE accounts (
PRIMARY KEY (account_id),
account_id uuid NOT NULL,
widget_count int NOT NULL
);
CREATE TABLE widgets (
PRIMARY KEY (widget_id),
widget_id uuid NOT NULL,
account_id uuid NOT NULL REFERENCES accounts (account_id),
name text NOT NULL
); We currently define SQLAlchemy mapping like this: schema = parse("schema.sql")
class Widget(Base):
__table__ = schema["widgets"]
class Account(Base):
__table__ = schema["accounts"]
widgets = relationship(Widget) However, I am considering moving to this: class Widget:
# Note that we omit account_id, because it is an implementation detail.
widget_id: UUID
name: str
class Account:
account_id: UUID
widgets: List[Widget]
widget_count: int
schema = parse("schema.sql")
mapper(Widget, schema["widgets"])
mapper(Account, schema["accounts"], properties={"widgets": relationship(Widget)}) Either way keeps the schema definition in SQL, in my opinion the best language for the job, so there is no need to automagically map Python types to database types, or integrate foreign key definitions with PEP-484 annotations etc. The second approach, however, adds type annotations to model classes in the standard form supported and promoted by the Python standard library. Note that this style of mapping doesn't necessarily use dataclasses, athough the I recognize that this is kind of a purist approach, but I really like the separation between pure model modules and pragmatic infrastructure modules. Helps with unit testing as well. |
So there are two potential problems with the approach above. The first is, your Account has this nice "account_id", "wdigets", "widget_count" dataclass-style declaration on it at the class level. The first thing SQLAlchemy mapper() does is blow all that away, and it instead places its own Python descriptors known as InstrumentedAttribute in their place. So the attributes you have there are not runtime-introspectable, they won't come up in things like Sphinx documentation, they wont be there for unit tests, etc., they are basically something you can see in your source file and then they're gone, like a "# comment". So you will still get pep-484 validation, but all the other things those attributes might be nice for will not work. The InstruementedAttribute which gets put there is introspectable in its own way but not such that it's been designed to act like a dataclass, because I have not even looked at dataclasses yet. this is once 2.0 is out and a giant heap of cruft has been removed I can start looking at these things. SQLAlchemy will eventually add real pep-484 capabilities so that your normal declarative mappings will work with mypy; there is already an extension which does this and the direction of SQLAlchemy 2.0 is to be able to support this as a built in feature in a few years. Second one is, there is no mechanism of any kind to sanity check that what you've put there for your data classes matches what the mapping will have. Like Account: widgets: List[Widget], the mapping could set that up to a column against a String and not a relationship, and then the thing you've stated you are using pep-484 for will no longer work. classical mappers aren't going away but I think the pep-484 solution will be that SQLAlchemy mappings will deliver pep-484 information fully. |
I want to have type annotations because we already use Type annotations are stored in the Mismatch between type annotations and schema definition could be a problem. But maybe not as much of a problem in practice. You are right, integrating declarative mapping with type checkers is the best way forward for SQLAlchemy and the majority of its users. But I still want the option of defining the mapping separately, even if it's a little bit more difficult and has some caveats. Glad to hear it's not going away. |
right they are in wouldn't that be helpful here? I gather full "dataclasses" go farther than that. |
I think the main problem is the one that started this issue. Simple default values as specified using dataclasses break mapping, because SQLAlchemy detects the class attribute value and doesn't create a property for the column. The solution to that could be to ignore these values for dataclass fields and overwrite them with SQLAlchemy properties. These class attributes are not necessary for the function of dataclasses, constructors have default values baked in. They are also still available through the Given the following patch to the diff --git a/lib/sqlalchemy/orm/mapper.py b/lib/sqlalchemy/orm/mapper.py
index 0169532..3da308b 100644
--- a/lib/sqlalchemy/orm/mapper.py
+++ b/lib/sqlalchemy/orm/mapper.py
@@ -22,6 +22,12 @@ import sys
import types
import weakref
+# dataclasses were added in Python 3.7
+try:
+ import dataclasses
+except ImportError:
+ dataclasses = None
+
from . import attributes
from . import exc as orm_exc
from . import instrumentation
@@ -2603,6 +2609,16 @@ class Mapper(InspectionAttr):
else:
return True
+ def _is_dataclass_field(self, assigned_name):
+ if dataclasses is None:
+ return False
+ if not dataclasses.is_dataclass(self.class_):
+ return False
+ for field in dataclasses.fields(self.class_):
+ if field.name == assigned_name:
+ return True
+ return False
+
def _should_exclude(self, name, assigned_name, local, column):
"""determine whether a particular property should be implicitly
present on the class.
@@ -2614,16 +2630,21 @@ class Mapper(InspectionAttr):
# check for class-bound attributes and/or descriptors,
# either local or from an inherited class
+ # ignore dataclass field default values
if local:
if self.class_.__dict__.get(
assigned_name, None
) is not None and self._is_userland_descriptor(
self.class_.__dict__[assigned_name]
- ):
+ ) and not self._is_dataclass_field(assigned_name):
return True
else:
attr = self.class_manager._get_class_attr_mro(assigned_name, None)
- if attr is not None and self._is_userland_descriptor(attr):
+ if (
+ attr is not None
+ and self._is_userland_descriptor(attr)
+ and not self._is_dataclass_field(assigned_name)
+ ):
return True
if ( The script from the original comment now works as expected: from __future__ import annotations
from dataclasses import dataclass, field
from sqlalchemy import Column, Table, String, Boolean
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import mapper, Session
Base = declarative_base()
@dataclass
class C:
id: str
active: bool = True
table = Table(
"table",
Base.metadata,
Column("id", String, primary_key=True),
Column("active", Boolean),
)
mapper(C, table) Class print(C('MY_ID'))
Filtering on the query = Session().query(C).filter(C.active)
print(query) SELECT "table".id AS table_id, "table".active AS table_active
FROM "table"
WHERE "table".active The same in current SQLAlchemy prints this instead, the SELECT "table".id AS table_id
FROM "table"
WHERE true |
@keosak in #5027 (comment) I closed the issue only because we have no contributors to work on it. if you are available to help implement dataclass support in SQLAlchemy we can open this right up and get to work on it for 1.4 . are you interested? |
we can add a Python 3 -only test suite now using a pytest filter that will be merged soon which can be seen here: https://gerrit.sqlalchemy.org/plugins/gitiles/sqlalchemy/sqlalchemy/+/refs/changes/71/2071/37/test/conftest.py#15 so steps are:
need documentation support, I would suggest a section maybe here: https://github.com/sqlalchemy/sqlalchemy/blob/master/doc/build/orm/mapping_columns.rst "Mapping Python Dataclasses" should include the caveat mentioned above about the attribute being replaced can we also copy over This patch might be OK for 1.3.x but 1.3.x is pretty far along and it's likely best this remain as a 1.4 thing. |
also note we need to test with the newer proposed version of mapper() which includes all the "nice to have" of Declarative, that's not ticketed yet but the general idea is here: https://docs.sqlalchemy.org/en/14/changelog/migration_20.html#the-original-mapper-function-removed-replaced-with-a-declarative-compatibility-function |
Sure, I can try over the weekend. I want to look over the |
thanks! you can work with regular PRs, we have an internal gerrit review system but you don't need to deal with it |
hi folks on twitter are also requesting support for attrs: https://www.attrs.org/en/stable/ I haven't looked at this either. we dont need to support this immediately but it's important we don't make any API mistakes that would preclude its being supported also. |
As far as I can tell, classical mapping for attrs already works. The
This means that once the class is processed by
A separate issue is mapping |
To be clear: |
Good to know. I just checked out the |
Does this mean you don't need anything from us (attrs project)? I was planning to try out the mapper APIs for a long time so I'd love to have it working properly. |
I will need to write tests to verify, but I am convinced that classes defined with |
Václav Klusák has proposed a fix for this issue in the master branch: Add support for classical mapping of dataclasses https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/2157 |
OK thanks very much for this! I have now reviewed dataclasses and will be adding new versions of the declarative decorator in a subsequent patch. the metaclass was never a requirement. also it might be time to start thinking about populating |
I wonder could sqlalchemy support dataclasses using the Declarative Table Configuration approach with something like: @mapper_registry.mapped
@dataclass
class User:
__tablename__ = "user"
id: int = field(
init=False,
metadata={"sqlalchemy": Column(Integer, primary_key=True)},
)
name: str = field(
init=False,
metadata={"sqlalchemy": Column(String(50))},
)
fullname: Optional[str] = field(
default=None,
metadata={"sqlalchemy": Column(String(50))},
)
nickname: Optional[str] = field(
default=None,
metadata={"sqlalchemy": Column(String(12))},
) instead of using Declarative with Imperative Table approach: @mapper_registry.mapped
@dataclass
class User:
__table__ = Table(
"user",
mapper_registry.metadata,
Column("id", Integer, primary_key=True),
Column("name", String(50)),
Column("fullname", String(50)),
Column("nickname", String(12)),
)
id: int = field(init=False)
name: str = None
fullname: str = None
nickname: str = None |
sure, build yourself a base class or mixin that does this: @mapper_registry.mapped
@dataclass
class User:
@declared_attr
def __table__(cls):
if cls.dataclass_tablename in mapper_registry.metadata.tables:
return mapper_registry.metadata.tables[cls.dataclass_tablename]
cols = []
for field_ in dataclasses.fields(cls):
if "sqlalchemy" in field_.metadata:
column = field_.metadata["sqlalchemy"]
if column.name is None:
column.name = field_.name
cols.append(column)
return Table(
cls.dataclass_tablename,
mapper_registry.metadata,
*cols
)
dataclass_tablename = "user"
id: int = field(
init=False,
metadata={"sqlalchemy": Column(Integer, primary_key=True)},
) |
on that note, the doc at https://docs.sqlalchemy.org/en/14/orm/mapping_styles.html#declarative-mapping-with-dataclasses-and-attrs is a little off because it does not have a relationship() for user.addresses. this has to be added using Here is a way we can generalize the above idea. declarative supports an extension to how it gets at the attributes to be part of the mapping. i will try a POC for that now. |
Change-Id: Id1e3fb660c2aff107aa4a180ba565ee88fa6f56e References: #5027
I didn't like this idea at first but once I did it this might be a nicer way to go, it sure is tricky to set up these dataclasses though. See #5745 + the patch for complete POC it would be extremely helpful if you could test it out. |
I was trying to use that dataclass example from the docs but couldn't figure out how to configure relationships so I abandoned it. I will try again using |
Hi,
I have come across some unexpected behaviour when using dataclasses mapped to SQLAlchemy using classical mapping.
Here's a snippet with the essence of the issue:
So, yes, I have found a workaround (using
= field(default_factory=lambda: something)
as an initializer instead of just= something
), but it doesn't look as good.Is there some general advice regarding the interaction of dataclasses and SQLAlchemy ORM? (I have also tried with attrs, and got the same issue).
The text was updated successfully, but these errors were encountered: