Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for generated columns in postgres and mysql #4894

Closed
CaselIT opened this issue Oct 7, 2019 · 57 comments
Closed

Support for generated columns in postgres and mysql #4894

CaselIT opened this issue Oct 7, 2019 · 57 comments
Labels
high priority mysql postgresql PRs (with tests!) welcome a fix or feature which is appropriate to be implemented by volunteers schema things related to the DDL related objects like Table, Column, CreateIndex, etc. use case not really a feature or a bug; can be support for new DB features or user use cases not anticipated
Milestone

Comments

@CaselIT
Copy link
Member

CaselIT commented Oct 7, 2019

Add support for specifying a column as generated by the database with GENERATED ALWAYS

Currently postgres and mysql support it (the list may not be complete)
https://www.postgresql.org/docs/12/ddl-generated-columns.html
https://dev.mysql.com/doc/refman/8.0/en/create-table-generated-columns.html

I've tried to search in the documentation and it does not seems to be a way to specify a generated column in sqlalchemy.
I've found a stack overflow answer https://stackoverflow.com/questions/57596603/sqlalchemy-mysql-column-ddl-with-generated-always-as but it does not seem to be the cleanest way to deal with it.

maybe a Generated construct used in the column with the code would be appropriate?
Something like

sa.Table('foo', meta,
  sa.Column('quantity', sa.Integer, nullable=False),
  sa.Column('price', sa.Float, nullable=False),
  sa.Column('revenue', sa.Float, Computed('quantity * price', always=True))
)

edit: updated example following #4894 (comment)

Current support and syntax seems to be:

@zzzeek
Copy link
Member

zzzeek commented Oct 7, 2019

there's not and I would need PR contributions for this to happen.

is GENERATED part of the SQL standard? if not, then the keyword arguments need to be mysql_generated, postgresql_generated

@zzzeek zzzeek added use case not really a feature or a bug; can be support for new DB features or user use cases not anticipated mysql postgresql PRs (with tests!) welcome a fix or feature which is appropriate to be implemented by volunteers labels Oct 7, 2019
@zzzeek zzzeek added this to the 1.3.xx milestone Oct 7, 2019
@zzzeek zzzeek added the schema things related to the DDL related objects like Table, Column, CreateIndex, etc. label Oct 7, 2019
@zzzeek
Copy link
Member

zzzeek commented Oct 7, 2019

if this is more like a SQL standard I might favor a real Generated construct that delivers keywords like ALWAYS and STORED based on flags

@zzzeek
Copy link
Member

zzzeek commented Oct 7, 2019

i.e. it works like CheckConstraint a bit

@CaselIT
Copy link
Member Author

CaselIT commented Oct 7, 2019

From wikipedia (hardly the best source) it seems that is not sql standard
https://en.wikipedia.org/wiki/Virtual_column

it seems to be supported by other db other than postgres and mysql though

@zzzeek
Copy link
Member

zzzeek commented Oct 7, 2019

so mysql and PG are..using the same syntax ? did one get it from the other?

@zzzeek
Copy link
Member

zzzeek commented Oct 7, 2019

OK per this article, oracle and SQL Server have something too.

this is enough for us to make a real construct out of this that's in Core. but, we need to work out all the syntaxes.

what we dont want to do, is implement for MySQL and PG but then not take into account Oracle or SQL Server and then we have to rework the whole thing to accomodate for whatever weird thing they want to do.

in the worst case, one or more of these DBs is doing it so differently that it just can't work. that's the problem we had with for example MERGE and also for a long time when Oracle had its own recursive query thing that was not a CTE.

@CaselIT
Copy link
Member Author

CaselIT commented Oct 7, 2019

I'm not sure regarding the syntax.
PG just added support in version 12, so it may have adopted the mysql / mariadb syntax

@zzzeek
Copy link
Member

zzzeek commented Oct 7, 2019

well here's oracle, this is looking like a standard syntax:


CREATE TABLE employees (
  id          NUMBER,
  first_name  VARCHAR2(10),
  last_name   VARCHAR2(10),
  salary      NUMBER(9,2),
  comm1       NUMBER(3),
  comm2       NUMBER(3),
  salary1     AS (ROUND(salary*(1+comm1/100),2)),
  salary2     NUMBER GENERATED ALWAYS AS (ROUND(salary*(1+comm2/100),2)) VIRTUAL,
  CONSTRAINT employees_pk PRIMARY KEY (id)
);

@zzzeek
Copy link
Member

zzzeek commented Oct 7, 2019

SQL Server, different, but not terrible

CREATE TABLE dbo.Products
   (
      ProductID int IDENTITY (1,1) NOT NULL
      , QtyAvailable smallint
      , UnitPrice money
      , InventoryValue AS QtyAvailable * UnitPrice
    )
;

@CaselIT
Copy link
Member Author

CaselIT commented Oct 7, 2019

if this is more like a SQL standard I might favor a real Generated construct that delivers keywords like ALWAYS and STORED based on flags

Yes that is probably best since there seems to be stored/virtual option (and maybe other)

I'll update the initial issue

@zzzeek
Copy link
Member

zzzeek commented Oct 7, 2019

I'd like a new construct for this..

officially, it would be Virtual(), but that seems a little awkward. Computed() seems a lot better.

Behaviorally, the construct acts a lot like a server_onupdate, except it accommodate for server_default too. this has to be worked out. when this construct is present, both of those have to be implicitly set up, as this impacts the ORM knowing it has to fetch these (e.g. server_onupdate=FetchedValue()).

Syntax would be like a CheckConstraint() to make it somewhat succinct:

class Model(Base):

    id = Column(Integer, primary_key=True)
    quantity = Column(Integer, nullable=False)
    unit_price_taxed = Column(DECIMAL(20, 4), nullable=False)
    line_price_taxed = Column(
        DECIMAL(20, 4),
        Computed("quantity * unit_price_taxed", always=True),
        index=True,
    )

above, it would raise an error to also set server_default or server_onupdate because Computed is present. sqlalchemy.schema.Computed would extend sqlalchemy.schema.FetchedValue and would be automatically assigned to server_default and server_onupdate when used.

tests go into test/sql/test_defaults.py, or alternatively a new suite test/sql/test_computed.py. these should be round trip tests for those databases that support comptued columns. then each backend dialect gets compilation tests.

dialects to support first, Postgresql and MySQL, and very hopefully Oracle and MSSQL. I can accept a PR for just the first two. For Firebird, I don't really care, that's mostly a not-supported dialect right now. This is enough of a feature that I can take over a PR and finish it up if someone wants to just put some effort into getting it started.

I didn't realize that many DBs support this by now so this is actually kind of a high priority. if there's no authors for a PR I might find time to get around to this alone.

@CaselIT
Copy link
Member Author

CaselIT commented Oct 7, 2019

Computed("quantity * unit_price_taxed", always=True),

I'm not sure if always if best to convey if they should be stored or not.
Maybe stored would be cleared (from what I understand a stored computed column is computed on insert/update, while a not stored one is computed in the select, like in a normal view)

As an alternative persistent as used in maria db may be clearer

There are two types of generated columns:
PERSISTENT (a.k.a. STORED): This type's value is actually stored in the table.
VIRTUAL: This type's value is not stored at all. Instead, the value is generated dynamically when the table is queried. This type is the default.

@zzzeek
Copy link
Member

zzzeek commented Oct 7, 2019

Sure, basically we need a matrix of what behaviors are selectable among these four vendors and then come up with flags + sensibele defaults

@zzzeek
Copy link
Member

zzzeek commented Oct 7, 2019

five vendors sigh...mariadb mysql postgresql oracle mssql

@CaselIT
Copy link
Member Author

CaselIT commented Oct 7, 2019

seems to be:

@zzzeek
Copy link
Member

zzzeek commented Oct 7, 2019

oracle: "If the datatype is omitted, it is determined based on the result of the expression. The GENERATED ALWAYS and VIRTUAL keywords are provided for clarity only." e.g. they dont support "persistent" . SQL Server seems to support "persisted" as an option

@zzzeek
Copy link
Member

zzzeek commented Oct 7, 2019

so what is the word "ALWAYS"? that's just boilerplate?

@CaselIT
Copy link
Member Author

CaselIT commented Oct 7, 2019

so what is the word "ALWAYS"? that's just boilerplate?

not sure. bot generated and always are optional in oracle, while seem required in PG

@zzzeek
Copy link
Member

zzzeek commented Oct 7, 2019

oh and PG / Oracle have mutually exclusive storage implementations. So we likely should support that too, for people who don't care if its stored or not based on backend, some kind of "auto" option

@zzzeek
Copy link
Member

zzzeek commented Oct 7, 2019

OK so it's basically like what MySQL says: "AS (expr) indicates that the column is generated and defines the expression used to compute column values. AS may be preceded by GENERATED ALWAYS to make the generated nature of the column more explicit." no biggie

@CaselIT
Copy link
Member Author

CaselIT commented Oct 7, 2019

also in mysql maria db generated always seems to optional

@zzzeek
Copy link
Member

zzzeek commented Oct 7, 2019

weird this is not SQL standard? it is sure acting like it

@CaselIT
Copy link
Member Author

CaselIT commented Oct 8, 2019

regarding PG it seems that it will eventually support virtual columns
https://www.postgresql.org/message-id/57c70b3e-b29a-ad08-cdce-746f6d561e3d%402ndquadrant.com

should sqlalchemy raise an error if persisted=False or suppose that PG will eventually add it and let the db raise the error in case the user uses a version that does not support it?

@zzzeek
Copy link
Member

zzzeek commented Oct 8, 2019

so SQL 2003 includes "GENERATED AS" in the spec specifically for identity columns. so it's in the spec, but the Wikipedia entry refers to "virtual columns" which aren't really stored, so that's the part that's not in the spec.

anyway we are good to go on this thanks for being involved!

<column definition> ::=
<column name> [ <data type or domain name> ]
[ <default clause> | <identity column specification> | <generation clause> ]
[ <column constraint definition>... ]
[ <collate clause> ]
<data type or domain name> ::=
<data type>
| <domain name>
<column constraint definition> ::=
[ <constraint name definition> ] <column constraint> [ <constraint characteristics> ]
<column constraint> ::=
NOT NULL
| <unique specification>
| <references specification>
| <check constraint definition>
<identity column specification> ::=
GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY
[ <left paren> <common sequence generator options> <right paren> ]
<generation clause> ::= <generation rule> AS <generation expression>
<generation rule> ::= GENERATED ALWAYS
<generation expression> ::= <left paren> <value expression> <right paren>

@CaselIT
Copy link
Member Author

CaselIT commented Oct 8, 2019

It seems that mysql supports not null with a generated column while mariadb does not

should I add checks in the dialect or demand them to the db?

@CaselIT
Copy link
Member Author

CaselIT commented Oct 8, 2019

mssql does not accept the type on a computed colum. does not allow null but allows not null on persisted columns.

@zzzeek
Copy link
Member

zzzeek commented Oct 8, 2019

mssql's type is likely..derived from the expression? I think for MSSQL we just omit the type when we render. for null/not null, again if we can get by without rendering anything for "null" / "not null" by default, that would be best.

I saw you mention you were calling it "Computed". Should we call it "Generated"? since that's what the SQL looks like, it might be clearest.

@CaselIT
Copy link
Member Author

CaselIT commented Oct 8, 2019

already renamed in the pr. I can revert it
for me its equally clear. maybe generated is more like generate of sql?

regarding the not null there the default (ie omitting the nullable should always work)

@zzzeek
Copy link
Member

zzzeek commented Oct 8, 2019

it's a hard call because Computed describes it better but Generated is more what the SQL looks like. just leave it for now whichever way you have it

@CaselIT
Copy link
Member Author

CaselIT commented Oct 8, 2019

mssql's type is likely..derived from the expression? I think for MSSQL we just omit the type when we render

Yes it infers it. But it is strange that it errors if the type is specified. Oracle also infers it but does not error if the type is specified (maybe if it is not compatible?).
I have omitted the type in mssql when the colum is computed

@CaselIT
Copy link
Member Author

CaselIT commented Oct 15, 2019

I've done a fist implementation. I'm not sure if the inspect module needs to be updated or how to update it.

CaselIT added a commit to CaselIT/sqlalchemy that referenced this issue Oct 18, 2019
Add a Generated schema item that can be used in a column definition
Add default compiler implementation

ref sqlalchemy#4894

Fixes: #<issue number>
@sqla-tester
Copy link
Collaborator

CaselIT has proposed a fix for this issue in the master branch:

Support for generated columns https://gerrit.sqlalchemy.org/1539

@CaselIT
Copy link
Member Author

CaselIT commented Nov 5, 2019

I'm not sure if we want to close this after #4928 is merged or if it should stay open since the orm support is still missing

@zzzeek
Copy link
Member

zzzeek commented Nov 5, 2019

I'm adding the ORM test, that just tests that the unit of work recognizes the Computed as a server / on update default, to the current gerrit. The only thing I see beyond that is the flag that gets into "this is a read only attribute" and I think as that is more of a behavioral change that should also impact any expression level column_property(), it should likely be a separate issue and maybe targeted at 1.4. the basic ability for you to use computed columns including that the ORM does the right thing (except for if you write to the column in Python) will be there in 1.3.x with the current PR.

@CaselIT
Copy link
Member Author

CaselIT commented Nov 5, 2019

I started something regarding the read only attributes in #4896 if you want to take a look.

I don't remember if I also added orm tests there.

I agree that it could be done in a separate issue

@zzzeek
Copy link
Member

zzzeek commented Nov 5, 2019

well next bad news, Oracle RETURNING does not work for computed columns. easiest approach right now is add another supports to the tests and eager_defaults with Oracle + Computed just won't work.

@zzzeek
Copy link
Member

zzzeek commented Nov 5, 2019

though need to see what PG does

@zzzeek
Copy link
Member

zzzeek commented Nov 5, 2019

pg 12 works. OK

@CaselIT
Copy link
Member Author

CaselIT commented Nov 5, 2019

I guess changing oracle to the MySQL behavior in case of computed columns would not be simple

@zzzeek
Copy link
Member

zzzeek commented Nov 5, 2019

oracle is really bad. the RETURNING returns the value before the UPDATE happens. so it's not only wrong it's misleading. im having it emit a warning.

@CaselIT
Copy link
Member Author

CaselIT commented Nov 5, 2019

that seems like it was implemented to intentionally cause problems!

@zzzeek
Copy link
Member

zzzeek commented Nov 5, 2019

the latest gerrit has a lot of oracle additions if you want to look.l

@CaselIT
Copy link
Member Author

CaselIT commented Nov 5, 2019

Nice, since the orm test from the #4896 are in this issue, I think we can probably close that and open a new pr with only the read only attributes parts in it

@sqla-tester
Copy link
Collaborator

CaselIT has proposed a fix for this issue in the rel_1_3 branch:

Support for generated columns https://gerrit.sqlalchemy.org/1559

sqlalchemy-bot pushed a commit that referenced this issue Nov 9, 2019
Added DDL support for "computed columns"; these are DDL column
specifications for columns that have a server-computed value, either upon
SELECT (known as "virtual") or at the point of which they are INSERTed or
UPDATEd (known as "stored").  Support is established for Postgresql, MySQL,
Oracle SQL Server and Firebird. Thanks to Federico Caselli for lots of work
on this one.

ORM round trip tests included.  The ORM makes use of existing
FetchedValue support and no additional ORM logic is present for
the basic feature.

It has been observed that Oracle RETURNING does not return the
new value of a computed column upon UPDATE; it returns the
prior value.  As this is very dangerous, a warning is emitted
if a computed column is rendered into the RETURNING clause
of an UPDATE statement.

Fixes: #4894
Closes: #4928
Pull-request: #4928
Pull-request-sha: d39c521

Change-Id: I2610b2999a5b1b127ed927dcdaeee98b769643ce
(cherry picked from commit 602d1e6dfd538980bb8d513867b17dbc2b4b92dd)
@zzzeek zzzeek modified the milestones: 1.3.xx, 1.3.x Dec 18, 2019
sumau pushed a commit to sumau/sqlalchemy that referenced this issue Dec 29, 2019
Added DDL support for "computed columns"; these are DDL column
specifications for columns that have a server-computed value, either upon
SELECT (known as "virtual") or at the point of which they are INSERTed or
UPDATEd (known as "stored").  Support is established for Postgresql, MySQL,
Oracle SQL Server and Firebird. Thanks to Federico Caselli for lots of work
on this one.

ORM round trip tests included.  The ORM makes use of existing
FetchedValue support and no additional ORM logic is present for
the basic feature.

It has been observed that Oracle RETURNING does not return the
new value of a computed column upon UPDATE; it returns the
prior value.  As this is very dangerous, a warning is emitted
if a computed column is rendered into the RETURNING clause
of an UPDATE statement.

Fixes: sqlalchemy#4894
Closes: sqlalchemy#4928
Pull-request: sqlalchemy#4928
Pull-request-sha: d39c521

Change-Id: I2610b2999a5b1b127ed927dcdaeee98b769643ce
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority mysql postgresql PRs (with tests!) welcome a fix or feature which is appropriate to be implemented by volunteers schema things related to the DDL related objects like Table, Column, CreateIndex, etc. use case not really a feature or a bug; can be support for new DB features or user use cases not anticipated
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants