Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the DELETE before INSERT problem #2501

Open
sqlalchemy-bot opened this issue Jun 9, 2012 · 49 comments
Open

the DELETE before INSERT problem #2501

sqlalchemy-bot opened this issue Jun 9, 2012 · 49 comments
Labels
bug Something isn't working low priority orm unit of work the persistence phase within the ORM.
Milestone

Comments

@sqlalchemy-bot
Copy link
Collaborator

Migrated issue, originally created by Michael Bayer (@zzzeek)

this would be another semi-rewrite of the UOW. Basically we'd need to remove the hard dependency on "saves" before "deletes". We might even need to break "saves" into INSERT and UPDATE separately, and maybe that's a good idea anyway. We'd then need the UOW to do a better job of treating mapper-level and relationship-level nodes equally - while the "DependencyProcessor" acts as an "edge" between "SaveUpdate"/"Delete", if I screw around with things even trivially, a "DependencyProcessor" can easily find itself not between two nodes, then it gets dropped. So really DependencyProcessor should be treated somehow as a node itself, in addition to being something of an "edge".

The actual DELETE before INSERT would need to be expressed as a dependency between objects that are known to share either a primary key value or a unique constraint value, and the ORM would need some way of being told that objects should be unique along certain attribute combinations (which would need to be multiple).


Attachments: 2501.patch | 2501_proof_of_concept.py | 2501_poc_2018.py

@sqlalchemy-bot
Copy link
Collaborator Author

Changes by Michael Bayer (@zzzeek):

  • changed milestone from "0.x.xx" to "0.9.0"

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

Here's why the delete before insert problem is almost not solvable:

p1 = Parent(id=1)
c1 = Child()
p1.children.append(c1)
s.add(p1)
s.commit()

p2 = Parent(id=1)
p2.children.append(c1)
s.delete(p1)
s.commit()

what SQL do we emit for the second part ? can't INSERT the parent, need to DELETE the old one first. Can't DELETE the old one first, the Child still refers to it and first needs an UPDATE to the new row, which doesn't exist. Only if we could temporarily update Child to have parent_id=NULL can this work, which means two UPDATEs for the same row, like a post_update type of thing.

There's all these combinations that keep looking like we're just doing two separate flushes - one for the delete, including all dependencies there as though there were no replacement row available, then another flush for the INSERT.

@sqlalchemy-bot
Copy link
Collaborator Author

Changes by Michael Bayer (@zzzeek):

  • changed milestone from "0.9.0" to "blue sky"

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

#2765 related.

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

a proof of concept is attached illustrating how to get #2765's test to work. Encouragingly, we can get an external rule to do the whole thing (in this simple case at least) without any code changes.

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

fine, a patch is attached. it would be safe to release since it takes no effect unless a new mapper setting "maintain_unique_attributes" is used. As far as what breaks when one uses this attribute, hard to say, though i think mostly the breakage would be when the mapper has lots of other things dependent on it.

Usage so far requires that one can detect a "dupe" before the UOW assigns foreign keys. So in the example like we got in #2765, it has to check for uniques on the relationship-bound attribute where the change has occurred. but you can also give it several pairs of things to check.

from sqlalchemy import create_engine
from sqlalchemy import Column, Integer, String, Float, UniqueConstraint, ForeignKey
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, relationship, backref
from sqlalchemy.orm.dependency import OneToManyDP, attributes

Base = declarative_base()

class ParentEntity(Base):
    __tablename__ = 'parent'

    id = Column(Integer, primary_key=True)
    name = Column(String)

class ChildEntity(Base):
    __tablename__ = 'child'

    __table_args__ = (UniqueConstraint('parent_id', 'z', name='child_uk'),)

    __mapper_args__ = {
        "maintain_unique_attributes": ["z")](("parent",)

        # to detect conflicts for both relationship as well as on the integer
        # attribute
        # "maintain_unique_attributes": ["z"), ("parent_id", "z")](("parent",)
    }

    id = Column(Integer, primary_key=True)
    parent_id = Column(Integer, ForeignKey('parent.id'), nullable=False)
    z = Column(Float)

    parent = relationship("ParentEntity",
                          backref=backref("children",
                                          cascade="all, delete-orphan"))

engine = create_engine('sqlite://', echo=True)
Session = sessionmaker(bind=engine)

sess = Session()
Base.metadata.create_all(engine)

p = ParentEntity(name='Datum')
p.children = [ChildEntity(z=3.5), ChildEntity(z=5.5)](ChildEntity(z=2.5),)
sess.add(p)
sess.commit()

# Remove all existing children and create new, one of which is
# identical to a previously existing child
p.children = [ChildEntity(z=5.5)](ChildEntity(z=4.0),)
sess.commit()

c5 = sess.query(ChildEntity).filter_by(z=5.5).one()
sess.delete(c5)
sess.add_all([   ChildEntity(z=5.5, parent=p)
](
))
sess.commit()

@sqlalchemy-bot
Copy link
Collaborator Author

Changes by Michael Bayer (@zzzeek):

  • added labels: high priority
  • changed milestone from "blue sky" to "0.9.0"

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

also test an update (new patch coming):

# test with an update
c4 = sess.query(ChildEntity).filter_by(z=4.0).one()
c5 = sess.query(ChildEntity).filter_by(z=5.5).one()

c4.z = 5.5
sess.delete(c5)
sess.commit()

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

so it's ready for tests, and also probably needs some work regarding inheritance.

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

that example earlier with the Parent.children, yeah that kind of thing still might not work. but I don't think that's what folks are going for with this one in most cases.

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

this is very ready to go if it has tests, but 0.9 is way behind sched at this point so pushing this to 0.9.xx since it has minimal impact on existing code

@sqlalchemy-bot
Copy link
Collaborator Author

Changes by Michael Bayer (@zzzeek):

  • changed milestone from "0.9.0" to "0.9.xx"

@sqlalchemy-bot
Copy link
Collaborator Author

Marc Schlaich (@schlamar) wrote:

I'm running in the same issue than #2765 but I have more complicated edge cases.

My main functionality is to update an object from a dict, so e.g. d = {'children': [dict(id=1, z=2)](dict(z=1),) will create a new child for the first entry and will use the existing Child with id 1 from the database.

There are a lot of cases to be considered: replace an existing with a new one, the other way round, swap two existing children and probably more.

This gets interesting as soon as you have an ordered relationship with ordering_list and a delete-orphan and a unique constraint on (position, parent_id) in the child.

Right now I have "solved" all issues: First by triggering cascade delete for all existing children and then "reviving" the children which are still needed by make_transient. Next, I fix possible ordering constraint issues by only adding previously existing objects to the relationship (because if child id=1 with position=2 should now be on position=1 and a new child should be on 2, you try to insert child=2 with position=2 first). At last, I set all objects to the relationship.

Here is my code. get_related_obj is responsible for creating or querying the child object.

old_ids = set([for o in getattr(obj, field)](o.id))
value = [obj, v, field) for v in value](get_related_obj(session,)
setattr(obj, field, list())
session.flush()  # possibly trigger delete cascade
for v in value:
    state = inspection.inspect(v)
    if state.deleted:
        orm.make_transient(v)  # put it back to life
new_filtered = [for o in value if o.id and o.id in old_ids](o)
setattr(obj, field, new_filtered)
session.flush()  # trigger update of position
# now set the correct list
setattr(obj, field, value)

I'm not sure if this is covering all edge cases, maybe you found something I missed?

However, my main purpose of this comment is to give you some cases you should test for this patch. I'm pretty sure that not all of them are covered :)

@sqlalchemy-bot
Copy link
Collaborator Author

Marc Schlaich (@schlamar) wrote:

Oh no. The solution from above does not work predictably in the following case:

  • Replace the first existing child with a new one

Sometimes the test passes, sometimes not. Error is:

IntegrityError: (IntegrityError) columns parent_id, position are not unique u'INSERT INTO ...

I assume that there is not determined order to update pending instances? Probably they are in a dict?

Any idea how I can solve this issue?

@sqlalchemy-bot
Copy link
Collaborator Author

Marc Schlaich (@schlamar) wrote:

Looks like resetting all order_by properties of revived objects does the trick:

def get_order_by_properties(model, relation_name):
    relation = getattr(model, relation_name)
    if relation.property.order_by:
        return [for c in relation.property.order_by](c.name)
    return list()


value = [obj, v, field) for v in value](get_related_obj(session,)
setattr(obj, field, list())
session.flush()  # possibly trigger delete cascade
for v in value:
    state = inspection.inspect(v)
    if state.deleted:
        orm.make_transient(v)  # put it back to life

        # reset order by fields
        for attr in get_order_by_properties(type(obj), field):
            if hasattr(v, attr):
                setattr(v, attr, None)
setattr(obj, field, value)

@sqlalchemy-bot
Copy link
Collaborator Author

Marc Schlaich (@schlamar) wrote:

Another update: there are some issues with association proxy handling and doing flush when the object is not yet fully created (similar to #2809). So here is a fixed version for the code above:

def fix_order_by_constraint(session, obj, field, value):
    setattr(obj, field, list())
    session.flush()  # possibly trigger delete cascade
    for v in value:
        state = inspection.inspect(v)
        if state.deleted:
            orm.make_transient(v)  # put it back to life

            # reset order by fields
            for attr in get_order_by_properties(type(obj), field):
                if hasattr(v, attr):
                    setattr(v, attr, None)


value = [obj, v, field) for v in value](get_related_obj(session,)
if obj.id:
    fix_order_by_constraint(session, obj, field, value)
setattr(obj, field, value)

@sqlalchemy-bot
Copy link
Collaborator Author

Marc Schlaich (@schlamar) wrote:

Ok, I definitely should have read the warning in http://docs.sqlalchemy.org/en/rel_0_8/orm/extensions/orderinglist.html. Disabling the unique constraints right now. The above solution still have some issues.

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

hi can you please attach an actual code example of what it is you're trying to do ? thanks. note that this ticket involves specifically models with unique constraints where rows are being replaced.

@sqlalchemy-bot
Copy link
Collaborator Author

Marc Schlaich (@schlamar) wrote:

Just forget it :) I had unique constraints on my ordering column which is totally non-sense and which I should have realized even before reading the warning in the documentation...

@sqlalchemy-bot
Copy link
Collaborator Author

Anonymous wrote:

(original author: joshma)

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

here's a test that illustrates the feature working against the primary key, a change to the patch coming will handle this:

from sqlalchemy import create_engine
from sqlalchemy import Column, Integer, String, Float, UniqueConstraint, ForeignKey
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import Session, relationship, backref
from sqlalchemy.orm.dependency import OneToManyDP, attributes

Base = declarative_base()

class ParentEntity(Base):
    __tablename__ = 'parent'

    id = Column(Integer, primary_key=True)
    name = Column(String)

    __mapper_args__ = {
        "maintain_unique_attributes": [("id",)](("id",))
    }

engine = create_engine('sqlite://', echo=True)

Base.metadata.create_all(engine)

sess = Session(engine)

p1 = ParentEntity(id=1, name='p1')
sess.add(p1)
sess.commit()

p2 = ParentEntity(id=1, name='p2')
sess.delete(p1)
sess.add(p2)
sess.commit()

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

updated patch

@sqlalchemy-bot
Copy link
Collaborator Author

Changes by Michael Bayer (@zzzeek):

  • attached file 2501.patch

@sqlalchemy-bot
Copy link
Collaborator Author

Changes by Michael Bayer (@zzzeek):

  • edited description

@sqlalchemy-bot
Copy link
Collaborator Author

Changes by Michael Bayer (@zzzeek):

  • changed milestone from "0.9.xx" to "1.0"

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

this so far isn't hitting the same level of urgency as a lot of other things in 1.0 so moving it out for the moment.

@sqlalchemy-bot
Copy link
Collaborator Author

Changes by Michael Bayer (@zzzeek):

  • removed labels: high priority

@sqlalchemy-bot
Copy link
Collaborator Author

Changes by Michael Bayer (@zzzeek):

  • changed milestone from "1.0" to "1.1"

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

people really aren't asking for this. still a nice to have only

@sqlalchemy-bot
Copy link
Collaborator Author

Changes by Michael Bayer (@zzzeek):

  • changed milestone from "1.1" to "1.2"

@sqlalchemy-bot
Copy link
Collaborator Author

Changes by Michael Bayer (@zzzeek):

  • edited description

@sqlalchemy-bot
Copy link
Collaborator Author

Changes by Michael Bayer (@zzzeek):

  • changed milestone from "1.2" to "1.4"

@sqlalchemy-bot
Copy link
Collaborator Author

alexbecker (@alexbecker) wrote:

I would like to +1 this issue. I currently get around it by calling session.flush() after deleting the related objects but before creating their replacements, but sometimes it forces me to awkwardly re-arrange other code avoid flushing incomplete objects which will violate non-null constraints.

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

sorry :) if someone wants to genuinely work on this, e.g. tests, presentation, docs, etc. that would be welcome. Attaching a one-line change to the above attached proof of concept that still works, that's the code to be integrated.

@sqlalchemy-bot
Copy link
Collaborator Author

Changes by Michael Bayer (@zzzeek):

  • changed milestone from "1.4" to "1.6"

@sqlalchemy-bot
Copy link
Collaborator Author

Changes by Michael Bayer (@zzzeek):

  • attached file 2501_poc_2018.py

@petr-motejlek
Copy link

petr-motejlek commented Sep 1, 2020

Would there be a chance to at least mention this in the docs somewhere? We were bit by this yesterday, and took us a while to figure out, that all one needs to do to make this work is to remove the original value through the session (we were working with a one-to-one/one-to-zero relation and were using uselist=False, so in our case it was a single value, but the process is almost the same for collections in one-to-many), and do a manual flush to fire the SQL commands, and then do the real update. It's a bit cumbersome, but since the actual update logic should live in a separate shared layer, it should not be a big deal for people to do.

model.field = None
session.flush()
model.field = ...
session.commit()

@zzzeek
Copy link
Member

zzzeek commented Sep 1, 2020

it can be in the docs in a place like this: https://docs.sqlalchemy.org/en/13/orm/persistence_techniques.html

do you think you would have found it in there if something was present and what should it say to catch your attention?

it can also be in the FAQ: https://docs.sqlalchemy.org/en/13/faq/sessions.html probably should have an entry there.

the change here IIRC is a big scary kind of one and this is almost never an issue for folks with a really easy workaround, so there's not a big push to work on this (unless someone really wants to work on it).

@petr-motejlek
Copy link

Hey.

Apologies for not responding sooner. My todo list was getting the better of me :D.

I think the best place would be https://docs.sqlalchemy.org/en/13/orm/basic_relationships.html

That's where we began looking for how to write a relation. As I mentioned, our use case was a special One-to-Zero/One (ie. there can only be one relation or none). I think it's a special one-to-many case (and the flush is required for it too). So, perhaps a special section about the need for this flush could be enough, but it should I think be on the relationship page :)

What do you think?

@zzzeek
Copy link
Member

zzzeek commented Sep 18, 2020

well that page is supposed to be pretty top-level / basic. what we can do is have little "See alsos". the place where we have the more odd cases is listed out at https://docs.sqlalchemy.org/en/13/orm/relationship_persistence.html.

so maybe when we illusrate one-to-one at basic_relationships, we include a note that if the one-to-one is on a column that has a unique constraint on it (that's waht you have, right? unique constraint on the ".field" object?") we include a "note: if the one-to-one target has aunique constraint please see ". that should work

@petr-motejlek
Copy link

Yeah. That would work. It's true, that the issue is actually the unique constraint :).

@zzzeek
Copy link
Member

zzzeek commented Sep 28, 2021

re: trying to make this work with some feature, I really dont' see this happening unless someone was really down with providing tests / polish / docs / etc. the workaround is much more transparent than a built in configuration option would be. going to milestone this out.

that said we do need docs more urgently (like, a non-never time :) )

@zzzeek zzzeek modified the milestones: 1.6, 1.x.xx Sep 28, 2021
@jtpavlock
Copy link

jtpavlock commented Sep 12, 2022

Is there any way to work around this issue while operating inside the before_flush event? The use-case being to try to catch and handle potential violations from unknown sources of edits or additions prior to the items being flushed to the database.

@zzzeek
Copy link
Member

zzzeek commented Sep 13, 2022

Is there any way to work around this issue while operating inside the before_flush event? The use-case being to try to catch and handle potential violations from unknown sources of edits or additions prior to the items being flushed to the database.

well the heuristics of detecting it are kind of an open question, but assuming you figured out that part, if you wanted to change when the DELETE happens for a particular object within before_flush(), you would need to expunge() it from that Session, put it into a new Session that's just local to that before_flush() event (you would make it like new_session = Session(my_existing_session.connection()) so that it joins into the current transaction, then delete it from there (new_session.delete(object_to_delete); new_session.flush()).

@zzzeek zzzeek modified the milestones: 1.x.xx, 2.x.x Oct 10, 2022
@kbower
Copy link
Contributor

kbower commented Nov 6, 2023

if you wanted to change when the DELETE happens for a particular object within before_flush(), you would need to expunge() it from that Session, put it into a new Session that's just local to that before_flush() event (you would make it like new_session = Session(my_existing_session.connection()) so that it joins into the current transaction, then delete it from there (new_session.delete(object_to_delete); new_session.flush()).

I tried this recipe for a similar problem I ran into and had issues being able to cleanly .close() this additional session under the same transaction (and if I didn't close() it then when it was recycled I had other issues).

An alternative recipe is to expunge as needed, expire as needed and then use query(...).delete(synchronize_session=False) (not sure whether .delete() triggers autoflush but in my case, autoflush is disabled at this point).

@CaselIT
Copy link
Member

CaselIT commented Nov 6, 2023

I tried this recipe for a similar problem I ran into and had issues being able to cleanly .close() this additional session under the same transaction (and if I didn't close() it then when it was recycled I had other issues).

you can control the behaviour of a session given an existing connection with a transaction using the parameter https://docs.sqlalchemy.org/en/20/orm/session_api.html#sqlalchemy.orm.Session.params.join_transaction_mode

@kbower
Copy link
Contributor

kbower commented Nov 7, 2023

@CaselIT appreciated, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working low priority orm unit of work the persistence phase within the ORM.
Projects
None yet
Development

No branches or pull requests

6 participants