Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

col[2], col[1] = col[1], col[2] loses #1103

Closed
sqlalchemy-bot opened this issue Jul 10, 2008 · 9 comments
Closed

col[2], col[1] = col[1], col[2] loses #1103

sqlalchemy-bot opened this issue Jul 10, 2008 · 9 comments
Labels
bug Something isn't working orm quagmire really hard to make the issue work "correctly" without lots of complication, risk
Milestone

Comments

@sqlalchemy-bot
Copy link
Collaborator

sqlalchemy-bot commented Jul 10, 2008

Migrated issue, originally created by jek (@jek)

List collections lose on this:

col[2], col[1] = col[1], col[2]

The collections package makes the assumption that a child will only be represented once in a collection, and as such fires 1 delete event too many here, orphaning one of the children in the ORM's bookkeeping while leaving it in the actual list. It's easy to see it here if the child has a backref to the parent- it'll be None after this assignment.

I think the approach here is to keep a refcount in the CollectionAdapter and quash events while the membership count for a particular child is > 1. It would need to do bookkeeping based on the id() of the child to avoid conflicts with user comparison overrides.

Initially reported by maqr on the channel.

@sqlalchemy-bot
Copy link
Collaborator Author

Changes by Michael Bayer (@zzzeek):

  • changed milestone from "0.7.xx" to "blue sky"

@sqlalchemy-bot
Copy link
Collaborator Author

sqlalchemy-bot commented Jan 7, 2013

Michael Bayer (@zzzeek) wrote:

a full test case:


from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker, relationship
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, Text, ForeignKey
from sqlalchemy import inspect

sa_engine = create_engine("sqlite:///:memory:", echo=True)

Session = sessionmaker(bind=sa_engine)

session = Session()

Base = declarative_base()


class Parent(Base):
    __tablename__ = 'parent'
    id = Column(Integer, primary_key=True)
    name = Column(Text)
    children = relationship('Child', backref='parent')
    def __repr__(self):
        return self.name


class Child(Base):
    __tablename__ = 'child'
    id = Column(Integer, primary_key=True)
    name = Column(Text)
    parent_id = Column(Integer, ForeignKey('parent.id'))
    def __repr__(self):
        return self.name

Base.metadata.create_all(sa_engine)

p = Parent(name='Thomas')

session.add(p)

c1 = Child(name="Mary")
c2 = Child(name="John")
c3 = Child(name="Kenny")

p.children.append(c1)
p.children.append(c2)
p.children.append(c3)

session.commit()

p = session.query(Parent).get(1)

print(p.children)  # prints [John, Kenny, Mary]

p.children[1], p.children[2] = p.children[2], p.children[1]

print(p.children)  # prints [Kenny, John]

session.commit()

print(p.children)  # prints [Mary, John] Oh my God! They killed Kenny!

workarounds:

p.children[1:2] = [c1, c2]

or assign again:

p.children[1], p.children[2] = p.children[2], p.children[1]
p.children[1] = p.children[1]

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

consider relationship(..., track_dupes=True), as this bookkeeping is simple but expensive.

@sqlalchemy-bot sqlalchemy-bot added bug Something isn't working orm labels Nov 27, 2018
@sqlalchemy-bot sqlalchemy-bot added this to the blue sky milestone Nov 27, 2018
@zzzeek
Copy link
Member

zzzeek commented Dec 22, 2018

here's one approach (more to come), track the state specific to the backref on the state itself, this doesn't work without much more complexity because as the collections are expired etc. that would have to be updated here:

diff --git a/lib/sqlalchemy/orm/attributes.py b/lib/sqlalchemy/orm/attributes.py
index ff730d7459..f12a474510 100644
--- a/lib/sqlalchemy/orm/attributes.py
+++ b/lib/sqlalchemy/orm/attributes.py
@@ -1233,6 +1233,7 @@ def backref_listeners(attribute, key, uselist):
         child_state, child_dict = instance_state(child), \
             instance_dict(child)
         child_impl = child_state.manager[key].impl
+        refcount_collection_membership = not child_impl.collection
 
         if initiator.parent_token is not parent_token and \
                 initiator.parent_token is not child_impl.parent_token:
@@ -1243,14 +1244,24 @@ def backref_listeners(attribute, key, uselist):
         check_bulk_replace_token = child_impl._bulk_replace_token \
             if child_impl.collection else None
 
+        if refcount_collection_membership:
+            if initiator.op is OP_BULK_REPLACE:
+                child_state._collection_memberships.pop(key, None)
+            membership = child_state._collection_memberships[key]
+        else:
+            membership = 0
+
         if initiator is not check_append_token and \
                 initiator is not check_bulk_replace_token:
-            child_impl.append(
-                child_state,
-                child_dict,
-                state.obj(),
-                initiator,
-                passive=PASSIVE_NO_FETCH)
+            if membership == 0:
+                if refcount_collection_membership:
+                    child_state._collection_memberships[key] += 1
+                child_impl.append(
+                    child_state,
+                    child_dict,
+                    state.obj(),
+                    initiator,
+                    passive=PASSIVE_NO_FETCH)
         return child
 
     def emit_backref_from_collection_remove_event(state, child, initiator):
@@ -1260,6 +1271,7 @@ def backref_listeners(attribute, key, uselist):
             child_state, child_dict = instance_state(child),\
                 instance_dict(child)
             child_impl = child_state.manager[key].impl
+            refcount_collection_membership = not child_impl.collection
 
             # tokens to test for a recursive loop.
             if not child_impl.collection and not child_impl.dynamic:
@@ -1270,14 +1282,25 @@ def backref_listeners(attribute, key, uselist):
                 check_replace_token = child_impl._bulk_replace_token \
                     if child_impl.collection else None
 
+            if refcount_collection_membership:
+                if initiator.op is OP_BULK_REPLACE:
+                    child_state._collection_memberships.pop(key, None)
+                membership = child_state._collection_memberships[key]
+            else:
+                membership = 0
+
             if initiator is not check_remove_token and \
                     initiator is not check_replace_token:
-                child_impl.pop(
-                    child_state,
-                    child_dict,
-                    state.obj(),
-                    initiator,
-                    passive=PASSIVE_NO_FETCH)
+
+                if membership == 0:
+                    if refcount_collection_membership:
+                        child_state._collection_memberships[key] -= 1
+                    child_impl.pop(
+                        child_state,
+                        child_dict,
+                        state.obj(),
+                        initiator,
+                        passive=PASSIVE_NO_FETCH)
 
     if uselist:
         event.listen(attribute, "append",
diff --git a/lib/sqlalchemy/orm/state.py b/lib/sqlalchemy/orm/state.py
index 944dc8177f..9c15d87121 100644
--- a/lib/sqlalchemy/orm/state.py
+++ b/lib/sqlalchemy/orm/state.py
@@ -306,6 +306,11 @@ class InstanceState(interfaces.InspectionAttrInfo):
     def _pending_mutations(self):
         return {}
 
+    @util.memoized_property
+    def _collection_memberships(self):
+        import collections
+        return collections.defaultdict(int)
+
     @util.memoized_property
     def mapper(self):
         """Return the :class:`.Mapper` used for this mapepd object."""

so that's one road

@zzzeek
Copy link
Member

zzzeek commented Dec 22, 2018

Here's tracking at the CollectionAdapter side, keeping the refcount on the InstanceState, uses weakrefs and is insanely expensive, also fails lots of tests for various reasons:

diff --git a/lib/sqlalchemy/orm/collections.py b/lib/sqlalchemy/orm/collections.py
index d6c23f5d29..a28bf753be 100644
--- a/lib/sqlalchemy/orm/collections.py
+++ b/lib/sqlalchemy/orm/collections.py
@@ -577,7 +577,8 @@ class CollectionAdapter(object):
     """
 
     __slots__ = (
-        'attr', '_key', '_data', 'owner_state', '_converter', 'invalidated')
+        'attr', '_key', '_data', 'owner_state', '_converter', 'invalidated',
+        '__weakref__')
 
     def __init__(self, attr, owner_state, data):
         self.attr = attr
@@ -1014,6 +1015,9 @@ def __set(collection, item, _sa_initiator=None):
     if _sa_initiator is not False:
         executor = collection._sa_adapter
         if executor:
+            state = base.instance_state(item)
+            state._add_collection_membership(executor)
+
             item = executor.fire_append_event(item, _sa_initiator)
     return item
 
@@ -1023,7 +1027,9 @@ def __del(collection, item, _sa_initiator=None):
     if _sa_initiator is not False:
         executor = collection._sa_adapter
         if executor:
-            executor.fire_remove_event(item, _sa_initiator)
+            state = base.instance_state(item)
+            if state._remove_collection_membership(executor):
+                executor.fire_remove_event(item, _sa_initiator)
 
 
 def __before_delete(collection, _sa_initiator=None):
diff --git a/lib/sqlalchemy/orm/state.py b/lib/sqlalchemy/orm/state.py
index 944dc8177f..b66e116581 100644
--- a/lib/sqlalchemy/orm/state.py
+++ b/lib/sqlalchemy/orm/state.py
@@ -311,6 +311,30 @@ class InstanceState(interfaces.InspectionAttrInfo):
         """Return the :class:`.Mapper` used for this mapepd object."""
         return self.manager.mapper
 
+    @util.memoized_property
+    def _collection_memberships(self):
+        return weakref.WeakKeyDictionary()
+
+    def _add_collection_membership(self, adapter):
+        if adapter in self._collection_memberships:
+            self._collection_memberships[adapter] += 1
+        else:
+            self._collection_memberships[adapter] = 0
+
+    def _remove_collection_membership(self, adapter):
+        existing = self._collection_memberships.get(adapter, 0)
+        if existing > 1:
+            self._collection_memberships[adapter] -= 1
+            return False
+        elif existing == 1:
+            del self._collection_memberships[adapter]
+            return True
+        else:
+            return False
+
+    def _get_collection_membership(self, adapter):
+        return self._collection_memberships.get(adapter, 0)
+
     @property
     def has_identity(self):
         """Return ``True`` if this object has an identity key.

@zzzeek
Copy link
Member

zzzeek commented Dec 22, 2018

next idea. Perhaps the Py3 only counter , because it uses a C function, is cheap enough that we can just do a count, however we probably can't do that since we don't assume mapped objects are hashable, which means we have to loop through them anyway which means we might as well just count the one we care about, also have to deal with expired collections and whatnot, but this is that idea:

diff --git a/lib/sqlalchemy/orm/attributes.py b/lib/sqlalchemy/orm/attributes.py
index ff730d7459..224da42b22 100644
--- a/lib/sqlalchemy/orm/attributes.py
+++ b/lib/sqlalchemy/orm/attributes.py
@@ -18,6 +18,9 @@ import operator
 from .. import util, event, inspection
 from . import interfaces, collections, exc as orm_exc
 
+import collections as py_collections
+from sqlalchemy import inspect
+
 from .base import instance_state, instance_dict, manager_of_class
 
 from .base import PASSIVE_NO_RESULT, ATTR_WAS_SET, ATTR_EMPTY, NO_VALUE,\
@@ -1270,14 +1273,28 @@ def backref_listeners(attribute, key, uselist):
                 check_replace_token = child_impl._bulk_replace_token \
                     if child_impl.collection else None
 
+            attribute, key, uselist
+            parent_impl
+
             if initiator is not check_remove_token and \
                     initiator is not check_replace_token:
-                child_impl.pop(
-                    child_state,
-                    child_dict,
-                    state.obj(),
-                    initiator,
-                    passive=PASSIVE_NO_FETCH)
+
+                if initiator.op is not OP_BULK_REPLACE \
+                        and parent_impl.collection and \
+                        not child_impl.collection:
+                    counter = py_collections.Counter(
+                        state.dict[parent_impl.key])
+                    count = counter[child]
+                else:
+                    count = 1
+
+                if count == 1:
+                    child_impl.pop(
+                        child_state,
+                        child_dict,
+                        state.obj(),
+                        initiator,
+                        passive=PASSIVE_NO_FETCH)
 
     if uselist:
         event.listen(attribute, "append",

I think trying to get some C code in here somehow that does the reference counting, either during or afterwards, might be what we have to do to make this not awful. we do after all iterate the whole list for things like get_history() and all that.

@sqla-tester
Copy link
Collaborator

Mike Bayer has proposed a fix for this issue in the master branch:

Check collection for only one item remaining before firing scalar backref remove https://gerrit.sqlalchemy.org/1054

@zzzeek
Copy link
Member

zzzeek commented Dec 23, 2018

yet another approach. it is the most brute force but might be the best way since it doesn't add complexity, memory issues, or overhead to anything except the single collection item remove use case, and among all of them is the most foolproof (I think). need to see how bad the linear scan might be in practice but when a collection is mutated, we will already be doing linear scans of it.

@zzzeek zzzeek modified the milestones: blue sky, 1.3 Dec 24, 2018
@zzzeek zzzeek added the quagmire really hard to make the issue work "correctly" without lots of complication, risk label Dec 27, 2018
@zzzeek
Copy link
Member

zzzeek commented Dec 27, 2018

pretty sure this one will be good to go but want to historically tag it as "quagmire".

sqlalchemy-bot pushed a commit that referenced this issue Dec 29, 2018
The "remove" event for collections is now called before the item is removed
in the case of the ``collection.remove()`` method, as is consistent with the
behavior for most other forms of collection item removal (such as
``__delitem__``, replacement under ``__setitem__``).  The ``pop()`` methods
are now the only exception as the target item is not available until after
the pop operation proceeds.

This allows ``remove()`` to be consistent in its behavior with all
the other collection operations, allows the "before_delete" hook
to be local to "pop()" operations only, and removes some method overhead.

We are also looking here to gain some more predictability in terms
of the fix for #1103.

Change-Id: I4fdea911517d65cc300fae0e9c351a471e52e4ab
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working orm quagmire really hard to make the issue work "correctly" without lots of complication, risk
Projects
None yet
Development

No branches or pull requests

3 participants