Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in lazy loading strategy for sharded setups #4228

Closed
sqlalchemy-bot opened this issue Apr 3, 2018 · 25 comments
Closed

Regression in lazy loading strategy for sharded setups #4228

sqlalchemy-bot opened this issue Apr 3, 2018 · 25 comments
Labels
bug Something isn't working orm
Milestone

Comments

@sqlalchemy-bot
Copy link
Collaborator

Migrated issue, originally created by Rishi Sharma (@rishirishi)

SQLAlchemy >= 1.2

In sharded setups, the lazy loading strategy no longer fetches instances available in the identity map; instead, emits a query.

The following test passes in 1.1 but fails in 1.2

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.horizontal_shard import ShardedSession
from sqlalchemy.testing import fixtures
from sqlalchemy.testing import eq_
from sqlalchemy import testing

class LazyLoadFromIdentityMapTest(fixtures.DeclarativeMappedTest):
    @classmethod
    def setup_classes(cls):
        Base = cls.DeclarativeBasic

        class Book(Base):
            __tablename__ = 'book'
            id = Column(Integer, primary_key=True)
            pages = relationship('Page', backref='book')

        class Page(Base):
            __tablename__ = 'page'
            id = Column(Integer, primary_key=True)
            book_id = Column(ForeignKey('book.id'))

    def test_lazy_load_from_identity_map(self):
        session = ShardedSession(
            shards={"test": testing.db},
            shard_chooser=lambda *args: 'test',
            id_chooser=lambda *args: None,
            query_chooser=lambda *args: ['test']
        )

        Book, Page = self.classes("Book", "Page")
        book = Book()
        book.pages.append(Page())

        session.add(book)
        session.commit()

        book = session.query(Book).first()
        page = session.query(Page).first()

        def go():
            eq_(page.book, book)

        # doesn't emit SQL
        self.assert_sql_count(
            testing.db,
            go,
            0
        )
@sqlalchemy-bot
Copy link
Collaborator Author

Changes by Rishi Sharma (@rishirishi):

  • edited description

3 similar comments
@sqlalchemy-bot
Copy link
Collaborator Author

Changes by Rishi Sharma (@rishirishi):

  • edited description

@sqlalchemy-bot
Copy link
Collaborator Author

Changes by Rishi Sharma (@rishirishi):

  • edited description

@sqlalchemy-bot
Copy link
Collaborator Author

Changes by Rishi Sharma (@rishirishi):

  • edited description

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

ok will confirm your patch is the right approach thanks!

@sqlalchemy-bot
Copy link
Collaborator Author

Changes by Michael Bayer (@zzzeek):

  • set milestone to "1.2.x"

@sqlalchemy-bot
Copy link
Collaborator Author

Rishi Sharma (@rishirishi) wrote:

Hi Michael, I imagine you are really busy. Is there something with the patch that needs changing? I can try to help. We are eager to upgrade to 1.2. Thanks! -Rishi

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

I've been mentally linking this to another sharding related issue but this is more of a bug than the other one which is more of a feature request, so let me take a look

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

please see if you can review / test the change in https://gerrit.sqlalchemy.org/#/c/zzzeek/sqlalchemy/+/734. thanks!

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

Refactor "get" to allow for pluggable identity token schemes

Fixed regression in 1.2 within sharded query feature where the
new "identity_token" element was not being correctly considered within
the scope of a lazy load operation, when searching the identity map
for a related many-to-one element. The new behavior will allow for
making use of the "id_chooser" in order to determine the best identity
key to retrieve from the identity map. In order to achieve this, some
refactoring of 1.2's "identity_token" approach has made some slight changes
to the implementation of ShardedQuery which should be noted for other
derivations of this class.

Change-Id: I04fa60535deec2d0cdec89f602935dfebeb9eb9d
Fixes: #4228

43f2783

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

Refactor "get" to allow for pluggable identity token schemes

Fixed regression in 1.2 within sharded query feature where the
new "identity_token" element was not being correctly considered within
the scope of a lazy load operation, when searching the identity map
for a related many-to-one element. The new behavior will allow for
making use of the "id_chooser" in order to determine the best identity
key to retrieve from the identity map. In order to achieve this, some
refactoring of 1.2's "identity_token" approach has made some slight changes
to the implementation of ShardedQuery which should be noted for other
derivations of this class.

Change-Id: I04fa60535deec2d0cdec89f602935dfebeb9eb9d
Fixes: #4228
(cherry picked from commit 43f2783)

eee88f5

@sqlalchemy-bot
Copy link
Collaborator Author

Changes by Michael Bayer (@zzzeek):

  • changed status to closed

@sqlalchemy-bot
Copy link
Collaborator Author

Rishi Sharma (@rishirishi) wrote:

Tested on 1.2.7 locally and seeing the following error:

AssertionError: SELECT ...(redacted query)... does not have a unique shard to bind to: set([])

The reason being is that we have never implemented "id_chooser" (because we don't use "get" to fetch a single record, instead apply a filter on the primary key, then use "one" or "first").

I suppose the recent change in 1.2.7 now requires "id_chooser" to be implemented?

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

yeah the horizontal shard extension assumes you've implemented all three. in your case, this is easy, the "ident" passed has the existing shard as ident[-1], just return that in a list:

def id_chooser(query, ident):
    return [ident[-1]]


@sqlalchemy-bot
Copy link
Collaborator Author

Rishi Sharma (@rishirishi) wrote:

Using your implementation in my case, the ident is, for example, [1, 1] representing a composite primary key. The shard_id is not present in that list.

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

oh it's just giving you the identity, not the whole key :) OK , are you using your own ShardedQuery subclass? I can pass the parent state privately

@sqlalchemy-bot
Copy link
Collaborator Author

Rishi Sharma (@rishirishi) wrote:

Yeah, we have subclassed ShardedQuery and passed that into sessionmaker as query_cls.

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

or you can search for that identity in the identity map locally to get it

def id_chooser(query, ident):
    target_cls = query.column_descriptions[0].['type']
    for key in query.session.identity_map.keys():
        if key[0] is target_cls and key[1] == ident:
            return key[2]

the id_chooser() function should be enhanced so that it is given context for the identity it's being asked for. or a fourth, optional callable can be added to ShardedSession that is specifically for lazy loads. The latter is possibly more doable in a point release.

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

yeah that id chooser won't really work that great, because if the object is not locally present then you don't know the shard id unless we send you more information.

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

but....it should get you to where you were before? would be fully:

def id_chooser(query, ident):
    target_cls = query.column_descriptions[0].['type']
    for key in query.session.identity_map.keys():
        if key[0] is target_cls and key[1] == ident:
            return [key[2]]
    else:
         return [<all shard ids>]

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

this is more regression from #4137, see #4243 for an API that will add the lazyload context to the query so it can be used by chooser functions

@sqlalchemy-bot
Copy link
Collaborator Author

Rishi Sharma (@rishirishi) wrote:

With a minor modification of your id_chooser we get back to where we were:

def id_chooser(query, ident):
    ident_tuple = tuple(ident)
    target_cls = query.column_descriptions[0]['type']

    for model_cls, primary_key_ident, shard_id in query.session.identity_map.keys():
        if model_cls is target_cls and primary_key_ident == ident_tuple:
            return [shard_id]

    return []

I wonder if there is a small performance hit in scanning the identity_map keys for 100s of lookups, in our case.

Your suggestion of an "optional callable can be added to ShardedSession that is specifically for lazy loads" seems reasonable. Although we are eager, the upgrade is not urgent, so please feel to take the time to do what you think makes sense.

@sqlalchemy-bot
Copy link
Collaborator Author

Rishi Sharma (@rishirishi) wrote:

Ah ok, I will follow #4243. Thanks!

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

Add unique_constraint_name to MSSQL FK reflection

Fixed bug in MSSQL reflection where when two same-named tables in different
schemas had same-named primary key constraints, foreign key constraints
referring to one of the tables would have their columns doubled, causing
errors. Pull request courtesy Sean Dunn.

Fixes: #4228
Change-Id: I7dabaaee0944e1030048826ba39fc574b0d63031
Pull-request: zzzeek/sqlalchemy#457

ca94ea8

@sqlalchemy-bot
Copy link
Collaborator Author

Michael Bayer (@zzzeek) wrote:

Add unique_constraint_name to MSSQL FK reflection

Fixed bug in MSSQL reflection where when two same-named tables in different
schemas had same-named primary key constraints, foreign key constraints
referring to one of the tables would have their columns doubled, causing
errors. Pull request courtesy Sean Dunn.

Fixes: #4228
Change-Id: I7dabaaee0944e1030048826ba39fc574b0d63031
Pull-request: zzzeek/sqlalchemy#457
(cherry picked from commit ca94ea8)

ff2a296

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working orm
Projects
None yet
Development

No branches or pull requests

1 participant