Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow initial query on postgres backend when using enum columns #281

Closed
bananaoomarang opened this issue Jan 24, 2021 · 3 comments
Closed

Comments

@bananaoomarang
Copy link

bananaoomarang commented Jan 24, 2021

Python version: 3.8
Databases version: 0.4.1

I haven't quite figured out why this is happening, but I have made a minimal test case where you can see the problem in action. I am hoping someone here may be able to help me debug further what is going on.

The problem appears to be to do with the Postgres Enum column type, and only occurs on the initial call of fetch_all after calling connect. After the initial call to fetch_all all subsequent queries run at the expected speed, but the initial query has a ~2 second delay. Calling disconnect then connect again appears to reset this.

In our actual use case we only have one Enum column, but for the minimal test case it seems I need two Enum columns to trigger the delay.

Minimal test case:

import asyncio
import time
from enum import Enum as PEnum

from databases import Database

from sqlalchemy import Column, Enum, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import scoped_session, sessionmaker
from sqlalchemy.sql.sqltypes import BigInteger

DB_URL = "postgresql://user:password@localhost:5432/testdb"

Base = declarative_base()

db = Database(DB_URL)


class MyEnum1(PEnum):
    ONE = 'One'
    TWO = 'Two'


class MyEnum2(PEnum):
    THREE = 'Three'
    FOUR = 'Four'


class MyTable(Base):
    __tablename__ = "mytable"

    id = Column(BigInteger, primary_key=True)
    my_column_1 = Column(Enum(MyEnum1))
    my_column_2 = Column(Enum(MyEnum2))


my_table = MyTable.__table__


async def read_rows(calltime: str):
    query = my_table.select()

    print()
    print("Using query: ")
    print(query)
    print()

    start = time.time()
    rows = await db.fetch_all(query=query)
    print(f"{calltime} fetch_all took: {time.time() - start} seconds")

    return rows


async def async_main():
    await db.connect()
    await read_rows("first")
    await read_rows("second")
    await db.disconnect()


def main():
    engine = create_engine(DB_URL)
    session = scoped_session(sessionmaker(bind=engine))

    Base.metadata.drop_all(engine)
    Base.metadata.create_all(engine)

    loop = asyncio.get_event_loop()
    loop.run_until_complete(async_main())


if __name__ == "__main__":
    main()

When I run this I see:

scite-api-fastapi-BjBDzBrP-py3.8 > python -m testdb

Using query:
SELECT mytable.id, mytable.my_column_1, mytable.my_column_2
FROM mytable

first fetch_all took: 2.0069031715393066 seconds

Using query:
SELECT mytable.id, mytable.my_column_1, mytable.my_column_2
FROM mytable

second fetch_all took: 0.0011420249938964844 seconds

When I remove one of the Enum columns (delete the line my_column_2 = Column(Enum(MyEnum2))) I see:

Using query:
SELECT mytable.id, mytable.my_column_1
FROM mytable

first fetch_all took: 0.17796087265014648 seconds

Using query:
SELECT mytable.id, mytable.my_column_1
FROM mytable

second fetch_all took: 0.0005922317504882812 seconds

It runs much quicker!

Does anyone have an idea of what might be causing this?

@omBratteng
Copy link

I don't think it's related to databases, as I'm encountering the same issue when using async SQLAlchemy

@bluefish6
Copy link

bluefish6 commented Jan 27, 2022

As I observed exactly the same results, I think you might be another victim of the introspection query in asyncpg. This issue was a major one already before:
MagicStack/asyncpg#186
Due to it being rewritten, the issue was somewhat reduced, however it returned when JIT in postgres was turned on by default in 12.0 (note: it wasn't really working in postgres:12.0-alpine image due to postgres being compiled without components required by JIT, so you may be fooled when running your tests on different postgres docker images):
MagicStack/asyncpg#530

See a great article on real-life impact of this issue here: https://dev.to/xenatisch/cascade-of-doom-jit-and-how-a-postgres-update-led-to-70-failure-on-a-critical-national-service-3f2a

As far as I understand from discussions around GH and SO, a lot of people think that enabling JIT by default in postgres is a good idea, while also a lot of people think otherwise (see for example this reddit thread).

The summary of solutions I've encountered:

@bananaoomarang
Copy link
Author

Very interesting thank you for the details it sounds like this can be closed as it is not a databases problem!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants