Slow initial query on postgres backend when using enum columns #281

bananaoomarang · 2021-01-24T21:45:25Z

Python version: 3.8
Databases version: 0.4.1

I haven't quite figured out why this is happening, but I have made a minimal test case where you can see the problem in action. I am hoping someone here may be able to help me debug further what is going on.

The problem appears to be to do with the Postgres Enum column type, and only occurs on the initial call of fetch_all after calling connect. After the initial call to fetch_all all subsequent queries run at the expected speed, but the initial query has a ~2 second delay. Calling disconnect then connect again appears to reset this.

In our actual use case we only have one Enum column, but for the minimal test case it seems I need two Enum columns to trigger the delay.

Minimal test case:

import asyncio
import time
from enum import Enum as PEnum

from databases import Database

from sqlalchemy import Column, Enum, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import scoped_session, sessionmaker
from sqlalchemy.sql.sqltypes import BigInteger

DB_URL = "postgresql://user:password@localhost:5432/testdb"

Base = declarative_base()

db = Database(DB_URL)


class MyEnum1(PEnum):
    ONE = 'One'
    TWO = 'Two'


class MyEnum2(PEnum):
    THREE = 'Three'
    FOUR = 'Four'


class MyTable(Base):
    __tablename__ = "mytable"

    id = Column(BigInteger, primary_key=True)
    my_column_1 = Column(Enum(MyEnum1))
    my_column_2 = Column(Enum(MyEnum2))


my_table = MyTable.__table__


async def read_rows(calltime: str):
    query = my_table.select()

    print()
    print("Using query: ")
    print(query)
    print()

    start = time.time()
    rows = await db.fetch_all(query=query)
    print(f"{calltime} fetch_all took: {time.time() - start} seconds")

    return rows


async def async_main():
    await db.connect()
    await read_rows("first")
    await read_rows("second")
    await db.disconnect()


def main():
    engine = create_engine(DB_URL)
    session = scoped_session(sessionmaker(bind=engine))

    Base.metadata.drop_all(engine)
    Base.metadata.create_all(engine)

    loop = asyncio.get_event_loop()
    loop.run_until_complete(async_main())


if __name__ == "__main__":
    main()

When I run this I see:

scite-api-fastapi-BjBDzBrP-py3.8 > python -m testdb

Using query:
SELECT mytable.id, mytable.my_column_1, mytable.my_column_2
FROM mytable

first fetch_all took: 2.0069031715393066 seconds

Using query:
SELECT mytable.id, mytable.my_column_1, mytable.my_column_2
FROM mytable

second fetch_all took: 0.0011420249938964844 seconds

When I remove one of the Enum columns (delete the line my_column_2 = Column(Enum(MyEnum2))) I see:

Using query:
SELECT mytable.id, mytable.my_column_1
FROM mytable

first fetch_all took: 0.17796087265014648 seconds

Using query:
SELECT mytable.id, mytable.my_column_1
FROM mytable

second fetch_all took: 0.0005922317504882812 seconds

It runs much quicker!

Does anyone have an idea of what might be causing this?

The text was updated successfully, but these errors were encountered:

omBratteng · 2021-08-31T20:29:16Z

I don't think it's related to databases, as I'm encountering the same issue when using async SQLAlchemy

bluefish6 · 2022-01-27T16:36:15Z

As I observed exactly the same results, I think you might be another victim of the introspection query in asyncpg. This issue was a major one already before:
MagicStack/asyncpg#186
Due to it being rewritten, the issue was somewhat reduced, however it returned when JIT in postgres was turned on by default in 12.0 (note: it wasn't really working in postgres:12.0-alpine image due to postgres being compiled without components required by JIT, so you may be fooled when running your tests on different postgres docker images):
MagicStack/asyncpg#530

See a great article on real-life impact of this issue here: https://dev.to/xenatisch/cascade-of-doom-jit-and-how-a-postgres-update-led-to-70-failure-on-a-critical-national-service-3f2a

As far as I understand from discussions around GH and SO, a lot of people think that enabling JIT by default in postgres is a good idea, while also a lot of people think otherwise (see for example this reddit thread).

The summary of solutions I've encountered:

"don't worry, it happens only for the first query on a new connection" and make sure you connect to the database only once with pooling (if you can, maybe your test suite enforces you to create a new connection per test like mine did?)
turn off JIT in postgres configuration
wait for psycopg3, currently in development and sqlalchemy 2.0, currently in beta - maybe they won't have this issue

bananaoomarang · 2022-01-27T20:47:52Z

Very interesting thank you for the details it sounds like this can be closed as it is not a databases problem!

bananaoomarang closed this as completed Jan 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow initial query on postgres backend when using enum columns #281

Slow initial query on postgres backend when using enum columns #281

bananaoomarang commented Jan 24, 2021 •

edited

omBratteng commented Aug 31, 2021

bluefish6 commented Jan 27, 2022 •

edited

bananaoomarang commented Jan 27, 2022

Slow initial query on postgres backend when using enum columns #281

Slow initial query on postgres backend when using enum columns #281

Comments

bananaoomarang commented Jan 24, 2021 • edited

omBratteng commented Aug 31, 2021

bluefish6 commented Jan 27, 2022 • edited

bananaoomarang commented Jan 27, 2022

bananaoomarang commented Jan 24, 2021 •

edited

bluefish6 commented Jan 27, 2022 •

edited