Skip to content

Use batched iterators to evaluate large SQLAlchemy queries #7875

@melton-jason

Description

@melton-jason

Is your feature request related to a problem? Please describe.
Similar to #7864, by default SQLAlchemy loads all results of a query into a single buffer. This can lead to the python process allocating a very large chunk of memory that can be held for a long time.

Describe the solution you'd like
SQLAlchemy supports iterating over very large result sets with a yield_per method for queries.
Instead of allocating memory for the entire result set, SQLAlchemy will batch memory allocation into chunks to minimize the overall memory compared to allocating space for the entire result set up-front.

From the SQLAlchemy docs:

Normally, the ORM will construct ORM objects for all rows up front, assembling them into a single buffer, before passing this buffer to the Result object as a source of rows to be returned.

The purpose of the yield_per option is to change this behavior so that the ORM result set is optimized for iteration through very large result sets (> 10K rows) ... When yield_per is used, the ORM will instead batch ORM results into sub-collections and yield rows from each sub-collection individually as the Result object is iterated, so that the Python interpreter doesn’t need to declare very large areas of memory which is both time consuming and leads to excessive memory use.

https://docs.sqlalchemy.org/en/14/orm/queryguide.html#fetching-large-result-sets-with-yield-per

Metadata

Metadata

Assignees

No one assigned

    Labels

    1 - EnhancementImprovements or extensions to existing behavior4 - PerformanceIssues related to performance, concurrency, and optimizationpythonPull requests that update Python codetype:housekeepingCode cleanup and refactoring

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions