execute_iter loads all rows in RAM #413

shereneshka · 2024-01-29T10:33:05Z

Describe the bug
I'm using execute_iter to load large piece of data from db (>20M rows). I thought that due to streaming mode all chunks will be loaded sequentially, but instead of this, they are loaded simultaneously in RAM. That's why my web app is sometimes killed because of high memory consumption.

To Reproduce
I've checked it with memory profiler, like this:

@profile
def exec_query():
    print('before execution', (datetime.now() - start).total_seconds())
    for i in client.execute_iter(query, settings={'max_block_size': 1000}):
        print('in progress', (datetime.now() - start).total_seconds())

And found out that memory consumption is growing until loop starts (all growth in execute_iter operation). It means that cursor is client-side, not server-side?

Versions

Version of package with the problem: 0.2.5
ClickHouse server version: 23.8.4.69
Python version: 3.9.6

The text was updated successfully, but these errors were encountered:

xzkostyan · 2024-01-29T19:52:27Z

Hi.

Cursor is client-side. Currently protocol doesn't offer server-side cursors. The advantage of client-side cursor is in datatypes parsing. Data is parsed block by block.

shereneshka · 2024-01-31T14:27:01Z

@xzkostyan thanks!

shereneshka closed this as completed Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

execute_iter loads all rows in RAM #413

execute_iter loads all rows in RAM #413

shereneshka commented Jan 29, 2024

xzkostyan commented Jan 29, 2024

shereneshka commented Jan 31, 2024

execute_iter loads all rows in RAM #413

execute_iter loads all rows in RAM #413

Comments

shereneshka commented Jan 29, 2024

xzkostyan commented Jan 29, 2024

shereneshka commented Jan 31, 2024