Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

execute_iter loads all rows in RAM #413

Closed
shereneshka opened this issue Jan 29, 2024 · 2 comments
Closed

execute_iter loads all rows in RAM #413

shereneshka opened this issue Jan 29, 2024 · 2 comments

Comments

@shereneshka
Copy link

Describe the bug
I'm using execute_iter to load large piece of data from db (>20M rows). I thought that due to streaming mode all chunks will be loaded sequentially, but instead of this, they are loaded simultaneously in RAM. That's why my web app is sometimes killed because of high memory consumption.

To Reproduce
I've checked it with memory profiler, like this:

@profile
def exec_query():
    print('before execution', (datetime.now() - start).total_seconds())
    for i in client.execute_iter(query, settings={'max_block_size': 1000}):
        print('in progress', (datetime.now() - start).total_seconds())

And found out that memory consumption is growing until loop starts (all growth in execute_iter operation). It means that cursor is client-side, not server-side?
image
image

Versions

  • Version of package with the problem: 0.2.5
  • ClickHouse server version: 23.8.4.69
  • Python version: 3.9.6
@xzkostyan
Copy link
Member

Hi.

Cursor is client-side. Currently protocol doesn't offer server-side cursors. The advantage of client-side cursor is in datatypes parsing. Data is parsed block by block.

@shereneshka
Copy link
Author

@xzkostyan thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants