New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Psycopg2 can't work with huge redshift datatables #553

Closed
Rustem opened this Issue May 9, 2017 · 4 comments

Comments

Projects
None yet
2 participants
@Rustem

Rustem commented May 9, 2017

Please refer to the discussion here sqlalchemy-redshift/sqlalchemy-redshift#116.

@dvarrazzo

This comment has been minimized.

Member

dvarrazzo commented May 9, 2017

If you use a client-side cursor psycopg will fetch itersize items. In this example you are fetching 10 records in the sql and 100 in psycopg, which may or may not different. For extra control use a named cursor and executemany(size) instead of for record in cur.

If the query is allocating excessive memory on the server is not psycopg bug. Psycopg is only opening a cursor on the server; what the server does is outside its control.

The rest of the discussion is polluted by sqlalchemy syntax: I don't know what sqlalchemy tells psycopg to do.

If you have a bug with psycopg not doing what it should with a named cursor open a self-contained bug. I don't see a bug here: I see a combination of not knowing what psycopg operation sqlalchemy does and not knowing what happens on the server.

@Rustem

This comment has been minimized.

Rustem commented May 9, 2017

Btw, please be polite with words u are using. I know how cursors should work. @dvarrazzo With postgres it works, but if i use client side cursor with redshift, psycopg start allocating everything in memory instead of providing stream (cursor) object. Could u imagine that 160GB table is allocating in memory. Either my app crashes or database server, depending on the cursor type. I wouldn't ask if the default behaviour would work. I provided snippet that uses JDBC driver for comparison. It works like a charm, but my app is based on Python stack. I tried to use server side cursor just to compare. I tried both raw driver connections and ORM based, nothing helps (only for postgres).

@dvarrazzo

This comment has been minimized.

Member

dvarrazzo commented May 9, 2017

Use cursor.query to know what is that psycopg is telling to the server. If psycopg asks the server 10 rows and it uses 160GB of ram maybe it is a problem. If you say that "either my app or the database crashes" I understand your query is wrong: psycopg cannot crash the server. However I don't want to infer anything about your environment. If you provide information about a bug we can fix (e.g. Python receives 10 rows and explodes), we can fix the bug. As it is this is not a bug we can reproduce.

@Rustem

This comment has been minimized.

Rustem commented May 9, 2017

After my investigation reading redshift documentation and testing few more choices I see that there are few options to go: jdbc driver or SQL UNLOAD. Sure it is not a psycopg bug, but it is a pitty that even simple cursor works really strange in Redshift.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment