Feature request: specify chunksize for read_sql #2908

davidstackio · 2013-02-21T00:35:13Z

It would be helpful to iterate through rows returned from an sql query (sqlite specifically) chunk by chunk just as is done in the read_csv and text files function as described here: http://pandas.pydata.org/pandas-docs/stable/io.html#iterating-through-files-chunk-by-chunk

The return value should be an iterable object. This will prevent queries from returning too large an amount of data, (possibly) exceeding the system memory.

davidstackio · 2013-02-21T00:57:29Z

The exact error I got was this on pandas version 0.10.1:

  runData = psql.read_frame("SELECT * FROM output", conn)
  File "C:\Python27\lib\site-packages\pandas\io\sql.py", line 151, in read_frame
    coerce_float=coerce_float)
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1014, in from_records
    coerce_float=coerce_float)
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 5468, in _to_arrays
    if len(data) == 0:
TypeError: object of type 'NoneType' has no len()

The TypeError is a little confusing as it took me awhile to figure out it was happening because I was hitting the memory limit. Maybe just a clearer error message would be enough (max query sized reached or something), perhaps suggesting that the user use the SQL LIMIT command to prevent this problem (See http://php.about.com/od/mysqlcommands/g/Limit_sql.htm)

davidstackio · 2013-02-21T01:01:15Z

I just ran:

runData = psql.read_frame("SELECT * FROM output LIMIT 10", conn)

with no problem

jreback · 2013-07-10T14:06:51Z

so this would need (to be consistent) iterator and chunksize keywords

lebedov · 2013-10-04T19:34:45Z

For the time being, here is a simple implementation of the requested functionality: https://gist.github.com/lebedov/6831387

jreback · 2014-03-22T21:49:35Z

@jorisvandenbossche @hayd this needs to go on the new sql issues list?

jorisvandenbossche · 2014-03-23T10:05:32Z

Hmm, I preferably keep the list in #6292 as the important todo's that should ideally be finished before releasing it. And this is a nice feature request, but not a blocker for the basic functionality. Just keep it as a seperate issue?

jreback · 2014-03-23T13:38:22Z

ok....how about you create another issue (mark as 0.15), then will include items that are not in #6292
but are marked as SQL; that way easy to move an issue out of current release to next one (and track all the SQL ones). make check boxes and such.

#3745, #5008, #2754 I think should go on one of these as well (or if already satisifed by another issue go ahead and close)

hayd · 2014-09-03T04:47:42Z

This came up again here: http://stackoverflow.com/q/25633830/1240268

mariusbutuc · 2014-09-03T14:21:09Z

I take full responsibility for asking how to pull large amounts of data from a remote server, into a DataFrame, that @hayd just referenced and answered in such good detail on SO ––for which I thank you!

I've updated the SO question with more context, but if I can help / contribute in any way here, I'd be more than happy to.

jorisvandenbossche · 2014-09-03T22:32:34Z

@mariusbutuc if you want to try to implement it and send a pull request, that would be very welcome!

I think this could be done inside the read_sql function (https://github.com/pydata/pandas/blob/master/pandas/io/sql.py#L870) using fetchmany instead of fetchall ? (would that work?)

hayd mentioned this issue Jul 8, 2013

ENH: sql support #4163

Closed

20 tasks

jreback mentioned this issue Mar 25, 2014

ENH: SQL Enhancement for the Future #6701

Closed

6 tasks

jreback modified the milestones: 0.15.0, 0.14.0 Mar 25, 2014

jorisvandenbossche changed the title ~~Feature Request: specify chunksize for sql.read_frame~~ ENH: specify chunksize for read_sql Jun 3, 2014

jorisvandenbossche changed the title ~~ENH: specify chunksize for read_sql~~ Feature request: specify chunksize for read_sql Jun 3, 2014

jorisvandenbossche mentioned this issue Sep 20, 2014

ENH: add chunksize argument to read_sql (GH2908) #8330

Merged

jreback modified the milestones: 0.15.0, 0.15.1 Sep 20, 2014

jorisvandenbossche closed this as completed in #8330 Oct 7, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: specify chunksize for read_sql #2908

Feature request: specify chunksize for read_sql #2908

davidstackio commented Feb 21, 2013

davidstackio commented Feb 21, 2013

davidstackio commented Feb 21, 2013

jreback commented Jul 10, 2013

lebedov commented Oct 4, 2013

jreback commented Mar 22, 2014

jorisvandenbossche commented Mar 23, 2014

jreback commented Mar 23, 2014

hayd commented Sep 3, 2014

mariusbutuc commented Sep 3, 2014

jorisvandenbossche commented Sep 3, 2014

Feature request: specify chunksize for read_sql #2908

Feature request: specify chunksize for read_sql #2908

Comments

davidstackio commented Feb 21, 2013

davidstackio commented Feb 21, 2013

davidstackio commented Feb 21, 2013

jreback commented Jul 10, 2013

lebedov commented Oct 4, 2013

jreback commented Mar 22, 2014

jorisvandenbossche commented Mar 23, 2014

jreback commented Mar 23, 2014

hayd commented Sep 3, 2014

mariusbutuc commented Sep 3, 2014

jorisvandenbossche commented Sep 3, 2014