setting with enlargement fails for large DataFrames #10692

pekaalto · 2015-07-28T20:16:54Z

Setting with enlargement seems to fail for DataFrames longer than 10**6 - 1
10**6 seems to be the exact treshold for me. That and anything bigger fails. Anything smaller works.

Example:

import pandas as pd

#works 
X = pd.DataFrame(dict(x=range(10**6-1)))
X.loc[len(X)] = 42

#doesn't work
Y = pd.DataFrame(dict(y=range(10**6)))
Y.loc[len(Y)] = 42   

>>> IndexError: index out of bounds

pd.show_versions() returns:

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: fi_FI

pandas: 0.16.2
nose: 1.3.6
Cython: 0.22
numpy: 1.9.2
scipy: 0.15.1
statsmodels: 0.6.1
IPython: 3.0.0
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.4
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.6.7
lxml: 3.4.2
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 0.9.9
pymysql: None
psycopg2: None

The text was updated successfully, but these errors were encountered:

jreback · 2015-07-28T21:40:39Z

this is the same issue as in #10645

the cases for len > 1M have different handling and something is amuck.

You know that you are copying the frame on enlargement right? This is extremely inefficient.

johne13 · 2015-07-28T22:25:35Z

@jreback What is the recommended way to do this? This exact way is mentioned in the docs and doesn't seem to be discouraged there:

http://pandas.pydata.org/pandas-docs/stable/indexing.html#setting-with-enlargement

jreback · 2015-07-28T22:29:57Z

what are you trying to do exactly?

johne13 · 2015-07-28T22:35:22Z

I'm not trying to do anything! Or maybe you are talking to the OP? I was actually wondering the same thing, as I would generally use append() for this general thing.

But FWIW, these questions do come up at stack overflow with some regularity and if they found "setting with enlargement" in the documentation this is suggested as the way to do it. (or one of the ways, anyway). And in this case what OP did was pretty much identical to the last example in the "setting with enlargement" doc.

jreback · 2015-07-28T23:02:55Z

@johne13 sorry, was on my phone.

So enlargement is equivalent of df.append(Series(..., name=key)). This creates by definition a copy. A doc note/warning would be nice here.

pekaalto · 2015-07-29T06:47:12Z

Actually I didn't know that the df is copied with every enlargement anyway.
But yeah some warning in docs would probably be nice to avoid misundersting.

About "what are you trying to do exactly?":

I just have a huge DataFrame where I append some information when it's returned from functions etc. I probably have to do some redesigning. I guess the way to go is somehow to preallocate the rows in the main DataFrame or collecting the "stuff to be appended" in some smaller list/df first and then appending all in the end.

pkch · 2016-06-19T06:01:35Z

@jreback commented on Jul 28, 2015

So enlargement is equivalent of df.append(Series(..., name=key)). This creates by definition a copy. A doc note/warning would be nice here.

Jeff, I guess you didn't mean it's a "copy" of the original object in the sense of creating a brand new, unrelated object. If you just meant that a lot of data had to be copied underneath the hood, then I understand completely.

Still, I'd guess it's quite different from append in that it manages to add a row in-place. (I didn't even know it's possible...)

df = pd.DataFrame({'a':[1]})
df1 = df
df1.loc[10] = 100
assert df is df1
assert len(df) == 2

Honestly, given the performance impact, I'm truly at a loss as to why "Setting with Enlargement" was added to the DataFrame API.

jreback mentioned this issue Jul 28, 2015

MultiIndex __contains__()/in operator throws an IndexError for large multiindices #10645

Closed

jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jul 28, 2015

jreback added this to the Next Major Release milestone Jul 28, 2015

jreback added Difficulty Intermediate labels Jul 28, 2015

scari mentioned this issue Jul 29, 2015

BUG: #10645 in using MultiIndex.__contains__ #10675

Closed

kawochen mentioned this issue Sep 10, 2015

BUG: GH10645 and GH10692 where operation on large Index would error #11049

Merged

jreback modified the milestones: 0.17.0, Next Major Release Sep 10, 2015

jreback closed this as completed in #11049 Sep 10, 2015

EvanSimpson mentioned this issue Jul 24, 2019

Temperature equilibration OsmoSystems/calibration-environment#25

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

setting with enlargement fails for large DataFrames #10692

setting with enlargement fails for large DataFrames #10692

pekaalto commented Jul 28, 2015

jreback commented Jul 28, 2015

johne13 commented Jul 28, 2015

jreback commented Jul 28, 2015

johne13 commented Jul 28, 2015

jreback commented Jul 28, 2015

pekaalto commented Jul 29, 2015

pkch commented Jun 19, 2016 •

edited

Loading

setting with enlargement fails for large DataFrames #10692

setting with enlargement fails for large DataFrames #10692

Comments

pekaalto commented Jul 28, 2015

jreback commented Jul 28, 2015

johne13 commented Jul 28, 2015

jreback commented Jul 28, 2015

johne13 commented Jul 28, 2015

jreback commented Jul 28, 2015

pekaalto commented Jul 29, 2015

pkch commented Jun 19, 2016 • edited Loading

pkch commented Jun 19, 2016 •

edited

Loading