Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
setting with enlargement fails for large DataFrames #10692
Comments
jreback
referenced
this issue
Jul 28, 2015
Closed
MultiIndex __contains__()/in operator throws an IndexError for large multiindices #10645
|
this is the same issue as in #10645 the cases for len > 1M have different handling and something is amuck. You know that you are copying the frame on enlargement right? This is extremely inefficient. |
jreback
added Bug Indexing Reshaping
labels
Jul 28, 2015
jreback
added this to the
Next Major Release
milestone
Jul 28, 2015
jreback
added Difficulty Intermediate Effort Low
labels
Jul 28, 2015
johne13
commented
Jul 28, 2015
|
@jreback What is the recommended way to do this? This exact way is mentioned in the docs and doesn't seem to be discouraged there: http://pandas.pydata.org/pandas-docs/stable/indexing.html#setting-with-enlargement |
|
what are you trying to do exactly? |
johne13
commented
Jul 28, 2015
|
I'm not trying to do anything! Or maybe you are talking to the OP? I was actually wondering the same thing, as I would generally use But FWIW, these questions do come up at stack overflow with some regularity and if they found "setting with enlargement" in the documentation this is suggested as the way to do it. (or one of the ways, anyway). And in this case what OP did was pretty much identical to the last example in the "setting with enlargement" doc. |
|
@johne13 sorry, was on my phone. So enlargement is equivalent of |
pekaalto
commented
Jul 29, 2015
|
Actually I didn't know that the df is copied with every enlargement anyway. About "what are you trying to do exactly?": I just have a huge DataFrame where I append some information when it's returned from functions etc. I probably have to do some redesigning. I guess the way to go is somehow to preallocate the rows in the main DataFrame or collecting the "stuff to be appended" in some smaller list/df first and then appending all in the end. |
kawochen
referenced
this issue
Sep 10, 2015
Merged
BUG: GH10645 and GH10692 where operation on large Index would error #11049
jreback
modified the milestone: 0.17.0, Next Major Release
Sep 10, 2015
jreback
closed this
in #11049
Sep 10, 2015
pkch
commented
Jun 19, 2016
•
|
@jreback commented on Jul 28, 2015
Jeff, I guess you didn't mean it's a "copy" of the original object in the sense of creating a brand new, unrelated object. If you just meant that a lot of data had to be copied underneath the hood, then I understand completely. Still, I'd guess it's quite different from append in that it manages to add a row in-place. (I didn't even know it's possible...)
Honestly, given the performance impact, I'm truly at a loss as to why "Setting with Enlargement" was added to the DataFrame API. |
pekaalto commentedJul 28, 2015
Setting with enlargement seems to fail for DataFrames longer than
10**6 - 110**6seems to be the exact treshold for me. That and anything bigger fails. Anything smaller works.Example:
pd.show_versions() returns: