ENH: Pandas `DataFrame.append` and `Series.append` methods should get an `inplace` kwag #14796

dragonator4 · 2016-12-04T05:50:35Z

Problem description

Currently to append to a DataFrame, the following is the approach:

df = pd.DataFrame(np.random.rand(5,3), columns=list('abc'))
df = df.append(pd.DataFrame(np.random.rand(5,3), columns=list('abc')))

append is a DataFrame or Series method, and as such should be able to modify the DataFrame or Series in place. If in place modification is not required, one may use concat or set inplace kwag to False. It will avoid an explicit assignment operation which is quite slow in Python, as we all know. Further, it will make the expected behavior similar to Python lists, and avoid questions such as these: 1, 2...

Additionally at present, append is full subset of concat, and as such it need not exist at all. Given the vast number of functions to append a DataFrame or Series to another in Pandas, it makes sense that each has it's merits and demerits. Gaining an inplace kwag will clearly distinguish append from concat, and simplify code.

I understand that this issue was raised in #2801 a long time ago. However, the conversation in that deviated from the simplification offered by the inplace kwag to performance enhancement. I (and many like me) are looking for ease of use, and not so much at performance. Also, we expect the data to fit in memory (which is a limitation even with current version of append).

Expected Code

df = pd.DataFrame(np.random.rand(5,3), columns=list('abc'))
df.append(pd.DataFrame(np.random.rand(5,3), columns=list('abc')), inplace=True)

The text was updated successfully, but these errors were encountered:

shoyer · 2016-12-04T09:37:31Z

I am opposed to this for the exact reasons discussed in #2801: it would mislead users who might expect a performance benefit.

jreback · 2016-12-04T16:15:17Z

Virtually all of pandas methods return a new object, the exception being the indexing operations. Using inplace is not idiomatic, quite unreadable and not (more) performant at all.

Closing, though if someone thinks that we should add a signature like

(...., inplace=False), and then raise a TypeError if inplace=True to give a nice error message, then we can reopen for that purpose.

In [2]: df = pd.DataFrame(np.random.rand(5,3), columns=list('abc'))
   ...: df.append(pd.DataFrame(np.random.rand(5,3), columns=list('abc')), inplace=True)
TypeError: append() got an unexpected keyword argument 'inplace'

remidebette · 2017-08-02T01:40:08Z

In the case of a namedtuple which contains a Series object, the inplace approach would be nice to have as a feature.
This would not be related in any way to the performance but would be a way to expose data to users.

Indeed, the nametuple objects are by design providing a way for writing a library and exposing it to a user allowing them to only modify it inplace.
Trying to overwrite an attribute of a namedtuple is intentionally raising AttributeError: can't set attribute so that the user does not try to affect your library. But mutable attributes are allowed.

Consider the following dummy code:

from collections import namedtuple
from pandas import Series

# ----- Library part ------
sample_schema = {
    "name": str,
    "some_info": str,
    "content": Series
}

my_data_type = namedtuple("MyDataType", sample_schema.keys())

exposed_data = my_data_type(
    name="Library data",
    some_info="Modify the content as you want",
    content=Series({"a": 0})
)


# ----- User code part ------
series_to_be_appended = Series({"b": 0})

 # This is forbidden
exposed_data.content = exposed_data.content.append(series_to_be_appended)

# This would be allowed but is not implemented in Series
exposed_data.content.append(series_to_be_appended, inplace=True)

The name and some_info attributes are string and therefore immutable. A user would not (easily) be able to affect them. But here the content can be modified as long as it is not set to a new object altogether.

I would think inplace methods are nice to have on any mutable object in general.

rtruxal · 2019-03-13T20:02:45Z

So the consensus among the maintainers is that it would be too confusing to have an append() method which actually appends?

I'd suggest removing the method from DataFrame entirely, or potentially renaming it. Someone familiar with pandas might find it confusing, but the opposite is currently true for those of us without your level of experience.

paulstapor · 2020-10-26T11:57:42Z

Agreeing here.
Never got why Pandas affords an API having its own logic rather than sharing the one of Python itself. One can get used to the fact that most pandas methods return objects rather than modifying their objects, although its counter-intuitive. (Pandas standard behavior is imho counter-intuitive for all persons that use more Python than Pandas, which should be most of the user-base). And one can get used to the fact that most Pandas methods behave as a user would expect it when passing inplace=True as argument.

Can live still with that. But not adding the possibility to specify inplace for append() and defaulting just it to False, which effectively keeps the method for all who want it but greatly helps those who need it, is something I cannot follow. Sorry.

aitikgupta · 2020-11-16T05:32:49Z

Adding a usecase:

Have a lot of csv files, with few entries in each, many of which have additional columns.
Want a combined dataframe, which should consist of the additional columns. (Land right up on pandas.DataFrame.append() docs)

Columns in other that are not in the caller are added as new columns.

Above line reassures that I landed up in the right place.

combined_dataframe = pd.DataFrame()
for dataframe in list_of_dataframes_read_from_csvs:
    combined_dataframe.append(dataframe, inplace=True)

This raised an error, checked docs, no inplace for append(), led me to this issue.

jreback closed this as completed Dec 4, 2016

jreback added Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Dec 4, 2016

jreback added this to the No action milestone Dec 4, 2016

jreback added the API Design label Dec 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Pandas `DataFrame.append` and `Series.append` methods should get an `inplace` kwag #14796

ENH: Pandas `DataFrame.append` and `Series.append` methods should get an `inplace` kwag #14796

dragonator4 commented Dec 4, 2016

shoyer commented Dec 4, 2016

jreback commented Dec 4, 2016

remidebette commented Aug 2, 2017

rtruxal commented Mar 13, 2019

paulstapor commented Oct 26, 2020 •

edited

aitikgupta commented Nov 16, 2020

ENH: Pandas DataFrame.append and Series.append methods should get an inplace kwag #14796

ENH: Pandas DataFrame.append and Series.append methods should get an inplace kwag #14796

Comments

dragonator4 commented Dec 4, 2016

Problem description

Expected Code

shoyer commented Dec 4, 2016

jreback commented Dec 4, 2016

remidebette commented Aug 2, 2017

rtruxal commented Mar 13, 2019

paulstapor commented Oct 26, 2020 • edited

aitikgupta commented Nov 16, 2020

ENH: Pandas `DataFrame.append` and `Series.append` methods should get an `inplace` kwag #14796

ENH: Pandas `DataFrame.append` and `Series.append` methods should get an `inplace` kwag #14796

paulstapor commented Oct 26, 2020 •

edited