-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Reindexing behaviour of dataframe column-assignment missing #39845
Comments
@fish-face Are you working on this issue? |
@Bhard27 I'm not confident of finding the best place in the documentation to add this, so was hoping to leave my contribution at the example and suggested wording above. But if someone can provide feedback on that, and on the wording, I might be able to produce more... |
Thanks for writing this up @fish-face! I agree that an example in https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html would be valuable since there is no current example (and this seems like an issue that might be commonly encountered). Your example seems perfect for this purpose. The same example could be added to the indexing section of the user guide if nothing like it exists. (Not sure about the |
Take, I am with a group of student developers from Allegheny College and we are looking to contribute to this issue. |
@mmarconi saw a question about this on gitter. A good starting point would be adding an example for |
Hello, I would like to work on this issue if it's not entirely finished! I noticed that it's still open. |
take |
@mroeschke is this still open? |
Location of the documentation
pandas.core.indexing.IndexingMixin.loc
pandas.DataFrame.__setitem__
Documentation problem
When assigning a
Series
throughdf[...] = ...
ordf.loc[...] =
, theSeries
' index is expanded to conform to theDataFrame
's, and then values are added according to the index:(But in contrast:
)
As far as I can tell, this is not really documented. In the case of
__setitem__
there is no API documentation at all, and one is left only with the "Selecting and Indexing Data" guide's examples. In the case of.loc
there is mention that if using aSeries
as input, "The index of the key will be aligned before masking," but this is not what we're doing here. Neither set of examples indicates the behaviour when adding a new column or part of column: the only hints I could find in the guide about setting with enlargement added a series whose index was the same as the existing index. This means that it is not clear what order the data will end up in the dataframe and whereNaN
s will be added.In the case of
.loc
in general the API documentation, although it does exist, is fairly scant. There is a link to the user guide, but personally I think this is pretty important behaviour to document in the reference.Suggested fix for documentation
__setitem__
and in particular the behaviour of reindexingSeries
..loc
to more completely describe the behaviour obtained when assigning to.loc[]
, and include at least one example of assigning to a partial column. Alternatively add this to the user guide. Perhaps something like the following, plus an example like those above:When assigning a
Series
to aDataFrame
, either via.loc
or via the[]
operator, values of theSeries
will be added to the dataframe according to their index. Values in theSeries
whose label does not appear in theDataFrame
will not be added, and labels missing from theSeries
' index will beNaN
. This also means that the order that data appears in the resultingDataFrame
could be different from the order in theSeries
.The text was updated successfully, but these errors were encountered: