Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent retrieval of DataFrame columns as Series. #33675

Closed
geppi opened this issue Apr 20, 2020 · 2 comments · Fixed by #36051
Closed

Inconsistent retrieval of DataFrame columns as Series. #33675

geppi opened this issue Apr 20, 2020 · 2 comments · Fixed by #36051
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@geppi
Copy link

geppi commented Apr 20, 2020

  1. Retrieve a DataFrame column by attribute as a Series.
  2. Modify an element of this series.
  3. As expected the modification has no effect on the original DataFrame.
  4. But it changes the Series that is retrieved from a column by attribute or with the 'loc' method.
  5. In contrast the Series retrieved from a column using the 'iloc' method is still the original.

This is at least inconsistent. However, I would also expect to see no change of the retrieved Series under point 4.

import pandas as pd
pd.show_versions()
INSTALLED VERSIONS
------------------
commit           : None
python           : 3.7.4.final.0
python-bits      : 64
OS               : Windows
OS-release       : 10
machine          : AMD64
processor        : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : None.None

pandas           : 1.0.3
numpy            : 1.18.1
pytz             : 2019.3
dateutil         : 2.8.1
pip              : 20.0.2
setuptools       : 40.8.0
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : 1.2.8
lxml.etree       : 4.5.0
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.10.3
IPython          : 7.11.1
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : 4.5.0
matplotlib       : 3.1.3
numexpr          : None
odfpy            : None
openpyxl         : 3.0.3
pandas_gbq       : None
pyarrow          : None
pytables         : None
pytest           : None
pyxlsb           : None
s3fs             : None
scipy            : None
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
xlsxwriter       : 1.2.8
numba            : None
df = pd.DataFrame([[1,2], [3,4]], index= ['a', 'b'], columns=['A', 'B'])
df
A B
a 1 2
b 3 4
s = df.A
s
a    1
b    3
Name: A, dtype: int64
s['c'] = 5
s
a    1
b    3
c    5
Name: A, dtype: int64
df
A B
a 1 2
b 3 4
df.A
a    1
b    3
c    5
Name: A, dtype: int64
df.loc[:,'A']
a    1
b    3
c    5
Name: A, dtype: int64
df.iloc[:,0]
a    1
b    3
Name: A, dtype: int64
s['b'] = 2
s
a    1
b    2
c    5
Name: A, dtype: int64
df
A B
a 1 2
b 3 4
df.A
a    1
b    2
c    5
Name: A, dtype: int64
df.loc[:,'A']
a    1
b    2
c    5
Name: A, dtype: int64
df.iloc[:,0]
a    1
b    3
Name: A, dtype: int64
@dsaxton dsaxton added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Apr 20, 2020
@jbrockmendel
Copy link
Member

I think the underlying issue here is that we have _getitem_cache for label-based lookups but no analogous _igetitem_cache. So either making _igetitem_cache or getting rid of _getitem_cache should do the trick.

@geppi
Copy link
Author

geppi commented Apr 22, 2020

That boils down to the question if the retrieval of a DataFrame column or row should return a copy or a reference. I'm relatively new to pandas and therefore might have a rather naive look on this but I would expect a copy and my gut feeling tells me that a copy would have less side effects.
It's a little bit strange when the retrieval of a column delivers something different from the column as shown by the __repr__ method of the DataFrame.
Also assignment to an element of a new row in the DataFrame does extend the DataFrame and create NaN values for the other elements of the new row. In the context of my simple example above:

df.loc['c','A'] = 7
df
A B
a 1.0 2.0
b 3.0 4.0
c 7.0 NaN

If the Series 's' in my example would be a reference to the DataFrame column I would expect the same to happen when I add an element to the Series 's'.
Currently it behaves like a copy of the DataFrame column and the DataFrame is not extended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants