Inconsistent retrieval of DataFrame columns as Series. #33675

geppi · 2020-04-20T14:35:50Z

Retrieve a DataFrame column by attribute as a Series.
Modify an element of this series.
As expected the modification has no effect on the original DataFrame.
But it changes the Series that is retrieved from a column by attribute or with the 'loc' method.
In contrast the Series retrieved from a column using the 'iloc' method is still the original.

This is at least inconsistent. However, I would also expect to see no change of the retrieved Series under point 4.

import pandas as pd
pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.7.4.final.0
python-bits      : 64
OS               : Windows
OS-release       : 10
machine          : AMD64
processor        : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : None.None

pandas           : 1.0.3
numpy            : 1.18.1
pytz             : 2019.3
dateutil         : 2.8.1
pip              : 20.0.2
setuptools       : 40.8.0
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : 1.2.8
lxml.etree       : 4.5.0
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.10.3
IPython          : 7.11.1
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : 4.5.0
matplotlib       : 3.1.3
numexpr          : None
odfpy            : None
openpyxl         : 3.0.3
pandas_gbq       : None
pyarrow          : None
pytables         : None
pytest           : None
pyxlsb           : None
s3fs             : None
scipy            : None
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
xlsxwriter       : 1.2.8
numba            : None

df = pd.DataFrame([[1,2], [3,4]], index= ['a', 'b'], columns=['A', 'B'])
df

	A	B
a	1	2
b	3	4

s = df.A
s

a    1
b    3
Name: A, dtype: int64

s['c'] = 5
s

a    1
b    3
c    5
Name: A, dtype: int64

df

	A	B
a	1	2
b	3	4

df.A

a    1
b    3
c    5
Name: A, dtype: int64

df.loc[:,'A']

a    1
b    3
c    5
Name: A, dtype: int64

df.iloc[:,0]

a    1
b    3
Name: A, dtype: int64

s['b'] = 2

a    1
b    2
c    5
Name: A, dtype: int64

df

	A	B
a	1	2
b	3	4

df.A

a    1
b    2
c    5
Name: A, dtype: int64

df.loc[:,'A']

a    1
b    2
c    5
Name: A, dtype: int64

df.iloc[:,0]

a    1
b    3
Name: A, dtype: int64

The text was updated successfully, but these errors were encountered:

jbrockmendel · 2020-04-22T00:18:05Z

I think the underlying issue here is that we have _getitem_cache for label-based lookups but no analogous _igetitem_cache. So either making _igetitem_cache or getting rid of _getitem_cache should do the trick.

geppi · 2020-04-22T06:43:10Z

That boils down to the question if the retrieval of a DataFrame column or row should return a copy or a reference. I'm relatively new to pandas and therefore might have a rather naive look on this but I would expect a copy and my gut feeling tells me that a copy would have less side effects.
It's a little bit strange when the retrieval of a column delivers something different from the column as shown by the __repr__ method of the DataFrame.
Also assignment to an element of a new row in the DataFrame does extend the DataFrame and create NaN values for the other elements of the new row. In the context of my simple example above:

df.loc['c','A'] = 7
df

	A	B
a	1.0	2.0
b	3.0	4.0
c	7.0	NaN

If the Series 's' in my example would be a reference to the DataFrame column I would expect the same to happen when I add an element to the Series 's'.
Currently it behaves like a copy of the DataFrame column and the DataFrame is not extended.

dsaxton added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Apr 20, 2020

jbrockmendel mentioned this issue Sep 1, 2020

BUG: frame._item_cache not cleared when Series is altered #36051

Merged

5 tasks

jreback added this to the 1.1.2 milestone Sep 1, 2020

jreback closed this as completed in #36051 Sep 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent retrieval of DataFrame columns as Series. #33675

Inconsistent retrieval of DataFrame columns as Series. #33675

geppi commented Apr 20, 2020

jbrockmendel commented Apr 22, 2020

geppi commented Apr 22, 2020

Inconsistent retrieval of DataFrame columns as Series. #33675

Inconsistent retrieval of DataFrame columns as Series. #33675

Comments

geppi commented Apr 20, 2020

jbrockmendel commented Apr 22, 2020

geppi commented Apr 22, 2020