Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial string indexing returns ndarray rather than Series. #27516

Closed
anetbnd opened this issue Jul 22, 2019 · 10 comments · Fixed by #27712

Comments

@anetbnd
Copy link

commented Jul 22, 2019

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

print(pd.__version__)

df_pass = pd.DataFrame(index=range(1,1000), columns=['A', 'B', 'C'])
df_pass.loc[:, :] = np.random.uniform(-100, 100, size=(len(df_pass.index), len(df_pass.columns)))
print(df_pass.loc[range(1,500), 'A'].sum(skipna=False)) # everything is fine here

df_fail = pd.DataFrame(index=pd.date_range('01-01-2005', '12-01-2006'), columns=['A', 'B', 'C'])
df_fail .loc[:, :] = np.random.uniform(-100, 100, size=(len(df_fail .index), len(df_fail .columns)))
print(df_fail .loc['2005', 'A'].sum(skipna=False)) # Here the type-error appears

Output:

0.25.0
-847.9947710494175
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-20-26bdb9fa00f7> in <module>()
     10 df_fail = pd.DataFrame(index=pd.date_range('01-01-2005', '12-01-2006'), columns=['A', 'B', 'C'])
     11 df_fail .loc[:, :] = np.random.uniform(-100, 100, size=(len(df_fail .index), len(df_fail .columns)))
---> 12 print(df_fail .loc['2005', 'A'].sum(skipna=False)) # Here the type-error appears
     13

Problem description

Before updating from 0.24.0 to 0.25.0 everything worked fine. I can also not see, that there was an API change here. I would expect, that the second sum, works without issues.

Expected Output

Output (something like):

0.25.0
-847.9947710494175
-451.5691327012012

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.6.5.final.0
python-bits      : 64
OS               : Windows
OS-release       : 10
machine          : AMD64
processor        : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : None.None

pandas           : 0.25.0
numpy            : 1.14.3
pytz             : 2018.4
dateutil         : 2.7.3
pip              : 19.1.1
setuptools       : 39.1.0
Cython           : 0.28.2
pytest           : 3.10.0
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : 1.0.5
lxml.etree       : 4.2.5
html5lib         : 1.0.1
pymysql          : None
psycopg2         : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2           : 2.10
IPython          : 6.4.0
pandas_datareader: None
bs4              : 4.7.1
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : 4.2.5
matplotlib       : 2.2.2
numexpr          : None
odfpy            : None
openpyxl         : 2.6.0
pandas_gbq       : None
pyarrow          : None
pytables         : None
s3fs             : None
scipy            : 1.1.0
sqlalchemy       : 1.2.12
tables           : None
xarray           : None
xlrd             : 1.1.0
xlwt             : None
xlsxwriter       : 1.0.5
@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2019

That's an indexing bug. Somehow .loc is returning an an ndarray rather than a Series.

In [27]: df = pd.DataFrame({"A": 1}, index=pd.date_range("2000", periods=100))

In [28]: df.loc['2000-01', 'A']
Out[28]:
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1])

@TomAugspurger TomAugspurger added this to the 0.25.1 milestone Jul 22, 2019

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2019

#27110 seems like the most likely candidate (cc @jbrockmendel).

IIUC, we can't treat ('2000-01', 'A') as a scalar, since it's really shorthand for the expanded indexing. Do you have time to look into this @jbrockmendel?

This may warrant an expedited 0.25.1. WDYT @jreback?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2019

likely more things would show as people actually use the new release

let’s just do a few weeks on this

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Jul 22, 2019

At first glance, I don't see how #27110 would cause this since that should affect DatetimeTZBlock but not DatetimeBlock.

There have been some other recent PRs that have tried to simplify core.indexing, maybe something got lost in there. I'll take a look.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2019

Ah sorry. I was just going release notes that sounded promising and stopped at that one.

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Jul 22, 2019

Tracking this down a bit, following Tom's example.

df.loc.__getitem__ eventually calls df._get_value('2000-01', 'A'). In 0.24.2 KeyError is raised by engine.get_value. Now we fall through following that KeyError

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Jul 22, 2019

Looks like the relevant change was #26298

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2019

Thanks @jbrockmendel. I would not have guessed that based on the name. Do you have a fix in mind?

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Jul 22, 2019

Do you have a fix in mind?

In DataFrame._get_value that PR changed the KeyError behavior to only raise for MultIIndex. That will need to raise in more cases. Not yet sure just how tight it will need to be.

@TomAugspurger TomAugspurger changed the title Pandas 0.25.0: TypeError: _sum() got an unexpected keyword argument 'skipna' Partial string indexing returns ndarray rather than Series. Aug 1, 2019

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Aug 1, 2019
TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Aug 2, 2019
jreback added a commit that referenced this issue Aug 4, 2019
@anetbnd

This comment has been minimized.

Copy link
Author

commented Aug 5, 2019

Thanks for taking care about this.

quintusdias added a commit to quintusdias/pandas_dev that referenced this issue Aug 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.