Partial string indexing returns ndarray rather than Series. #27516

anetbnd · 2019-07-22T10:50:58Z

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

print(pd.__version__)

df_pass = pd.DataFrame(index=range(1,1000), columns=['A', 'B', 'C'])
df_pass.loc[:, :] = np.random.uniform(-100, 100, size=(len(df_pass.index), len(df_pass.columns)))
print(df_pass.loc[range(1,500), 'A'].sum(skipna=False)) # everything is fine here

df_fail = pd.DataFrame(index=pd.date_range('01-01-2005', '12-01-2006'), columns=['A', 'B', 'C'])
df_fail .loc[:, :] = np.random.uniform(-100, 100, size=(len(df_fail .index), len(df_fail .columns)))
print(df_fail .loc['2005', 'A'].sum(skipna=False)) # Here the type-error appears

Output:

0.25.0
-847.9947710494175
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-20-26bdb9fa00f7> in <module>()
     10 df_fail = pd.DataFrame(index=pd.date_range('01-01-2005', '12-01-2006'), columns=['A', 'B', 'C'])
     11 df_fail .loc[:, :] = np.random.uniform(-100, 100, size=(len(df_fail .index), len(df_fail .columns)))
---> 12 print(df_fail .loc['2005', 'A'].sum(skipna=False)) # Here the type-error appears
     13

Problem description

Before updating from 0.24.0 to 0.25.0 everything worked fine. I can also not see, that there was an API change here. I would expect, that the second sum, works without issues.

Expected Output

Output (something like):

0.25.0
-847.9947710494175
-451.5691327012012

Output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.6.5.final.0
python-bits      : 64
OS               : Windows
OS-release       : 10
machine          : AMD64
processor        : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : None.None

pandas           : 0.25.0
numpy            : 1.14.3
pytz             : 2018.4
dateutil         : 2.7.3
pip              : 19.1.1
setuptools       : 39.1.0
Cython           : 0.28.2
pytest           : 3.10.0
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : 1.0.5
lxml.etree       : 4.2.5
html5lib         : 1.0.1
pymysql          : None
psycopg2         : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2           : 2.10
IPython          : 6.4.0
pandas_datareader: None
bs4              : 4.7.1
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : 4.2.5
matplotlib       : 2.2.2
numexpr          : None
odfpy            : None
openpyxl         : 2.6.0
pandas_gbq       : None
pyarrow          : None
pytables         : None
s3fs             : None
scipy            : 1.1.0
sqlalchemy       : 1.2.12
tables           : None
xarray           : None
xlrd             : 1.1.0
xlwt             : None
xlsxwriter       : 1.0.5

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2019-07-22T13:06:56Z

That's an indexing bug. Somehow .loc is returning an an ndarray rather than a Series.

In [27]: df = pd.DataFrame({"A": 1}, index=pd.date_range("2000", periods=100))

In [28]: df.loc['2000-01', 'A']
Out[28]:
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1])

TomAugspurger · 2019-07-22T13:44:08Z

#27110 seems like the most likely candidate (cc @jbrockmendel).

IIUC, we can't treat ('2000-01', 'A') as a scalar, since it's really shorthand for the expanded indexing. Do you have time to look into this @jbrockmendel?

This may warrant an expedited 0.25.1. WDYT @jreback?

jreback · 2019-07-22T14:00:23Z

likely more things would show as people actually use the new release

let’s just do a few weeks on this

jbrockmendel · 2019-07-22T17:44:19Z

At first glance, I don't see how #27110 would cause this since that should affect DatetimeTZBlock but not DatetimeBlock.

There have been some other recent PRs that have tried to simplify core.indexing, maybe something got lost in there. I'll take a look.

TomAugspurger · 2019-07-22T18:16:55Z

Ah sorry. I was just going release notes that sounded promising and stopped at that one.

jbrockmendel · 2019-07-22T19:53:50Z

Tracking this down a bit, following Tom's example.

df.loc.__getitem__ eventually calls df._get_value('2000-01', 'A'). In 0.24.2 KeyError is raised by engine.get_value. Now we fall through following that KeyError

jbrockmendel · 2019-07-22T21:01:40Z

Looks like the relevant change was #26298

TomAugspurger · 2019-07-22T21:09:12Z

Thanks @jbrockmendel. I would not have guessed that based on the name. Do you have a fix in mind?

jbrockmendel · 2019-07-22T21:38:28Z

Do you have a fix in mind?

In DataFrame._get_value that PR changed the KeyError behavior to only raise for MultIIndex. That will need to raise in more cases. Not yet sure just how tight it will need to be.

Closes pandas-dev#27516

Closes #27516

anetbnd · 2019-08-05T06:40:49Z

Thanks for taking care about this.

Closes pandas-dev#27516

TomAugspurger added this to the 0.25.1 milestone Jul 22, 2019

TomAugspurger added the Indexing Related to indexing on series/frames, not to indexes themselves label Jul 22, 2019

TomAugspurger added Regression Functionality that used to work in a prior pandas version Timeseries labels Jul 22, 2019

TomAugspurger changed the title ~~Pandas 0.25.0: TypeError: _sum() got an unexpected keyword argument 'skipna'~~ Partial string indexing returns ndarray rather than Series. Aug 1, 2019

TomAugspurger mentioned this issue Aug 1, 2019

unexpected numpy array returned when using .loc #27695

Closed

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Aug 1, 2019

BUG: partial string indexing with scalar

f40bf4d

Closes pandas-dev#27516

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Aug 2, 2019

BUG: partial string indexing with scalar

95ecf62

Closes pandas-dev#27516

TomAugspurger mentioned this issue Aug 2, 2019

BUG: partial string indexing with scalar #27712

Merged

jreback closed this as completed in #27712 Aug 4, 2019

jreback pushed a commit that referenced this issue Aug 4, 2019

BUG: partial string indexing with scalar (#27712)

2263982

Closes #27516

quintusdias pushed a commit to quintusdias/pandas_dev that referenced this issue Aug 16, 2019

BUG: partial string indexing with scalar (pandas-dev#27712)

6c06f66

Closes pandas-dev#27516

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partial string indexing returns ndarray rather than Series. #27516

Partial string indexing returns ndarray rather than Series. #27516

anetbnd commented Jul 22, 2019

TomAugspurger commented Jul 22, 2019

TomAugspurger commented Jul 22, 2019

jreback commented Jul 22, 2019

jbrockmendel commented Jul 22, 2019

TomAugspurger commented Jul 22, 2019

jbrockmendel commented Jul 22, 2019

jbrockmendel commented Jul 22, 2019

TomAugspurger commented Jul 22, 2019

jbrockmendel commented Jul 22, 2019

anetbnd commented Aug 5, 2019

Partial string indexing returns ndarray rather than Series. #27516

Partial string indexing returns ndarray rather than Series. #27516

Comments

anetbnd commented Jul 22, 2019

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

TomAugspurger commented Jul 22, 2019

TomAugspurger commented Jul 22, 2019

jreback commented Jul 22, 2019

jbrockmendel commented Jul 22, 2019

TomAugspurger commented Jul 22, 2019

jbrockmendel commented Jul 22, 2019

jbrockmendel commented Jul 22, 2019

TomAugspurger commented Jul 22, 2019

jbrockmendel commented Jul 22, 2019

anetbnd commented Aug 5, 2019

Output of `pd.show_versions()`