Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xs is filling nan in index with its last item, as if sorted ascending, in the resulting index #6574

Closed
leungwk opened this issue Mar 7, 2014 · 2 comments · Fixed by #6579

Comments

@leungwk
Copy link

leungwk commented Mar 7, 2014

Illustration:

acc = [
    ('a','abcde',1),
    ('b','bbcde',2),
    ('y','yzcde',25),
    ('z','xbcde',24),
    ('z',None,26),
    ('z','zbcde',25),
    ('z','ybcde',26),
]
df1 = pd.DataFrame(acc, columns=['a1','a2','cnt']).set_index(['a1','a2'])
In [476]: df1
Out[476]: 
          cnt
a1 a2        
a  abcde    1
b  bbcde    2
y  yzcde   25
z  xbcde   24
   NaN     26
   zbcde   25
   ybcde   26

[7 rows x 1 columns]

In [477]: df1.xs('z',level='a1')
Out[477]: 
       cnt
a2        
xbcde   24
zbcde   26
zbcde   25
ybcde   26

[4 rows x 1 columns]

I was expecting:

       cnt
a2        
xbcde   24
NaN     26
zbcde   25
ybcde   26

because I thought it would preserve the index of df1.

Sorting explicitly doesn't seem to affect the result:

In [478]: df1.sort('cnt',ascending=False)
Out[478]: 
          cnt
a1 a2        
z  ybcde   26
   NaN     26
   zbcde   25
y  yzcde   25
z  xbcde   24
b  bbcde    2
a  abcde    1

[7 rows x 1 columns]

In [479]: df1.sort('cnt',ascending=False).xs('z',level='a1')
Out[479]: 
       cnt
a2        
ybcde   26
zbcde   26
zbcde   25
xbcde   24

[4 rows x 1 columns]

It might be related to forward filling, but then I think it would be:

       cnt
a2        
ybcde   26
ybcde   26
zbcde   25
xbcde   24

which still isn't what I was expecting.

Versions and dependencies:

In [480]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.2.final.0
python-bits: 64
OS: Darwin
OS-release: 12.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8

pandas: 0.13.1
Cython: 0.17.2
numpy: 1.6.2
scipy: 0.13.3
statsmodels: 0.5.0
IPython: 1.1.0
sphinx: 1.2
patsy: 0.2.1
scikits.timeseries: None
dateutil: 1.5
pytz: 2012h
bottleneck: None
tables: 3.0.0
numexpr: 2.0.1
matplotlib: 1.3.1
openpyxl: None
xlrd: 0.8.0
xlwt: None
xlsxwriter: None
sqlalchemy: None
lxml: None
bs4: None
html5lib: None
bq: None
apiclient: None
@jreback
Copy link
Contributor

jreback commented Mar 7, 2014

Here's some other ways to get at what you want
(this may be a bug, as having NaN in an index is in general odd, so maybe some code
to 'deal' with that)

In [3]: df1.xs('z',level='a1',drop_level=False)
Out[3]: 
          cnt
a1 a2        
z  xbcde   24
   NaN     26
   zbcde   25
   ybcde   26

[4 rows x 1 columns]

In [4]: df1.loc[['z']]
Out[4]: 
          cnt
a1 a2        
z  xbcde   24
   NaN     26
   zbcde   25
   ybcde   26

[4 rows x 1 columns]

In [5]: df1.loc['z']
Out[5]: 
       cnt
a2        
xbcde   24
zbcde   26
zbcde   25
ybcde   26

[4 rows x 1 columns]

@jreback
Copy link
Contributor

jreback commented Mar 9, 2014

thanks for the report; fixed in master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants