Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Series.argsort() fails with datetime64[ns] with NaT / dtypes are odd #2967

Closed
jreback opened this issue Mar 5, 2013 · 3 comments

Comments

@jreback
Copy link
Contributor

commented Mar 5, 2013

pretty simple fix though, just need to make sure that the result array
is typed float64 (to accomodate the nans) or int64 (if no nans)

(rather than the same as input type)
(which could give weird results in some cases), e.g. you wouldn't
want a float array back just because you fed it nans....

heres the inspiration question
http://stackoverflow.com/questions/15207279/return-sorted-indexes-skipping-nan-values-in-pandas

In [29]: s = pd.Series([pd.Timestamp('201301%02d'% (i+1)) for i in range(5)])

In [30]: s
Out[30]: 
0   2013-01-01 00:00:00
1   2013-01-02 00:00:00
2   2013-01-03 00:00:00
3   2013-01-04 00:00:00
4   2013-01-05 00:00:00
dtype: datetime64[ns]

In [31]: s.argsort()
Out[31]: 
0    0
1    1
2    2
3    3
4    4
dtype: int64

In [32]: s.shift().argsort()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-32-61a586f08c06> in <module>()
----> 1 s.shift().argsort()

/mnt/home/jreback/pandas/pandas/core/series.pyc in argsort(self, axis, kind, order)
   2119             result = values.copy()
   2120             notmask = -mask
-> 2121             result[notmask] = np.argsort(values[notmask], kind=kind)
   2122             return Series(result, index=self.index, name=self.name)
   2123         else:

TypeError: array cannot be safely cast to required type
@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Mar 5, 2013

could also put -1 in the places where NaN/NaT are....so could always return an int64 Series, but is that weird?
we do this (put -1) with idxmin/max so maybe not so weird....

@wesm @changhiskhan

?

@hayd

This comment has been minimized.

Copy link
Contributor

commented Mar 5, 2013

I don't think we got this error in 10.1:

In [6]: s.shift().argsort()
Out[6]: 
0    NaN
1      0
2      1
3      2
4      3

In [7]: s.shift().argsort().dtype
Out[7]: dtype('object')
@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Mar 6, 2013

yep...but in 0.10.1 the dtype of the series is itself wrong (its object)
so s.shift(1) 'works'

easy fix any how..

In [2]: s = pd.Series([pd.Timestamp('201301%02d'% (i+1)) for i in range(5)])

In [3]: s
Out[3]: 
0    2013-01-01 00:00:00
1    2013-01-02 00:00:00
2    2013-01-03 00:00:00
3    2013-01-04 00:00:00
4    2013-01-05 00:00:00

In [4]: s.dtype
Out[4]: dtype('object')

In [5]: s.shift(1).dtype
Out[5]: dtype('object')

In [6]: s.shift(1).argsort()
Out[6]: 
0    NaN
1      0
2      1
3      2
4      3
jreback added a commit to jreback/pandas that referenced this issue Mar 6, 2013
jreback added a commit that referenced this issue Mar 6, 2013
Merge pull request #2977 from jreback/argsort
BUG: Series.argsort failing on datetime64[ns] when NaT present, GH #2967

@jreback jreback closed this Mar 6, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.