Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

groupby.first() fails for np.datetime64 columns #1717

Closed
gerigk opened this issue Aug 1, 2012 · 2 comments
Closed

groupby.first() fails for np.datetime64 columns #1717

gerigk opened this issue Aug 1, 2012 · 2 comments
Labels
Milestone

Comments

@gerigk
Copy link

gerigk commented Aug 1, 2012

numpy 1.7 dev and pandas-0.8.2.dev_f5a74d4-py2.7-linux-x86_64

import pandas as pd
import numpy as np
df = pd.DataFrame([(3,np.datetime64('2012-07-03')),(3,np.datetime64('2012-07-04'))], columns = ['a', 'date'])
df.groupby('a').first()

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-10-56fc44df2d9e> in <module>()
      2 import numpy as np
      3 df = pd.DataFrame([(3,np.datetime64('2012-07-03')),(3,np.datetime64('2012-07-04'))], columns = ['a', 'date'])
----> 4 df.groupby('a').first()

/usr/local/lib/python2.7/dist-packages/pandas-0.8.2.dev_f5a74d4-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in f(self)
     25             return self._cython_agg_general(alias)
     26         except Exception:
---> 27             return self.aggregate(lambda x: npfunc(x, axis=self.axis))
     28 
     29     f.__doc__ = "Compute %s of group values" % name

/usr/local/lib/python2.7/dist-packages/pandas-0.8.2.dev_f5a74d4-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in aggregate(self, arg, *args, **kwargs)
   1501                 return self._python_agg_general(arg, *args, **kwargs)
   1502             else:
-> 1503                 result = self._aggregate_generic(arg, *args, **kwargs)
   1504 
   1505         if not self.as_index:

/usr/local/lib/python2.7/dist-packages/pandas-0.8.2.dev_f5a74d4-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in _aggregate_generic(self, func, *args, **kwargs)
   1564                     result[name] = data.apply(wrapper, axis=axis)
   1565 
-> 1566         return self._wrap_generic_output(result, obj)
   1567 
   1568     def _wrap_aggregated_output(self, output, names=None):

/usr/local/lib/python2.7/dist-packages/pandas-0.8.2.dev_f5a74d4-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in _wrap_generic_output(self, result, obj)
   1763             if self.axis == 0:
   1764                 result = DataFrame(result, index=obj.columns,
-> 1765                                    columns=result_index).T
   1766             else:
   1767                 result = DataFrame(result, index=obj.index,

/usr/local/lib/python2.7/dist-packages/pandas-0.8.2.dev_f5a74d4-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in __init__(self, data, index, columns, dtype, copy)
    371             mgr = self._init_mgr(data, index, columns, dtype=dtype, copy=copy)
    372         elif isinstance(data, dict):
--> 373             mgr = self._init_dict(data, index, columns, dtype=dtype)
    374         elif isinstance(data, ma.MaskedArray):
    375             mask = ma.getmaskarray(data)

/usr/local/lib/python2.7/dist-packages/pandas-0.8.2.dev_f5a74d4-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in _init_dict(self, data, index, columns, dtype)
    459 
    460         # don't force copy because getting jammed in an ndarray anyway
--> 461         homogenized = _homogenize(data, index, columns, dtype)
    462 
    463         # from BlockManager perspective

/usr/local/lib/python2.7/dist-packages/pandas-0.8.2.dev_f5a74d4-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in _homogenize(data, index, columns, dtype)
   4879 
   4880             v = _sanitize_array(v, index, dtype=dtype, copy=False,
-> 4881                                 raise_cast_failure=False)
   4882 
   4883         homogenized[k] = v

/usr/local/lib/python2.7/dist-packages/pandas-0.8.2.dev_f5a74d4-py2.7-linux-x86_64.egg/pandas/core/series.pyc in _sanitize_array(data, index, dtype, copy, raise_cast_failure)
   2724             else:
   2725                 subarr = np.empty(len(index), dtype=dtype)
-> 2726             subarr.fill(value)
   2727         else:
   2728             return subarr.item()

ValueError: Cannot convert from specific units to generic units in NumPy datetimes or timedeltas
@gerigk
Copy link
Author

gerigk commented Aug 1, 2012

A weird thing is that this succeeds

import pandas as pd
import numpy as np
df = pd.DataFrame([(3,np.datetime64('2012-07-03 00:00:00')),(3,np.datetime64('2012-07-04 00:00:00'))], columns = ['a', 'date'])
df.date = df.date.astype('M8[ns]')
print df.dtypes
df.date = pd.to_datetime(df.date).astype(object)
df.groupby('a').first()

but without the .astype(object) it does not succeed. Actually the values are pandas.lib.timestamp type but the series stays datetime64[ns].

@wesm wesm closed this as completed in 8cde377 Aug 9, 2012
@wesm
Copy link
Member

wesm commented Aug 9, 2012

fixed various bugs causing this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants