groupby.first() fails for np.datetime64 columns #1717

gerigk · 2012-08-01T10:52:31Z

numpy 1.7 dev and pandas-0.8.2.dev_f5a74d4-py2.7-linux-x86_64

import pandas as pd
import numpy as np
df = pd.DataFrame([(3,np.datetime64('2012-07-03')),(3,np.datetime64('2012-07-04'))], columns = ['a', 'date'])
df.groupby('a').first()

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-10-56fc44df2d9e> in <module>()
      2 import numpy as np
      3 df = pd.DataFrame([(3,np.datetime64('2012-07-03')),(3,np.datetime64('2012-07-04'))], columns = ['a', 'date'])
----> 4 df.groupby('a').first()

/usr/local/lib/python2.7/dist-packages/pandas-0.8.2.dev_f5a74d4-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in f(self)
     25             return self._cython_agg_general(alias)
     26         except Exception:
---> 27             return self.aggregate(lambda x: npfunc(x, axis=self.axis))
     28 
     29     f.__doc__ = "Compute %s of group values" % name

/usr/local/lib/python2.7/dist-packages/pandas-0.8.2.dev_f5a74d4-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in aggregate(self, arg, *args, **kwargs)
   1501                 return self._python_agg_general(arg, *args, **kwargs)
   1502             else:
-> 1503                 result = self._aggregate_generic(arg, *args, **kwargs)
   1504 
   1505         if not self.as_index:

/usr/local/lib/python2.7/dist-packages/pandas-0.8.2.dev_f5a74d4-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in _aggregate_generic(self, func, *args, **kwargs)
   1564                     result[name] = data.apply(wrapper, axis=axis)
   1565 
-> 1566         return self._wrap_generic_output(result, obj)
   1567 
   1568     def _wrap_aggregated_output(self, output, names=None):

/usr/local/lib/python2.7/dist-packages/pandas-0.8.2.dev_f5a74d4-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in _wrap_generic_output(self, result, obj)
   1763             if self.axis == 0:
   1764                 result = DataFrame(result, index=obj.columns,
-> 1765                                    columns=result_index).T
   1766             else:
   1767                 result = DataFrame(result, index=obj.index,

/usr/local/lib/python2.7/dist-packages/pandas-0.8.2.dev_f5a74d4-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in __init__(self, data, index, columns, dtype, copy)
    371             mgr = self._init_mgr(data, index, columns, dtype=dtype, copy=copy)
    372         elif isinstance(data, dict):
--> 373             mgr = self._init_dict(data, index, columns, dtype=dtype)
    374         elif isinstance(data, ma.MaskedArray):
    375             mask = ma.getmaskarray(data)

/usr/local/lib/python2.7/dist-packages/pandas-0.8.2.dev_f5a74d4-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in _init_dict(self, data, index, columns, dtype)
    459 
    460         # don't force copy because getting jammed in an ndarray anyway
--> 461         homogenized = _homogenize(data, index, columns, dtype)
    462 
    463         # from BlockManager perspective

/usr/local/lib/python2.7/dist-packages/pandas-0.8.2.dev_f5a74d4-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in _homogenize(data, index, columns, dtype)
   4879 
   4880             v = _sanitize_array(v, index, dtype=dtype, copy=False,
-> 4881                                 raise_cast_failure=False)
   4882 
   4883         homogenized[k] = v

/usr/local/lib/python2.7/dist-packages/pandas-0.8.2.dev_f5a74d4-py2.7-linux-x86_64.egg/pandas/core/series.pyc in _sanitize_array(data, index, dtype, copy, raise_cast_failure)
   2724             else:
   2725                 subarr = np.empty(len(index), dtype=dtype)
-> 2726             subarr.fill(value)
   2727         else:
   2728             return subarr.item()

ValueError: Cannot convert from specific units to generic units in NumPy datetimes or timedeltas

The text was updated successfully, but these errors were encountered:

gerigk · 2012-08-01T11:08:02Z

A weird thing is that this succeeds

import pandas as pd
import numpy as np
df = pd.DataFrame([(3,np.datetime64('2012-07-03 00:00:00')),(3,np.datetime64('2012-07-04 00:00:00'))], columns = ['a', 'date'])
df.date = df.date.astype('M8[ns]')
print df.dtypes
df.date = pd.to_datetime(df.date).astype(object)
df.groupby('a').first()

but without the .astype(object) it does not succeed. Actually the values are pandas.lib.timestamp type but the series stays datetime64[ns].

wesm · 2012-08-09T03:14:33Z

fixed various bugs causing this

wesm closed this as completed in 8cde377 Aug 9, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

groupby.first() fails for np.datetime64 columns #1717

groupby.first() fails for np.datetime64 columns #1717

gerigk commented Aug 1, 2012

gerigk commented Aug 1, 2012

wesm commented Aug 9, 2012

groupby.first() fails for np.datetime64 columns #1717

groupby.first() fails for np.datetime64 columns #1717

Comments

gerigk commented Aug 1, 2012

gerigk commented Aug 1, 2012

wesm commented Aug 9, 2012