# Sum of TimedeltaIndex raises TypeError: len() of unsized object #25282

Closed
opened this issue Feb 12, 2019 · 6 comments

### H-SG commented Feb 12, 2019

#### Example code

```import pandas as pd
import numpy as np

dti = pd.date_range('2018-01-01', periods=100, freq='T')
dtd = dti[1:] - dti[:-1]

print(np.sum(dtd))```

#### Problem description

Summing a TimedeltaIndex when using pandas 0.24.0 or newer with any version of numpy, or pandas 0.23.4 with any version of numpy newer than 0.15.4 fails, `np.sum(dtd)` gives the following output:

```---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-d64fe48d0fec> in <module>
6
7 print(np.mean(dtd)) # works
----> 8 print(np.sum(dtd)) # breaks

~/.local/lib/python3.6/site-packages/numpy/core/fromnumeric.py in sum(a, axis, dtype, out, keepdims, initial)
2074
2075     return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
-> 2076                           initial=initial)
2077
2078

~/.local/lib/python3.6/site-packages/numpy/core/fromnumeric.py in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
84                 return reduction(axis=axis, out=out, **passkwargs)
85
---> 86     return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
87
88

~/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py in __array_wrap__(self, result, context)
658         attrs = self._get_attributes_dict()
659         attrs = self._maybe_update_attributes(attrs)
--> 660         return Index(result, **attrs)
661

~/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py in __new__(cls, data, dtype, copy, name, fastpath, tupleize_cols, **kwargs)
301                   (dtype is not None and is_timedelta64_dtype(dtype))):
302                 from pandas.core.indexes.timedeltas import TimedeltaIndex
--> 303                 result = TimedeltaIndex(data, copy=copy, name=name, **kwargs)
304                 if dtype is not None and _o_dtype == dtype:
305                     return Index(result.to_pytimedelta(), dtype=_o_dtype)

~/.local/lib/python3.6/site-packages/pandas/core/indexes/timedeltas.py in __new__(cls, data, unit, freq, start, end, periods, closed, dtype, copy, name, verify_integrity)
250
251         # check that we are matching freqs
--> 252         if verify_integrity and len(data) > 0:
253             if freq is not None and not freq_infer:
254                 index = cls._simple_new(data, name=name)

TypeError: len() of unsized object
```

Other methods such as `np.mean()` and `np.median()` continue to function normally

#### Expected Output

`5940000000000 nanoseconds`

#### Output of `pd.show_versions()`

``````INSTALLED VERSIONS
------------------
commit: None
python: 3.6.7.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-45-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.1
pytest: None
pip: 19.0.2
setuptools: 40.8.0
Cython: None
numpy: 1.16.1
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: None
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: 4.3.1
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.2.17
pymysql: None
psycopg2: 2.7.7 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
gcsfs: None
``````
Member

### WillAyd commented Feb 14, 2019

 Can you post versions from both environments where you had the issue? Seems kind of strange that the version of numpy only matters for one of the two pandas versions you've tested

Author

### H-SG commented Feb 14, 2019

 It's the same environment, I was updating my numpy and pandas libraries when I ran into the issue, all other packages remained the same. Playing around with installing specific versions of each led to the strange combination of working versions.

Member

### jorisvandenbossche commented Feb 14, 2019

 The above code indeed shows a regression in my 0.23.4 vs master environments. I have numpy 1.13 in both of them, so I don't think this is related to numpy. It's the repr that fails, and it seems to be related to the fact that it now returns an "invalid" TimedeltaIndex, instead of a numpy.timedelta64 scalar: ``````In [8]: pd.__version__ Out[8]: '0.23.4' In [9]: res = np.sum(dtd) In [10]: type(res) Out[10]: numpy.timedelta64 `````` vs ``````In [15]: pd.__version__ Out[15]: '0.25.0.dev0+113.gb8306f19d.dirty' In [16]: res = np.sum(dtd) In [17]: type(res) Out[17]: pandas.core.indexes.timedeltas.TimedeltaIndex In [18]: res.values Out[18]: array(5940000000000, dtype='timedelta64[ns]') In [19]: res.values.ndim Out[19]: 0 ``````
Member

### jorisvandenbossche commented Feb 14, 2019

 Might be related to the fact that the TimedeltaIndex/TimedeltaArray constructor no longer fails on scalars, but produces an invalid index/array: ``````In [12]: pd.__version__ Out[12]: '0.23.4' In [13]: idx = pd.TimedeltaIndex(np.array(1, dtype='timedelta64[ns]')) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) in () ----> 1 idx = pd.TimedeltaIndex(np.array(1, dtype='timedelta64[ns]')) ~/miniconda3/lib/python3.5/site-packages/pandas/core/indexes/timedeltas.py in __new__(cls, data, unit, freq, start, end, periods, closed, dtype, copy, name, verify_integrity) 250 251 # check that we are matching freqs --> 252 if verify_integrity and len(data) > 0: 253 if freq is not None and not freq_infer: 254 index = cls._simple_new(data, name=name) TypeError: len() of unsized object `````` vs ``````In [29]: pd.__version__ Out[29]: '0.25.0.dev0+113.gb8306f19d.dirty' In [30]: idx = pd.TimedeltaIndex(np.array(1, dtype='timedelta64[ns]')) In [31]: idx.array._data Out[31]: array(1, dtype='timedelta64[ns]') `````` cc @TomAugspurger @jbrockmendel seems EA refactor related
Member

### jbrockmendel commented Feb 15, 2019

 @jorisvandenbossche I think you're right that the TimedeltaArray constructor is behaving oddly. Would `__array_ufunc__` be the place to look to figure out why `np.sum(tdi)` behaves differently from `np.sum(tdi._data)`?

### jbrockmendel added a commit to jbrockmendel/pandas that referenced this issue Feb 15, 2019

``` Fix+test pandas-dev#25282, pandas-dev#25317 ```
``` 3fa3e81 ```
referenced this issue Feb 15, 2019
Member

### jorisvandenbossche commented Feb 15, 2019

 Would `__array_ufunc__` be the place to look to figure out why `np.sum(tdi)` behaves differently from `np.sum(tdi._data)`? Long term, yes. But short term simply fixing the constructor should also fix the sum bug (due to the way how numpy deal with it), as you already did in the PR in the meantime.

### jorisvandenbossche added a commit that referenced this issue Feb 20, 2019

``` REGR: fix TimedeltaIndex sum and datetime subtraction with NaT (#25282, ```
`#25317) (#25329)`
``` def8b96 ```

### meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this issue Feb 20, 2019

``` Backport PR pandas-dev#25329: REGR: fix TimedeltaIndex sum and dateti… ```
`…me subtraction with NaT (pandas-dev#25282, pandas-dev#25317)`
``` 79d7dde ```
referenced this issue Feb 20, 2019

### jorisvandenbossche added a commit that referenced this issue Feb 20, 2019

``` Backport PR #25329: REGR: fix TimedeltaIndex sum and datetime subtrac… ```
`…tion with NaT (#25282, #25317) (#25385)`
``` 31fa6f8 ```

### thoo added a commit to thoo/pandas that referenced this issue Feb 20, 2019

``` Merge remote-tracking branch 'upstream/master' into Rt05 ```
```* upstream/master:
API: more consistent error message for MultiIndex.from_arrays (pandas-dev#25189)
edited whatsnew typo (pandas-dev#25381)
REGR: fix TimedeltaIndex sum and datetime subtraction with NaT (pandas-dev#25282, pandas-dev#25317) (pandas-dev#25329)
ENH: indexing and __getitem__ of dataframe and series accept zerodim integer np.array as int (pandas-dev#24924)
[CLN] Excel Module Cleanups (pandas-dev#25275)
Interval dtype fix (pandas-dev#25338)
BUG/ENH: Timestamp.strptime (pandas-dev#25124)
14873: test for groupby.agg coercing booleans (pandas-dev#25327)
[BUG] exception handling of MultiIndex.__contains__ too narrow (pandas-dev#25268)
9236: test for the DataFrame.groupby with MultiIndex having pd.NaT (pandas-dev#25310)
pandas-dev#23049: test for Fatal Stack Overflow stemming From Misuse of astype('category') (pandas-dev#25366)
Remove spurious MultiIndex creation in `_set_axis_name` (pandas-dev#25371)
DOC: modify typos in Contributing section (pandas-dev#25365)
DOC/BLD: fix --no-api option (pandas-dev#25209)
DOC: Correct doc mistake in combiner func (pandas-dev#25360)```
``` 15fde16 ```

### Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019

``` REGR: fix TimedeltaIndex sum and datetime subtraction with NaT (panda… ```
`…s-dev#25282, pandas-dev#25317) (pandas-dev#25329)`
``` eaf07a7 ```

### Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019

``` REGR: fix TimedeltaIndex sum and datetime subtraction with NaT (panda… ```
`…s-dev#25282, pandas-dev#25317) (pandas-dev#25329)`
``` fb4df4e ```

``` dcf7137 ```

### hksonngan pushed a commit to hksonngan/pandas that referenced this issue Mar 12, 2019

``` REGR: fix TimedeltaIndex sum and datetime subtraction with NaT (panda… ```
`…s-dev#25282, pandas-dev#25317) (pandas-dev#25329)`
``` d024260 ```

### hksonngan pushed a commit to hksonngan/pandas that referenced this issue Mar 12, 2019

``` REGR: fix TimedeltaIndex sum and datetime subtraction with NaT (panda… ```
`…s-dev#25282, pandas-dev#25317) (pandas-dev#25329)`
``` 8640aec ```

### alimcmaster1 added a commit to alimcmaster1/pandas that referenced this issue Jun 3, 2019

``` REGR: fix TimedeltaIndex sum and datetime subtraction with NaT (panda… ```
`…s-dev#25282, pandas-dev#25317) (pandas-dev#25329)`
``` 1ea5263 ```