Skip to content

Loading…

BUG/API: Fix operating with timedelta64/pd.offsets on rhs of a datelike series/index #4534

Merged
merged 4 commits into from

3 participants

@jreback

closes #4532
closes #4134
closes #4135
closes #4521

Timedeltas can be converted to other ‘frequencies’ by dividing by another timedelta.

In [210]: td = Series(date_range('20130101',periods=4))-Series(date_range('20121201',periods=4))

In [211]: td[2] += np.timedelta64(timedelta(minutes=5,seconds=3))

In [212]: td[3] = np.nan

In [213]: td

0   31 days, 00:00:00
1   31 days, 00:00:00
2   31 days, 00:05:03
3                 NaT
dtype: timedelta64[ns]

to days

In [214]: td / np.timedelta64(1,'D')

0    31.000000
1    31.000000
2    31.003507
3          NaN
dtype: float64

to seconds

In [215]: td / np.timedelta64(1,'s')

0    2678400
1    2678400
2    2678703
3        NaN
dtype: float64

Dividing or multiplying a timedelta64[ns] Series by an integer or integer Series yields a float64 dtyped Series.

In [216]: td * -1

0   -31 days, 00:00:00
1   -31 days, 00:00:00
2   -31 days, 00:05:03
3                  NaT
dtype: timedelta64[ns]

In [217]: td * Series([1,2,3,4])

0   31 days, 00:00:00
1   62 days, 00:00:00
2   93 days, 00:15:09
3                 NaT
dtype: timedelta64[ns]
jreback added some commits
@jreback jreback BUG: (GH4532) Fix bug in having a rhs of np.timedelta64 or np.offsets…
….DateOffset when operating

     with datetimes
6f8550a
@jreback jreback BUG: (GH4134) Fix arithmetic with series/datetimeindex and np.timedel…
…ta64 not working the same

TST: add in @cpcloud tests related to GH4135
4d2d571
@jreback jreback ENH: GH4521 A Series of dtype timedelta64[ns] can now be divided/mult…
…ipled by an integer series
18bb625
@jreback jreback DOC/ENH: timedelta conversions & docs
TST: add tests/catch for non-absolute DateOffsets in timedelta operations
b7e80a5
@jreback

@cpcloud any more thoughts on this?

was thinking all of the timedelta/date arithmetic out of core/series.py, maybe to core/ops.py? (just to make code cleaner)

cc @jtratner, IIRC you are creating an core/ops.py for something?

@jreback

bombs away

@jreback jreback merged commit 9fc8636 into pydata:master
@cpcloud
Python for Data member

:+1:

@jtratner
Python for Data member

@jreback which of these should be allowed with rops? E.g., clearly okay to do __radd__ and __rmul__ since those are transitive. Should rdiv work? What about rsub?

@jreback

I think the reversals are somewhat handled (e.g. lhs of int and rhs of td is ok when doing mul)
radd ok always I think, rsub I think is never ok (except td and td is ok, but datetime and td are NOT ok)
div I think order IS important (e.g. integer and td is not allowed, but td and integer is)

@jtratner
Python for Data member

@jreback so the problem is that, with how the arithmetic functions (and r* funcs in general) are set up:

t1 - t2
t2 - t1

get the same call signature for self, other in wrapper, and you only know what's going on because of the name variable that's passed along. So, would it work to just do:

# earlier
if r_op:
    external_op = op
    op = lambda x, y: external_op(y, x)
if r_op:
   rvalues, lvalues = self, other
else:
   lvalues, rvalues = self, other

and then flip the ops when they actually get passed to the operation (since all r ops can't currently be passed well through core.expressions.evaluate). This ends up with the following chain of operations [specific to series, because it's the only one that actually cares about left and right values]:

#r_op passed into arith_method
r_op = lambda x, y: operator.add(y, x)
internal_op = lambda a, b: r_op(b, a)
# inside method
rvalues, lvalues = self, other
# finally at end, gets called
internal_op(lvalues, rvalues) --> internal_op(other, self) --> r_op(self, other) --> operator.add(other, self)

That should make all the checks work for it, right?

@jreback

can we add a flag to the wrapper to indicate if its a reversed arg op? (eg. True for radd and False for add)?
then easy enough toreflip them at the beginning to make the order correct?

@jtratner
Python for Data member

@jreback yeah, that's what I outlined. There's already a flag (since it gets the name), it's just:

r_op = name[0] == "r" or name.startswith("__r")

and it happens when the method is bound, not when it's called, so the cost of the check is minimal.

@jreback

@jtratner i c now....getting late!

I think if you see a reversed op, do:

self, other = other, self

because the timedelta/datetime compuations are very sensitive to the type of the lhs operator (even before values are calculated), e.g. lhs=td, rhs=integer -> radd -> lhs=integer, rhs=td will blow up as don't know how to deal wit this, its required to have the td on the lhs (I mean it could deal with it but just easier this way)

@jtratner
Python for Data member

@jreback can't do that, because the other could be a scalar or something, right?

@jtratner
Python for Data member

@jreback figured it out, it's all good

@jreback

great!

@jreback

aren't the r??? methods ONLY called if the regular methods don't have a valid method, e.g.

td_series + integer -> ok we call add
integer + td_series -> call radd?

@jtratner
Python for Data member

@jreback yes, but the call signature is the same both times

integer + td_series --> td_series.__radd__(integer) --> wrapper(td_series, integer)
td_series + integer --> td_series.__add__(integer) --> wrapper(td_series, integer)

@jreback

ahh...so that's ok (in this case then), but __name__ is radd?

@jtratner
Python for Data member
@jreback

gotcha

FYI

minor rebasing to master of series2, (incorporates json changes) and some doc cleanups

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Aug 12, 2013
  1. @jreback

    BUG: (GH4532) Fix bug in having a rhs of np.timedelta64 or np.offsets…

    jreback committed
    ….DateOffset when operating
    
         with datetimes
  2. @jreback

    BUG: (GH4134) Fix arithmetic with series/datetimeindex and np.timedel…

    jreback committed
    …ta64 not working the same
    
    TST: add in @cpcloud tests related to GH4135
  3. @jreback

    ENH: GH4521 A Series of dtype timedelta64[ns] can now be divided/mult…

    jreback committed
    …ipled by an integer series
Commits on Aug 13, 2013
  1. @jreback

    DOC/ENH: timedelta conversions & docs

    jreback committed
    TST: add tests/catch for non-absolute DateOffsets in timedelta operations
View
9 doc/source/release.rst
@@ -53,6 +53,11 @@ pandas 0.13
- Add ``rename`` and ``set_names`` methods to ``Index`` as well as
``set_names``, ``set_levels``, ``set_labels`` to ``MultiIndex``.
(:issue:`4039`)
+ - A Series of dtype ``timedelta64[ns]`` can now be divided/multiplied
+ by an integer series (:issue`4521`)
+ - A Series of dtype ``timedelta64[ns]`` can now be divided by another
+ ``timedelta64[ns]`` object to yield a ``float64`` dtyped Series. This
+ is frequency conversion.
**API Changes**
@@ -166,6 +171,10 @@ pandas 0.13
- Fixed issue where individual ``names``, ``levels`` and ``labels`` could be
set on ``MultiIndex`` without validation (:issue:`3714`, :issue:`4039`)
- Fixed (:issue:`3334`) in pivot_table. Margins did not compute if values is the index.
+ - Fix bug in having a rhs of ``np.timedelta64`` or ``np.offsets.DateOffset`` when operating
+ with datetimes (:issue:`4532`)
+ - Fix arithmetic with series/datetimeindex and ``np.timedelta64`` not working the same (:issue:`4134`)
+ and buggy timedelta in numpy 1.6 (:issue:`4135`)
pandas 0.12
===========
View
88 doc/source/timeseries.rst
@@ -170,7 +170,7 @@ Take care, ``to_datetime`` may not act as you expect on mixed data:
.. ipython:: python
- pd.to_datetime([1, '1'])
+ to_datetime([1, '1'])
.. _timeseries.daterange:
@@ -297,7 +297,7 @@ the year or year and month as strings:
ts['2011-6']
-This type of slicing will work on a DataFrame with a ``DateTimeIndex`` as well. Since the
+This type of slicing will work on a DataFrame with a ``DateTimeIndex`` as well. Since the
partial string selection is a form of label slicing, the endpoints **will be** included. This
would include matching times on an included date. Here's an example:
@@ -1112,7 +1112,8 @@ Time Deltas
-----------
Timedeltas are differences in times, expressed in difference units, e.g. days,hours,minutes,seconds.
-They can be both positive and negative.
+They can be both positive and negative. :ref:`DateOffsets<timeseries.offsets>` that are absolute in nature
+(``Day, Hour, Minute, Second, Milli, Micro, Nano``) can be used as ``timedeltas``.
.. ipython:: python
@@ -1128,41 +1129,16 @@ They can be both positive and negative.
s - s.max()
s - datetime(2011,1,1,3,5)
s + timedelta(minutes=5)
+ s + Minute(5)
+ s + Minute(5) + Milli(5)
Getting scalar results from a ``timedelta64[ns]`` series
.. ipython:: python
- :suppress:
-
- from distutils.version import LooseVersion
-
-.. ipython:: python
y = s - s[0]
y
-.. code-block:: python
-
- if LooseVersion(np.__version__) <= '1.6.2':
- y.apply(lambda x: x.item().total_seconds())
- y.apply(lambda x: x.item().days)
- else:
- y.apply(lambda x: x / np.timedelta64(1, 's'))
- y.apply(lambda x: x / np.timedelta64(1, 'D'))
-
-.. note::
-
- As you can see from the conditional statement above, these operations are
- different in numpy 1.6.2 and in numpy >= 1.7. The ``timedelta64[ns]`` scalar
- type in 1.6.2 is much like a ``datetime.timedelta``, while in 1.7 it is a
- nanosecond based integer. A future version of pandas will make this
- transparent.
-
-.. note::
-
- In numpy >= 1.7 dividing a ``timedelta64`` array by another ``timedelta64``
- array will yield an array with dtype ``np.float64``.
-
Series of timedeltas with ``NaT`` values are supported
.. ipython:: python
@@ -1218,3 +1194,55 @@ issues). ``idxmin, idxmax`` are supported as well.
df.min().idxmax()
df.min(axis=1).idxmin()
+
+.. _timeseries.timedeltas_convert:
+
+Time Deltas & Conversions
+-------------------------
+
+.. versionadded:: 0.13
+
+Timedeltas can be converted to other 'frequencies' by dividing by another timedelta.
+These operations yield ``float64`` dtyped Series.
+
+.. ipython:: python
+
+ td = Series(date_range('20130101',periods=4))-Series(date_range('20121201',periods=4))
+ td[2] += np.timedelta64(timedelta(minutes=5,seconds=3))
+ td[3] = np.nan
+ td
+
+ # to days
+ td / np.timedelta64(1,'D')
+
+ # to seconds
+ td / np.timedelta64(1,'s')
+
+Dividing or multiplying a ``timedelta64[ns]`` Series by an integer or integer Series
+yields another ``timedelta64[ns]`` dtypes Series.
+
+.. ipython:: python
+
+ td * -1
+ td * Series([1,2,3,4])
+
+Numpy < 1.7 Compatibility
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Numpy < 1.7 has a broken ``timedelta64`` type that does not correctly work
+for arithmetic. Pandas bypasses this, but for frequency conversion as above,
+you need to create the divisor yourself. The ``np.timetimedelta64`` type only
+has 1 argument, the number of **micro** seconds.
+
+The following are equivalent statements in the two versions of numpy.
+
+.. code-block:: python
+
+ from distutils.version import LooseVersion
+ if LooseVersion(np.__version__) <= '1.6.2':
+ y / np.timedelta(86400*int(1e6))
+ y / np.timedelta(int(1e6))
+ else:
+ y / np.timedelta64(1,'D')
+ y / np.timedelta64(1,'s')
+
View
34 doc/source/v0.13.0.txt
@@ -100,6 +100,40 @@ Enhancements
- Added a more informative error message when plot arguments contain
overlapping color and style arguments (:issue:`4402`)
+ - ``timedelta64[ns]`` operations
+
+ - A Series of dtype ``timedelta64[ns]`` can now be divided by another
+ ``timedelta64[ns]`` object to yield a ``float64`` dtyped Series. This
+ is frequency conversion. See :ref:`here<timeseries.timedeltas_convert>` for the docs.
+
+ .. ipython:: python
+
+ from datetime import timedelta
+ td = Series(date_range('20130101',periods=4))-Series(date_range('20121201',periods=4))
+ td[2] += np.timedelta64(timedelta(minutes=5,seconds=3))
+ td[3] = np.nan
+ td
+
+ # to days
+ td / np.timedelta64(1,'D')
+
+ # to seconds
+ td / np.timedelta64(1,'s')
+
+ - Dividing or multiplying a ``timedelta64[ns]`` Series by an integer or integer Series
+
+ .. ipython:: python
+
+ td * -1
+ td * Series([1,2,3,4])
+
+ - Absolute ``DateOffset`` objects can act equivalenty to ``timedeltas``
+
+ .. ipython:: python
+
+ from pandas import offsets
+ td + offsets.Minute(5) + offsets.Milli(5)
+
Bug Fixes
~~~~~~~~~
View
49 pandas/core/common.py
@@ -11,9 +11,11 @@
import pandas.algos as algos
import pandas.lib as lib
import pandas.tslib as tslib
-
+from distutils.version import LooseVersion
from pandas import compat
from pandas.compat import StringIO, BytesIO, range, long, u, zip, map
+from datetime import timedelta
+
from pandas.core.config import get_option
from pandas.core import array as pa
@@ -33,6 +35,10 @@ class PandasError(Exception):
class AmbiguousIndexError(PandasError, KeyError):
pass
+# versioning
+_np_version = np.version.short_version
+_np_version_under1p6 = LooseVersion(_np_version) < '1.6'
+_np_version_under1p7 = LooseVersion(_np_version) < '1.7'
_POSSIBLY_CAST_DTYPES = set([ np.dtype(t) for t in ['M8[ns]','m8[ns]','O','int8','uint8','int16','uint16','int32','uint32','int64','uint64'] ])
_NS_DTYPE = np.dtype('M8[ns]')
@@ -1144,7 +1150,45 @@ def _possibly_convert_platform(values):
def _possibly_cast_to_timedelta(value, coerce=True):
""" try to cast to timedelta64, if already a timedeltalike, then make
sure that we are [ns] (as numpy 1.6.2 is very buggy in this regards,
- don't force the conversion unless coerce is True """
+ don't force the conversion unless coerce is True
+
+ if coerce='compat' force a compatibilty coercerion (to timedeltas) if needeed
+ """
+
+ # coercion compatability
+ if coerce == 'compat' and _np_version_under1p7:
+
+ def convert(td, type):
+
+ # we have an array with a non-object dtype
+ if hasattr(td,'item'):
+ td = td.astype(np.int64).item()
+ if td == tslib.iNaT:
+ return td
+ if dtype == 'm8[us]':
+ td *= 1000
+ return td
+
+ if td == tslib.compat_NaT:
+ return tslib.iNaT
+
+ # convert td value to a nanosecond value
+ d = td.days
+ s = td.seconds
+ us = td.microseconds
+
+ if dtype == 'object' or dtype == 'm8[ns]':
+ td = 1000*us + (s + d * 24 * 3600) * 10 ** 9
+ else:
+ raise ValueError("invalid conversion of dtype in np < 1.7 [%s]" % dtype)
+
+ return td
+
+ # < 1.7 coercion
+ if not is_list_like(value):
+ value = np.array([ value ])
+ dtype = value.dtype
+ return np.array([ convert(v,dtype) for v in value ], dtype='m8[ns]')
# deal with numpy not being able to handle certain timedelta operations
if isinstance(value,np.ndarray) and value.dtype.kind == 'm':
@@ -1154,6 +1198,7 @@ def _possibly_cast_to_timedelta(value, coerce=True):
# we don't have a timedelta, but we want to try to convert to one (but don't force it)
if coerce:
+
new_value = tslib.array_to_timedelta64(value.astype(object), coerce=False)
if new_value.dtype == 'i8':
value = np.array(new_value,dtype='timedelta64[ns]')
View
122 pandas/core/series.py
@@ -23,6 +23,7 @@
_ensure_index, _handle_legacy_indexes)
from pandas.core.indexing import (_SeriesIndexer, _check_bool_indexer,
_check_slice_bounds, _maybe_convert_indices)
+from pandas.tseries.offsets import DateOffset
from pandas.tseries.index import DatetimeIndex
from pandas.tseries.period import PeriodIndex, Period
from pandas import compat
@@ -84,81 +85,144 @@ def na_op(x, y):
def wrapper(self, other, name=name):
from pandas.core.frame import DataFrame
dtype = None
+ fill_value = tslib.iNaT
wrap_results = lambda x: x
lvalues, rvalues = self, other
is_timedelta_lhs = com.is_timedelta64_dtype(self)
is_datetime_lhs = com.is_datetime64_dtype(self)
+ is_integer_lhs = lvalues.dtype.kind in ['i','u']
if is_datetime_lhs or is_timedelta_lhs:
+ coerce = 'compat' if _np_version_under1p7 else True
+
# convert the argument to an ndarray
def convert_to_array(values):
if not is_list_like(values):
values = np.array([values])
inferred_type = lib.infer_dtype(values)
if inferred_type in set(['datetime64','datetime','date','time']):
+ # a datetlike
if not (isinstance(values, pa.Array) and com.is_datetime64_dtype(values)):
values = tslib.array_to_datetime(values)
- elif inferred_type in set(['timedelta','timedelta64']):
- # need to convert timedelta to ns here
- # safest to convert it to an object arrany to process
- if not (isinstance(values, pa.Array) and com.is_timedelta64_dtype(values)):
- values = com._possibly_cast_to_timedelta(values)
+ elif inferred_type in set(['timedelta']):
+ # have a timedelta, convert to to ns here
+ values = com._possibly_cast_to_timedelta(values, coerce=coerce)
+ elif inferred_type in set(['timedelta64']):
+ # have a timedelta64, make sure dtype dtype is ns
+ values = com._possibly_cast_to_timedelta(values, coerce=coerce)
elif inferred_type in set(['integer']):
+ # py3 compat where dtype is 'm' but is an integer
if values.dtype.kind == 'm':
values = values.astype('timedelta64[ns]')
+ elif name not in ['__truediv__','__div__','__mul__']:
+ raise TypeError("incompatible type for a datetime/timedelta operation [{0}]".format(name))
+ elif isinstance(values[0],DateOffset):
+ # handle DateOffsets
+ os = pa.array([ getattr(v,'delta',None) for v in values ])
+ mask = isnull(os)
+ if mask.any():
+ raise TypeError("cannot use a non-absolute DateOffset in "
+ "datetime/timedelta operations [{0}]".format(','.join([ com.pprint_thing(v) for v in values[mask] ])))
+ values = com._possibly_cast_to_timedelta(os, coerce=coerce)
else:
- values = pa.array(values)
+ raise TypeError("incompatible type [{0}] for a datetime/timedelta operation".format(pa.array(values).dtype))
+
return values
# convert lhs and rhs
lvalues = convert_to_array(lvalues)
rvalues = convert_to_array(rvalues)
- is_timedelta_rhs = com.is_timedelta64_dtype(rvalues)
is_datetime_rhs = com.is_datetime64_dtype(rvalues)
+ is_timedelta_rhs = com.is_timedelta64_dtype(rvalues) or (not is_datetime_rhs and _np_version_under1p7)
+ is_integer_rhs = rvalues.dtype.kind in ['i','u']
+ mask = None
- # 2 datetimes or 2 timedeltas
- if (is_timedelta_lhs and is_timedelta_rhs) or (is_datetime_lhs and
- is_datetime_rhs):
- if is_datetime_lhs and name != '__sub__':
+ # timedelta and integer mul/div
+ if (is_timedelta_lhs and is_integer_rhs) or (is_integer_lhs and is_timedelta_rhs):
+
+ if name not in ['__truediv__','__div__','__mul__']:
+ raise TypeError("can only operate on a timedelta and an integer for "
+ "division, but the operator [%s] was passed" % name)
+ dtype = 'timedelta64[ns]'
+ mask = isnull(lvalues) | isnull(rvalues)
+ lvalues = lvalues.astype(np.int64)
+ rvalues = rvalues.astype(np.int64)
+
+ # 2 datetimes
+ elif is_datetime_lhs and is_datetime_rhs:
+ if name != '__sub__':
raise TypeError("can only operate on a datetimes for subtraction, "
"but the operator [%s] was passed" % name)
- elif is_timedelta_lhs and name not in ['__add__','__sub__']:
- raise TypeError("can only operate on a timedeltas for "
- "addition and subtraction, but the operator [%s] was passed" % name)
dtype = 'timedelta64[ns]'
+ mask = isnull(lvalues) | isnull(rvalues)
+ lvalues = lvalues.view('i8')
+ rvalues = rvalues.view('i8')
- # we may have to convert to object unfortunately here
+ # 2 timedeltas
+ elif is_timedelta_lhs and is_timedelta_rhs:
mask = isnull(lvalues) | isnull(rvalues)
- if mask.any():
- def wrap_results(x):
- x = pa.array(x,dtype='timedelta64[ns]')
- np.putmask(x,mask,tslib.iNaT)
- return x
+
+ # time delta division -> unit less
+ if name in ['__div__','__truediv__']:
+ dtype = 'float64'
+ fill_value = np.nan
+ lvalues = lvalues.astype(np.int64).astype(np.float64)
+ rvalues = rvalues.astype(np.int64).astype(np.float64)
+
+ # another timedelta
+ elif name in ['__add__','__sub__']:
+ dtype = 'timedelta64[ns]'
+ lvalues = lvalues.astype(np.int64)
+ rvalues = rvalues.astype(np.int64)
+
+ else:
+ raise TypeError("can only operate on a timedeltas for "
+ "addition, subtraction, and division, but the operator [%s] was passed" % name)
# datetime and timedelta
- elif (is_timedelta_lhs and is_datetime_rhs) or (is_timedelta_rhs and is_datetime_lhs):
+ elif is_timedelta_rhs and is_datetime_lhs:
if name not in ['__add__','__sub__']:
- raise TypeError("can only operate on a timedelta and a datetime for "
+ raise TypeError("can only operate on a datetime with a rhs of a timedelta for "
"addition and subtraction, but the operator [%s] was passed" % name)
dtype = 'M8[ns]'
+ lvalues = lvalues.view('i8')
+ rvalues = rvalues.view('i8')
+
+ elif is_timedelta_lhs and is_datetime_rhs:
+
+ if name not in ['__add__']:
+ raise TypeError("can only operate on a timedelta and a datetime for "
+ "addition, but the operator [%s] was passed" % name)
+ dtype = 'M8[ns]'
+ lvalues = lvalues.view('i8')
+ rvalues = rvalues.view('i8')
else:
- raise ValueError('cannot operate on a series with out a rhs '
- 'of a series/ndarray of type datetime64[ns] '
- 'or a timedelta')
+ raise TypeError('cannot operate on a series with out a rhs '
+ 'of a series/ndarray of type datetime64[ns] '
+ 'or a timedelta')
- lvalues = lvalues.view('i8')
- rvalues = rvalues.view('i8')
+ # if we need to mask the results
+ if mask is not None:
+ if mask.any():
+ def f(x):
+ x = pa.array(x,dtype=dtype)
+ np.putmask(x,mask,fill_value)
+ return x
+ wrap_results = f
if isinstance(rvalues, Series):
- lvalues = lvalues.values
- rvalues = rvalues.values
+
+ if hasattr(lvalues,'values'):
+ lvalues = lvalues.values
+ if hasattr(rvalues,'values'):
+ rvalues = rvalues.values
if self.index.equals(other.index):
name = _maybe_match_name(self, other)
View
636 pandas/tests/test_series.py
@@ -1,10 +1,10 @@
# pylint: disable-msg=E1101,W0612
-from datetime import datetime, timedelta, date
-import os
+from datetime import datetime, timedelta
import operator
import unittest
import string
+from itertools import product, starmap
import nose
@@ -21,6 +21,7 @@
import pandas.core.series as smod
import pandas.lib as lib
+import pandas.core.common as com
import pandas.core.datetools as datetools
import pandas.core.nanops as nanops
@@ -38,6 +39,7 @@ def _skip_if_no_scipy():
except ImportError:
raise nose.SkipTest
+
def _skip_if_no_pytz():
try:
import pytz
@@ -63,8 +65,8 @@ def test_copy_name(self):
self.assertEquals(result.name, self.ts.name)
# def test_copy_index_name_checking(self):
- # # don't want to be able to modify the index stored elsewhere after
- # # making a copy
+ # don't want to be able to modify the index stored elsewhere after
+ # making a copy
# self.ts.index.name = None
# cp = self.ts.copy()
@@ -313,7 +315,7 @@ def test_constructor(self):
self.assertEqual(rs, xp)
# raise on MultiIndex GH4187
- m = MultiIndex.from_arrays([[1, 2], [3,4]])
+ m = MultiIndex.from_arrays([[1, 2], [3, 4]])
self.assertRaises(NotImplementedError, Series, m)
def test_constructor_empty(self):
@@ -445,7 +447,7 @@ def test_constructor_cast(self):
self.assertRaises(ValueError, Series, ['a', 'b', 'c'], dtype=float)
def test_constructor_dtype_nocast(self):
- # #1572
+ # 1572
s = Series([1, 2, 3])
s2 = Series(s, dtype=np.int64)
@@ -459,8 +461,8 @@ def test_constructor_dtype_datetime64(self):
s = Series(tslib.iNaT, dtype='M8[ns]', index=lrange(5))
self.assert_(isnull(s).all() == True)
- #### in theory this should be all nulls, but since
- #### we are not specifying a dtype is ambiguous
+ # in theory this should be all nulls, but since
+ # we are not specifying a dtype is ambiguous
s = Series(tslib.iNaT, index=lrange(5))
self.assert_(isnull(s).all() == False)
@@ -489,12 +491,14 @@ def test_constructor_dtype_datetime64(self):
self.assert_(s.dtype == 'M8[ns]')
# invalid astypes
- for t in ['s','D','us','ms']:
+ for t in ['s', 'D', 'us', 'ms']:
self.assertRaises(TypeError, s.astype, 'M8[%s]' % t)
# GH3414 related
- self.assertRaises(TypeError, lambda x: Series(Series(dates).astype('int')/1000000,dtype='M8[ms]'))
- self.assertRaises(TypeError, lambda x: Series(dates, dtype='datetime64'))
+ self.assertRaises(TypeError, lambda x: Series(
+ Series(dates).astype('int') / 1000000, dtype='M8[ms]'))
+ self.assertRaises(
+ TypeError, lambda x: Series(dates, dtype='datetime64'))
def test_constructor_dict(self):
d = {'a': 0., 'b': 1., 'c': 2.}
@@ -518,14 +522,17 @@ def test_constructor_subclass_dict(self):
def test_orderedDict_ctor(self):
# GH3283
- import pandas, random
+ import pandas
+ import random
data = OrderedDict([('col%s' % i, random.random()) for i in range(12)])
s = pandas.Series(data)
self.assertTrue(all(s.values == list(data.values())))
def test_orderedDict_subclass_ctor(self):
# GH3283
- import pandas, random
+ import pandas
+ import random
+
class A(OrderedDict):
pass
data = A([('col%s' % i, random.random()) for i in range(12)])
@@ -631,7 +638,8 @@ def test_getitem_get(self):
self.assertEqual(self.series[idx1], self.series[5])
self.assertEqual(self.objSeries[idx2], self.objSeries[5])
- self.assertEqual(self.series.get(-1), self.series.get(self.series.index[-1]))
+ self.assertEqual(
+ self.series.get(-1), self.series.get(self.series.index[-1]))
self.assertEqual(self.series[5], self.series.get(self.series.index[5]))
# missing
@@ -792,10 +800,10 @@ def test_getitem_dups_with_missing(self):
# breaks reindex, so need to use .ix internally
# GH 4246
- s = Series([1,2,3,4],['foo','bar','foo','bah'])
- expected = s.ix[['foo','bar','bah','bam']]
- result = s[['foo','bar','bah','bam']]
- assert_series_equal(result,expected)
+ s = Series([1, 2, 3, 4], ['foo', 'bar', 'foo', 'bah'])
+ expected = s.ix[['foo', 'bar', 'bah', 'bam']]
+ result = s[['foo', 'bar', 'bah', 'bam']]
+ assert_series_equal(result, expected)
def test_setitem_ambiguous_keyerror(self):
s = Series(lrange(10), index=lrange(0, 20, 2))
@@ -944,8 +952,8 @@ def test_reshape_non_2d(self):
self.assertRaises(TypeError, x.reshape, (len(x),))
# GH 2719
- a = Series([1,2,3,4])
- self.assertRaises(TypeError,a.reshape, 2, 2)
+ a = Series([1, 2, 3, 4])
+ self.assertRaises(TypeError, a.reshape, 2, 2)
def test_reshape_2d_return_array(self):
x = Series(np.random.random(201), name='x')
@@ -1098,76 +1106,83 @@ def test_where(self):
self.assertRaises(ValueError, s.where, cond[:3].values, -s)
# GH 2745
- s = Series([1,2])
- s[[True, False]] = [0,1]
- expected = Series([0,2])
- assert_series_equal(s,expected)
+ s = Series([1, 2])
+ s[[True, False]] = [0, 1]
+ expected = Series([0, 2])
+ assert_series_equal(s, expected)
# failures
- self.assertRaises(ValueError, s.__setitem__, tuple([[[True, False]]]), [0,2,3])
- self.assertRaises(ValueError, s.__setitem__, tuple([[[True, False]]]), [])
+ self.assertRaises(
+ ValueError, s.__setitem__, tuple([[[True, False]]]), [0, 2, 3])
+ self.assertRaises(
+ ValueError, s.__setitem__, tuple([[[True, False]]]), [])
# unsafe dtype changes
- for dtype in [ np.int8, np.int16, np.int32, np.int64, np.float16, np.float32, np.float64 ]:
+ for dtype in [np.int8, np.int16, np.int32, np.int64, np.float16, np.float32, np.float64]:
s = Series(np.arange(10), dtype=dtype)
mask = s < 5
- s[mask] = lrange(2,7)
- expected = Series(lrange(2,7) + lrange(5,10), dtype=dtype)
+ s[mask] = lrange(2, 7)
+ expected = Series(lrange(2, 7) + lrange(5, 10), dtype=dtype)
assert_series_equal(s, expected)
self.assertEquals(s.dtype, expected.dtype)
# these are allowed operations, but are upcasted
- for dtype in [ np.int64, np.float64 ]:
+ for dtype in [np.int64, np.float64]:
s = Series(np.arange(10), dtype=dtype)
mask = s < 5
- values = [2.5,3.5,4.5,5.5,6.5]
+ values = [2.5, 3.5, 4.5, 5.5, 6.5]
s[mask] = values
- expected = Series(values + lrange(5,10), dtype='float64')
+ expected = Series(values + lrange(5, 10), dtype='float64')
assert_series_equal(s, expected)
self.assertEquals(s.dtype, expected.dtype)
- # can't do these as we are forced to change the itemsize of the input to something we cannot
- for dtype in [ np.int8, np.int16, np.int32, np.float16, np.float32 ]:
+ # can't do these as we are forced to change the itemsize of the input
+ # to something we cannot
+ for dtype in [np.int8, np.int16, np.int32, np.float16, np.float32]:
s = Series(np.arange(10), dtype=dtype)
mask = s < 5
- values = [2.5,3.5,4.5,5.5,6.5]
+ values = [2.5, 3.5, 4.5, 5.5, 6.5]
self.assertRaises(Exception, s.__setitem__, tuple(mask), values)
# GH3235
- s = Series(np.arange(10),dtype='int64')
+ s = Series(np.arange(10), dtype='int64')
mask = s < 5
- s[mask] = lrange(2,7)
- expected = Series(lrange(2,7) + lrange(5,10),dtype='int64')
+ s[mask] = lrange(2, 7)
+ expected = Series(lrange(2, 7) + lrange(5, 10), dtype='int64')
assert_series_equal(s, expected)
self.assertEquals(s.dtype, expected.dtype)
- s = Series(np.arange(10),dtype='int64')
+ s = Series(np.arange(10), dtype='int64')
mask = s > 5
- s[mask] = [0]*4
- expected = Series([0,1,2,3,4,5] + [0]*4,dtype='int64')
- assert_series_equal(s,expected)
+ s[mask] = [0] * 4
+ expected = Series([0, 1, 2, 3, 4, 5] + [0] * 4, dtype='int64')
+ assert_series_equal(s, expected)
s = Series(np.arange(10))
mask = s > 5
- self.assertRaises(ValueError, s.__setitem__, mask, ([0]*5,))
+ self.assertRaises(ValueError, s.__setitem__, mask, ([0] * 5,))
def test_where_broadcast(self):
# Test a variety of differently sized series
for size in range(2, 6):
# Test a variety of boolean indices
- for selection in [np.resize([True, False, False, False, False], size), # First element should be set
- np.resize([True, False], size), # Set alternating elements]
- np.resize([False], size)]: # No element should be set
+ for selection in [np.resize([True, False, False, False, False], size), # First element should be set
+ # Set alternating elements]
+ np.resize([True, False], size),
+ np.resize([False], size)]: # No element should be set
# Test a variety of different numbers as content
for item in [2.0, np.nan, np.finfo(np.float).max, np.finfo(np.float).min]:
- # Test numpy arrays, lists and tuples as the input to be broadcast
+ # Test numpy arrays, lists and tuples as the input to be
+ # broadcast
for arr in [np.array([item]), [item], (item,)]:
data = np.arange(size, dtype=float)
s = Series(data)
s[selection] = arr
- # Construct the expected series by taking the source data or item based on the selection
- expected = Series([item if use_item else data[i] for i, use_item in enumerate(selection)])
- assert_series_equal(s,expected)
+ # Construct the expected series by taking the source
+ # data or item based on the selection
+ expected = Series([item if use_item else data[i]
+ for i, use_item in enumerate(selection)])
+ assert_series_equal(s, expected)
def test_where_inplace(self):
s = Series(np.random.randn(5))
@@ -1221,14 +1236,14 @@ def test_setitem_boolean(self):
# similiar indexed series
result = self.series.copy()
- result[mask] = self.series*2
- expected = self.series*2
+ result[mask] = self.series * 2
+ expected = self.series * 2
assert_series_equal(result[mask], expected[mask])
# needs alignment
result = self.series.copy()
- result[mask] = (self.series*2)[0:5]
- expected = (self.series*2)[0:5].reindex_like(self.series)
+ result[mask] = (self.series * 2)[0:5]
+ expected = (self.series * 2)[0:5].reindex_like(self.series)
expected[-mask] = self.series[mask]
assert_series_equal(result[mask], expected[mask])
@@ -1391,8 +1406,7 @@ def test_timeseries_periodindex(self):
prng = period_range('1/1/2011', '1/1/2012', freq='M')
ts = Series(np.random.randn(len(prng)), prng)
new_ts = pickle.loads(pickle.dumps(ts))
- self.assertEqual(new_ts.index.freq,'M')
-
+ self.assertEqual(new_ts.index.freq, 'M')
def test_iter(self):
for i, val in enumerate(self.series):
@@ -1501,19 +1515,19 @@ def test_argsort(self):
self.assert_(issubclass(argsorted.dtype.type, np.integer))
# GH 2967 (introduced bug in 0.11-dev I think)
- s = Series([Timestamp('201301%02d'% (i+1)) for i in range(5)])
+ s = Series([Timestamp('201301%02d' % (i + 1)) for i in range(5)])
self.assert_(s.dtype == 'datetime64[ns]')
shifted = s.shift(-1)
self.assert_(shifted.dtype == 'datetime64[ns]')
self.assert_(isnull(shifted[4]) == True)
result = s.argsort()
- expected = Series(lrange(5),dtype='int64')
- assert_series_equal(result,expected)
+ expected = Series(lrange(5), dtype='int64')
+ assert_series_equal(result, expected)
result = shifted.argsort()
- expected = Series(lrange(4) + [-1],dtype='int64')
- assert_series_equal(result,expected)
+ expected = Series(lrange(4) + [-1], dtype='int64')
+ assert_series_equal(result, expected)
def test_argsort_stable(self):
s = Series(np.random.randint(0, 100, size=10000))
@@ -1567,7 +1581,6 @@ def testit():
# add some NaNs
self.series[5:15] = np.NaN
-
# idxmax, idxmin, min, and max are valid for dates
if not ('max' in name or 'min' in name):
ds = Series(date_range('1/1/2001', periods=10))
@@ -1591,7 +1604,7 @@ def testit():
# 2888
l = [0]
- l.extend(lrange(2**40,2**40+1000))
+ l.extend(lrange(2 ** 40, 2 ** 40+1000))
s = Series(l, dtype='int64')
assert_almost_equal(float(f(s)), float(alternate(s.values)))
@@ -1748,50 +1761,52 @@ def test_invert(self):
def test_modulo(self):
# GH3590, modulo as ints
- p = DataFrame({ 'first' : [3,4,5,8], 'second' : [0,0,0,3] })
+ p = DataFrame({'first': [3, 4, 5, 8], 'second': [0, 0, 0, 3]})
result = p['first'] % p['second']
- expected = Series(p['first'].values % p['second'].values,dtype='float64')
+ expected = Series(p['first'].values %
+ p['second'].values, dtype='float64')
expected.iloc[0:3] = np.nan
- assert_series_equal(result,expected)
+ assert_series_equal(result, expected)
result = p['first'] % 0
- expected = Series(np.nan,index=p.index)
- assert_series_equal(result,expected)
+ expected = Series(np.nan, index=p.index)
+ assert_series_equal(result, expected)
p = p.astype('float64')
result = p['first'] % p['second']
expected = Series(p['first'].values % p['second'].values)
- assert_series_equal(result,expected)
+ assert_series_equal(result, expected)
p = p.astype('float64')
result = p['first'] % p['second']
result2 = p['second'] % p['first']
- self.assertFalse(np.array_equal(result,result2))
+ self.assertFalse(np.array_equal(result, result2))
def test_div(self):
# integer div, but deal with the 0's
- p = DataFrame({ 'first' : [3,4,5,8], 'second' : [0,0,0,3] })
+ p = DataFrame({'first': [3, 4, 5, 8], 'second': [0, 0, 0, 3]})
result = p['first'] / p['second']
- expected = Series(p['first'].values / p['second'].values,dtype='float64')
+ expected = Series(
+ p['first'].values / p['second'].values, dtype='float64')
expected.iloc[0:3] = np.inf
- assert_series_equal(result,expected)
+ assert_series_equal(result, expected)
result = p['first'] / 0
- expected = Series(np.inf,index=p.index)
- assert_series_equal(result,expected)
+ expected = Series(np.inf, index=p.index)
+ assert_series_equal(result, expected)
p = p.astype('float64')
result = p['first'] / p['second']
expected = Series(p['first'].values / p['second'].values)
- assert_series_equal(result,expected)
+ assert_series_equal(result, expected)
- p = DataFrame({ 'first' : [3,4,5,8], 'second' : [1,1,1,1] })
+ p = DataFrame({'first': [3, 4, 5, 8], 'second': [1, 1, 1, 1]})
result = p['first'] / p['second']
if compat.PY3:
- assert_series_equal(result,p['first'].astype('float64'))
+ assert_series_equal(result, p['first'].astype('float64'))
else:
- assert_series_equal(result,p['first'])
+ assert_series_equal(result, p['first'])
self.assertFalse(np.array_equal(result, p['second'] / p['first']))
def test_operators(self):
@@ -1845,146 +1860,325 @@ def test_operators_empty_int_corner(self):
def test_constructor_dtype_timedelta64(self):
- td = Series([ timedelta(days=i) for i in range(3) ])
- self.assert_(td.dtype=='timedelta64[ns]')
+ td = Series([timedelta(days=i) for i in range(3)])
+ self.assert_(td.dtype == 'timedelta64[ns]')
# mixed with NaT
from pandas import tslib
- td = Series([ timedelta(days=i) for i in range(3) ] + [ tslib.NaT ], dtype='m8[ns]' )
- self.assert_(td.dtype=='timedelta64[ns]')
+ td = Series([timedelta(days=i)
+ for i in range(3)] + [tslib.NaT ], dtype='m8[ns]' )
+ self.assert_(td.dtype == 'timedelta64[ns]')
- td = Series([ timedelta(days=i) for i in range(3) ] + [ tslib.iNaT ], dtype='m8[ns]' )
- self.assert_(td.dtype=='timedelta64[ns]')
+ td = Series([timedelta(days=i)
+ for i in range(3)] + [tslib.iNaT ], dtype='m8[ns]' )
+ self.assert_(td.dtype == 'timedelta64[ns]')
- td = Series([ timedelta(days=i) for i in range(3) ] + [ np.nan ], dtype='m8[ns]' )
- self.assert_(td.dtype=='timedelta64[ns]')
+ td = Series([timedelta(days=i)
+ for i in range(3)] + [np.nan ], dtype='m8[ns]' )
+ self.assert_(td.dtype == 'timedelta64[ns]')
# invalid astypes
- for t in ['s','D','us','ms']:
+ for t in ['s', 'D', 'us', 'ms']:
self.assertRaises(TypeError, td.astype, 'm8[%s]' % t)
# valid astype
td.astype('int64')
# this is an invalid casting
- self.assertRaises(Exception, Series, [ timedelta(days=i) for i in range(3) ] + [ 'foo' ], dtype='m8[ns]' )
+ self.assertRaises(Exception, Series, [timedelta(days=i)
+ for i in range(3)] + ['foo' ], dtype='m8[ns]' )
self.assertRaises(TypeError, td.astype, 'int32')
# leave as object here
- td = Series([ timedelta(days=i) for i in range(3) ] + [ 'foo' ])
- self.assert_(td.dtype=='object')
+ td = Series([timedelta(days=i) for i in range(3)] + ['foo'])
+ self.assert_(td.dtype == 'object')
def test_operators_timedelta64(self):
# invalid ops
self.assertRaises(Exception, self.objSeries.__add__, 1)
- self.assertRaises(Exception, self.objSeries.__add__, np.array(1,dtype=np.int64))
+ self.assertRaises(
+ Exception, self.objSeries.__add__, np.array(1, dtype=np.int64))
self.assertRaises(Exception, self.objSeries.__sub__, 1)
- self.assertRaises(Exception, self.objSeries.__sub__, np.array(1,dtype=np.int64))
+ self.assertRaises(
+ Exception, self.objSeries.__sub__, np.array(1, dtype=np.int64))
# seriese ops
v1 = date_range('2012-1-1', periods=3, freq='D')
v2 = date_range('2012-1-2', periods=3, freq='D')
rs = Series(v2) - Series(v1)
- xp = Series(1e9 * 3600 * 24, rs.index).astype('int64').astype('timedelta64[ns]')
+ xp = Series(1e9 * 3600 * 24, rs.index).astype(
+ 'int64').astype('timedelta64[ns]')
assert_series_equal(rs, xp)
- self.assert_(rs.dtype=='timedelta64[ns]')
+ self.assert_(rs.dtype == 'timedelta64[ns]')
- df = DataFrame(dict(A = v1))
- td = Series([ timedelta(days=i) for i in range(3) ])
- self.assert_(td.dtype=='timedelta64[ns]')
+ df = DataFrame(dict(A=v1))
+ td = Series([timedelta(days=i) for i in range(3)])
+ self.assert_(td.dtype == 'timedelta64[ns]')
# series on the rhs
result = df['A'] - df['A'].shift()
- self.assert_(result.dtype=='timedelta64[ns]')
+ self.assert_(result.dtype == 'timedelta64[ns]')
result = df['A'] + td
- self.assert_(result.dtype=='M8[ns]')
+ self.assert_(result.dtype == 'M8[ns]')
# scalar Timestamp on rhs
maxa = df['A'].max()
- tm.assert_isinstance(maxa,Timestamp)
+ tm.assert_isinstance(maxa, Timestamp)
- resultb = df['A']- df['A'].max()
- self.assert_(resultb.dtype=='timedelta64[ns]')
+ resultb = df['A'] - df['A'].max()
+ self.assert_(resultb.dtype == 'timedelta64[ns]')
# timestamp on lhs
result = resultb + df['A']
- expected = Series([Timestamp('20111230'),Timestamp('20120101'),Timestamp('20120103')])
- assert_series_equal(result,expected)
+ expected = Series(
+ [Timestamp('20111230'), Timestamp('20120101'), Timestamp('20120103')])
+ assert_series_equal(result, expected)
# datetimes on rhs
- result = df['A'] - datetime(2001,1,1)
- expected = Series([timedelta(days=4017+i) for i in range(3)])
- assert_series_equal(result,expected)
- self.assert_(result.dtype=='m8[ns]')
+ result = df['A'] - datetime(2001, 1, 1)
+ expected = Series([timedelta(days=4017 + i) for i in range(3)])
+ assert_series_equal(result, expected)
+ self.assert_(result.dtype == 'm8[ns]')
- d = datetime(2001,1,1,3,4)
+ d = datetime(2001, 1, 1, 3, 4)
resulta = df['A'] - d
- self.assert_(resulta.dtype=='m8[ns]')
+ self.assert_(resulta.dtype == 'm8[ns]')
# roundtrip
resultb = resulta + d
- assert_series_equal(df['A'],resultb)
+ assert_series_equal(df['A'], resultb)
# timedeltas on rhs
td = timedelta(days=1)
resulta = df['A'] + td
resultb = resulta - td
- assert_series_equal(resultb,df['A'])
- self.assert_(resultb.dtype=='M8[ns]')
+ assert_series_equal(resultb, df['A'])
+ self.assert_(resultb.dtype == 'M8[ns]')
# roundtrip
- td = timedelta(minutes=5,seconds=3)
+ td = timedelta(minutes=5, seconds=3)
resulta = df['A'] + td
resultb = resulta - td
- assert_series_equal(df['A'],resultb)
- self.assert_(resultb.dtype=='M8[ns]')
+ assert_series_equal(df['A'], resultb)
+ self.assert_(resultb.dtype == 'M8[ns]')
+
+ # inplace
+ value = rs[2] + np.timedelta64(timedelta(minutes=5,seconds=1))
+ rs[2] += np.timedelta64(timedelta(minutes=5,seconds=1))
+ self.assert_(rs[2] == value)
+
+ def test_timedeltas_with_DateOffset(self):
+
+ # GH 4532
+ # operate with pd.offsets
+ s = Series([Timestamp('20130101 9:01'), Timestamp('20130101 9:02')])
+
+ result = s + pd.offsets.Second(5)
+ expected = Series(
+ [Timestamp('20130101 9:01:05'), Timestamp('20130101 9:02:05')])
+
+ result = s + pd.offsets.Milli(5)
+ expected = Series(
+ [Timestamp('20130101 9:01:00.005'), Timestamp('20130101 9:02:00.005')])
+ assert_series_equal(result, expected)
+
+ result = s + pd.offsets.Minute(5) + pd.offsets.Milli(5)
+ expected = Series(
+ [Timestamp('20130101 9:06:00.005'), Timestamp('20130101 9:07:00.005')])
+ assert_series_equal(result, expected)
+
+ if not com._np_version_under1p7:
+
+ # operate with np.timedelta64 correctly
+ result = s + np.timedelta64(1, 's')
+ expected = Series(
+ [Timestamp('20130101 9:01:01'), Timestamp('20130101 9:02:01')])
+ assert_series_equal(result, expected)
+
+ result = s + np.timedelta64(5, 'ms')
+ expected = Series(
+ [Timestamp('20130101 9:01:00.005'), Timestamp('20130101 9:02:00.005')])
+ assert_series_equal(result, expected)
+
+ # valid DateOffsets
+ for do in [ 'Hour', 'Minute', 'Second', 'Day', 'Micro',
+ 'Milli', 'Nano' ]:
+ op = getattr(pd.offsets,do)
+ s + op(5)
+
+ # invalid DateOffsets
+ for do in [ 'Week', 'BDay', 'BQuarterEnd', 'BMonthEnd', 'BYearEnd',
+ 'BYearBegin','BQuarterBegin', 'BMonthBegin',
+ 'MonthEnd','YearBegin', 'YearEnd',
+ 'MonthBegin', 'QuarterBegin' ]:
+ op = getattr(pd.offsets,do)
+ self.assertRaises(TypeError, s.__add__, op(5))
+
+ def test_timedelta64_operations_with_timedeltas(self):
# td operate with td
- td1 = Series([timedelta(minutes=5,seconds=3)]*3)
- td2 = timedelta(minutes=5,seconds=4)
- result = td1-td2
- expected = Series([timedelta(seconds=0)]*3)-Series([timedelta(seconds=1)]*3)
- self.assert_(result.dtype=='m8[ns]')
+ td1 = Series([timedelta(minutes=5, seconds=3)] * 3)
+ td2 = timedelta(minutes=5, seconds=4)
+ result = td1 - td2
+ expected = Series([timedelta(seconds=0)] * 3) -Series(
+ [timedelta(seconds=1)] * 3)
+ self.assert_(result.dtype == 'm8[ns]')
+ assert_series_equal(result, expected)
+
+ # roundtrip
+ assert_series_equal(result + td2,td1)
+
+ def test_timedelta64_operations_with_integers(self):
+
+ # GH 4521
+ # divide/multiply by integers
+ startdate = Series(date_range('2013-01-01', '2013-01-03'))
+ enddate = Series(date_range('2013-03-01', '2013-03-03'))
+
+ s1 = enddate - startdate
+ s1[2] = np.nan
+ s2 = Series([2, 3, 4])
+ expected = Series(s1.values.astype(np.int64) / s2, dtype='m8[ns]')
+ expected[2] = np.nan
+ result = s1 / s2
+ assert_series_equal(result,expected)
+
+ s2 = Series([20, 30, 40])
+ expected = Series(s1.values.astype(np.int64) / s2, dtype='m8[ns]')
+ expected[2] = np.nan
+ result = s1 / s2
+ assert_series_equal(result,expected)
+
+ result = s1 / 2
+ expected = Series(s1.values.astype(np.int64) / 2, dtype='m8[ns]')
+ expected[2] = np.nan
+ assert_series_equal(result,expected)
+
+ s2 = Series([20, 30, 40])
+ expected = Series(s1.values.astype(np.int64) * s2, dtype='m8[ns]')
+ expected[2] = np.nan
+ result = s1 * s2
+ assert_series_equal(result,expected)
+
+ for dtype in ['int32','int16','uint32','uint64','uint32','uint16','uint8']:
+ s2 = Series([20, 30, 40],dtype=dtype)
+ expected = Series(s1.values.astype(np.int64) * s2.astype(np.int64), dtype='m8[ns]')
+ expected[2] = np.nan
+ result = s1 * s2
+ assert_series_equal(result,expected)
+
+ result = s1 * 2
+ expected = Series(s1.values.astype(np.int64) * 2, dtype='m8[ns]')
+ expected[2] = np.nan
assert_series_equal(result,expected)
+ result = s1 * -1
+ expected = Series(s1.values.astype(np.int64) * -1, dtype='m8[ns]')
+ expected[2] = np.nan
+ assert_series_equal(result,expected)
+
+ # invalid ops
+ for op in ['__true_div__','__div__','__mul__']:
+ sop = getattr(s1,op,None)
+ if sop is not None:
+ self.assertRaises(TypeError, sop, s2.astype(float))
+ self.assertRaises(TypeError, sop, 2.)
+
+ for op in ['__add__','__sub__']:
+ sop = getattr(s1,op,None)
+ if sop is not None:
+ self.assertRaises(TypeError, sop, 1)
+ self.assertRaises(TypeError, sop, s2.values)
+
+ def test_timedelta64_conversions(self):
+ if com._np_version_under1p7:
+ raise nose.SkipTest("cannot use 2 argument form of timedelta64 conversions with numpy < 1.7")
+
+ startdate = Series(date_range('2013-01-01', '2013-01-03'))
+ enddate = Series(date_range('2013-03-01', '2013-03-03'))
+
+ s1 = enddate - startdate
+ s1[2] = np.nan
+
+ for m in [1, 3, 10]:
+ for unit in ['D','h','m','s','ms','us','ns']:
+ expected = s1.apply(lambda x: x / np.timedelta64(m,unit))
+ result = s1 / np.timedelta64(m,unit)
+ assert_series_equal(result, expected)
+
+ def test_timedelta64_equal_timedelta_supported_ops(self):
+ ser = Series([Timestamp('20130301'), Timestamp('20130228 23:00:00'),
+ Timestamp('20130228 22:00:00'),
+ Timestamp('20130228 21:00:00')])
+
+ intervals = 'D', 'h', 'm', 's', 'us'
+ npy16_mappings = {'D': 24 * 60 * 60 * 1000000, 'h': 60 * 60 * 1000000,
+ 'm': 60 * 1000000, 's': 1000000, 'us': 1}
+
+ def timedelta64(*args):
+ if com._np_version_under1p7:
+ coeffs = np.array(args)
+ terms = np.array([npy16_mappings[interval]
+ for interval in intervals])
+ return np.timedelta64(coeffs.dot(terms))
+ return sum(starmap(np.timedelta64, zip(args, intervals)))
+
+ for op, d, h, m, s, us in product([operator.add, operator.sub],
+ *([range(2)] * 5)):
+ nptd = timedelta64(d, h, m, s, us)
+ pytd = timedelta(days=d, hours=h, minutes=m, seconds=s,
+ microseconds=us)
+ lhs = op(ser, nptd)
+ rhs = op(ser, pytd)
+
+ try:
+ assert_series_equal(lhs, rhs)
+ except:
+ raise AssertionError(
+ "invalid comparsion [op->{0},d->{1},h->{2},m->{3},s->{4},us->{5}]\n{6}\n{7}\n".format(op, d, h, m, s, us, lhs, rhs))
+
def test_operators_datetimelike(self):
- ### timedelta64 ###
- td1 = Series([timedelta(minutes=5,seconds=3)]*3)
- td2 = timedelta(minutes=5,seconds=4)
- for op in ['__mul__','__floordiv__','__truediv__','__div__','__pow__']:
- op = getattr(td1,op,None)
+ # timedelta64 ###
+ td1 = Series([timedelta(minutes=5, seconds=3)] * 3)
+ td2 = timedelta(minutes=5, seconds=4)
+ for op in ['__mul__', '__floordiv__', '__pow__']:
+ op = getattr(td1, op, None)
if op is not None:
self.assertRaises(TypeError, op, td2)
td1 + td2
td1 - td2
-
- ### datetime64 ###
- dt1 = Series([Timestamp('20111230'),Timestamp('20120101'),Timestamp('20120103')])
- dt2 = Series([Timestamp('20111231'),Timestamp('20120102'),Timestamp('20120104')])
- for op in ['__add__','__mul__','__floordiv__','__truediv__','__div__','__pow__']:
- op = getattr(dt1,op,None)
- if op is not None:
- self.assertRaises(TypeError, op, dt2)
+ td1 / td2
+
+ # datetime64 ###
+ dt1 = Series(
+ [Timestamp('20111230'), Timestamp('20120101'), Timestamp('20120103')])
+ dt2 = Series(
+ [Timestamp('20111231'), Timestamp('20120102'), Timestamp('20120104')])
+ for op in ['__add__', '__mul__', '__floordiv__', '__truediv__', '__div__', '__pow__']:
+ sop = getattr(dt1, op, None)
+ if sop is not None:
+ self.assertRaises(TypeError, sop, dt2)
dt1 - dt2
- ### datetime64 with timetimedelta ###
- for op in ['__mul__','__floordiv__','__truediv__','__div__','__pow__']:
- op = getattr(dt1,op,None)
- if op is not None:
- self.assertRaises(TypeError, op, td1)
+ # datetime64 with timetimedelta ###
+ for op in ['__mul__', '__floordiv__', '__truediv__', '__div__', '__pow__']:
+ sop = getattr(dt1, op, None)
+ if sop is not None:
+ self.assertRaises(TypeError, sop, td1)
dt1 + td1
dt1 - td1
- ### timetimedelta with datetime64 ###
- for op in ['__mul__','__floordiv__','__truediv__','__div__','__pow__']:
- op = getattr(td1,op,None)
- if op is not None:
- self.assertRaises(TypeError, op, dt1)
+ # timetimedelta with datetime64 ###
+ for op in ['__sub__', '__mul__', '__floordiv__', '__truediv__', '__div__', '__pow__']:
+ sop = getattr(td1, op, None)
+ if sop is not None:
+ self.assertRaises(TypeError, sop, dt1)
+
+ # timedelta + datetime ok
td1 + dt1
- td1 - dt1
def test_timedelta64_functions(self):
@@ -1992,7 +2186,8 @@ def test_timedelta64_functions(self):
from pandas import date_range
# index min/max
- td = Series(date_range('2012-1-1', periods=3, freq='D'))-Timestamp('20120101')
+ td = Series(date_range('2012-1-1', periods=3, freq='D')) - \
+ Timestamp('20120101')
result = td.idxmin()
self.assert_(result == 0)
@@ -2011,31 +2206,30 @@ def test_timedelta64_functions(self):
self.assert_(result == 2)
# abs
- s1 = Series(date_range('20120101',periods=3))
- s2 = Series(date_range('20120102',periods=3))
- expected = Series(s2-s1)
+ s1 = Series(date_range('20120101', periods=3))
+ s2 = Series(date_range('20120102', periods=3))
+ expected = Series(s2 - s1)
# this fails as numpy returns timedelta64[us]
#result = np.abs(s1-s2)
- #assert_frame_equal(result,expected)
+ # assert_frame_equal(result,expected)
- result = (s1-s2).abs()
- assert_series_equal(result,expected)
+ result = (s1 - s2).abs()
+ assert_series_equal(result, expected)
# max/min
result = td.max()
- expected = Series([timedelta(2)],dtype='timedelta64[ns]')
- assert_series_equal(result,expected)
+ expected = Series([timedelta(2)], dtype='timedelta64[ns]')
+ assert_series_equal(result, expected)
result = td.min()
- expected = Series([timedelta(1)],dtype='timedelta64[ns]')
- assert_series_equal(result,expected)
-
+ expected = Series([timedelta(1)], dtype='timedelta64[ns]')
+ assert_series_equal(result, expected)
def test_sub_of_datetime_from_TimeSeries(self):
from pandas.core import common as com
from datetime import datetime
- a = Timestamp(datetime(1993,0o1,0o7,13,30,00))
+ a = Timestamp(datetime(1993, 0o1, 0o7, 13, 30, 00))
b = datetime(1993, 6, 22, 13, 30)
a = Series([a])
result = com._possibly_cast_to_timedelta(np.abs(a - b))
@@ -2044,7 +2238,7 @@ def test_sub_of_datetime_from_TimeSeries(self):
def test_timedelta64_nan(self):
from pandas import tslib
- td = Series([ timedelta(days=i) for i in range(10) ])
+ td = Series([timedelta(days=i) for i in range(10)])
# nan ops on timedeltas
td1 = td.copy()
@@ -2066,8 +2260,8 @@ def test_timedelta64_nan(self):
td1[2] = td[2]
self.assert_(isnull(td1[2]) == False)
- #### boolean setting
- #### this doesn't work, not sure numpy even supports it
+ # boolean setting
+ # this doesn't work, not sure numpy even supports it
#result = td[(td>np.timedelta64(timedelta(days=3))) & (td<np.timedelta64(timedelta(days=7)))] = np.nan
#self.assert_(isnull(result).sum() == 7)
@@ -2262,7 +2456,7 @@ def test_idxmin(self):
# datetime64[ns]
from pandas import date_range
- s = Series(date_range('20130102',periods=6))
+ s = Series(date_range('20130102', periods=6))
result = s.idxmin()
self.assert_(result == 0)
@@ -2292,7 +2486,7 @@ def test_idxmax(self):
self.assert_(isnull(allna.idxmax()))
from pandas import date_range
- s = Series(date_range('20130102',periods=6))
+ s = Series(date_range('20130102', periods=6))
result = s.idxmax()
self.assert_(result == 5)
@@ -2464,7 +2658,7 @@ def test_update(self):
df['c'] = np.nan
# this will fail as long as series is a sub-class of ndarray
- ##### df['c'].update(Series(['foo'],index=[0])) #####
+ # df['c'].update(Series(['foo'],index=[0])) #####
def test_corr(self):
_skip_if_no_scipy()
@@ -2579,7 +2773,7 @@ def test_dot(self):
index=['1', '2', '3'])
assert_series_equal(result, expected)
- #Check index alignment
+ # Check index alignment
b2 = b.reindex(index=reversed(b.index))
result = a.dot(b)
assert_series_equal(result, expected)
@@ -2589,7 +2783,7 @@ def test_dot(self):
self.assertTrue(np.all(result == expected.values))
assert_almost_equal(a.dot(b['2'].values), expected['2'])
- #Check series argument
+ # Check series argument
assert_almost_equal(a.dot(b['1']), expected['1'])
assert_almost_equal(a.dot(b2['1']), expected['1'])
@@ -2622,20 +2816,22 @@ def test_value_counts_nunique(self):
# GH 3002, datetime64[ns]
import pandas as pd
- f = StringIO("xxyyzz20100101PIE\nxxyyzz20100101GUM\nxxyyww20090101EGG\nfoofoo20080909PIE")
- df = pd.read_fwf(f, widths=[6,8,3], names=["person_id", "dt", "food"], parse_dates=["dt"])
+ f = StringIO(
+ "xxyyzz20100101PIE\nxxyyzz20100101GUM\nxxyyww20090101EGG\nfoofoo20080909PIE")
+ df = pd.read_fwf(f, widths=[6, 8, 3], names=[
+ "person_id", "dt", "food"], parse_dates=["dt"])
s = df.dt.copy()
result = s.value_counts()
self.assert_(result.index.dtype == 'datetime64[ns]')
# with NaT
- s = s.append(Series({ 4 : pd.NaT }))
+ s = s.append(Series({4: pd.NaT}))
result = s.value_counts()
self.assert_(result.index.dtype == 'datetime64[ns]')
# timedelta64[ns]
from datetime import timedelta
- td = df.dt-df.dt+timedelta(1)
+ td = df.dt - df.dt + timedelta(1)
result = td.value_counts()
#self.assert_(result.index.dtype == 'timedelta64[ns]')
self.assert_(result.index.dtype == 'int64')
@@ -3210,7 +3406,7 @@ def test_getitem_setitem_datetime_tz(self):
assert_series_equal(result, ts)
def test_getitem_setitem_periodindex(self):
- from pandas import period_range, Period
+ from pandas import period_range
N = 50
rng = period_range('1/1/1990', periods=N, freq='H')
ts = Series(np.random.randn(N), index=rng)
@@ -3319,9 +3515,9 @@ def test_cast_on_putmask(self):
# GH 2746
# need to upcast
- s = Series([1,2],index=[1,2],dtype='int64')
- s[[True, False]] = Series([0],index=[1],dtype='int64')
- expected = Series([0,2],index=[1,2],dtype='int64')
+ s = Series([1, 2], index=[1, 2], dtype='int64')
+ s[[True, False]] = Series([0], index=[1], dtype='int64')
+ expected = Series([0, 2], index=[1, 2], dtype='int64')
assert_series_equal(s, expected)
@@ -3441,7 +3637,7 @@ def test_apply(self):
tm.assert_series_equal(s, rs)
# index but no data
- s = Series(index=[1,2,3])
+ s = Series(index=[1, 2, 3])
rs = s.apply(lambda x: x)
tm.assert_series_equal(s, rs)
@@ -3467,66 +3663,78 @@ def test_apply_dont_convert_dtype(self):
def test_convert_objects(self):
- s = Series([1., 2, 3],index=['a','b','c'])
- result = s.convert_objects(convert_dates=False,convert_numeric=True)
+ s = Series([1., 2, 3], index=['a', 'b', 'c'])
+ result = s.convert_objects(convert_dates=False, convert_numeric=True)
assert_series_equal(result, s)
# force numeric conversion
r = s.copy().astype('O')
r['a'] = '1'
- result = r.convert_objects(convert_dates=False,convert_numeric=True)
+ result = r.convert_objects(convert_dates=False, convert_numeric=True)
assert_series_equal(result, s)
r = s.copy().astype('O')
r['a'] = '1.'
- result = r.convert_objects(convert_dates=False,convert_numeric=True)
+ result = r.convert_objects(convert_dates=False, convert_numeric=True)
assert_series_equal(result, s)
r = s.copy().astype('O')
r['a'] = 'garbled'
expected = s.copy()
expected['a'] = np.nan
- result = r.convert_objects(convert_dates=False,convert_numeric=True)
+ result = r.convert_objects(convert_dates=False, convert_numeric=True)
assert_series_equal(result, expected)
# GH 4119, not converting a mixed type (e.g.floats and object)
- s = Series([1, 'na', 3 ,4])
+ s = Series([1, 'na', 3, 4])
result = s.convert_objects(convert_numeric=True)
- expected = Series([1,np.nan,3,4])
+ expected = Series([1, np.nan, 3, 4])
assert_series_equal(result, expected)
- s = Series([1, '', 3 ,4])
+ s = Series([1, '', 3, 4])
result = s.convert_objects(convert_numeric=True)
- expected = Series([1,np.nan,3,4])
+ expected = Series([1, np.nan, 3, 4])
assert_series_equal(result, expected)
# dates
- s = Series([datetime(2001,1,1,0,0), datetime(2001,1,2,0,0), datetime(2001,1,3,0,0) ])
- s2 = Series([datetime(2001,1,1,0,0), datetime(2001,1,2,0,0), datetime(2001,1,3,0,0), 'foo', 1.0, 1, Timestamp('20010104'), '20010105'],dtype='O')
-
- result = s.convert_objects(convert_dates=True,convert_numeric=False)
- expected = Series([Timestamp('20010101'),Timestamp('20010102'),Timestamp('20010103')],dtype='M8[ns]')
+ s = Series(
+ [datetime(2001, 1, 1, 0, 0), datetime(2001, 1, 2, 0, 0), datetime(2001, 1, 3, 0, 0)])
+ s2 = Series([datetime(2001, 1, 1, 0, 0), datetime(2001, 1, 2, 0, 0), datetime(
+ 2001, 1, 3, 0, 0), 'foo', 1.0, 1, Timestamp('20010104'), '20010105'], dtype='O')
+
+ result = s.convert_objects(convert_dates=True, convert_numeric=False)
+ expected = Series(
+ [Timestamp('20010101'), Timestamp('20010102'), Timestamp('20010103')], dtype='M8[ns]')
assert_series_equal(result, expected)
- result = s.convert_objects(convert_dates='coerce',convert_numeric=False)
- result = s.convert_objects(convert_dates='coerce',convert_numeric=True)
+ result = s.convert_objects(
+ convert_dates='coerce', convert_numeric=False)
+ result = s.convert_objects(
+ convert_dates='coerce', convert_numeric=True)
assert_series_equal(result, expected)
- expected = Series([Timestamp('20010101'),Timestamp('20010102'),Timestamp('20010103'),lib.NaT,lib.NaT,lib.NaT,Timestamp('20010104'),Timestamp('20010105')],dtype='M8[ns]')
- result = s2.convert_objects(convert_dates='coerce',convert_numeric=False)
+ expected = Series(
+ [Timestamp(
+ '20010101'), Timestamp('20010102'), Timestamp('20010103'),
+ lib.NaT, lib.NaT, lib.NaT, Timestamp('20010104'), Timestamp('20010105')], dtype='M8[ns]')
+ result = s2.convert_objects(
+ convert_dates='coerce', convert_numeric=False)
assert_series_equal(result, expected)
- result = s2.convert_objects(convert_dates='coerce',convert_numeric=True)
+ result = s2.convert_objects(
+ convert_dates='coerce', convert_numeric=True)
assert_series_equal(result, expected)
# preserver all-nans (if convert_dates='coerce')
- s = Series(['foo','bar',1,1.0],dtype='O')
- result = s.convert_objects(convert_dates='coerce',convert_numeric=False)
- assert_series_equal(result,s)
+ s = Series(['foo', 'bar', 1, 1.0], dtype='O')
+ result = s.convert_objects(
+ convert_dates='coerce', convert_numeric=False)
+ assert_series_equal(result, s)
# preserver if non-object
- s = Series([1],dtype='float32')
- result = s.convert_objects(convert_dates='coerce',convert_numeric=False)
- assert_series_equal(result,s)
+ s = Series([1], dtype='float32')
+ result = s.convert_objects(
+ convert_dates='coerce', convert_numeric=False)
+ assert_series_equal(result, s)
#r = s.copy()
#r[0] = np.nan
@@ -3535,12 +3743,12 @@ def test_convert_objects(self):
# dateutil parses some single letters into today's value as a date
for x in 'abcdefghijklmnopqrstuvwxyz':
- s = Series([x])
- result = s.convert_objects(convert_dates='coerce')
- assert_series_equal(result,s)
- s = Series([x.upper()])
- result = s.convert_objects(convert_dates='coerce')
- assert_series_equal(result,s)
+ s = Series([x])
+ result = s.convert_objects(convert_dates='coerce')
+ assert_series_equal(result, s)
+ s = Series([x.upper()])
+ result = s.convert_objects(convert_dates='coerce')
+ assert_series_equal(result, s)
def test_apply_args(self):
s = Series(['foo,bar'])
@@ -3824,7 +4032,8 @@ def test_rename(self):
self.assert_(np.array_equal(renamed.index, ['a', 'foo', 'c', 'bar']))
# index with name
- renamer = Series(np.arange(4), index=Index(['a', 'b', 'c', 'd'], name='name'))
+ renamer = Series(
+ np.arange(4), index=Index(['a', 'b', 'c', 'd'], name='name'))
renamed = renamer.rename({})
self.assertEqual(renamed.index.name, renamer.index.name)
@@ -3971,7 +4180,6 @@ def cummin(x):
def cummax(x):
return np.maximum.accumulate(x)
- from itertools import product
a = pd.Series([False, False, False, True, True, False, False])
b = ~a
c = pd.Series([False] * len(b))
@@ -3996,7 +4204,6 @@ def cummax(x):
res = getattr(e, method)()
assert_series_equal(res, expecteds[method])
-
def test_replace(self):
N = 100
ser = Series(np.random.randn(N))
@@ -4136,7 +4343,8 @@ def test_interpolate_index_values(self):
expected = s.copy()
bad = isnull(expected.values)
good = -bad
- expected = Series(np.interp(vals[bad], vals[good], s.values[good]), index=s.index[bad])
+ expected = Series(
+ np.interp(vals[bad], vals[good], s.values[good]), index=s.index[bad])
assert_series_equal(result[bad], expected)
@@ -4167,13 +4375,13 @@ def test_diff(self):
assert_series_equal(rs, xp)
# datetime diff (GH3100)
- s = Series(date_range('20130102',periods=5))
- rs = s-s.shift(1)
+ s = Series(date_range('20130102', periods=5))
+ rs = s - s.shift(1)
xp = s.diff()
assert_series_equal(rs, xp)
# timedelta diff
- nrs = rs-rs.shift(1)
+ nrs = rs - rs.shift(1)
nxp = xp.diff()
assert_series_equal(nrs, nxp)
View
10 pandas/tseries/index.py
@@ -6,7 +6,8 @@
import numpy as np
-from pandas.core.common import isnull, _NS_DTYPE, _INT64_DTYPE
+from pandas.core.common import (isnull, _NS_DTYPE, _INT64_DTYPE,
+ is_list_like,_possibly_cast_to_timedelta)
from pandas.core.index import Index, Int64Index
import pandas.compat as compat
from pandas.compat import u
@@ -541,7 +542,7 @@ def __add__(self, other):
elif isinstance(other, (DateOffset, timedelta)):
return self._add_delta(other)
elif isinstance(other, np.timedelta64):
- raise NotImplementedError
+ return self._add_delta(other)
elif com.is_integer(other):
return self.shift(other)
else: # pragma: no cover
@@ -553,7 +554,7 @@ def __sub__(self, other):
elif isinstance(other, (DateOffset, timedelta)):
return self._add_delta(-other)
elif isinstance(other, np.timedelta64):
- raise NotImplementedError
+ return self._add_delta(-other)
elif com.is_integer(other):
return self.shift(-other)
else: # pragma: no cover
@@ -568,6 +569,9 @@ def _add_delta(self, delta):
utc = _utc()
if self.tz is not None and self.tz is not utc:
result = result.tz_convert(self.tz)
+ elif isinstance(delta, np.timedelta64):
+ new_values = self.to_series() + delta
+ result = DatetimeIndex(new_values, tz=self.tz, freq='infer')
else:
new_values = self.astype('O') + delta
result = DatetimeIndex(new_values, tz=self.tz, freq='infer')
View
10 pandas/tseries/tests/test_timeseries.py
@@ -2296,6 +2296,16 @@ def test_timedelta(self):
expected = index + timedelta(-1)
self.assert_(result.equals(expected))
+ # GH4134, buggy with timedeltas
+ rng = date_range('2013', '2014')
+ s = Series(rng)
+ result1 = rng - pd.offsets.Hour(1)
+ result2 = DatetimeIndex(s - np.timedelta64(100000000))
+ result3 = rng - np.timedelta64(100000000)
+ result4 = DatetimeIndex(s - pd.offsets.Hour(1))
+ self.assert_(result1.equals(result4))
+ self.assert_(result2.equals(result3))
+
def test_shift(self):
ts = Series(np.random.randn(5),
index=date_range('1/1/2000', periods=5, freq='H'))
View
2 pandas/tslib.pyx
@@ -45,6 +45,8 @@ PyDateTime_IMPORT
cdef int64_t NPY_NAT = util.get_nat()
+# < numpy 1.7 compat for NaT
+compat_NaT = np.array([NPY_NAT]).astype('m8[ns]').item()
try:
basestring
Something went wrong with that request. Please try again.