Vectorised addition of MonthOffset(n=0) returns different values to item-by-item addition #11370

Closed
rekcahpassyla opened this Issue Oct 19, 2015 · 4 comments

Comments

Projects
None yet
3 participants
Contributor

rekcahpassyla commented Oct 19, 2015

This code returns different values in 0.17.0 and 0.15.2

import pandas as pd
from pandas.util.testing import assert_index_equal

pd.show_versions()

offsets = [
    pd.offsets.Day, pd.offsets.MonthBegin,
    pd.offsets.QuarterBegin, pd.offsets.YearBegin,
]

dates = pd.date_range('2011-01-01', '2011-01-05', freq='D')

for offset in offsets:
    # adding each item individually or vectorised should give same answer
    expected_vec = dates + offset(n=0)
    expected = pd.DatetimeIndex([d + offset(n=0) for d in dates])

    msg = "offset: {}, vectorised: {}, individual: {}".format(
        offset, expected_vec, expected
    )
    try:
        if pd.__version__ == '0.17.0':
            assert_index_equal(expected_vec, expected, check_names=False)
        else:
            assert_index_equal(expected_vec, expected)
    except AssertionError as er:
        raise Exception(msg + str(er))

0.17.0

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.0
nose: 1.3.7
pip: 7.1.0
setuptools: 18.0.1
Cython: 0.22
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 3.2.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.1
pytz: 2015.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.4.3
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: 0.7.3
lxml: None
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.7
pymysql: None
psycopg2: None
Traceback (most recent call last):
  File "c:\dev\code\sandbox\pandas_17_vs_15_dateoffsets.py", line 24, in <module>
    raise Exception(msg + str(er))
Exception: offset: <class 'pandas.tseries.offsets.MonthBegin'>, vectorised: DatetimeIndex(['2010-12-01', '2011-01-01', '2011-01-01', '2011-01-01',
               '2011-01-01'],
              dtype='datetime64[ns]', freq=None), individual: DatetimeIndex(['2011-01-01', '2011-02-01', '2011-02-01', '2011-02-01',
               '2011-02-01'],
              dtype='datetime64[ns]', freq=None)Index are different

Index values are different (100.0 %)
[left]:  DatetimeIndex(['2010-12-01', '2011-01-01', '2011-01-01', '2011-01-01',
               '2011-01-01'],
              dtype='datetime64[ns]', freq=None)
[right]: DatetimeIndex(['2011-01-01', '2011-02-01', '2011-02-01', '2011-02-01',
               '2011-02-01'],
              dtype='datetime64[ns]', freq=None)

0.15.2

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_GB

pandas: 0.15.2
nose: 1.3.7
Cython: 0.22
numpy: 1.9.2
scipy: 0.15.1
statsmodels: None
IPython: 3.2.1
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.1
pytz: 2015.4
bottleneck: 1.0.0
tables: 3.2.0
numexpr: 2.4.3
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.4
xlwt: 0.7.5
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 1.0.7
pymysql: None
psycopg2: None

rekcahpassyla changed the title from Vectorised addition of `MonthOffset` returns different values to item-by-item addition to Vectorised addition of `MonthOffset(n=0)` returns different values to item-by-item addition Oct 19, 2015

Contributor

chris-b1 commented Oct 19, 2015

This is from #10744, I didn't have the n=0 semantics right (and apparently didn't test!). It'll be a couple days, but I'll submit a fix.

jreback added this to the 0.17.1 milestone Oct 19, 2015

Contributor

rekcahpassyla commented Oct 19, 2015

Many thanks for quick response!

Contributor

rekcahpassyla commented Oct 19, 2015

MonthEnd also not working:

Test script

import pandas as pd
from pandas.util.testing import assert_index_equal

pd.show_versions()

offsets = [
    pd.offsets.MonthEnd,
    pd.offsets.QuarterEnd, pd.offsets.YearEnd,
]


dates = pd.date_range('2011-01-01', '2011-01-05', freq='D')

for offset in offsets:
    # adding each item individually or vectorised should give same answer
    expected_vec = dates + offset(n=0)
    expected = pd.DatetimeIndex([d + offset(n=0) for d in dates])

    msg = "offset: {}, vectorised: {}, individual: {}".format(
        offset, expected_vec, expected
    )
    try:
        if pd.__version__ == '0.17.0':
            assert_index_equal(expected_vec, expected, check_names=False)
        else:
            assert_index_equal(expected_vec, expected)
    except AssertionError as er:
        raise Exception(msg + str(er))

0.17.0

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.0
nose: 1.3.7
pip: 7.1.0
setuptools: 18.0.1
Cython: 0.22
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 3.2.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.1
pytz: 2015.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.4.3
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: 0.7.3
lxml: None
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.7
pymysql: None
psycopg2: None
Traceback (most recent call last):
  File "c:\dev\code\sandbox\pandas_17_vs_15_dateoffsets.py", line 34, in <module>
    raise Exception(msg + str(er))
Exception: offset: <class 'pandas.tseries.offsets.MonthEnd'>, vectorised: DatetimeIndex(['2010-12-31', '2010-12-31', '2010-12-31', '2010-12-31',
               '2010-12-31'],
              dtype='datetime64[ns]', freq=None), individual: DatetimeIndex(['2011-01-31', '2011-01-31', '2011-01-31', '2011-01-31',
               '2011-01-31'],
              dtype='datetime64[ns]', freq=None)Index are different

Index values are different (100.0 %)
[left]:  DatetimeIndex(['2010-12-31', '2010-12-31', '2010-12-31', '2010-12-31',
               '2010-12-31'],
              dtype='datetime64[ns]', freq=None)
[right]: DatetimeIndex(['2011-01-31', '2011-01-31', '2011-01-31', '2011-01-31',
               '2011-01-31'],
              dtype='datetime64[ns]', freq=None)

0.15.2

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_GB

pandas: 0.15.2
nose: 1.3.7
Cython: 0.22
numpy: 1.9.2
scipy: 0.15.1
statsmodels: None
IPython: 3.2.1
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.1
pytz: 2015.4
bottleneck: 1.0.0
tables: 3.2.0
numexpr: 2.4.3
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.4
xlwt: 0.7.5
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 1.0.7
pymysql: None
psycopg2: None
Contributor

chris-b1 commented Oct 19, 2015

Probably also wrong for YearEnd and QuarterEnd too as the counting
logic is shared IIRC.

On Mon, Oct 19, 2015 at 10:31 AM, Petra Chong notifications@github.com
wrote:

MonthEnd also not working:

Test script

import pandas as pdfrom pandas.util.testing import assert_index_equal

pd.show_versions()

offsets = [
pd.offsets.MonthEnd,
pd.offsets.QuarterEnd, pd.offsets.YearEnd,
]

dates = pd.date_range('2011-01-01', '2011-01-05', freq='D')
for offset in offsets:
# adding each item individually or vectorised should give same answer
expected_vec = dates + offset(n=0)
expected = pd.DatetimeIndex([d + offset(n=0) for d in dates])

msg = "offset: {}, vectorised: {}, individual: {}".format(
    offset, expected_vec, expected
)
try:
    if pd.__version__ == '0.17.0':
        assert_index_equal(expected_vec, expected, check_names=False)
    else:
        assert_index_equal(expected_vec, expected)
except AssertionError as er:
    raise Exception(msg + str(er))

0.17.0

INSTALLED VERSIONS------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.0
nose: 1.3.7
pip: 7.1.0
setuptools: 18.0.1
Cython: 0.22
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 3.2.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.1
pytz: 2015.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.4.3
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: 0.7.3
lxml: None
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.7
pymysql: None
psycopg2: None
Traceback (most recent call last):
File "c:\dev\code\sandbox\pandas_17_vs_15_dateoffsets.py", line 34, in
raise Exception(msg + str(er))Exception: offset: <class 'pandas.tseries.offsets.MonthEnd'>, vectorised: DatetimeIndex(['2010-12-31', '2010-12-31', '2010-12-31', '2010-12-31',
'2010-12-31'],
dtype='datetime64[ns]', freq=None), individual: DatetimeIndex(['2011-01-31', '2011-01-31', '2011-01-31', '2011-01-31',
'2011-01-31'],
dtype='datetime64[ns]', freq=None)Index are different

Index values are different (100.0 %)
[left]: DatetimeIndex(['2010-12-31', '2010-12-31', '2010-12-31', '2010-12-31',
'2010-12-31'],
dtype='datetime64[ns]', freq=None)
[right]: DatetimeIndex(['2011-01-31', '2011-01-31', '2011-01-31', '2011-01-31',
'2011-01-31'],
dtype='datetime64[ns]', freq=None)

0.15.2

INSTALLED VERSIONS------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_GB

pandas: 0.15.2
nose: 1.3.7
Cython: 0.22
numpy: 1.9.2
scipy: 0.15.1
statsmodels: None
IPython: 3.2.1
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.1
pytz: 2015.4
bottleneck: 1.0.0
tables: 3.2.0
numexpr: 2.4.3
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.4
xlwt: 0.7.5
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 1.0.7
pymysql: None
psycopg2: None


Reply to this email directly or view it on GitHub
pydata#11370 (comment).

jreback changed the title from Vectorised addition of `MonthOffset(n=0)` returns different values to item-by-item addition to Vectorised addition of MonthOffset(n=0) returns different values to item-by-item addition Oct 19, 2015

@jreback jreback modified the milestone: Next Major Release, 0.17.1 Nov 13, 2015

@jreback jreback modified the milestone: 0.18.0, Next Major Release Dec 1, 2015

jreback closed this in #11427 Dec 13, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment