Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DatetimeIndex: 'freq' inconsistencies #19172

Closed
luca-s opened this issue Jan 10, 2018 · 4 comments
Closed

DatetimeIndex: 'freq' inconsistencies #19172

luca-s opened this issue Jan 10, 2018 · 4 comments
Labels

Comments

@luca-s
Copy link
Contributor

luca-s commented Jan 10, 2018

Code Sample, a copy-pastable example if possible

import pandas as pd
from pandas.tseries.offsets import *

rng = pd.date_range('2012-01-05', '2012-01-10', freq=BDay())
index = pd.DatetimeIndex(rng)
print(index)

index2 = index + pd.Timedelta('1D')
print(index2)

Problem description

The line index + pd.Timedelta('1D') adds 1 day to a DatetimeIndex that has freq='B' but the returned index contains non-business days:

DatetimeIndex(['2012-01-06', '2012-01-07', '2012-01-10', '2012-01-11'], dtype='datetime64[ns]', freq='B')

Please note that the freq property is B.

This behavior is very inconsistent:

  • Why isn't freq considered when performing computation (+/- Timedelta) on the DatetimeIndex?
  • Why freq doesn't reflect the actual data contained in the DatetimeIndex? ( it says 'B' even though it contains non-business days)

Expected Output

DatetimeIndex(['2012-01-05', '2012-01-06', '2012-01-09', '2012-01-10'], dtype='datetime64[ns]', freq='B')
DatetimeIndex(['2012-01-06', '2012-01-09', '2012-01-10', '2012-01-12'], dtype='datetime64[ns]', freq='B')

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-26-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: 3.0.6
pip: 9.0.1
setuptools: 38.2.4
Cython: 0.23.4
numpy: 1.14.0
scipy: 0.18.1
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.5
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 1.0b10
sqlalchemy: 1.1.5
pymysql: None
psycopg2: 2.7.1 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.5.0

@chris-b1
Copy link
Contributor

The freq attribute is meant to be purely descriptive, so it doesn't and shouldn't impact calculations. Potentially docs could be clearer.

The 2nd one would be a bug, but when I run you code (Win 64, py36, pandas 0.22) I get a repr without a freq.

In [183]: index2 = index + pd.Timedelta('1D')

In [184]: index2
Out[184]: DatetimeIndex(['2012-01-06', '2012-01-07', '2012-01-10', '2012-01-11'], dtype='datetime64[ns]', freq=None)

In [185]: index2.freq

@chris-b1
Copy link
Contributor

And you may already know this, but if you want to date math confirming to a frequency, you add the actual offsets themselves:

In [187]: index + pd.offsets.BDay()
Out[187]: DatetimeIndex(['2012-01-06', '2012-01-09', '2012-01-10', '2012-01-11'], dtype='datetime64[ns]', freq='B')

@luca-s
Copy link
Contributor Author

luca-s commented Jan 10, 2018

@chris-b1 thank you for your help. I guess the only issue is the documentation then. It's hard to grasp what freq is about, what is the scope of its usage.

Thank you also for the suggested solution but unfortunately it doesn't solve the general problem of adding a Timedelta to an index. What if you want to add Timedelta('1D4h5m') to a DatetimeIndex that has a specific frequency? I don't believe there is a solution then, but this is out of topic here.

Again,thank you very much for your reply.

@jreback
Copy link
Contributor

jreback commented Jan 11, 2018

@luca-s you simply want to use an offset; it does not have to be the same freq as an index. a Timedelta is a fixed-offset object, IOW it simply adds/subtracts the actual datetimes, it is equivalent of a 'D' freq.

In [7]: index = pd.date_range('2012-01-05', '2012-01-10', freq='B')

In [8]: index
Out[8]: DatetimeIndex(['2012-01-05', '2012-01-06', '2012-01-09', '2012-01-10'], dtype='datetime64[ns]', freq='B')

In [9]: index + pd.offsets.BDay(1)
Out[9]: DatetimeIndex(['2012-01-06', '2012-01-09', '2012-01-10', '2012-01-11'], dtype='datetime64[ns]', freq='B')

@jreback jreback closed this as completed Jan 11, 2018
@jreback jreback added this to the No action milestone Jan 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants