Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timestamp.to_period() is off-by-one for weekly frequencies #20818

Open
masnick opened this issue Apr 25, 2018 · 4 comments
Open

Timestamp.to_period() is off-by-one for weekly frequencies #20818

masnick opened this issue Apr 25, 2018 · 4 comments
Labels
Bug Period Period data type

Comments

@masnick
Copy link

masnick commented Apr 25, 2018

Code Sample

import datetime as dt
import pandas as pd

pd.to_datetime(dt.datetime(2018,1,2)).to_period('W-MON')
# Period('2018-01-02/2018-01-08', 'W-MON')

dt.datetime(2018,1,2).strftime('%a')
# 'Tue'

Problem description

I expect W-MON to produce a Period that starts on Monday. But in my example, the beginning of the Period is 2018-01-02, which is a Tuesday.

Note that pandas.date_range() does produce the expected output:

pd.date_range(start=dt.datetime(2017,12,1), end=dt.datetime(2018,2,1), freq='W-MON')
# DatetimeIndex(['2017-12-04', '2017-12-11', '2017-12-18', '2017-12-25',
#                '2018-01-01', '2018-01-08', '2018-01-15', '2018-01-22',
#                '2018-01-29'],
#               dtype='datetime64[ns]', freq='W-MON')

Expected Output

Period('2018-01-01/2018-01-07', 'W-MON')

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.14.final.0 python-bits: 64 OS: Darwin OS-release: 17.4.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 39.0.1
Cython: 0.26
numpy: 1.14.2
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 5.5.0
sphinx: None
patsy: 0.4.1
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.1
openpyxl: None
xlrd: None
xlwt: 1.3.0
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: 1.2.0
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@WillAyd
Copy link
Member

WillAyd commented Apr 25, 2018

Hmm interesting. I think the issue is somewhere in the Period constructor (which to_period essentially uses)

In [28]: from pandas import Period

In [29]: Period(pd.to_datetime('1/2/2018'), freq='W-MON')
Out[29]: Period('2018-01-02/2018-01-08', 'W-MON')

In [30]: Period(pd.to_datetime('1/2/2018'), freq='W-SUN')
Out[30]: Period('2018-01-01/2018-01-07', 'W-SUN')

I didn't see a ton of test coverage for this frequency so it's possible something slipped through. cc @jorisvandenbossche for any input

@masnick
Copy link
Author

masnick commented Apr 26, 2018

In case it helps, here's another example of W-MON not working properly:

In [1]: df = pd.DataFrame({'rows': 1}, index=pd.date_range(start='2018-01-01', end='2018-01-07', freq='D'))

In [2]: df
Out[2]: 
            rows
2018-01-01     1
2018-01-02     1
2018-01-03     1
2018-01-04     1
2018-01-05     1
2018-01-06     1
2018-01-07     1

In [3]: df.resample('W-MON').sum()
Out[3]: 
            rows
2018-01-01     1
2018-01-08     6

Edit:

This will make the above behave as expected:

In [47]: df.resample('W-MON', closed='left').sum()
Out[47]: 
            rows
2018-01-08     7

I figured this out by looking at the source for resample().

@jstritar
Copy link

I believe we hit this same bug:

df = pd.DataFrame({
    'date': pd.to_datetime([
        '2017-12-31', '2018-01-01', '2018-01-02', '2018-01-09']),
    'signal': [1, 2, 3, 4]
})
df.resample('W-MON', on='date', label='left').sum()

Result:

	signal
date	
2017-12-25	3
2018-01-01	3
2018-01-08	4

Expected:

	signal
date	
2017-12-25	1
2018-01-01	5
2018-01-08	4

@jbrockmendel jbrockmendel added the Period Period data type label Jul 30, 2018
@mroeschke mroeschke added the Bug label May 11, 2020
@miccoli
Copy link
Contributor

miccoli commented Apr 14, 2022

Yesterday I was a happy man, because I learned about to_period("W-MON") and today I'm unhappy because I realized that it doesn't work as expected.

BUT: it seems that this is indeed expetected behaviour: see this snippet:

"W-SUN": PeriodDtypeCode.W_SUN, # Weekly - Sunday end of week
"W-MON": PeriodDtypeCode.W_MON, # Weekly - Monday end of week
"W-TUE": PeriodDtypeCode.W_TUE, # Weekly - Tuesday end of week
"W-WED": PeriodDtypeCode.W_WED, # Weekly - Wednesday end of week
"W-THU": PeriodDtypeCode.W_THU, # Weekly - Thursday end of week
"W-FRI": PeriodDtypeCode.W_FRI, # Weekly - Friday end of week
"W-SAT": PeriodDtypeCode.W_SAT, # Weekly - Saturday end of week

W-MON is "Weekly - Monday end of week", so this is not a bug but a documentation problem.

To have monday start of week one has to specify "W-SUN", i.e weekly sunday end of week, i.e. "W"

>>> pd.to_datetime("2018-01-01").to_period("W")
Period('2018-01-01/2018-01-07', 'W-SUN')
>>> pd.to_datetime("2018-01-01").to_period("W-SUN")
Period('2018-01-01/2018-01-07', 'W-SUN')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Period Period data type
Projects
None yet
Development

No branches or pull requests

6 participants