Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: date_range(freq='Y') is missing a date #56134

Open
3 tasks done
epizut opened this issue Nov 23, 2023 · 4 comments
Open
3 tasks done

BUG: date_range(freq='Y') is missing a date #56134

epizut opened this issue Nov 23, 2023 · 4 comments
Assignees
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@epizut
Copy link

epizut commented Nov 23, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
pd.date_range(start=pd.to_datetime('2021-12-31 00:00:01'), end=pd.to_datetime('2023-11-01 00:00:00'), freq='Y')

>> DatetimeIndex(['2021-12-31 00:00:01'], dtype='datetime64[ns]', freq='A-DEC')



### Issue Description

This Year-End (freq='Y') date range is skipping one date (missing '2022-12-31 00:00:01').
When using start='2021-12-31' the behavior is nominal, the bug only appears when using a non-round datetime.

### Expected Behavior

`DatetimeIndex(['2021-12-31 00:00:01', '2022-12-31 00:00:01'], dtype='datetime64[ns]', freq='A-DEC')`

### Installed Versions

<details>
INSTALLED VERSIONS
------------------
commit              : 2a953cf80b77e4348bf50ed724f8abc0d814d9dd
python              : 3.9.12.final.0
python-bits         : 64
OS                  : Windows
OS-release          : 10
Version             : 10.0.19045
machine             : AMD64
processor           : Intel64 Family 6 Model 106 Stepping 6, GenuineIntel
byteorder           : little
LC_ALL              : None
LANG                : None
LOCALE              : English_United States.1252

pandas              : 2.1.3
numpy               : 1.26.2
pytz                : 2023.3.post1
dateutil            : 2.8.2
setuptools          : 68.0.0
pip                 : 23.2.1
Cython              : None
pytest              : None
hypothesis          : None
sphinx              : None
blosc               : None
feather             : None
xlsxwriter          : None
lxml.etree          : None
html5lib            : None
pymysql             : None
psycopg2            : None
jinja2              : 3.1.2
IPython             : 8.17.2
pandas_datareader   : None
bs4                 : 4.12.2
bottleneck          : None
dataframe-api-compat: None
fastparquet         : None
fsspec              : None
gcsfs               : None
matplotlib          : None
numba               : None
numexpr             : None
odfpy               : None
openpyxl            : None
pandas_gbq          : None
pyarrow             : None
pyreadstat          : None
pyxlsb              : None
s3fs                : None
scipy               : None
sqlalchemy          : None
tables              : None
tabulate            : None
xarray              : None
xlrd                : None
zstandard           : None
tzdata              : 2023.3
qtpy                : None
pyqt5               : None
</details>
@epizut epizut added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 23, 2023
@jrmylow
Copy link
Contributor

jrmylow commented Nov 24, 2023

Saving this write-up for now, hoping to revisit with a fix soon

The issue arises from _generate_range in pandas/core/arrays/datetimes.py. The summary of the cause is:

  • The start timestamp is maintained as expected
  • The end timestamp is adjusted to Timestamp('2022-12-31 00:00:00')
  • The start timestamp is yielded, creating the first value 2021-12-31 00:00:01
  • When start is incremented by offset (which is A-DEC or Y-DEC depending on version) it moves to 2022-12-31 00:00:01
  • As this exceeds end it is not yielded

For specific code locations:
The following check preserves the start of Timestamp('2021-12-31 00:00:01')

if start and not offset.is_on_offset(start):

However the following is False and so the end is changed to Timestamp('2022-12-31 00:00:00')

elif end and not offset.is_on_offset(end):

@jrmylow
Copy link
Contributor

jrmylow commented Nov 24, 2023

A potential fix: only ensure the start internal variable is set to align with the offset period.

@jrmylow
Copy link
Contributor

jrmylow commented Nov 24, 2023

take

jrmylow added a commit to jrmylow/pandas that referenced this issue Nov 24, 2023
jrmylow added a commit to jrmylow/pandas that referenced this issue Nov 24, 2023
jrmylow added a commit to jrmylow/pandas that referenced this issue Jan 11, 2024
@tqa236
Copy link
Contributor

tqa236 commented Mar 9, 2024

Hello, should this issue be closed by #56831 already?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants