Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas date_range error when the date falls in a DST starting day. #30378

Open
dfonnegra opened this issue Dec 20, 2019 · 6 comments
Open

Pandas date_range error when the date falls in a DST starting day. #30378

dfonnegra opened this issue Dec 20, 2019 · 6 comments
Labels

Comments

@dfonnegra
Copy link

dfonnegra commented Dec 20, 2019

Code Sample

import pandas as pd
pd.date_range(start=pd.Timestamp('2016-10-05 00:00:00'), end=pd.Timestamp('2016-10-30 00:00:00'), freq='1D', tz='America/Sao_Paulo')

Problem description

[When I try to generate a range of daily data with America/Sao_Paulo timezone, it breaks with NonExistentTimeError: 2016-10-16 00:00:00. My pandas version is 0.25.3]

@dfonnegra
Copy link
Author

I iterated through all timezones and got that this list of timezones fail too:

America/Asuncion
America/Campo_Grande
America/Cuiaba
America/Havana
America/Punta_Arenas
America/Santiago
America/Sao_Paulo
America/Scoresbysund
Antarctica/Casey
Antarctica/Palmer
Asia/Amman
Asia/Beirut
Asia/Damascus
Asia/Gaza
Asia/Hebron
Asia/Tehran
Atlantic/Azores
Brazil/East
Chile/Continental
Cuba
Iran

@TomAugspurger
Copy link
Contributor

What's the expected behavior here? If the user is trying to create an invalid datetime, shouldn't we raise?

@dfonnegra
Copy link
Author

The same behaviour used when this exact same bug happened with the resample method in the 0.23.x version.
Check this in the 0.24.0 changelong

  • Bug in DataFrame.resample() and Series.resample() where an AmbiguousTimeError or NonExistentTimeError would raise if a timezone aware timeseries ended on a DST transition (GH19375, GH10117)

@TomAugspurger
Copy link
Contributor

What was the resolution there? Can you post the actual expected output in the original post?

@mroeschke
Copy link
Member

In the resample case, an assumption was made on the behavior of ambiguous and nonexistent timezones since the resulting index is a transformation from the original index (i.e the binning).

In the date_range case, I agree with Tom's assessment #30378 (comment) in which the idiomatic alternative is:

pd.date_range(start=pd.Timestamp('2016-10-05 00:00:00'), end=pd.Timestamp('2016-10-30 00:00:00'), freq='1D').tz_localize('America/Sao_Paulo', nonexistent=...)

I think ultimately the tz parameter should be deprecated from date_range

@kdebrab
Copy link
Contributor

kdebrab commented Apr 26, 2024

In the resample case, an assumption was made on the behavior of ambiguous and nonexistent timezones since the resulting index is a transformation from the original index (i.e the binning).

In the date_range case, I agree with Tom's assessment #30378 (comment) in which the idiomatic alternative is:

pd.date_range(start=pd.Timestamp('2016-10-05 00:00:00'), end=pd.Timestamp('2016-10-30 00:00:00'), freq='1D').tz_localize('America/Sao_Paulo', nonexistent=...)

I think ultimately the tz parameter should be deprecated from date_range

I don't think that deprecating the tz parameter is the right solution, as one gets exactly the same error without using the tz parameter, but when start and end themselves are timezone-aware:

start = pd.Timestamp("2016-10-05").tz_localize("America/Sao_Paulo")
end = pd.Timestamp("2016-10-30").tz_localize("America/Sao_Paulo")
pd.date_range(start=start, end=end, freq="D")

What's the expected behavior here? If the user is trying to create an invalid datetime, shouldn't we raise?

I think date_range should make the same assumptions as were made in resample, such that the above returns the same as:

pd.Series(1, index=pd.date_range(start, end, freq="h")).resample("D").sum().index

which returns:

DatetimeIndex(['2016-10-05 00:00:00-03:00', '2016-10-06 00:00:00-03:00',
               '2016-10-07 00:00:00-03:00', '2016-10-08 00:00:00-03:00',
               '2016-10-09 00:00:00-03:00', '2016-10-10 00:00:00-03:00',
               '2016-10-11 00:00:00-03:00', '2016-10-12 00:00:00-03:00',
               '2016-10-13 00:00:00-03:00', '2016-10-14 00:00:00-03:00',
               '2016-10-15 00:00:00-03:00', '2016-10-16 01:00:00-02:00',
               '2016-10-17 00:00:00-02:00', '2016-10-18 00:00:00-02:00',
               '2016-10-19 00:00:00-02:00', '2016-10-20 00:00:00-02:00',
               '2016-10-21 00:00:00-02:00', '2016-10-22 00:00:00-02:00',
               '2016-10-23 00:00:00-02:00', '2016-10-24 00:00:00-02:00',
               '2016-10-25 00:00:00-02:00', '2016-10-26 00:00:00-02:00',
               '2016-10-27 00:00:00-02:00', '2016-10-28 00:00:00-02:00',
               '2016-10-29 00:00:00-02:00', '2016-10-30 00:00:00-02:00'],
              dtype='datetime64[ns, America/Sao_Paulo]', freq='D')

Notice that the index for 2016-10-16 is shifted by 1 hour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants