Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: to_period behaves different between Timestamp and DatetimeIndex with timezones #22905

Closed
miccoli opened this issue Sep 30, 2018 · 7 comments

Comments

Projects
None yet
4 participants
@miccoli
Copy link
Contributor

commented Sep 30, 2018

After 935c5d1 pandas/tests/indexes/datetimes/test_astype.py fails depending on the timezone:
in fact

TZ=Etc/GMT+1 pytest pandas/tests/indexes/datetimes/test_astype.py -k TestToPeriod

is OK while

TZ=Etc/GMT-1 pytest pandas/tests/indexes/datetimes/test_astype.py -k TestToPeriod

fails.

This can be nailed down to the following code sample.

Code Sample

>>> ts = pd.date_range('1/1/2000', '2/1/2000', tz='Etc/GMT-1')
>>> ts.to_period()[0] == ts[0].to_period()
False
>>> ts.to_period()[0], ts[0].to_period()
(Period('1999-12-31', 'D'), Period('2000-01-01', 'D'))
>>> ts = pd.date_range('1/1/2000', '2/1/2000', tz='Etc/GMT+1')
>>> ts.to_period()[0] == ts[0].to_period()
True
>>> ts.to_period()[0], ts[0].to_period()
(Period('2000-01-01', 'D'), Period('2000-01-01', 'D'))

Problem description

It seems that ts.to_period()[0] == ts[0].to_period() depends on tzinfo, true if west of GMT, false if east of GMT. Before 935c5d1 ts.to_period()[0] == ts[0].to_period() is always True irrespective of tzinfo.

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: 935c5d1453938a284805cb7d3c0c3255390997a7
python: 3.7.0.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.0.dev0+555.g935c5d145
pytest: 3.7.4
pip: 18.0
setuptools: 39.0.1
Cython: 0.28.5
numpy: 1.15.1
scipy: 1.1.0
pyarrow: None
xarray: 0.10.9
IPython: 6.5.0
sphinx: 1.7.9
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: 1.5.1
bottleneck: 1.2.1
tables: None
numexpr: 2.6.8
feather: None
matplotlib: 3.0.0
openpyxl: 2.5.5
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.1
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.12
pymysql: 0.9.2
psycopg2: None
jinja2: 2.10
s3fs: 0.1.6
fastparquet: 0.1.6
pandas_gbq: None
pandas_datareader: None
gcsfs: 0.1.2
@brute4s99

This comment has been minimized.

Copy link
Contributor

commented Sep 30, 2018

looking into this

@brute4s99

This comment has been minimized.

Copy link
Contributor

commented Sep 30, 2018

@miccoli , please read more about pandas.date_range() here
As you can see,
since you did not specify any 'clock time' , the default value will be taken as 00:00.
From documentation ,

"tz : str or tzinfo, optional
Time zone name for returning localized DatetimeIndex, ..."

putting tz=GMT-1 will cause backwards shift by 1 hour.
To understand more clearly, see this :-
>>> ts = pd.date_range('1/1/2000 1:00', '2/1/2000 1:00', tz='Etc/GMT-1')
^ Added "clock time" as 1:00 AM, since this will be used to compensate for GMT-1, because outputs will show LOCALIZED time
>>> ts.to_period()[0] == ts[0].to_period()
True
>>> ts.to_period()[0], ts[0].to_period()
(Period('2000-01-01', 'D'), Period('2000-01-01', 'D'))

@miccoli

This comment has been minimized.

Copy link
Contributor Author

commented Sep 30, 2018

@brute4s99

@pytest.mark.parametrize('tz', [
'US/Eastern', pytz.utc, tzlocal(), 'dateutil/US/Eastern',
dateutil.tz.tzutc()])
def test_to_period_tz(self, tz):
ts = date_range('1/1/2000', '2/1/2000', tz=tz)
with tm.assert_produces_warning(UserWarning):
# GH#21333 warning that timezone info will be lost
result = ts.to_period()[0]
expected = ts[0].to_period()
assert result == expected
expected = date_range('1/1/2000', '2/1/2000').to_period()
with tm.assert_produces_warning(UserWarning):
# GH#21333 warning that timezone info will be lost
result = ts.to_period()
tm.assert_index_equal(result, expected)

I just reproduced the situation in the test above, which fails east of greenwich.

Let me explain this better:

>>> ts = pd.date_range('1/1/2000', '2/1/2000', tz='Etc/GMT-1')
>>>
>>> ts[0]
Timestamp('2000-01-01 00:00:00+0100', tz='Etc/GMT-1', freq='D')
>>> ts[0].to_period()
Period('2000-01-01', 'D')
>>> 
>>> ts[:1]
DatetimeIndex(['2000-01-01 00:00:00+01:00'], dtype='datetime64[ns, Etc/GMT-1]', freq='D')
>>> ts[:1].to_period()
PeriodIndex(['1999-12-31'], dtype='period[D]', freq='D')
  • ts[0] is a Timestamp and its to_period method drops tzinfo without taking into account its value.

  • ts[:1] is a length 1 DatetimeIndex and its to_period method first converts to UTC and then drops tzinfo.

I'm not sure if it's a bug or a feature, but I feel more natural to assume that ts.to_period()[0] == ts[0].to_period() is an invariant that should hold regardless of tzinfo. A different question is to decide which one, Timestamp.to_period or DatetimeIndex.to_period is correct.

Finally if positively Timestamp.to_period and DatetimeIndex.to_period should have different semantics, then the tests are wrong.

@mroeschke

This comment has been minimized.

Copy link
Member

commented Sep 30, 2018

Nice catch @miccoli! Agreed to_period should behave similarly between Timestamp and DatetimeIndex.

I don't work with Periods much so I welcome others' opinions, but it would think the DatetimeIndex.to_period behavior is incorrect; I don't think I would want the 1999 result.

The UTC conversion happens here in self.values:

return PeriodIndex(self.values, name=self.name, freq=freq)

@mroeschke mroeschke changed the title pandas/tests/indexes/datetimes/test_astype.py fails depending on timezone BUG: to_period behaves different between Timestamp and DatetimeIndex with timezones Sep 30, 2018

@brute4s99

This comment has been minimized.

Copy link
Contributor

commented Oct 1, 2018

Can I work this bug, please ?

@mroeschke

This comment has been minimized.

Copy link
Member

commented Oct 1, 2018

Go for it @brute4s99. Open up the pull request when you are ready for a review.

@mroeschke

This comment has been minimized.

Copy link
Member

commented Jan 4, 2019

This looks to be fixed on master now. Could use a test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.