Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datetime.index.date returns incorrect date post upgrade to version 0.23 #21230

Closed
MarekOzana opened this Issue May 28, 2018 · 5 comments

Comments

Projects
None yet
5 participants
@MarekOzana
Copy link

MarekOzana commented May 28, 2018

Code Sample, a copy-pastable example if possible

df1 = pd.DataFrame(data=[24, 25],
                   index=pd.DatetimeIndex(['2013-01-24 15:01:00+01:00',
                                           '2013-01-25 15:01:00+01:00'],
                                          dtype='datetime64[ns, CET]',
                                          name='Date', freq=None))
print(df1.index.date)

the above code prints:
[datetime.date(2013, 1, 23) datetime.date(2013, 1, 24)]

Problem description

Datetime.index.date does return incorrect dates. The behavious worked until version 0.22, and seems to be incorrect post upgrade to version 0.23

Expected Output

[datetime.date(2013, 1, 24) datetime.date(2013, 1, 25)]

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.0
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: None
numpy: 1.12.1
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@mroeschke

This comment has been minimized.

Copy link
Member

mroeschke commented May 29, 2018

Looks like a timezone issue since this works correctly without a timezone dtype. Investigation and PR's welcome!

In [7]: d = pd.DatetimeIndex(['2013-01-24 15:01:00+01:00','2013-01-25 15:01:00+01:00'])

In [8]: d.date
Out[8]:
array([datetime.date(2013, 1, 24), datetime.date(2013, 1, 25)],
      dtype=object)

In [9]: d.tz_localize('CET').date
Out[9]:
array([datetime.date(2013, 1, 23), datetime.date(2013, 1, 24)],
      dtype=object)
@ssikdar1

This comment has been minimized.

Copy link
Contributor

ssikdar1 commented May 31, 2018

Hmm is this possibly on purpose?

import pandas as pd

df1 = pd.DataFrame(data=[24, 25],
                   index=pd.DatetimeIndex(['2013-01-24 15:01:00+01:00',
                                           '2013-01-25 15:01:00+01:00'],
                                          dtype='datetime64[ns, CET]',
                                          name='Date', freq=None))
print(type(df1.index))

Prints

<class 'pandas.core.indexes.datetimes.DatetimeIndex'>

Looking at the index method of DatetimeIndex :
https://github.com/pandas-dev/pandas/blob/master/pandas/core/indexes/datetimes.py#L2037

    @property
    def date(self):
        """
        Returns numpy array of python datetime.date objects (namely, the date
        part of Timestamps without timezone information).
        """
        return libts.ints_to_pydatetime(self.normalize().asi8, box="date")

So here the datetime.date obj is intentionally being returned without the timezone information.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

jorisvandenbossche commented May 31, 2018

I suppose this is due to #18163, which improved the performance of the .date accessor, but probably forgot to deal with timezones.

Before it relied on the date() method of the individual timestamp objects, which is still correct:

In [19]: df1.index[0]
Out[19]: Timestamp('2013-01-24 14:01:00+0100', tz='CET')

In [20]: df1.index[0].date()
Out[20]: datetime.date(2013, 1, 24)
@jorisvandenbossche

This comment has been minimized.

Copy link
Member

jorisvandenbossche commented May 31, 2018

cc @tmnhat2001 welcome to take a look if you would have time.

@tmnhat2001

This comment has been minimized.

Copy link
Contributor

tmnhat2001 commented Jun 1, 2018

Just did a some more digging why some test cases fail. It seems that when DatetimeIndex.normalize() is used, The dates are converted properly when the original timezone is behind UTC time. But when they are ahead of UTC time, the returned value is incorrect.

In [41]: index = pd.DatetimeIndex(['2013-01-24 15:01:00'], dtype='datetime64[ns, EST]', freq=None)
In [42]: index.date
Out[42]: array([datetime.date(2013, 1, 24)], dtype=object)

In [43]: index = pd.DatetimeIndex(['2013-01-24 15:01:00'],dtype='datetime64[ns, CET]', freq=None)
    ...:

In [44]: index.date
Out[44]: array([datetime.date(2013, 1, 23)], dtype=object)

jorisvandenbossche added a commit that referenced this issue Jun 7, 2018

BUG: Using DatetimeIndex.date with timezone returns incorrect date (#…
…21281)

* BUG: Using DatetimeIndex.date with timezone returns incorrect date #21230
* Fix bug where DTI.time returns a tz-aware Time instead of tz-naive #21267

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jun 12, 2018

BUG: Using DatetimeIndex.date with timezone returns incorrect date (p…
…andas-dev#21281)

* BUG: Using DatetimeIndex.date with timezone returns incorrect date pandas-dev#21230
* Fix bug where DTI.time returns a tz-aware Time instead of tz-naive pandas-dev#21267

(cherry picked from commit a363e1a)

TomAugspurger added a commit that referenced this issue Jun 12, 2018

BUG: Using DatetimeIndex.date with timezone returns incorrect date (#…
…21281)

* BUG: Using DatetimeIndex.date with timezone returns incorrect date #21230
* Fix bug where DTI.time returns a tz-aware Time instead of tz-naive #21267

(cherry picked from commit a363e1a)

david-liu-brattle-1 added a commit to david-liu-brattle-1/pandas that referenced this issue Jun 18, 2018

BUG: Using DatetimeIndex.date with timezone returns incorrect date (p…
…andas-dev#21281)

* BUG: Using DatetimeIndex.date with timezone returns incorrect date pandas-dev#21230
* Fix bug where DTI.time returns a tz-aware Time instead of tz-naive pandas-dev#21267
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.