Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.asof() : Timezone Awareness / Naivety comparison TypeError (incorrect) #21194

Closed
emmet02 opened this issue May 24, 2018 · 6 comments

Comments

Projects
None yet
5 participants
@emmet02
Copy link

commented May 24, 2018

import pandas as pd

# Create some random Timestamps
timestamp1 = pd.Timestamp('2018-01-01 21:00:05.001+00:00')
timestamp2 = pd.Timestamp('2018-01-01 22:35:10.550+00:00')

# Get an internal timestamp, so asof should give us the lesser value
timestamp_internal = timestamp2 + ((timestamp2 - timestamp1) / 2)

# Create a DataFrame
df = pd.DataFrame(data=[1,2], index=[timestamp1, timestamp2])

# display it
df
                                  0
2018-01-01 21:00:05.001000+00:00  1
2018-01-01 22:35:10.550000+00:00  2

# Now call asof() and show the issue
df.asof(timestamp_internal)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Anaconda3\envs\py36\lib\site-packages\pandas\core\generic.py", line 6144, in asof
    locs = self.index.asof_locs(where, ~(nulls.values))
  File "C:\Anaconda3\envs\py36\lib\site-packages\pandas\core\indexes\base.py", line 2489, in asof_locs
    result[(locs == 0) & (where < self.values[first])] = -1
  File "C:\Anaconda3\envs\py36\lib\site-packages\pandas\core\indexes\datetimes.py", line 136, in wrapper
    self._assert_tzawareness_compat(other)
  File "C:\Anaconda3\envs\py36\lib\site-packages\pandas\core\indexes\datetimes.py", line 672, in _assert_tzawareness_compat
    raise TypeError('Cannot compare tz-naive and tz-aware '
TypeError: Cannot compare tz-naive and tz-aware datetime-like objects

# Demonstrate other errant behavior - Try using a tz-naive Timestamp to lookup the frame
df.asof(timestamp_innie.tz_localize(None))
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Anaconda3\envs\py36\lib\site-packages\pandas\core\generic.py", line 6108, in asof
    if where < start:
  File "pandas\_libs\tslibs\timestamps.pyx", line 164, in pandas._libs.tslibs.timestamps._Timestamp.__richcmp__
  File "pandas\_libs\tslibs\timestamps.pyx", line 224, in pandas._libs.tslibs.timestamps._Timestamp._assert_tzawareness_compat
TypeError: Cannot compare tz-naive and tz-aware timestamps

Problem description

Somehow the DataFrame index is losing the 'awareness' of the timezone of the original Timestamps.
This has only been noticed recently, following a recent upgrade to the newest version of pandas, however I cannot say for sure whether or not it has been a very recent change to pandas which has caused this.

I am confident (though not 100% certain) that the expected usage worked previously (nor am in a position to test right now).

EDIT: Have tested this with pandas=0.22.0 and the expected output is now working fine.

Hopefully the bug is reproducible.

Expected Output

0 1
Name: 2018-01-01 23:22:43.324500+0000, dtype: int64

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

pd.show_versions()
Matplotlib support failed
INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 62 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.0
pytest: None
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.13.3
scipy: 1.1.0
pyarrow: 0.7.0
xarray: 0.10.4
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: None
tables: 3.4.3
numexpr: 2.6.5
feather: 0.4.0
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: None
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented May 24, 2018

Thanks for the report. It is indeed reproducible.

The bug is in

result[(locs == 0) & (where < self.values[first])] = -1

For a tz-aware DatetimeIndex self.values is a tz-naive ndarray.

The simplest fix is to maybe make that comparison be where.values < self.values[first]

@msmarchena

This comment has been minimized.

Copy link
Contributor

commented May 29, 2018

@TomAugspurger I have checked your solution and it actually solve the problem. If you need some help I can make the correction and implement the test.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented May 29, 2018

@jreback jreback modified the milestones: 0.23.1, 0.23.2 Jun 7, 2018

@jreback jreback modified the milestones: 0.23.2, 0.23.3 Jun 26, 2018

@jreback jreback modified the milestones: 0.23.4, 0.24.0, 0.23.5 Aug 2, 2018

@danielwang5

This comment has been minimized.

Copy link

commented Aug 3, 2018

Any progress on this? 0.23 still doesn't work for me.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Aug 3, 2018

@msmarchena

This comment has been minimized.

Copy link
Contributor

commented Aug 4, 2018

Sorry, I have been travelling and without time to work on this. Since I have done a mess with PR21284
I have opened a cleaner one Last try!

@jreback jreback modified the milestones: 0.23.5, 0.24.0 Aug 6, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.