Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.asof() : Timezone Awareness / Naivety comparison TypeError (incorrect) #21194

Closed
emmet02 opened this issue May 24, 2018 · 6 comments · Fixed by #22198
Closed

DataFrame.asof() : Timezone Awareness / Naivety comparison TypeError (incorrect) #21194

emmet02 opened this issue May 24, 2018 · 6 comments · Fixed by #22198
Labels
Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@emmet02
Copy link

emmet02 commented May 24, 2018

import pandas as pd

# Create some random Timestamps
timestamp1 = pd.Timestamp('2018-01-01 21:00:05.001+00:00')
timestamp2 = pd.Timestamp('2018-01-01 22:35:10.550+00:00')

# Get an internal timestamp, so asof should give us the lesser value
timestamp_internal = timestamp2 + ((timestamp2 - timestamp1) / 2)

# Create a DataFrame
df = pd.DataFrame(data=[1,2], index=[timestamp1, timestamp2])

# display it
df
                                  0
2018-01-01 21:00:05.001000+00:00  1
2018-01-01 22:35:10.550000+00:00  2

# Now call asof() and show the issue
df.asof(timestamp_internal)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Anaconda3\envs\py36\lib\site-packages\pandas\core\generic.py", line 6144, in asof
    locs = self.index.asof_locs(where, ~(nulls.values))
  File "C:\Anaconda3\envs\py36\lib\site-packages\pandas\core\indexes\base.py", line 2489, in asof_locs
    result[(locs == 0) & (where < self.values[first])] = -1
  File "C:\Anaconda3\envs\py36\lib\site-packages\pandas\core\indexes\datetimes.py", line 136, in wrapper
    self._assert_tzawareness_compat(other)
  File "C:\Anaconda3\envs\py36\lib\site-packages\pandas\core\indexes\datetimes.py", line 672, in _assert_tzawareness_compat
    raise TypeError('Cannot compare tz-naive and tz-aware '
TypeError: Cannot compare tz-naive and tz-aware datetime-like objects

# Demonstrate other errant behavior - Try using a tz-naive Timestamp to lookup the frame
df.asof(timestamp_innie.tz_localize(None))
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Anaconda3\envs\py36\lib\site-packages\pandas\core\generic.py", line 6108, in asof
    if where < start:
  File "pandas\_libs\tslibs\timestamps.pyx", line 164, in pandas._libs.tslibs.timestamps._Timestamp.__richcmp__
  File "pandas\_libs\tslibs\timestamps.pyx", line 224, in pandas._libs.tslibs.timestamps._Timestamp._assert_tzawareness_compat
TypeError: Cannot compare tz-naive and tz-aware timestamps

Problem description

Somehow the DataFrame index is losing the 'awareness' of the timezone of the original Timestamps.
This has only been noticed recently, following a recent upgrade to the newest version of pandas, however I cannot say for sure whether or not it has been a very recent change to pandas which has caused this.

I am confident (though not 100% certain) that the expected usage worked previously (nor am in a position to test right now).

EDIT: Have tested this with pandas=0.22.0 and the expected output is now working fine.

Hopefully the bug is reproducible.

Expected Output

0 1
Name: 2018-01-01 23:22:43.324500+0000, dtype: int64

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

pd.show_versions()
Matplotlib support failed
INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 62 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.0
pytest: None
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.13.3
scipy: 1.1.0
pyarrow: 0.7.0
xarray: 0.10.4
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: None
tables: 3.4.3
numexpr: 2.6.5
feather: 0.4.0
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: None
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@TomAugspurger
Copy link
Contributor

Thanks for the report. It is indeed reproducible.

The bug is in

result[(locs == 0) & (where < self.values[first])] = -1

For a tz-aware DatetimeIndex self.values is a tz-naive ndarray.

The simplest fix is to maybe make that comparison be where.values < self.values[first]

@TomAugspurger TomAugspurger added Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions labels May 24, 2018
@TomAugspurger TomAugspurger added this to the 0.23.1 milestone May 24, 2018
@TomAugspurger TomAugspurger added Difficulty Intermediate Indexing Related to indexing on series/frames, not to indexes themselves labels May 24, 2018
@msmarchena
Copy link
Contributor

@TomAugspurger I have checked your solution and it actually solve the problem. If you need some help I can make the correction and implement the test.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented May 29, 2018 via email

@danielwang5
Copy link

Any progress on this? 0.23 still doesn't work for me.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Aug 3, 2018 via email

@msmarchena
Copy link
Contributor

Sorry, I have been travelling and without time to work on this. Since I have done a mess with PR21284
I have opened a cleaner one Last try!

@jreback jreback modified the milestones: 0.23.5, 0.24.0 Aug 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
5 participants