Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slicing DatetimeIndex should be timezone aware #24076

Closed
ismell opened this issue Dec 3, 2018 · 6 comments · Fixed by #25263
Closed

Slicing DatetimeIndex should be timezone aware #24076

ismell opened this issue Dec 3, 2018 · 6 comments · Fixed by #25263
Labels
Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves Timezones Timezone data dtype
Milestone

Comments

@ismell
Copy link

ismell commented Dec 3, 2018

Code Sample, a copy-pastable example if possible

# Your code here
import numpy as np
import pandas as pd

idx = pd.DatetimeIndex(start='2018-12-02 14:50:00-07:00', end='2018-12-03 03:11:00-07:00', freq='1min')

df = pd.DataFrame(np.random.randn(len(idx), 1), index=idx, columns=['A'])

df['2018-12-02 21:53:34+00:00':'2018-12-02 22:53:54+00:00'].head()
#                                  A
# 2018-12-02 21:54:00-07:00 -0.647961
# 2018-12-02 21:55:00-07:00  0.112683
# 2018-12-02 21:56:00-07:00 -1.221521
# 2018-12-02 21:57:00-07:00  0.052644
# 2018-12-02 21:58:00-07:00  1.799572

Problem description

When slicing a DatetimeIndex using a timestamp that has a timezone, the timezone is not used as part o the comparison.

Expected Output

df[(df.index >= '2018-12-02 21:53:34+00:00') & (df.index < '2018-12-02 22:53:54+00:00')].head()
#                                  A
# 2018-12-02 14:54:00-07:00  0.578494
# 2018-12-02 14:55:00-07:00  0.170471
# 2018-12-02 14:56:00-07:00  1.078600
# 2018-12-02 14:57:00-07:00  0.376983
# 2018-12-02 14:58:00-07:00 -1.295243

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.0.final.0 python-bits: 64 OS: Linux OS-release: 4.17.0-3rodete2-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: None
pip: 18.1
setuptools: 40.6.2
Cython: None
numpy: 1.15.4
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jbrockmendel
Copy link
Member

@jorisvandenbossche a while back there was a couple of threads about slicing conventions with strings and tz-awareness. I think this is orthogonal, but this merits another pair of eyes.

@gfyoung gfyoung added Datetime Datetime data dtype Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Timezones Timezone data dtype labels Dec 6, 2018
@jorisvandenbossche
Copy link
Member

@jbrockmendel do you remember where this discussion was? From a search, I found possibly related discussion in #18435.

I would certainly consider this a bug (but I also have the feeling we have had issues about this before)

@1kastner
Copy link

1kastner commented Dec 6, 2018

See #16785 and related: The timezone information is just dropped during the parsing process, see the related pull request.

@mroeschke
Copy link
Member

This looks fixed in master; could use a test.

In [3]: pd.__version__
Out[3]: '0.24.0.dev0+1342.g3e0358d86'

In [4]: df[(df.index >= '2018-12-02 21:53:34+00:00') & (df.index < '2018-12-02 22:53:54+00:00')].head()
Out[4]:
                                  A
2018-12-02 14:54:00-07:00 -0.962576
2018-12-02 14:55:00-07:00  0.324211
2018-12-02 14:56:00-07:00 -0.412128
2018-12-02 14:57:00-07:00 -0.631707
2018-12-02 14:58:00-07:00 -0.594750

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Dec 23, 2018
@1kastner
Copy link

1kastner commented Jan 1, 2019

The changes have not affected #16785 - I just hoped we could have been lucky ;-)

@mroeschke
Copy link
Member

Oops, I think I was looking at the incorrect example; this still looks broken:

In [7]: df['2018-12-02 21:53:34+00:00':'2018-12-02 22:53:54+00:00'].head()
   ...:
Out[7]:
                                  A
2018-12-02 21:54:00-07:00 -0.659843
2018-12-02 21:55:00-07:00 -0.802792
2018-12-02 21:56:00-07:00 -1.045828
2018-12-02 21:57:00-07:00  0.059644
2018-12-02 21:58:00-07:00 -2.181424

@mroeschke mroeschke added Indexing Related to indexing on series/frames, not to indexes themselves and removed Needs Tests Unit test(s) needed to prevent regressions good first issue labels Jan 5, 2019
@jreback jreback added this to the 0.25.0 milestone Feb 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants