Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Erratic behavior using query() and timedelta #23420

Open
jsevo opened this issue Oct 30, 2018 · 7 comments
Open

Erratic behavior using query() and timedelta #23420

jsevo opened this issue Oct 30, 2018 · 7 comments
Labels
Bug expressions pd.eval, query Timedelta Timedelta data type

Comments

@jsevo
Copy link

jsevo commented Oct 30, 2018

Code Sample, a copy-pastable example if possible

import pandas as pd
d = pd.DataFrame([pd.Timedelta(5,'D')], columns=['dt'])
d.query('dt =="5 days"', engine='numexpr')
# works: 
#      dt
# 0 5 days

d.query('dt =="4 days"', engine='numexpr')
# works:
#Empty DataFrame
#Columns: [dt]
#Index: []

d.query('dt <"5 days"', engine='numexpr')
# Does not work
# ValueError: unknown type timedelta64[ns]

Problem description

Using query on timedelta values has erratic behavior, and was working before. I can't see how suddenly, less than, and greater than operations using query do not work, but == does.

Expected Output

import pandas as pd
d = pd.DataFrame([pd.Timedelta(5,'D')], columns=['dt'])


d.query('dt <"5 days"')
#EXPECTED:
#      dt
# 0 5 days`

INSTALLED VERSIONS

commit: None
python: 3.7.0.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: None
pip: 18.1
setuptools: 39.2.0
Cython: 0.28.3
numpy: 1.14.5
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: None
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: 2.6.8
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.9
xlrd: 1.1.0
xlwt: None
xlsxwriter: 1.1.2
lxml: 4.2.5
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.7.0

@gfyoung gfyoung added Timeseries Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Oct 31, 2018
@gfyoung
Copy link
Member

gfyoung commented Oct 31, 2018

@jsevo : Thanks for reporting this! You said this was working before - which version was this?

cc @jreback @mroeschke

@jsevo
Copy link
Author

jsevo commented Nov 1, 2018

@gfyoung I ran the same analysis over the last few days maintaining a python session. Then, I had some updates due on my mac, and had to reboot. I re-ran the same scripts, and, working of the same data, observed this behavior. I then checked my pandas version, and am fairly certain I had already installed the latest, but ran pip install --update pandas anyway. I wish I had retained the version info from before to be 100% certain, sorry.

Perhaps what I describe above is not supposed to work at all? Then it's still puzzling that == does.

@gfyoung
Copy link
Member

gfyoung commented Nov 1, 2018

No worries. It sounds pretty recent, so perhaps you can try installing older and older versions of pandas via pip and see which one starts getting your scripts to work again?

@jsevo
Copy link
Author

jsevo commented Nov 1, 2018

Oh right - no worries. I refactored my code like so:

#before, which breaks
d.query('dt <"5 days"')
#now 
d.loc[d['dt'] < pd.Timedelta(5, 'D')]

I think using query looks nicer, but not if query is not meant to work with Timedelta. Thanks for your help. Or would you like me to try? I am busy this week, but could look into it on Sunday.

@gfyoung
Copy link
Member

gfyoung commented Nov 1, 2018

Funny thing, I can't reproduce the failure. I tried it with 0.23.4 and on master.

@jreback @mroeschke : Are you able to reproduce by any chance?

@gfyoung
Copy link
Member

gfyoung commented Nov 1, 2018

So I'm able to reproduce, but you need to have numexpr installed. Just not sure which pandas version you had installed though...

@jsevo : To work around this, pass in engine='python'.

@gfyoung gfyoung added the Compat pandas objects compatability with Numpy or Python functions label Nov 1, 2018
@gfyoung
Copy link
Member

gfyoung commented Nov 1, 2018

I went back a couple of year's worth of releases, but I can't seem to find a combo of pandas and numexpr that gets the example to work...very strange...

@jsevo : if you're able to retrieve your old environment information, that would be great!

@jbrockmendel jbrockmendel added the expressions pd.eval, query label Oct 22, 2019
@mroeschke mroeschke added Bug and removed Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Apr 1, 2020
@mroeschke mroeschke added Timedelta Timedelta data type and removed Compat pandas objects compatability with Numpy or Python functions Timeseries labels Apr 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug expressions pd.eval, query Timedelta Timedelta data type
Projects
None yet
Development

No branches or pull requests

4 participants