Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read from HDF with empty `where` throws an error #26610

Closed
BeforeFlight opened this issue Jun 1, 2019 · 10 comments

Comments

@BeforeFlight
Copy link
Contributor

commented Jun 1, 2019

Code Sample

df = pd.DataFrame(np.random.rand(4,4))

where = ''
with pd.HDFStore('test.h5') as store:
    store.put('df', df, 't')
    store.select('df', where = where)

Problem description

Wanted to be able construct "by hands" and save where condition for later, so declare it as variable. But some times constructed where becomes empty and code throws an error.

Traceback (most recent call last):

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3267, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-101-48181c3b59fb>", line 6, in <module>
    store.select('df', where = where)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 740, in select
    return it.get_result()

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 1518, in get_result
    results = self.func(self.start, self.stop, where)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 733, in func
    columns=columns)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 4254, in read
    if not self.read_axes(where=where, **kwargs):

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 3443, in read_axes
    self.selection = Selection(self, where=where, **kwargs)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 4815, in __init__
    self.terms = self.generate(where)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 4828, in generate
    return Expr(where, queryables=q, encoding=self.table.encoding)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/core/computation/pytables.py", line 548, in __init__
    self.terms = self.parse()

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/core/computation/expr.py", line 766, in parse
    return self._visitor.visit(self.expr)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/core/computation/expr.py", line 331, in visit
    return visitor(node, **kwargs)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/core/computation/expr.py", line 335, in visit_Module
    raise SyntaxError('only a single expression is allowed')

  File "<string>", line unknown
SyntaxError: only a single expression is allowed

Expected Output

When empty string is passed to where - just select whole DataFrame. It may be easily achieved by changing last statement to store.select('df', where = where if where else None). But it would be better to add this checking inside pandas, so user may not worry about it all the times using selection from HDF with where.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.3.final.0 python-bits: 64 OS: Linux OS-release: 5.0.0-16-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 4.5.0
pip: 19.1.1
setuptools: 41.0.1
Cython: 0.29.7
numpy: 1.16.3
scipy: 1.2.1
pyarrow: None
xarray: 0.12.1
IPython: 7.2.0
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@BeforeFlight BeforeFlight changed the title Read from HDF with empty where throws error Read from HDF with empty where throws an error Jun 1, 2019

@BeforeFlight BeforeFlight changed the title Read from HDF with empty where throws an error Read from HDF with empty `where` throws an error Jun 1, 2019

@WillAyd

This comment has been minimized.

Copy link
Member

commented Jun 2, 2019

Where is documented as accepting a list so if you use an empty list instead of the string you should be able to manage this the way you want

@WillAyd WillAyd closed this Jun 2, 2019

@BeforeFlight

This comment has been minimized.

Copy link
Contributor Author

commented Jun 2, 2019

In API reference it is stated it accepts list, yes. But in user_guide all examples are with using where as string. And also user_guide states: "If a list/tuple of expressions is passed they will be combined via &". The latter may become problem if one would create empty where = [], and starts to populate it with conditions - all of them will be forced to be combined via '&' (not '|' as may be wished). So in this case it would be ended to amending single condition inside where = [condition] list.

But anyway even here problem is the same. If where will ends up as empty list after all processing:

df = pd.DataFrame(np.random.rand(4,4))

where = []
with pd.HDFStore('test.h5') as store:
    store.put('df', df, 't')
    store.select('df', where = where)

Same error will be raised:

Traceback (most recent call last):

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3267, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-90-507edb4b117e>", line 6, in <module>
    store.select('df', where = where)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 740, in select
    return it.get_result()

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 1518, in get_result
    results = self.func(self.start, self.stop, where)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 733, in func
    columns=columns)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 4254, in read
    if not self.read_axes(where=where, **kwargs):

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 3443, in read_axes
    self.selection = Selection(self, where=where, **kwargs)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 4815, in __init__
    self.terms = self.generate(where)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 4828, in generate
    return Expr(where, queryables=q, encoding=self.table.encoding)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/core/computation/pytables.py", line 548, in __init__
    self.terms = self.parse()

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/core/computation/expr.py", line 766, in parse
    return self._visitor.visit(self.expr)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/core/computation/expr.py", line 331, in visit
    return visitor(node, **kwargs)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/core/computation/expr.py", line 335, in visit_Module
    raise SyntaxError('only a single expression is allowed')

  File "<string>", line unknown
SyntaxError: only a single expression is allowed

@WillAyd WillAyd reopened this Jun 2, 2019

@WillAyd WillAyd added IO HDF5 and removed Usage Question labels Jun 2, 2019

@WillAyd

This comment has been minimized.

Copy link
Member

commented Jun 2, 2019

Thanks for the additional references. If you'd like to take a look and clean up implementation / documentation PRs would certainly be welcome!

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Jun 3, 2019

To make sure I understand, the proposed fix is for where=[] to be treaded the same as where=None, i.e. no filtering?

@BeforeFlight

This comment has been minimized.

Copy link
Contributor Author

commented Jun 3, 2019

@TomAugspurger, if you are asking me, yes I think it should be that way. Empty where=[] -> no filtering -> whole df will be return.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Jun 3, 2019

@BeforeFlight

This comment has been minimized.

Copy link
Contributor Author

commented Jun 3, 2019

@TomAugspurger as I explained in neighbour theme with groupby issue - just don't know how to do it correctly.

@WillAyd

This comment has been minimized.

Copy link
Member

commented Jun 3, 2019

@BeforeFlight we have a contributing guide which could be helpful:

https://pandas.pydata.org/pandas-docs/stable/development/contributing.html#testing-with-continuous-integration

If you would like to try but run into specific issues we are of course here to help. You can also use Gitter for development questions

@WillAyd WillAyd added this to the Contributions Welcome milestone Jun 3, 2019

@WillAyd WillAyd added the API Design label Jun 3, 2019

@BeforeFlight

This comment has been minimized.

Copy link
Contributor Author

commented Jun 3, 2019

@WillAyd well I would like to. But will start only tomorrow (now is 2pm here). If this api design proposal may serve for my 'github environment understanding' for some time - I would like to try for sure.

@BeforeFlight

This comment has been minimized.

Copy link
Contributor Author

commented Jun 7, 2019

I'm not sure how should I test it. While putting my test into pandas context get following for now:

def test_empty_where_lst():
    with tm.ensure_clean() as path:
        df = pd.DataFrame([[1, 2, 3], [1, 2, 3]])
        with pd.HDFStore(path) as store:
            store.put("df", df, "t")
            store.select("df", where=[])

But this code raises very specific exception - SyntaxError. So should I prefix function with @pytest.mark.xfail(raises=SyntaxError)? So be more explicit on what exception is expected.

The reason why I'm asking is discourage of checking for Exceptions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.