Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read from HDF with empty where throws an error #26610

Closed
BeforeFlight opened this issue Jun 1, 2019 · 10 comments · Fixed by #26746
Closed

Read from HDF with empty where throws an error #26610

BeforeFlight opened this issue Jun 1, 2019 · 10 comments · Fixed by #26746
Labels
API Design IO HDF5 read_hdf, HDFStore
Milestone

Comments

@BeforeFlight
Copy link
Contributor

BeforeFlight commented Jun 1, 2019

Code Sample

df = pd.DataFrame(np.random.rand(4,4))

where = ''
with pd.HDFStore('test.h5') as store:
    store.put('df', df, 't')
    store.select('df', where = where)

Problem description

Wanted to be able construct "by hands" and save where condition for later, so declare it as variable. But some times constructed where becomes empty and code throws an error.

Traceback (most recent call last):

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3267, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-101-48181c3b59fb>", line 6, in <module>
    store.select('df', where = where)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 740, in select
    return it.get_result()

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 1518, in get_result
    results = self.func(self.start, self.stop, where)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 733, in func
    columns=columns)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 4254, in read
    if not self.read_axes(where=where, **kwargs):

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 3443, in read_axes
    self.selection = Selection(self, where=where, **kwargs)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 4815, in __init__
    self.terms = self.generate(where)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 4828, in generate
    return Expr(where, queryables=q, encoding=self.table.encoding)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/core/computation/pytables.py", line 548, in __init__
    self.terms = self.parse()

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/core/computation/expr.py", line 766, in parse
    return self._visitor.visit(self.expr)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/core/computation/expr.py", line 331, in visit
    return visitor(node, **kwargs)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/core/computation/expr.py", line 335, in visit_Module
    raise SyntaxError('only a single expression is allowed')

  File "<string>", line unknown
SyntaxError: only a single expression is allowed

Expected Output

When empty string is passed to where - just select whole DataFrame. It may be easily achieved by changing last statement to store.select('df', where = where if where else None). But it would be better to add this checking inside pandas, so user may not worry about it all the times using selection from HDF with where.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.3.final.0 python-bits: 64 OS: Linux OS-release: 5.0.0-16-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 4.5.0
pip: 19.1.1
setuptools: 41.0.1
Cython: 0.29.7
numpy: 1.16.3
scipy: 1.2.1
pyarrow: None
xarray: 0.12.1
IPython: 7.2.0
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@BeforeFlight BeforeFlight changed the title Read from HDF with empty where throws error Read from HDF with empty where throws an error Jun 1, 2019
@BeforeFlight BeforeFlight changed the title Read from HDF with empty where throws an error Read from HDF with empty where throws an error Jun 1, 2019
@WillAyd
Copy link
Member

WillAyd commented Jun 2, 2019

Where is documented as accepting a list so if you use an empty list instead of the string you should be able to manage this the way you want

@BeforeFlight
Copy link
Contributor Author

BeforeFlight commented Jun 2, 2019

In API reference it is stated it accepts list, yes. But in user_guide all examples are with using where as string. And also user_guide states: "If a list/tuple of expressions is passed they will be combined via &". The latter may become problem if one would create empty where = [], and starts to populate it with conditions - all of them will be forced to be combined via '&' (not '|' as may be wished). So in this case it would be ended to amending single condition inside where = [condition] list.

But anyway even here problem is the same. If where will ends up as empty list after all processing:

df = pd.DataFrame(np.random.rand(4,4))

where = []
with pd.HDFStore('test.h5') as store:
    store.put('df', df, 't')
    store.select('df', where = where)

Same error will be raised:

Traceback (most recent call last):

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3267, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-90-507edb4b117e>", line 6, in <module>
    store.select('df', where = where)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 740, in select
    return it.get_result()

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 1518, in get_result
    results = self.func(self.start, self.stop, where)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 733, in func
    columns=columns)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 4254, in read
    if not self.read_axes(where=where, **kwargs):

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 3443, in read_axes
    self.selection = Selection(self, where=where, **kwargs)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 4815, in __init__
    self.terms = self.generate(where)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/io/pytables.py", line 4828, in generate
    return Expr(where, queryables=q, encoding=self.table.encoding)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/core/computation/pytables.py", line 548, in __init__
    self.terms = self.parse()

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/core/computation/expr.py", line 766, in parse
    return self._visitor.visit(self.expr)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/core/computation/expr.py", line 331, in visit
    return visitor(node, **kwargs)

  File "/home/beforeflight/Coding/Python/_venvs_/main/lib/python3.7/site-packages/pandas/core/computation/expr.py", line 335, in visit_Module
    raise SyntaxError('only a single expression is allowed')

  File "<string>", line unknown
SyntaxError: only a single expression is allowed

@WillAyd WillAyd reopened this Jun 2, 2019
@WillAyd WillAyd added IO HDF5 read_hdf, HDFStore and removed Usage Question labels Jun 2, 2019
@WillAyd
Copy link
Member

WillAyd commented Jun 2, 2019

Thanks for the additional references. If you'd like to take a look and clean up implementation / documentation PRs would certainly be welcome!

@TomAugspurger
Copy link
Contributor

To make sure I understand, the proposed fix is for where=[] to be treaded the same as where=None, i.e. no filtering?

@BeforeFlight
Copy link
Contributor Author

BeforeFlight commented Jun 3, 2019

@TomAugspurger, if you are asking me, yes I think it should be that way. Empty where=[] -> no filtering -> whole df will be return.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 3, 2019 via email

@BeforeFlight
Copy link
Contributor Author

BeforeFlight commented Jun 3, 2019

@TomAugspurger as I explained in neighbour theme with groupby issue - just don't know how to do it correctly.

@WillAyd
Copy link
Member

WillAyd commented Jun 3, 2019

@BeforeFlight we have a contributing guide which could be helpful:

https://pandas.pydata.org/pandas-docs/stable/development/contributing.html#testing-with-continuous-integration

If you would like to try but run into specific issues we are of course here to help. You can also use Gitter for development questions

@WillAyd WillAyd added this to the Contributions Welcome milestone Jun 3, 2019
@BeforeFlight
Copy link
Contributor Author

@WillAyd well I would like to. But will start only tomorrow (now is 2pm here). If this api design proposal may serve for my 'github environment understanding' for some time - I would like to try for sure.

@BeforeFlight
Copy link
Contributor Author

BeforeFlight commented Jun 7, 2019

I'm not sure how should I test it. While putting my test into pandas context get following for now:

def test_empty_where_lst():
    with tm.ensure_clean() as path:
        df = pd.DataFrame([[1, 2, 3], [1, 2, 3]])
        with pd.HDFStore(path) as store:
            store.put("df", df, "t")
            store.select("df", where=[])

But this code raises very specific exception - SyntaxError. So should I prefix function with @pytest.mark.xfail(raises=SyntaxError)? So be more explicit on what exception is expected.

The reason why I'm asking is discourage of checking for Exceptions.

@jreback jreback modified the milestones: Contributions Welcome, 0.25.0 Jun 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants