Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix HDFStore empty keys on native HDF5 file by adding keyword include #32723

Merged

Conversation

roberthdevries
Copy link
Contributor

@pep8speaks
Copy link

pep8speaks commented Mar 15, 2020

Hello @roberthdevries! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-06-14 21:02:12 UTC

@roberthdevries roberthdevries changed the title Fix 29916 by adding keyword include Fix HDFStore empty keys on native HDF5 file by adding keyword include Mar 15, 2020
@jreback jreback added IO HDF5 read_hdf, HDFStore API - Consistency Internal Consistency of API/Behavior labels Mar 16, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine. does this solve the OP? meaning folks simply want to use .keys() to get all the tables, then use .get() to read them?

if include == "pandas":
return [n._v_pathname for n in self.groups()]

if include == "native":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use elif

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really needed in this case as the previous if returns in all cases. Why would that help us?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change this (its more idiomatic)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

pandas/tests/io/pytables/test_store.py Show resolved Hide resolved
pandas/io/pytables.py Show resolved Hide resolved
@@ -580,16 +580,37 @@ def __enter__(self):
def __exit__(self, exc_type, exc_value, traceback):
self.close()

def keys(self) -> List[str]:
def keys(self, include: Optional[str] = "pandas") -> List[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not Optional, just str (alternatively, we could actually do Optiona[str] = None and then allow passing include='native')

but find either way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the Optional annotation

Parameters
----------

.. versionadded:: 1.1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment on other PR - put below include and indent one level (I think will fix CI)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@roberthdevries roberthdevries force-pushed the fix-29916-by-adding-keyword-include branch from 3e114be to 020fc5b Compare March 18, 2020 19:51
@roberthdevries roberthdevries changed the title Fix HDFStore empty keys on native HDF5 file by adding keyword include BUG: Fix HDFStore empty keys on native HDF5 file by adding keyword include Mar 18, 2020
@roberthdevries
Copy link
Contributor Author

looks fine. does this solve the OP? meaning folks simply want to use .keys() to get all the tables, then use .get() to read them?

We could also merge the two sets of tables. I think pandas does not use native HDF5 tables to store DataFrames. In that case the two sets are disjoint and we should be fine returning the joined set.

@roberthdevries
Copy link
Contributor Author

I tried the variant with returning the combination of keys of pandas tables and HDF5 native tables, and the unit test pandas/tests/io/pytables/test_store.py::TestHDFStore::test_copy fails. I think the solution we have now helps the dask developers at least to have a work-around solution, that can work on both older and newer pandas versions by checking for the pandas version.

@roberthdevries roberthdevries force-pushed the fix-29916-by-adding-keyword-include branch 3 times, most recently from 4ee5d7e to ba1a8ca Compare March 24, 2020 20:42
@roberthdevries roberthdevries force-pushed the fix-29916-by-adding-keyword-include branch 2 times, most recently from 83bce97 to 151aa4f Compare April 2, 2020 12:24
@roberthdevries roberthdevries force-pushed the fix-29916-by-adding-keyword-include branch from 151aa4f to 8ce3475 Compare April 19, 2020 12:31
@st-bender
Copy link

I tried the variant with returning the combination of keys of pandas tables and HDF5 native tables, and the unit test pandas/tests/io/pytables/test_store.py::TestHDFStore::test_copy fails.

Hi there, this is interesting, how did this test pass before the keys() method changed, i.e. it used to return all the keys in pandas < 0.24?

I think the solution we have now helps the dask developers at least to have a work-around solution, that can work on both older and newer pandas versions by checking for the pandas version.

Hopefully, thanks for working on it.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine, pls merge master and ping on green

if include == "pandas":
return [n._v_pathname for n in self.groups()]

if include == "native":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change this (its more idiomatic)

An additional kind parameter has been added that defaults to pandas
original behavior, but with 'tables' value gives you the list of
non-pandas tables in the file
- improve type annotation of the HDFStore.keys method
- minor improvement in ValueError string
- minor improvement in doc-string
@roberthdevries roberthdevries force-pushed the fix-29916-by-adding-keyword-include branch from 8ce3475 to 51d632d Compare June 14, 2020 20:21
@jreback jreback added this to the 1.1 milestone Jun 14, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you need this also in .groups()?

include : str, default 'pandas'
When kind equals 'pandas' return pandas objects
When kind equals 'native' return native HDF5 Table objects
Otherwise fail with a ValueError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line is not needed (the otherwise fail)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@roberthdevries
Copy link
Contributor Author

don't you need this also in .groups()?

Could be added there as well, although that use case is less typical. Most users are interested in the tables in the file.
However, if this question pops up, I'll be happy to create a PR for that.

@roberthdevries
Copy link
Contributor Author

@jreback tests should have run fine, except that one of the tests took too long I think. So this is my ping on sort of green

@jreback jreback merged commit 4f625a2 into pandas-dev:master Jun 14, 2020
@jreback
Copy link
Contributor

jreback commented Jun 14, 2020

thanks @roberthdevries

@jamesbooker
Copy link

thanks @roberthdevries and @jreback for resolving this issue for keys, it will help my project a lot.

@roberthdevries the source files coming into my system use groups (not created by pandas) so I would need the same fix applying to .groups() as well as .keys() if at all possible.

@roberthdevries roberthdevries deleted the fix-29916-by-adding-keyword-include branch June 16, 2020 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HDF5: empty groups and keys
6 participants