Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: #10143 Function to walk the group hierarchy of a PyTables HDF5 file #10932

Closed
wants to merge 3 commits into from

Conversation

stephenpascoe
Copy link

closes #10143

This implementation is inspired by os.walk and follows the interface as much as possible.

@jreback
Copy link
Contributor

jreback commented Aug 30, 2015

any reason not to call this just walk?

@jreback jreback added API Design IO HDF5 read_hdf, HDFStore labels Aug 30, 2015
@jreback jreback added this to the 0.17.0 milestone Aug 30, 2015
@jreback
Copy link
Contributor

jreback commented Aug 30, 2015

pls add a note in whatsnew/0.17.0 in enhancements

if (getattr(child._v_attrs, 'CLASS', None) == 'GROUP'
and pandas_type is None):
groups.append(child._v_name)
elif pandas_type == 'frame':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yield ANY pandas_type objects, so don't need this complicated if

@stephenpascoe
Copy link
Author

I've renamed walk() and included Series objects. However, I think you have a different API in mind. I am deliberately not yielding each Pandas object individually but yielding each PyTables group name, along with a tuple of its contents. This follows the os.walk API. I.e. each yield is

(group_path, [subgroup_name, ...], [subobj_name, ...])

I think there are several advantages:

  1. The consumer can see the difference between groups and Pandas objects
  2. Future extension could allow pruning of the search space by mutating the yielded lists, as is possible with os.walk.

Note also:

  • Some testing of node type is necessary during walk because a Pandas object is also a group to PyTables.
  • All non-pandas leaves are ignored. walk() will only yield groups and Pandas objects.

Please let me know what you think before I write something in whatsnew/0.17.0


Returns
-------
A generator yielding tuples (`path`, `groups`, `frames`) where:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frames -> leaves

@jreback
Copy link
Contributor

jreback commented Aug 31, 2015

ok, looks reasonable. Pls add a whatsnew (and add this to the docs in io.rst), separate-sub-section in the HDF5 section.

@jreback jreback modified the milestones: Next Major Release, 0.17.0 Sep 2, 2015
Stephen Pascoe added 2 commits September 2, 2015 14:00
…les HDF5 file.

This implementation is inspired by os.walk and follows the interface as much as possible.
Including small fix to remove redundant '/' from group names.
@jreback
Copy link
Contributor

jreback commented Oct 18, 2015

closing, pls reopen if you can fix according to comments

@jreback jreback closed this Oct 18, 2015
@jorisvandenbossche jorisvandenbossche modified the milestones: No action, Next Major Release Jul 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Iterate over HDF store hierarchically
3 participants