Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: #10143 Function to walk the group hierarchy of a PyTables HDF5 file #10932

Closed
wants to merge 3 commits into from

Conversation

Projects
None yet
3 participants
@stephenpascoe
Copy link

commented Aug 30, 2015

closes #10143

This implementation is inspired by os.walk and follows the interface as much as possible.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Aug 30, 2015

any reason not to call this just walk?

@jreback jreback added this to the 0.17.0 milestone Aug 30, 2015

@jreback jreback added the Enhancement label Aug 30, 2015

@jreback

This comment has been minimized.

Copy link
Contributor

commented Aug 30, 2015

pls add a note in whatsnew/0.17.0 in enhancements

@jreback

View changes

pandas/io/pytables.py Outdated
if (getattr(child._v_attrs, 'CLASS', None) == 'GROUP'
and pandas_type is None):
groups.append(child._v_name)
elif pandas_type == 'frame':

This comment has been minimized.

Copy link
@jreback

jreback Aug 30, 2015

Contributor

yield ANY pandas_type objects, so don't need this complicated if

@jreback

View changes

pandas/io/pytables.py Outdated
frames = []
for child in g._v_children.values():
pandas_type = getattr(child._v_attrs, 'pandas_type', None)
if (getattr(child._v_attrs, 'CLASS', None) == 'GROUP'

This comment has been minimized.

Copy link
@jreback

jreback Aug 30, 2015

Contributor

yield as you walk, not at the end

@jreback

View changes

pandas/io/tests/test_pytables.py Outdated
'df1': pd.DataFrame([1,2,3]),
'df2': pd.DataFrame([4,5,6]),
'df3': pd.DataFrame([6,7,8]),
'df4': pd.DataFrame([9,10,11]),

This comment has been minimized.

Copy link
@jreback

jreback Aug 30, 2015

Contributor

add a Series and a PyTables object here (e.g. non-pandas) as well

@stephenpascoe

This comment has been minimized.

Copy link
Author

commented Aug 31, 2015

I've renamed walk() and included Series objects. However, I think you have a different API in mind. I am deliberately not yielding each Pandas object individually but yielding each PyTables group name, along with a tuple of its contents. This follows the os.walk API. I.e. each yield is

(group_path, [subgroup_name, ...], [subobj_name, ...])

I think there are several advantages:

  1. The consumer can see the difference between groups and Pandas objects
  2. Future extension could allow pruning of the search space by mutating the yielded lists, as is possible with os.walk.

Note also:

  • Some testing of node type is necessary during walk because a Pandas object is also a group to PyTables.
  • All non-pandas leaves are ignored. walk() will only yield groups and Pandas objects.

Please let me know what you think before I write something in whatsnew/0.17.0

@jreback

View changes

pandas/io/pytables.py Outdated

Returns
-------
A generator yielding tuples (`path`, `groups`, `frames`) where:

This comment has been minimized.

Copy link
@jreback

jreback Aug 31, 2015

Contributor

frames -> leaves

_tables()
self._check_if_open()
for g in self._handle.walk_groups():
if getattr(g._v_attrs, 'pandas_type', None) is not None:

This comment has been minimized.

Copy link
@jreback

jreback Aug 31, 2015

Contributor

shouldn't this be is None?

This comment has been minimized.

Copy link
@stephenpascoe

stephenpascoe Aug 31, 2015

Author

No, HDF5 groups that have 'pandas_type' attribute will be group wrappers around dataframe/series objects. Every Pandas object is wrapped in an HDF5 group.

This comment has been minimized.

Copy link
@jreback

jreback Aug 31, 2015

Contributor

ahh ok

@jreback

View changes

pandas/io/tests/test_pytables.py Outdated
'a1': np.array([[1,2,3], [4,5,6]])
}

with tm.ensure_clean('walk_groups.hdf') as filename:

This comment has been minimized.

Copy link
@jreback

jreback Aug 31, 2015

Contributor

you can just use ensure_clean_store

@jreback

This comment has been minimized.

Copy link
Contributor

commented Aug 31, 2015

ok, looks reasonable. Pls add a whatsnew (and add this to the docs in io.rst), separate-sub-section in the HDF5 section.

@jreback jreback modified the milestones: Next Major Release, 0.17.0 Sep 2, 2015

Stephen Pascoe added some commits Aug 30, 2015

Stephen Pascoe Stephen Pascoe
ENH: #10143 Function to walk the group hierarchy of a PyTables HDF5 f…
…ile.

This implementation is inspired by os.walk and follows the interface as much as possible.

@stephenpascoe stephenpascoe force-pushed the stephenpascoe:issue-10143 branch Sep 2, 2015

Stephen Pascoe Stephen Pascoe
Documentation and whats-new.
Including small fix to remove redundant '/' from group names.

@stephenpascoe stephenpascoe force-pushed the stephenpascoe:issue-10143 branch to b0ae071 Sep 3, 2015

@jreback

This comment has been minimized.

Copy link
Contributor

commented Oct 18, 2015

closing, pls reopen if you can fix according to comments

@jreback jreback closed this Oct 18, 2015

@jorisvandenbossche jorisvandenbossche modified the milestones: No action, Next Major Release Jul 21, 2016

@CharlesB2 CharlesB2 referenced this pull request Jun 6, 2018

Closed

HDFStore.walk() to iterate on groups #21339

4 of 4 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.