Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce source_shapes for H5PYDataset #392

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rubenvereecken
Copy link

@rubenvereecken rubenvereecken commented May 8, 2017

This pull request is meant to initiate discussion and is by no means finished.

I needed to get the dimensions of my data before reading any data from my HDF5 files. There is the num_examples attribute but of course that's only limited to one dimension. I could not find any straightforward way to get all dimensions.. except H5PYDataset.source_shapes seemed to represent what I wanted. But it wasn't really implemented. So it might very well be that I missed a way of getting my sources' dimensions but in the meantime I've implemented the source_shapes attribute to accomplish what I need, albeit in not all possible scenarios.

If you agree that this source_shapes attribute is useful, I could look into how to complete the feature because currently it only works if the user has provided no custom slices (which suits my use case just fine for now).

Little edit to explain why I want these dimensions. I want to specify an input layer's shape in a neural network by looking at the spec that is already present in the HDF5 datasets.

@dmitriy-serdyuk
Copy link
Contributor

This is a useful feature. Do you plan to get one batch from the dataset to compute its shape?

@rubenvereecken
Copy link
Author

No, as far as I know h5py datasets have the shape attribute which should be just fine. I've never used dimension scales though, nor do I know about variable-length datasets. Either way, I think variable length only goes in the first dimension? Whereas if you'd get a batch from the dataset you don't know anything about the first dimension anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants