Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queuing from multiple datasets? #13

Closed
aamini opened this issue Apr 20, 2018 · 2 comments
Closed

Queuing from multiple datasets? #13

aamini opened this issue Apr 20, 2018 · 2 comments

Comments

@aamini
Copy link

aamini commented Apr 20, 2018

Awesome package!!

Is it possible to load/dequeue data samples from multiple datasets (which maybe inside the same hdf5 file)? For example, lets say we have filename=/path/to/h5_file.h5 which contains two tables: /path/to/table/1 and /path/to/table/2. Both tables contain columns data and labels like on the main README example.

I can make a loader any individual table as suggested on the README:

 loader_dataset1 = tftables.load_dataset(filename='path/to/h5_file.h5',
                                   dataset_path='/path/to/table/1',
                                   input_transform=input_transform, ...)

But would I have to create an entirely different loader to handle the second table? Like this:

 loader_dataset2 = tftables.load_dataset(filename='path/to/h5_file.h5',
                                   dataset_path='/path/to/table/2',
                                   input_transform=input_transform, ...)

Then I would have to load the batches from each table separately and alternate on which to use on every iteration of training:

truth_batch1, data_batch1 = loader_dataset1.dequeue()
truth_batch2, data_batch2 = loader_dataset2.dequeue()

Is there a better way of doing this? I could imagine concatenating both tables into a single table (and thus use a single loader). For clarity, it would make sense to keep the tables separate but if this is the only solution, merging them together is certainly possible. Do you have any suggestions?

@ghcollin
Copy link
Owner

What's your use case?

If your data is just spread over two different files, you could concatenate the batches together after loading into tensorflow:

truth_batch, data_batch = tf.concat([truth_batch1, truth_batch2], axis=0), tf.concat([data_batch1, data_batch2], axis=0)

@aamini
Copy link
Author

aamini commented Apr 21, 2018

Thanks, that would work!

@aamini aamini closed this as completed Apr 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants