Queuing from multiple datasets? #13

aamini · 2018-04-20T21:32:09Z

Awesome package!!

Is it possible to load/dequeue data samples from multiple datasets (which maybe inside the same hdf5 file)? For example, lets say we have filename=/path/to/h5_file.h5 which contains two tables: /path/to/table/1 and /path/to/table/2. Both tables contain columns data and labels like on the main README example.

I can make a loader any individual table as suggested on the README:

 loader_dataset1 = tftables.load_dataset(filename='path/to/h5_file.h5',
                                   dataset_path='/path/to/table/1',
                                   input_transform=input_transform, ...)

But would I have to create an entirely different loader to handle the second table? Like this:

 loader_dataset2 = tftables.load_dataset(filename='path/to/h5_file.h5',
                                   dataset_path='/path/to/table/2',
                                   input_transform=input_transform, ...)

Then I would have to load the batches from each table separately and alternate on which to use on every iteration of training:

truth_batch1, data_batch1 = loader_dataset1.dequeue()
truth_batch2, data_batch2 = loader_dataset2.dequeue()

Is there a better way of doing this? I could imagine concatenating both tables into a single table (and thus use a single loader). For clarity, it would make sense to keep the tables separate but if this is the only solution, merging them together is certainly possible. Do you have any suggestions?

The text was updated successfully, but these errors were encountered:

ghcollin · 2018-04-21T17:15:22Z

What's your use case?

If your data is just spread over two different files, you could concatenate the batches together after loading into tensorflow:

truth_batch, data_batch = tf.concat([truth_batch1, truth_batch2], axis=0), tf.concat([data_batch1, data_batch2], axis=0)

aamini · 2018-04-21T17:58:57Z

Thanks, that would work!

aamini closed this as completed Apr 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queuing from multiple datasets? #13

Queuing from multiple datasets? #13

aamini commented Apr 20, 2018

ghcollin commented Apr 21, 2018

aamini commented Apr 21, 2018

Queuing from multiple datasets? #13

Queuing from multiple datasets? #13

Comments

aamini commented Apr 20, 2018

ghcollin commented Apr 21, 2018

aamini commented Apr 21, 2018