Skip to content

ProgressBar ETA with IterableDataset where __len__ undefined #1518

@g-karthik

Description

@g-karthik

❓ Questions/Help/Support

I've been successfully using ignite with regular Dataset/TensorDataset classes in the past. These have a fixed length and are tied to a DataLoader with a DistributedSampler. Keeping all other training hyper-parameters equal, if I increase the number of nodes/GPUs, I've always noticed that the ETA displayed by the ProgressBar reduces.

Then, I switched to an IterableDataset where the length was computable in advance and so __len__ was defined. There is no DistributedSampler defined in this case because the dataset is iterable: the data files are grouped into distinct subsets in advance and assigned to different ranks. In this scenario too, I noticed that keeping all else equal, the ETA displayed by ProgressBar reduces when the number of nodes/GPUs increases. Some earlier discussion on this here: #1263.

Finally, I came across the setting where I had a massive dataset where the length (i.e., number of data-points) was not computable in advance. So I removed the __len__ definition, making the IterableDataset more generic.

Unfortunately, in this final setting, I find that the ETA displayed by ProgressBar doesn't reduce when the number of nodes/GPUs increases. I tried training for a fixed 50000 iterations, i.e., epoch_length of 50000. I notice that if I train on 1 GPU, the ETA is much lesser than if I train on > 1 GPUs. I also notice that the overall time taken per iteration is much lesser when 1 GPU is used.

I'm confused about this behavior, it doesn't seem like I'm doing something incorrect. Could you please explain what may be happening?

@vfdev-5

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions