-
Notifications
You must be signed in to change notification settings - Fork 475
Closed
Description
Pull request #231 reduced the memory footprint of the training script significantly by using memmaps. But this causes a lot of overhead while data loading and now the data loading part is a bottle neck while training on GPU and causes a significant drop in the training speed (around 5 times). And there is this issue in numpy which highlights that the memmap is incredibly slow in LINUX. Even while using 8 workers to load data using PyTorch dataloader this problem still persists. From initial experiments and profiling, this function seems to be the major bottleneck.
Possible solutions:
- PyTorch Dataloader's prefetch can be improved maybe?
- Remove this heavy pre-processing while training by doing this once and dumping as binary files and use dask with PyTorch dataloader.
- Use HDF5 instead of npy files. Initial experiments show memmapped npy files are much faster to load than HDF5 files.
- Convert the training to Tensorflow 2.0 and use tf.data pipeline. (This is not the most elegant solution)
- DALI. I guess it's more optimized for images
Me and @svlandeg are inclining more towards solution 2.
Suggestions are Welcome!
4cb42b