Potential memory leak when num_workers > 0 and gc_handler enabled

I am running a task with high I/O demand. So I modified the ParallelAwareDataloader to enable num_workers > 0. As the training went, I observed increasing memory used and finally the task crashed after some dataloader workers are killed due to OOM. 

I tried to delete the gc_handler and just let gc do as it was. The memory leak disappeared but the training slows down. From #148