Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Regression of Dataloader #23642

Closed
alpha0422 opened this issue Aug 1, 2019 · 1 comment
Closed

Performance Regression of Dataloader #23642

alpha0422 opened this issue Aug 1, 2019 · 1 comment
Labels
module: dataloader Related to torch.utils.data.DataLoader and Sampler module: performance Issues related to performance, either of kernel code or framework glue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@alpha0422
Copy link

馃悰 Bug

Latest change to Dataloader (#19228) leads to severe performance regression for large scale training up to 30%. We finally root the cause to theses change: https://github.com/pytorch/pytorch/blob/master/torch/utils/data/dataloader.py#L889-L891. It causes the exit of each epoch has additional 5 seconds.

To Reproduce

Steps to reproduce the behavior:

# regression.py
import torch
import time

from torch.utils.data import TensorDataset, DataLoader

dataset = TensorDataset(torch.randn(10240, 2))
loader = DataLoader(dataset, batch_size=128, num_workers=2, pin_memory=True, drop_last=False)

for epoch in range(10):
    for idx, data in enumerate(loader):
        data = data[0].cuda()
        if idx == 10240/128-1:
            ts = time.time()
    print("Exit epoch {} elapsed {:.2f}s".format(epoch, time.time()-ts))

Expected behavior

The exit is basically free in pytorch 1.1, but it takes 5s in pytorch 1.2.

# 1.2.0a0
$ python regression.py
Exit epoch 0 elapsed 5.01s       
Exit epoch 1 elapsed 5.05s       
Exit epoch 2 elapsed 5.05s       
Exit epoch 3 elapsed 5.05s       
Exit epoch 4 elapsed 5.05s       
Exit epoch 5 elapsed 5.05s       
Exit epoch 6 elapsed 5.05s       
Exit epoch 7 elapsed 5.05s       
Exit epoch 8 elapsed 5.05s       
Exit epoch 9 elapsed 5.04s

# 1.1.0a0
$ python regression.py
Exit epoch 0 elapsed 0.01s       
Exit epoch 1 elapsed 0.02s       
Exit epoch 2 elapsed 0.03s       
Exit epoch 3 elapsed 0.02s       
Exit epoch 4 elapsed 0.03s       
Exit epoch 5 elapsed 0.03s       
Exit epoch 6 elapsed 0.03s       
Exit epoch 7 elapsed 0.03s       
Exit epoch 8 elapsed 0.02s       
Exit epoch 9 elapsed 0.02s  

Environment

PyTorch version: 1.2.0a0+5b0484d                                          
Is debug build: No                                                        
CUDA used to build PyTorch: 10.1.233                                      
                                                                          
OS: Ubuntu 18.04.2 LTS                                                    
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0                        
CMake version: version 3.14.0                                             
                                                                          
Python version: 3.6                                                       
Is CUDA available: Yes                                                    
CUDA runtime version: 10.1.241                                            
GPU models and configuration:                                             
GPU 0: Tesla V100-SXM2-16GB                                               
GPU 1: Tesla V100-SXM2-16GB                                               
GPU 2: Tesla V100-SXM2-16GB                                               
GPU 3: Tesla V100-SXM2-16GB                                               
GPU 4: Tesla V100-SXM2-16GB                                               
GPU 5: Tesla V100-SXM2-16GB                                               
GPU 6: Tesla V100-SXM2-16GB                                               
GPU 7: Tesla V100-SXM2-16GB                                               
                                                                          
Nvidia driver version: 418.40.04                                          
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.3                
                                                                          
Versions of relevant libraries:                                           
[pip] msgpack-numpy==0.4.3.2                                              
[pip] numpy==1.16.4                                                       
[pip] torch==1.2.0a0+5b0484d                                              
[pip] torchtext==0.4.0                                                    
[pip] torchvision==0.3.0a0                                                
[conda] magma-cuda100             2.1.0                         5    local
[conda] mkl                       2019.1                      144         
[conda] mkl-include               2019.1                      144         
[conda] nomkl                     3.0                           0         
[conda] torch                     1.2.0a0+5b0484d          pypi_0    pypi 
[conda] torchtext                 0.4.0                    pypi_0    pypi 
[conda] torchvision               0.3.0a0                  pypi_0    pypi 

Additional context

The suggest fix is to recover previous lines around https://github.com/pytorch/pytorch/blob/master/torch/utils/data/dataloader.py#L889-L891. For example, following code will fix the problem:

self.worker_result_queue.cancel_join_thread()
self.worker_result_queue.put((0, None))      
self.pin_memory_thread.join()                
self.worker_result_queue.close()   
@vishwakftw vishwakftw added module: dataloader Related to torch.utils.data.DataLoader and Sampler module: performance Issues related to performance, either of kernel code or framework glue labels Aug 1, 2019
@soumith
Copy link
Member

soumith commented Aug 1, 2019

cc: @ssnl

@mrshenli mrshenli added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: dataloader Related to torch.utils.data.DataLoader and Sampler module: performance Issues related to performance, either of kernel code or framework glue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants