Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions for improving training speed (especially when input data is large) #48

Closed
Robotwithsoul opened this issue Aug 2, 2019 · 2 comments

Comments

@Robotwithsoul
Copy link

Hi, In memory.py, I suggested to change a little bit of your codes, which will be helpful for improving the training speed (especially when the input image data is 3D)

  • your original codes
state = torch.stack([trans.state for trans in transition[:self.history]]).to(dtype=torch.float32, device=self.device).div_(255)
next_state = torch.stack([trans.state for trans in transition[self.n:self.n + self.history]]).to(dtype=torch.float32, device=self.device).div_(255)
  • suggested codes
state = torch.stack([trans.state for trans in transition[:self.history]]).to(device=self.device).to(dtype=torch.float32).div_(255)
next_state = torch.stack([trans.state for trans in transition[self.n:self.n + self.history]]).to(device=self.device).to(dtype=torch.float32).div_(255)

Here is the code for testing:

import timeit
import numpy as np
import torch

T,T1=[],[]

device=torch.device('cuda')
for i in range (0,4):

    A=np.zeros((100,100,100),dtype=np.int)
    B=torch.tensor(A,dtype=torch.int)
    T.append(B)
for i in range (0,4):
    A=np.zeros((100,100,100),dtype=np.int)
    B=torch.tensor(A,dtype=torch.int)
    T1.append(B)

# This line is used for initilization
M=torch.stack(T).to(dtype=torch.float32, device=device).div_(255)

# Comparison 
timea=timeit.default_timer()
M=torch.stack(T).to(dtype=torch.float32, device=device).div_(255)
timeb=timeit.default_timer()
N=torch.stack(T1).to(device=device).to(dtype=torch.float32).div_(255)
timec=timeit.default_timer()

print("time1 is:{}\n time2 is:{}".format(timeb-timea,timec-timeb))
@Kaixhin
Copy link
Owner

Kaixhin commented Aug 2, 2019

How odd - do you have any idea why this is the case? Thanks a lot for the script for quick testing - can confirm that I am indeed seeing a speedup (not so significant for smaller tensors, but sure). Feel free to submit a PR for this change!

@Robotwithsoul
Copy link
Author

I'm not sure, but I guess this is because that the following codes perform the data type conversion in CPU

.to(dtype=torch.float32, device=device)

while the suggested codes perform the data type conversion in GPU

.to(device=device).to(dtype=torch.float32)

@Kaixhin Kaixhin closed this as completed in 631ce4c Aug 3, 2019
BerenMillidge pushed a commit to BerenMillidge/Rainbow that referenced this issue Dec 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants