Suggestions for improving training speed (especially when input data is large) #48

Robotwithsoul · 2019-08-02T18:25:02Z

Hi, In memory.py, I suggested to change a little bit of your codes, which will be helpful for improving the training speed (especially when the input image data is 3D)

your original codes

state = torch.stack([trans.state for trans in transition[:self.history]]).to(dtype=torch.float32, device=self.device).div_(255)
next_state = torch.stack([trans.state for trans in transition[self.n:self.n + self.history]]).to(dtype=torch.float32, device=self.device).div_(255)

suggested codes

state = torch.stack([trans.state for trans in transition[:self.history]]).to(device=self.device).to(dtype=torch.float32).div_(255)
next_state = torch.stack([trans.state for trans in transition[self.n:self.n + self.history]]).to(device=self.device).to(dtype=torch.float32).div_(255)

Here is the code for testing：

import timeit
import numpy as np
import torch

T,T1=[],[]

device=torch.device('cuda')
for i in range (0,4):

    A=np.zeros((100,100,100),dtype=np.int)
    B=torch.tensor(A,dtype=torch.int)
    T.append(B)
for i in range (0,4):
    A=np.zeros((100,100,100),dtype=np.int)
    B=torch.tensor(A,dtype=torch.int)
    T1.append(B)

# This line is used for initilization
M=torch.stack(T).to(dtype=torch.float32, device=device).div_(255)

# Comparison 
timea=timeit.default_timer()
M=torch.stack(T).to(dtype=torch.float32, device=device).div_(255)
timeb=timeit.default_timer()
N=torch.stack(T1).to(device=device).to(dtype=torch.float32).div_(255)
timec=timeit.default_timer()

print("time1 is:{}\n time2 is:{}".format(timeb-timea,timec-timeb))

Kaixhin · 2019-08-02T18:47:12Z

How odd - do you have any idea why this is the case? Thanks a lot for the script for quick testing - can confirm that I am indeed seeing a speedup (not so significant for smaller tensors, but sure). Feel free to submit a PR for this change!

Robotwithsoul · 2019-08-03T04:53:55Z

I'm not sure, but I guess this is because that the following codes perform the data type conversion in CPU

.to(dtype=torch.float32, device=device)

while the suggested codes perform the data type conversion in GPU

.to(device=device).to(dtype=torch.float32)

Closes Kaixhin#48

Kaixhin added the enhancement label Aug 2, 2019

Kaixhin closed this as completed in 631ce4c Aug 3, 2019

BerenMillidge pushed a commit to BerenMillidge/Rainbow that referenced this issue Dec 20, 2019

Perform replay sampling data type conversion on GPU

97c545e

Closes Kaixhin#48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestions for improving training speed (especially when input data is large) #48

Suggestions for improving training speed (especially when input data is large) #48

Robotwithsoul commented Aug 2, 2019

Kaixhin commented Aug 2, 2019

Robotwithsoul commented Aug 3, 2019

Suggestions for improving training speed (especially when input data is large) #48

Suggestions for improving training speed (especially when input data is large) #48

Comments

Robotwithsoul commented Aug 2, 2019

Kaixhin commented Aug 2, 2019

Robotwithsoul commented Aug 3, 2019