Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which backprop method is correct for RNN? #635

Closed
csarofeen opened this issue Jan 29, 2017 · 2 comments
Closed

Which backprop method is correct for RNN? #635

csarofeen opened this issue Jan 29, 2017 · 2 comments

Comments

@csarofeen
Copy link
Contributor

Accumulating loss incrementally with timestep as in the tutorial, and sending all tiemsteps to RNN seem to produce the same output/hidden/loss but loss.backwards is calculating different parameter gradients. Is there a correct and incorrect method to do this? Which is right?

import torch
from torch import nn
from torch.autograd import Variable

torch.backends.cudnn.enabled=False

model  = nn.LSTM(5, 5, 1).cuda()
model2 = nn.LSTM(5, 5, 1).cuda()

for i in range(len(model2.all_weights)):
    for j in range(len(model2.all_weights[i])):
        model2.all_weights[i][j].data.copy_(model.all_weights[i][j].data)

crit = nn.MSELoss().cuda()
crit2 = nn.MSELoss().cuda()

input = Variable(torch.randn(2,1,5).cuda())
target = Variable(torch.ones(2,1,5).cuda(), requires_grad=False)
hidden = [ Variable(torch.randn(1,1,5).cuda().fill_(0.0)),
            Variable(torch.randn(1,1,5).cuda().fill_(0.0))]

output, hidden = model(input, hidden)
loss = crit(output, target)
loss.backward(retain_variables=True)

hidden2 = [ Variable(torch.randn(1,1,5).cuda().fill_(0.0)),
            Variable(torch.randn(1,1,5).cuda().fill_(0.0))]

loss2 = 0
for i in range(input.size(0)):
    output2, hidden2 = model(input[i].view(1,1,-1), hidden2)
    loss2 += crit2(output2[0], target[i])

loss2 = loss2/2
loss2.backward(retain_variables=True)

diff = 0
max_w = 0
for i in range(len(model2.all_weights)):
    for j in range(len(model2.all_weights[i])):
        diff = max(diff, (model2.all_weights[i][j].grad - model.all_weights[i][j].grad).abs().max().data[0])
        
        max_w = max(model2.all_weights[i][j].grad.max().data[0], max_w)
        max_w = max(model.all_weights[i][j].grad.max().data[0], max_w)

dh = (hidden[0]-hidden2[0]).abs().max().data[0]
dc = (hidden[1]-hidden2[1]).abs().max().data[0]
do = (output[1]-output2).abs().max().data[0]
dl = (loss-loss2).abs().max().data[0]

print("Diff in output : " + str(do))
print("Diff in hidden states : "+str(dh) +", "+str(dc))
print("Diff in loss : " + str(dl))

print("Max weight grad found : " +str(max_w))
print("Diff in weight grad : " + str(diff))
@apaszke
Copy link
Contributor

apaszke commented Jan 29, 2017

You've used the same model twice. You probably meant to use model2 in the for loop. Once you change it the diff in weight grad is 0.

@apaszke apaszke closed this as completed Jan 29, 2017
@csarofeen
Copy link
Contributor Author

Sorry about that. Thanks!

zou3519 pushed a commit to zou3519/pytorch that referenced this issue Mar 30, 2018
mrshenli pushed a commit to mrshenli/pytorch that referenced this issue Apr 11, 2020
mcarilli pushed a commit to mcarilli/pytorch that referenced this issue Feb 2, 2021
Summary:
CUDA TK >= 11.1 provides ptxjitcompiler that emits SASS instead of PTX.
1. This gives better backward-compatibility that allows future TK to work with older driver, which might not necessarily be able to load generated PTX through JIT compile and would error out at runtime;
https://docs.nvidia.com/deploy/cuda-compatibility/#using-ptx
2. Meanwhile, SASS doesn't provide good future compatibility, so for unsupported arch, we fallback to PTX to support future device.
https://docs.nvidia.com/deploy/cuda-compatibility/index.html#cubin-compatibility

Pull Request resolved: pytorch#50319

Reviewed By: malfet

Differential Revision: D26114475

Pulled By: ngimel

fbshipit-source-id: 046e9e7b3312d910f499572608a0bc1fe53feef5
KyleCZH pushed a commit to KyleCZH/pytorch that referenced this issue Sep 20, 2021
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants