Which backprop method is correct for RNN? #635

csarofeen · 2017-01-29T19:38:22Z

Accumulating loss incrementally with timestep as in the tutorial, and sending all tiemsteps to RNN seem to produce the same output/hidden/loss but loss.backwards is calculating different parameter gradients. Is there a correct and incorrect method to do this? Which is right?

import torch
from torch import nn
from torch.autograd import Variable

torch.backends.cudnn.enabled=False

model  = nn.LSTM(5, 5, 1).cuda()
model2 = nn.LSTM(5, 5, 1).cuda()

for i in range(len(model2.all_weights)):
    for j in range(len(model2.all_weights[i])):
        model2.all_weights[i][j].data.copy_(model.all_weights[i][j].data)

crit = nn.MSELoss().cuda()
crit2 = nn.MSELoss().cuda()

input = Variable(torch.randn(2,1,5).cuda())
target = Variable(torch.ones(2,1,5).cuda(), requires_grad=False)
hidden = [ Variable(torch.randn(1,1,5).cuda().fill_(0.0)),
            Variable(torch.randn(1,1,5).cuda().fill_(0.0))]

output, hidden = model(input, hidden)
loss = crit(output, target)
loss.backward(retain_variables=True)

hidden2 = [ Variable(torch.randn(1,1,5).cuda().fill_(0.0)),
            Variable(torch.randn(1,1,5).cuda().fill_(0.0))]

loss2 = 0
for i in range(input.size(0)):
    output2, hidden2 = model(input[i].view(1,1,-1), hidden2)
    loss2 += crit2(output2[0], target[i])

loss2 = loss2/2
loss2.backward(retain_variables=True)

diff = 0
max_w = 0
for i in range(len(model2.all_weights)):
    for j in range(len(model2.all_weights[i])):
        diff = max(diff, (model2.all_weights[i][j].grad - model.all_weights[i][j].grad).abs().max().data[0])
        
        max_w = max(model2.all_weights[i][j].grad.max().data[0], max_w)
        max_w = max(model.all_weights[i][j].grad.max().data[0], max_w)

dh = (hidden[0]-hidden2[0]).abs().max().data[0]
dc = (hidden[1]-hidden2[1]).abs().max().data[0]
do = (output[1]-output2).abs().max().data[0]
dl = (loss-loss2).abs().max().data[0]

print("Diff in output : " + str(do))
print("Diff in hidden states : "+str(dh) +", "+str(dc))
print("Diff in loss : " + str(dl))

print("Max weight grad found : " +str(max_w))
print("Diff in weight grad : " + str(diff))

The text was updated successfully, but these errors were encountered:

apaszke · 2017-01-29T19:44:07Z

You've used the same model twice. You probably meant to use model2 in the for loop. Once you change it the diff in weight grad is 0.

csarofeen · 2017-01-29T19:47:56Z

Sorry about that. Thanks!

…e it in CI (pytorch#635) onnx/onnx@1a067ba

Reorder text tutorials

Summary: CUDA TK >= 11.1 provides ptxjitcompiler that emits SASS instead of PTX. 1. This gives better backward-compatibility that allows future TK to work with older driver, which might not necessarily be able to load generated PTX through JIT compile and would error out at runtime; https://docs.nvidia.com/deploy/cuda-compatibility/#using-ptx 2. Meanwhile, SASS doesn't provide good future compatibility, so for unsupported arch, we fallback to PTX to support future device. https://docs.nvidia.com/deploy/cuda-compatibility/index.html#cubin-compatibility Pull Request resolved: pytorch#50319 Reviewed By: malfet Differential Revision: D26114475 Pulled By: ngimel fbshipit-source-id: 046e9e7b3312d910f499572608a0bc1fe53feef5

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

apaszke closed this as completed Jan 29, 2017

zou3519 pushed a commit to zou3519/pytorch that referenced this issue Mar 30, 2018

[auto] Update onnx to 1a067ba - fix all python lint errors and enforc…

ece2883

…e it in CI (pytorch#635) onnx/onnx@1a067ba

mrshenli pushed a commit to mrshenli/pytorch that referenced this issue Apr 11, 2020

Merge pull request pytorch#635 from SethHWeidman/update_index_rst

f716a91

Reorder text tutorials

KyleCZH pushed a commit to KyleCZH/pytorch that referenced this issue Sep 20, 2021

conda: Specify != 11.0.221 for cudatoolkit (pytorch#635)

84338ea

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which backprop method is correct for RNN? #635

Which backprop method is correct for RNN? #635

csarofeen commented Jan 29, 2017

apaszke commented Jan 29, 2017

csarofeen commented Jan 29, 2017

Which backprop method is correct for RNN? #635

Which backprop method is correct for RNN? #635

Comments

csarofeen commented Jan 29, 2017

apaszke commented Jan 29, 2017

csarofeen commented Jan 29, 2017