torch.nn.DataParallel error #22

gentlebreeze1 · 2021-06-30T09:10:05Z

trian error 👍 RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!
how to solve this problem? i use to(device),but it do not work.error in @szq0214

szq0214 · 2021-06-30T13:49:40Z

@gentlebreeze1 I did not meet this error before, are you running with our default codes? I guess your GPUs for loading the model and running the model are different, so you encounter this error.

gentlebreeze1 · 2021-07-01T01:06:04Z

@gentlebreeze1 I did not meet this error before, are you running with our default codes? I guess your GPUs for loading the model and running the model are different, so you encounter this error.

i train my own model,use
ckpt={'epoch':epoch+1, 'state_dict':net.module.state_dict(), 'optimizer':optimizer.param_groups[0]['lr']} torch.save(ckpt, weights_path)
and load
model = _create_model(model_name, teacher=False, pretrain=True) checkpoint = torch.load(state_file) if 'state_dict' in checkpoint: model.load_state_dict(checkpoint['state_dict']) else: model.load_state_dict(checkpoint)
I don’t think it’s a problem with model loading，the erorr occurred on Discriminator.conv1

gentlebreeze1 · 2021-07-01T01:52:59Z

i change as follows, it works.Will it affect the final result?@szq0214
class discriminatorLoss(nn.Module):
def init(self, outputs_size,K=2, loss=nn.BCEWithLogitsLoss()):
super(discriminatorLoss, self).init()
self.conv1 = nn.Conv2d(in_channels=outputs_size, out_channels=outputs_size // K, kernel_size=1, stride=1,
bias=True)
outputs_size = outputs_size // K
self.conv2 = nn.Conv2d(in_channels=outputs_size, out_channels=outputs_size // K, kernel_size=1, stride=1,
bias=True)
outputs_size = outputs_size // K
self.conv3 = nn.Conv2d(in_channels=outputs_size, out_channels=2, kernel_size=1, stride=1, bias=True)
self.loss = loss

def forward(self, outputs, targets):
    inputs = [torch.cat((i, j), 0) for i, j in zip(outputs, targets)]
    inputs = torch.cat(inputs, 1)
    batch_size = inputs.size(0)
    target = torch.FloatTensor([[1, 0] for _ in range(batch_size // 2)] + [[0, 1] for _ in range(batch_size // 2)])
    target = target.to(inputs[0].device)
    inputs=inputs[:,:,None,None]
    out = F.relu(self.conv1(inputs))
    out = F.relu(self.conv2(out))
    out = F.relu(self.conv3(out))
    output = out.view(out.size(0), -1)
    res = self.loss(output, target)
    return res

gentlebreeze1 · 2021-07-06T03:29:03Z

my pre model size is 59.7M ,why save model size is 109M? after distillation ,model will grow bigger?@szq0214

szq0214 · 2022-06-20T17:53:32Z

Hi @gentlebreeze1 The model size will not grow after distillation, I guess you saved other parameters in your checkpoint.

szq0214 closed this as completed Jun 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch.nn.DataParallel error #22

torch.nn.DataParallel error #22

gentlebreeze1 commented Jun 30, 2021

szq0214 commented Jun 30, 2021

gentlebreeze1 commented Jul 1, 2021 •

edited

gentlebreeze1 commented Jul 1, 2021

gentlebreeze1 commented Jul 6, 2021

szq0214 commented Jun 20, 2022

torch.nn.DataParallel error #22

torch.nn.DataParallel error #22

Comments

gentlebreeze1 commented Jun 30, 2021

szq0214 commented Jun 30, 2021

gentlebreeze1 commented Jul 1, 2021 • edited

gentlebreeze1 commented Jul 1, 2021

gentlebreeze1 commented Jul 6, 2021

szq0214 commented Jun 20, 2022

gentlebreeze1 commented Jul 1, 2021 •

edited