Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch.nn.DataParallel error #22

Closed
gentlebreeze1 opened this issue Jun 30, 2021 · 5 comments
Closed

torch.nn.DataParallel error #22

gentlebreeze1 opened this issue Jun 30, 2021 · 5 comments

Comments

@gentlebreeze1
Copy link

trian error 馃憤 RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!
how to solve this problem? i use to(device),but it do not work.error in @szq0214
image

@szq0214
Copy link
Owner

szq0214 commented Jun 30, 2021

@gentlebreeze1 I did not meet this error before, are you running with our default codes? I guess your GPUs for loading the model and running the model are different, so you encounter this error.

@gentlebreeze1
Copy link
Author

gentlebreeze1 commented Jul 1, 2021

@gentlebreeze1 I did not meet this error before, are you running with our default codes? I guess your GPUs for loading the model and running the model are different, so you encounter this error.

i train my own model,use
ckpt={'epoch':epoch+1, 'state_dict':net.module.state_dict(), 'optimizer':optimizer.param_groups[0]['lr']} torch.save(ckpt, weights_path)
and load
model = _create_model(model_name, teacher=False, pretrain=True) checkpoint = torch.load(state_file) if 'state_dict' in checkpoint: model.load_state_dict(checkpoint['state_dict']) else: model.load_state_dict(checkpoint)
I don鈥檛 think it鈥檚 a problem with model loading锛宼he erorr occurred on Discriminator.conv1

@gentlebreeze1
Copy link
Author

i change as follows, it works.Will it affect the final result?@szq0214
class discriminatorLoss(nn.Module):
def init(self, outputs_size,K=2, loss=nn.BCEWithLogitsLoss()):
super(discriminatorLoss, self).init()
self.conv1 = nn.Conv2d(in_channels=outputs_size, out_channels=outputs_size // K, kernel_size=1, stride=1,
bias=True)
outputs_size = outputs_size // K
self.conv2 = nn.Conv2d(in_channels=outputs_size, out_channels=outputs_size // K, kernel_size=1, stride=1,
bias=True)
outputs_size = outputs_size // K
self.conv3 = nn.Conv2d(in_channels=outputs_size, out_channels=2, kernel_size=1, stride=1, bias=True)
self.loss = loss

def forward(self, outputs, targets):
    inputs = [torch.cat((i, j), 0) for i, j in zip(outputs, targets)]
    inputs = torch.cat(inputs, 1)
    batch_size = inputs.size(0)
    target = torch.FloatTensor([[1, 0] for _ in range(batch_size // 2)] + [[0, 1] for _ in range(batch_size // 2)])
    target = target.to(inputs[0].device)
    inputs=inputs[:,:,None,None]
    out = F.relu(self.conv1(inputs))
    out = F.relu(self.conv2(out))
    out = F.relu(self.conv3(out))
    output = out.view(out.size(0), -1)
    res = self.loss(output, target)
    return res

@gentlebreeze1
Copy link
Author

my pre model size is 59.7M ,why save model size is 109M? after distillation ,model will grow bigger?@szq0214

@szq0214
Copy link
Owner

szq0214 commented Jun 20, 2022

Hi @gentlebreeze1 The model size will not grow after distillation, I guess you saved other parameters in your checkpoint.

@szq0214 szq0214 closed this as completed Jun 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants