-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data Parallel for PWC-Net #24
Comments
Did exactly this quite recently.
inside of the Mind the preprocessing. |
Thank you for the instructions @v-iashin! Would you mind sharing the error message, @ZacharyGong? |
Hi @sniklaus , I tried the step No.1 in v-iashin's reply. For my case, the code works. I check the flow generated in the multi-GPUs environment, and the flow seems correct. If you can check if this will cause some mistakes, I will be very grateful. I leave my original snippet as follow for future review: I wrote a small code snippet to reproduce the problem that I met
where the
|
Hopefully, it can save some time for you. The problem is that you import this Now imagine the following scenario. The first GPU adds a tensor to To verify this try to print the content of these dicts inside of the |
Thank you for the code @ZacharyGong and thank you for sharing your thoughts with us @v-iashin! The easiest workaround is probably to not use the dictionary as suggested by @v-iashin in his most recent post. That may have a negative impact on performance though that I am unable to predict. Another solution would be to have the tensors in the dict / cache be located on the CPU instead of the GPU, again with a negative effect on performance. The backwards-warping would then be as follows.
|
Thank you guys, both of you give out good solutions! |
Another potentially efficient solution is to add device id into the dictionary keys. Like this: Backward_tensorGrid = {}
Backward_tensorPartial = {}
def Backward(tensorInput, tensorFlow):
if str(tensorFlow.size())+str(tensorFlow.device) not in Backward_tensorGrid:
tensorHorizontal = torch.linspace(-1.0, 1.0, tensorFlow.size(3)).view(1, 1, 1, tensorFlow.size(3)).expand(tensorFlow.size(0), -1, tensorFlow.size(2), -1)
tensorVertical = torch.linspace(-1.0, 1.0, tensorFlow.size(2)).view(1, 1, tensorFlow.size(2), 1).expand(tensorFlow.size(0), -1, -1, tensorFlow.size(3))
Backward_tensorGrid[str(tensorFlow.size())+str(tensorFlow.device)] = torch.cat([ tensorHorizontal, tensorVertical ], 1).cuda()
# end
if str(tensorFlow.size())+str(tensorFlow.device) not in Backward_tensorPartial:
Backward_tensorPartial[str(tensorFlow.size())+str(tensorFlow.device)] = tensorFlow.new_ones([ tensorFlow.size(0), 1, tensorFlow.size(2), tensorFlow.size(3) ])
# end
tensorFlow = torch.cat([ tensorFlow[:, 0:1, :, :] / ((tensorInput.size(3) - 1.0) / 2.0), tensorFlow[:, 1:2, :, :] / ((tensorInput.size(2) - 1.0) / 2.0) ], 1)
tensorInput = torch.cat([ tensorInput, Backward_tensorPartial[str(tensorFlow.size())+str(tensorFlow.device)] ], 1)
tensorOutput = torch.nn.functional.grid_sample(input=tensorInput, grid=(Backward_tensorGrid[str(tensorFlow.size())+str(tensorFlow.device)] + tensorFlow).permute(0, 2, 3, 1), mode='bilinear', padding_mode='zeros')
tensorMask = tensorOutput[:, -1:, :, :]; tensorMask[tensorMask > 0.999] = 1.0; tensorMask[tensorMask < 1.0] = 0.0
return tensorOutput[:, :-1, :, :] * tensorMask
# end |
Interesting approach, thank you for sharing! |
Hi,
Thank you for your implementation, it is really helpful since the original version used the older Python version and need some extra functions in language C.
However, presently I am trying to use your model into my work, and I found this network cannot be warped in the data parallel module. Therefore, it can not profit multi-GPUs. I think the error is caused by 'torch.nn.functional.grid_sample' in the 'Backward' function. I'm wondering if you have any clue on this problem? Thank you in advance!
The text was updated successfully, but these errors were encountered: