Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maml 方法似乎不支持多gpu训练 #10

Closed
ypy516478793 opened this issue Oct 8, 2021 · 4 comments
Closed

maml 方法似乎不支持多gpu训练 #10

ypy516478793 opened this issue Oct 8, 2021 · 4 comments

Comments

@ypy516478793
Copy link

maml方法能在单个gpu上训练,但在多个gpu上平行训练会报错。具体错误如下:

  File "/home/cougarnet.uh.edu/pyuan2/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/home/cougarnet.uh.edu/pyuan2/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/cougarnet.uh.edu/pyuan2/Projects/LibFewShot/core/model/backbone/conv_four.py", line 69, in forward
    out1 = self.layer1(x)
  File "/home/cougarnet.uh.edu/pyuan2/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/cougarnet.uh.edu/pyuan2/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/home/cougarnet.uh.edu/pyuan2/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/cougarnet.uh.edu/pyuan2/Projects/LibFewShot/core/model/backbone/utils/maml_module.py", line 63, in forward
    if self.weight.fast is not None and self.bias.fast is not None:
AttributeError: 'Tensor' object has no attribute 'fast'
@wZuck
Copy link
Member

wZuck commented Oct 10, 2021

你好,感谢你的反馈,我们正在解决这个问题,会尽快回复。

@yangcedrus
Copy link
Contributor

你好,关于你说的maml方法多gpu的问题,我们发现确实存在这样的问题。并且如果要修改支持多gpu的话,需要对代码进行较大的改动。我们打算在之后进行一次更新,来修复这些比较大的问题。

@ypy516478793
Copy link
Author

你好,关于你说的maml方法多gpu的问题,我们发现确实存在这样的问题。并且如果要修改支持多gpu的话,需要对代码进行较大的改动。我们打算在之后进行一次更新,来修复这些比较大的问题。

好的,谢谢!

@yangcedrus
Copy link
Contributor

MAML现在可以多gpu进行训练了。

有一个没有解决的问题是MAML在DistributedDataParallel下不能和SyncBatchNorm同时使用,我们后续会分析缺少同步操作对最终结果的影响,并寻找相应的解决办法。

@RL-VIG RL-VIG locked and limited conversation to collaborators Oct 13, 2022
@wZuck wZuck converted this issue into discussion #59 Oct 13, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants