Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError: hidden layer never emitted an output with multi-gpu training #9

Closed
reactivetype opened this issue Jun 27, 2020 · 7 comments

Comments

@reactivetype
Copy link

I tried your library with a WideResnet40-2 model and used layer_index=-2.

The lightning example works fine for single-gpu but i got the error with multiple GPUs.

@lucidrains
Copy link
Owner

@reactivetype Hi Rindra! It works on my two GPUs for some reason, but I introduced a change here fa1e338 to see if I could make a stab at the problem without being able to reproduce it. Could you let me know if that helped?

@lucidrains
Copy link
Owner

@reactivetype is this multiple GPUs on one machine or distributed?

@reactivetype
Copy link
Author

@lucidrains on one machine. I used 'dp' mode in lightning. i upgrade to 0.1.4 but still get the issue

@lucidrains
Copy link
Owner

@reactivetype that's weird, are you just running the script I supplied in the examples?

@lucidrains
Copy link
Owner

Screen Shot 2020-06-27 at 8 27 53 PM

can you try using ddp?

@QinbinLi
Copy link

Hi @lucidrains ,

I got the same error when using nn.DataParallel(learner) running with 2 GPUs.

@ranliu98
Copy link

Hi @lucidrains ,
thanks for the really nice implementation. I got the same issue using 2 GPUs with "dp".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants