New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: #5
Comments
Hi, Thanks for your kind question! Could you share the environment variables with us? We haven't encounter such problems using this code base. Really appreciate that you could reply. Best, |
Package Version thank you very much |
same problem. |
Hi, We have carefully checked the source codes and environments, this bug is from torch.dist.distributed. We thought apex was not required before. However, due to the implemention in torch DDP, we could not train the supernet in SPOS mechaism. Thus, to solve this bug, it's necessary to run over apex package. You should install apex before supernet training. We would fix installation steps in README.md. Thanks. |
The same error occurs, when using apex. |
Adding "for name, param in model.named_parameters(recurse=True): param.grad = None" at the beginning of update_student_weights_only solves my problem. It is caused by optimizer.step(), which changes the parameters of meta network. |
In our experience, if the installation strictly follows the README, this issue should not occur. |
HI, Could you share your environment variables with us? We have tested the codes. When using apex(installed following REAME), it should not occur. Best, |
hello i want to ask where you add the code? i ocuur the same problem after i have installed apex using pip. |
Hi, You should install apex with cpp extension and cuda extension as indicated in this URL
Or you could add the above codes as SPOS did: Set the grad to None in each training iteration. Best, |
I encountered with a runtime error when I tried to search for an architecture based on your code.
I tried to locate the source of the error, and I find that whenever the code update the meta network or add the kd_loss to the final loss the error above appears.
How can I fix this problem?
The text was updated successfully, but these errors were encountered: