-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About mghead loss compute question #19
Comments
does the result right? |
the test result is correct, and there are other people are get the same problem |
I also encounter this problem occasionally, but it's hard to reproduce so I didn't pay much attention to it. |
I am checking the loss compute in your repo and the second.pytorch repo, in the original repo, I have never encounter this kind of problem though the loss compute are almost same when train pointpoillars. |
here is some problem in data generate, the invalid Nan Value in gt_boxes velocity leading this problem.
although you modify these part, the velo compute may be still be illegal output. the most dirty way avoid this .
|
Beside CBGS, tring train original pointpillars in nuscenes with the repo.
find the loss compute problem leading to a gradient explosion
here is the first epoch Head1 box_conv weight:
here is the loss output (only compute head1 loss):
in the second epoch:
the head1 cpnv_box weight changed and contain some NaN value:
that's the last layer weight contain nan value leading back propagation to other layer are all nan value, the grad clip are set to:
Another try is that I set the loss value in a fixed num(300), which leading no nan value in all layer weight, and the loss are normal value(which means the problem is the loss compute rather than the network layer compute problem).
@poodarchu
The text was updated successfully, but these errors were encountered: