-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
❓ [QUESTION] Train models in single precision, but evaluate them in double precision #92
Comments
Depends on what your goal is: if you just want float64 to come out instead of float32, definitely, but whether you will gain any actual improvement from this, I'm not so sure (since all the weights will be noise past single precision, though I guess you could avoid rounding errors in the intermediate states... I feel like that should only matter for pathological inputs/weights, but maybe these are). Are you using a deployed model? In Python or C++? This will affect how to do it. Side note: a deployed NequIP model is still a full-fledged PyTorch compute graph including arbitrary autograd support. Depending on which finite differences you are trying to compute, using the autograd engine may be a much better approach. |
I'm trying to perform geometry optimizations using ASE. Initially it works well but after a while it gets stuck (energy no longer changes while fmax ~ 0.01 eV/A) and returns with an error that says "Gradient and/or function calls were not changing. May indicate that precision was lost, i.e., the routine did not converge.". I was using this CG optimizer from scipy, but also quasi-newton methods like BFGS seem to get stuck. However, even though it did not successfully complete optimization, the Hessian matrix already turned out to be positive definite so I'm not exactly sure whether there is in fact a problem. I was just curious about what would happen when the model is float64. Would something like PS: You're right that it's noise, but if the precision is high, then the noise is also precisely constant and will get cancelled when taking finite differences. When the precision is lower, then that will no longer be the case. I've experienced a similar issue when dealing with classical force fields that are evaluated on a GPU; numerical estimates of e.g. the stress tensor were in my experience only possible when requiring double precision during the energy evaluation. |
So unfortunately it's a little more involved then that: currently NequIP "freezes" the compiled model as part of deployment, which allows PyTorch to inline all weights, buffers, etc. and do any optimizations that enables. A side effect of freezing is that nothing is registered as a parameter/buffer anymore— everything is an inline constant in the TorchScript graph, which includes its For now, the workaround is to not freeze models at deployment in your install. This can be acheived on I plan to add some way to flag to NequIP to optionally not freeze the model during deployment, so this should have a more lasting fix, but not sure when that will land. Also cc'ing @simonbatzner who may be interested in the geometry minimization part. |
But what if you create a double precision model first? Something like |
Yes, that modification would also work— it's why I'm considering changing the way NequIP does this to freeze the loaded model right before using it, rather than before saving, to maintain this kind of flexibility... just trying to get some clarify from the PyTorch people first about backward compatability, etc. |
I can confirm that this works! Simply adding |
Interesting! Glad to hear this solved your problem. Since it appears that this may be something that is useful in general for arbitrary potentials, I will try to see if there is a way to incorporate flexibility about dtype for deployment directly into nequip in the future. |
FYI, this is now fixed--- models are now frozen right before they are used, and the
|
Is it possible to train models in single precision, but deploy them in double precision? I'm trying to use my single-precision-trained models to compute some finite differences, and this is typically only possible when the output of the models uses double precision.
The text was updated successfully, but these errors were encountered: