-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically assigned tmin may become too small #180
Comments
Setting As a result of the low quality step, the trust radius is decreased (the step is not necessarily rejected, as that only happens for very low quality steps). The reason for lowering the trust radius is that the first order Taylor expansion becomes exact in the limit of small step sizes, so the step quality should eventually become good. However, when the gradient and energy are inconsistent, there is a chance the step quality never becomes good. A long (perhaps endless) sequence of low-quality steps is what results. The reason for the current setting of tmin is because if the convergence criteria is satisfied for the RMS and max gradient and there is an energy/gradient inconsistency that causes an endless loop before the other three criteria could be met (RMS & max step & energy change), the trust radius can be reduced until it is well below Convergence_drms, causing the optimization to converge for the RMS / max displacement (and presumably the energy change will have fallen below the criterion in the meantime). This was designed to prevent exactly the behavior you are describing. However, if the RMS / max gradient criteria have not been met yet, for example if the energy and gradient have a large inconsistency on the order of 1e-3 or greater, then one could still end up with an endless sequence of low-quality steps. In other words, your description makes me suspect that the source of the problem is the electronic structure method and not the geometry optimization. It's true that if the value tmin were set larger, one has a better chance of "jumping" out of the bad region and hope to end up in a region of the PES where the energy and gradient are consistent again; my experience is this never happens. I have two suggestions for the next step - (1) If you could share the output file and ORCA input file with me, I could help to diagnose further. (2) You can try the finite difference tool in If the energy and gradient are consistent, then you should see agreement of 1e-5 or lower for the analytic and numerical gradient. If the inconsistency is larger than 1e-3 then that is probably the root cause of your failed optimization. PS: It is possible that we could somehow detect poor convergence behavior such as these endless loops and switch the algorithm to a pure energy-based one (such as the Powell direction-set method or Nelder-Mead simplex method). That would require writing more code than just the algorithm. Since these algorithms are designed for when gradients are unavailable, the gradient convergence criteria should no longer be used. We would also need to implement the capability in the engine to compute just the energy (and not the gradient) for efficiency reasons. This would be a major contribution to the code and require quite a bit of testing to ensure it works though. |
Thanks for your help. I'm pretty sure ORCA has some numerical noise in energy and gradients because of RIJCOSX approximation, exspecially by COSX part and a way of its implementation (different grid is used for the final SCF iteration & gradient). However, I'm not sure if we should really dig into gradient quality now, because all possible solutions would slow down calculations significantly. Even if gradient quality is not stellar, with What do you think about my code? It sets self.tmin = kwargs.get('tmin', min(max(1.0e-4, self.Convergence_drms * 0.1), self.Convergence_drms)) |
It's possible your code could be a good solution, but I can't decide that without understanding the behavior that causes the |
To answer your other question:
I think it's important for tmin to have the ability to become smaller than drms in order to cause convergence in the situation where: PS: If tmin is set to 1e-4, that is less than twice as large as the default setting of 6e-5 using your settings. If I could see the log file, that would help me understand why one setting of tmin helps convergence but the other does not. |
Here are the logs. However, I'm working with modified version of geomeTRIC. All my modifications can be seen at annulen/geomeTRIC@master...smart_reset and I believe they have no influence oin convergence (except for This is failed optimization: PMDA_chair_dimer2_3.log UPD: just to clarify, my change is responsible for |
Out of curiosity: why is it so important to decrease trust radius? By default ORCA is using RFO steps with static trust radius 0.3, and steps are nevertheless decreasing and eventually converge (but of course it's still far from perfect, that's why I'm here:) If it was possible I'd prefer to keep trust radius above 0.005 a.u. to make sure that gradient quality is not an issue. |
I think I understood it. Most quantum chemistry programs implement algorithms with built-in step limiting. The most popular ones are rational function (RF or RFO) and quasi-Newton (QN), some programs also implement GDIIS and Newton-Raphson (NR). For example, by default ORCA uses RF and switches to QN when RF step is too small, while Firefly uses GDIIS and later switches to NR. However, geomeTRIC instead does a full step of trust radius size, and step limiting completely depends on its ability to reduce radius. If this is the case, I guess I'll have to implement at least RF and QN before I can start using geomeTRIC in production. |
The algorithm geomeTRIC uses for energy minimization has always been a quasi-Newton (QN) method. For transition state optimization it uses RFO. In both cases the trust radius is used to limit the step size. It is possible for the step to be smaller than the trust radius, in fact that happens very often, though it may not have happened in your calculations because the trust radius became very small. You can read more about the theory here: https://geometric.readthedocs.io/en/latest/how-it-works.html I looked at your output files. Starting at around step ~20, the step quality is no longer good and the trust radius is reduced to 1e-4 ( One potential way to resolve this issue is to change how the trust radius is adjusted. We can consider the step state is "good" even if the step quality is bad if the energy convergence criteria has already been met (which is true in your case), even if the energy change is not as expected. The following may work - add the following line immediately above here:
https://github.com/leeping/geomeTRIC/blob/master/geometric/optimize.py#L560 |
This might be better:
|
To answer your previous question:
As I said before, the reason for decreasing the trust radius is to ensure the second-order Taylor expansion of the energy is a good approximation. Otherwise the optimizer could take bad steps (for example, Step 5 in |
Just for your information, I can understand diffs (and find them easier to use than instructions like "add X before line Y"). If I understood it correctly, you last change in that way would look like diff --git a/geometric/optimize.py b/geometric/optimize.py
index 0521f1c..6ea6d84 100644
--- a/geometric/optimize.py
+++ b/geometric/optimize.py
@@ -565,7 +565,8 @@ class Optimizer(object):
# 2020-03-10: Step quality thresholds are hard-coded here.
colors = {}
if Quality > 0.75: step_state = StepState.Good
- elif Quality > (0.5 if params.transition else 0.25): step_state = StepState.Okay
+ elif Converged_energy or (Quality > (0.5 if params.transition else 0.25)): step_state = StepState.Okay
+ #elif Quality > (0.5 if params.transition else 0.25): step_state = StepState.Okay
elif Quality > 0.0: step_state = StepState.Poor
else:
colors['energy'] = "\x1b[92m" if Converged_energy else "\x1b[91m" |
That's correct. |
Thanks for explanation. I was thinking of trust radius just as a safety net for optimizer in case something goes wrong, and was really surprised to see it dropping down so low. |
Yes, I think one insight I gained from this is that the trust radius only works properly as a safety net if one assumes the energy and gradient are consistent. Otherwise it does not work as intended and it can actually slow things down. There are situations like yours in which the convergence could be sped up if the trust radius had not decreased as much as it did. I'd be interested to know if the fix above solves the problem. |
Ok, then I'll reproduce the same state of code that I had when doing PMDA_chair_dimer2_3.log, plus your patch on top, so we could make a better comparison. |
While it's going, another question out of curiosity: why are you using QN instead of RFO for energy minimization? I thought RFO was more reliable, especially with bad input geometry. It's used by default at least in ORCA and Molpro. |
For energy minimizations, RFO and QN are very similar and with a reasonable default parameter choice for RFO, they are equivalent (see attached paper). The transition state optimization uses partitioned RFO to maximize along one mode and minimize the others. WIREs Comput Mol Sci - 2011 - Schlegel - Geometry optimization.pdf |
Thanks!
For the record, ORCA also use PRFO for TS by default. |
Thinking more of your patch, I start to doubt that it would be helpful for my bigger system. When I optimized 9-molecule cluster with ORCA I've reached situation when energy changes were minimal, but gradients didn't reduce, and I had to reduce trust radius manually to get any reasonable steps. Your change completely stops updating trust radius after energy has converged so I'm afraid geomeTRIC can get stuck the same way for 9-molecule cluster. |
To clarify, I haven't tried 9-molecule cluster with geomeTRIC yet, only with built-in ORCA's optimizer. |
You could make it more aggressive in the following way, this will increase the trust radius as long as the energy convergence criterion is being met. We would need to perform more testing to ensure this doesn't break something else though. diff --git a/geometric/optimize.py b/geometric/optimize.py
index 0521f1c..6ea6d84 100644
--- a/geometric/optimize.py
+++ b/geometric/optimize.py
@@ -565,7 +565,8 @@ class Optimizer(object):
# 2020-03-10: Step quality thresholds are hard-coded here.
colors = {}
+ if Converged_energy or Quality > 0.75: step_state = StepState.Good
- if Quality > 0.75: step_state = StepState.Good
+ # if Quality > 0.75: step_state = StepState.Good
elif Quality > (0.5 if params.transition else 0.25): step_state = StepState.Okay
elif Quality > 0.0: step_state = StepState.Poor
else:
colors['energy'] = "\x1b[92m" if Converged_energy else "\x1b[91m" |
On the other hand I've never set trust radius values lower than 1e-3 in ORCA, so maybe it won't be an issue. |
How do you think, could it be a good idea to use different convergence criteria for cases when energy goes up or down? When I was dealing with ORCA's optimizer it always seemed weird to me to get a "converged" message after a series of positive energy changes (with decreasing gradient). IMO if energy keeps climbing it's by no means converged. |
My first thought is that I don't think it's a good idea to apply different criteria based on the direction of the energy change. That has not been done in the computational literature as far as I know, and if we start to do that, the community might think it's being done to "declare victory" prematurely. I haven't seen many cases where the energy goes up monotonically as you are observing; in a different calculation you might see the energy fluctuate up and down. I don't think ORCA is applying different criteria based on the direction of the energy change, but it would be good to know precisely what criteria it is using. If the energy goes up but the absolute value of the energy change in each step is below the convergence criteria, it is possible for the optimization to converge even if the energy goes up by a nontrivial amount (say, 1e-5) over a number of steps (say, 20 steps). It is valid for you to state that the calculation is by not converged - after all, it had a lower-energy structure 20 steps ago - but there is simply no way to fix the situation other than by addressing energy / gradient consistency. In my opinion, energy/gradient consistency is not addressed carefully in the published literature, and there is always an incentive to cut corners in accuracy in favor of speed, so as a result the default settings of many codes are in a "gray area" where things are not exactly correct but results are still mostly reproducible. |
My idea was actually the opposite: don't declare victory if we are going up, unless change is really small. E.g. 1e-6 when going down and 1e-7 when going up. But if nobody have done it yet than it's probably a bad idea.
Convergence criteria that I'm using now exactly replicate criteria that I previously used in ORCA:
where all criteria are mandatory. Default behavior of ORCA is different, it has a few "profiles" and interprets criterai more liberally (e.g., if energy, drms, and dmax converged, and grms is way below requirement, than we can ignore gmax being a couple of times higher than need). I hated that "liberal" thing and turned it off using undocumented option found in binary file that makes it behave like geomeTRIC in that regard ( |
Thanks - I think I understand your motivation for wanting to have a tighter energy criterion if the energy goes up. It is because the optimization converges if the energy is increasing by a small amount each step, but we know it's not really converged. However, there are some complicating factors for this idea - one is that very similar criteria are also used in constrained geometry optimization and transition state optimization. In those calculations, some amount of energy increase is expected if required to satisfy the constraint or maximize the energy along the reaction coordinate. Therefore, one would have to turn this on only for unconstrained energy minimizations. The other reason is that I think the "energy steadily increasing" behavior only happens when the energy / gradient are inconsistent when close to convergence, and one cannot expect to get rigorous optimization results if such an inconsistency exists. Probably the best we can do is find an approximate solution and tweak the step size control and convergence criteria so that the optimization doesn't take forever in a frustrating way. |
However, if you think I've made things too tight, it's not the case: grms and gmax that I've actually reached in ORCA with 9-molecule cluster were higher, I've never seen any one of them ever shown as converged. Oh, and ORCA uses gradient in internal coordinates for evaluation of grms and gmax criteria, so behavior differs depending on chosen coordinate system. |
The inconsistency you are seeing might be grid related. Using a larger grid (such as (99, 590)) might improve your success rate when using your thresholds, which are relatively tight. But without going to a larger grid, I think that implementing one of the "diffs" above should result in convergence even if the steady energy increase is unsettling. |
Actually, it's not only about frustration, as I haven't tell you the complete story yet. As you might guess, 9-molecule cluster has lots of H-bonds in it, and dihedral angles that include them have very shallow potential energy surfaces. Changing them roughly corresponds to spatial rotation of individual molecules of cluster. ORCA by default uses redundant internal coordinates which I thought was a main source of troubles. On the other hand, optimization in Cartesian coordinates behaved even worse, and I thought it was caused by a poor implementation of ORCA's Cartesian optimizer (which also lacks some features so it was not a completely unreasonable assumption). |
So, my hope was that TRIC coordinates will allow quicker path to convergence, and tight gradient criteria would ensure that molecules are oriented in the most optimal way |
In case you are interested, here is an example output from ORCA with default grids: Note that there is DFT grid and three COSX grids. |
Here is a log with your patch from #180 (comment) As you can see, it starts exactly the same as |
And indeed, trust stops at |
Thanks! I still think that checking the energy / gradient consistency using finite difference is a good way to find the true cause of the issue. Within ORCA there is the NUMGRAD option, I believe you can use it to compute a single-point numerical gradient. I bet that if the max gradient error could be made less than 1e-5, you will no longer observe this strange behavior. I think one of the two "diffs" in this thread, where |
Will try it next. BTW, here is a trajectory file from the latest optimization: It seems like some steps contain a rotation of system as a whole. Grids are not rotationally invariant, so that may cause additional jitter. I guess it could be a good idea to always have translational coordinates of one of the fragments frozen, so that whole system rotations won't happen. |
Thanks, I can see the slight whole-system rotation at the end. This has been observed in several other software packages as well. GeomeTRIC deals with this in the following way:
It isn't a perfect solution but I think it works well enough. Using your input file I was able to create an equivalent input for Q-Chem. I have tightened the grid in that calculation and I will let you know what the result looks like. |
However it often kicks in late in optimization progress. While pinning down center of mass of one of the fragments looks like a reasonable constraint and can be used for any system right from the start by default (e.g. pin fragment 1, and allow user to choose different pinned fragment as an option separate from other constraints). Yep it decreases degrees of freedom from 3N to 3N-3, but I don't see how could it harm optimization.
Note the devil in details. ORCA uses RIJCOSX by default. RIJCOSX is RI (resolution of identity aka density fitting) for Coulomb integrals and COSX (chain of spheres — numerical integration scheme actually) for exchange integrals. I don't know if QChem supports those, especially COSX, and without it stability may be much better. However, it allows DFT to scale better, larger clusters get bigger time savings. |
With one center of mass pinned down, the system can still undergo a net rotation, although the axis of rotation will change location. Even if you pinned two centers of mass, there is still the possibility of rotation about the axis connecting the centers of mass.
Thanks, I did note ORCA's use of the RIJCOSX approximation. I am thinking that if the optimization also fails in Q-Chem, it could point to a problem with the optimization algorithm. Otherwise, it could point to the root cause being an energy/gradient inconsistency possibly caused by one of the grids. I consider it part of the process of elimination to find the root issue. |
Here are results with Results are rather... curious. When trust radius started growing after step 26, Quality quickly went down to large negative values, however it converged at step 34 with a step of Quality 24.7 Of course, final energy I've also tried out my idea with pinning of center of the first molecule. My constraints file looked like
Seems like it did what I intended and it reduced whole system rotations, however overall convergence got worse: |
Here's the Q-Chem result. I think it's more well-behaved than what you've been seeing so far; toward the end of the optimization, the energy changes are negative and the step size drops below the convergence criteria by itself. There is still some net rotation that gets projected out at step 27. |
Thanks for sharing your result. Here's what I see happening: When the energy increases but the size of the increase is below the energy convergence criterion, the step is counted as "Good" and the trust radius increases (step 25-27). But then the step is large enough for the energy increase to be larger than the convergence criterion, causing the step to be rejected (28). This indicates to me that the more "moderate" approach, i.e. |
I guess that it should actually depend on trust radius value. If it's considered "too low" (e.g. Or I should simply use higher As for QChem results: from energy values I assume it uses exact DFT calculation, and it's no surprise that those gradients are more precise. I can try to run In ORCA I also tried GDIIS which I suppose to be even more stable, however it didn't make any substantial improvement there. Though I guess it could be a nice to have addition to geomeTRIC, even as a "checkbox" feature :) |
Also, I'd like to notice that small systems like this one usually converge relatively well in ORCA with the same settings. I've just picked particular system that exposed pathologically bad convergence which looked suspiciously similar to my experience with 9-molecule cluster. |
I think for your calculation which has the behavior of steadily increasing energy, setting I think running I don't think one could play with tmin or anything else to guarantee good convergence if the energy and gradient are inconsistent. There are many different ways the inconsistency could manifest; in this case the gradient decreased as the energy slowly increased, but in the next case it could be totally different. I do know about GDIIS, but I don't understand the mathematics of how it works. I do have some understanding of DIIS for electronic structure and QN/RFO for geometry optimization, but unless there is a convincing case where GDIIS performs better, there's some higher priority tasks to work on. :) For example getting v1.1 released... |
I think I've found an issue with your "Okay" patch. Optimization now may not ever hit I think it could make sense to enable translation and rotation substraction automatically when energy is converged without waiting for sufficiently bad step.
diff --git a/geometric/optimize.py b/geometric/optimize.py
index 9b0e1a3..6ee08ba 100644
--- a/geometric/optimize.py
+++ b/geometric/optimize.py
@@ -640,7 +642,7 @@ class Optimizer(object):
# if step_state in (StepState.Okay, StepState.Poor, StepState.Reject) and params.transition:
# logger.info("LPW: Recalculating Hessian\n")
# self.recalcHess = True
- if step_state in (StepState.Poor, StepState.Reject):
+ if step_state in (StepState.Okay, StepState.Poor, StepState.Reject):
new_trust = max(params.tmin, min(self.trust, self.cnorm)/2)
# if (Converged_grms or Converged_gmax) or (params.molcnv and Converged_molpro_gmax):
# new_trust = max(new_trust, self.params.Convergence_dmax if self.params.usedmax else self.params.Convergence_drms) |
Trying this now diff --git a/geometric/optimize.py b/geometric/optimize.py
index 9b0e1a3..c09f42c 100644
--- a/geometric/optimize.py
+++ b/geometric/optimize.py
@@ -565,7 +565,8 @@ class Optimizer(object):
# 2020-03-10: Step quality thresholds are hard-coded here.
colors = {}
if Quality > 0.75: step_state = StepState.Good
- elif Quality > (0.5 if params.transition else 0.25): step_state = StepState.Okay
+ elif Converged_energy or (Quality > (0.5 if params.transition else 0.25)): step_state = StepState.Okay
+ #elif Quality > (0.5 if params.transition else 0.25): step_state = StepState.Okay
elif Quality > 0.0: step_state = StepState.Poor
else:
colors['energy'] = "\x1b[92m" if Converged_energy else "\x1b[91m"
@@ -646,13 +648,6 @@ class Optimizer(object):
# new_trust = max(new_trust, self.params.Convergence_dmax if self.params.usedmax else self.params.Convergence_drms)
self.trustprint = "\x1b[91m-\x1b[0m" if new_trust < self.trust else "="
self.trust = new_trust
- # A poor quality step that is dominated by overall translation/rotation
- # is a sign that projecting out the net force and torque may be needed
- if self.params.subfrctor == 1 and ((rms_displacement_noalign / rms_displacement) > self.lowq_tr_thre):
- self.lowq_tr_count += 1
- if self.lowq_tr_count == self.lowq_tr_limit :
- logger.info("Poor-quality step dominated by net translation/rotation detected; ")
- logger.info("will project out net forces and torques past this point.\n")
elif step_state == StepState.Good:
new_trust = min(params.tmax, np.sqrt(2)*self.trust)
self.trustprint = "\x1b[92m+\x1b[0m" if new_trust > self.trust else "="
@@ -660,6 +655,15 @@ class Optimizer(object):
elif step_state == StepState.Okay:
self.trustprint = "="
+ if step_state != StepState.Good:
+ # A poor quality step that is dominated by overall translation/rotation
+ # is a sign that projecting out the net force and torque may be needed
+ if self.params.subfrctor == 1 and ((rms_displacement_noalign / rms_displacement) > self.lowq_tr_thre):
+ self.lowq_tr_count += 1
+ if self.lowq_tr_count == self.lowq_tr_limit :
+ logger.info("Poor-quality step dominated by net translation/rotation detected; ")
+ logger.info("will project out net forces and torques past this point.\n")
+
if step_state == StepState.Reject:
if hasattr(self, 'X_rj') and np.allclose(self.X_rj, self.X, atol=1e-6):
logger.info("\x1b[93mA previously rejected step was repeated; accepting to avoid infinite loop\x1b[0m\n") |
I've done optimization with patch above and using Sample ORCA output with grids: Convergence runs much more smoothly. However, a lot of steps were taken to refine structure to satisfy all criteria.
|
Looking at your output, I think the behavior is better. There are still some problems because around step 20 the step quality starts to get low, and between step 31-58 there are some fluctuations up and down. However, the converged structure does have the lowest energy with a margin of error of 1e-6. The I see that your patch essentially makes the "subfrctor" more aggressive so that it can be applied when the step state is "Okay". There is currently a command line option that turns the behavior on from the beginning as The reason why subfrctor is not always active is because there are finite field calculations where one expects overall reorientation of the molecule - I have corresponded with another group that uses geomeTRIC for this purpose. Therefore I don't think it should be turned on by default for "Okay" steps. However, I think it should be turned on for "Okay" steps when the step quality is low (i.e. those steps that used to be Poor/Reject but are now being set to Okay because of the first patch.) I think this patch would implement the behavior I described. diff --git a/geometric/optimize.py b/geometric/optimize.py
index 78d8454..6a8b999 100644
--- a/geometric/optimize.py
+++ b/geometric/optimize.py
@@ -565,7 +565,8 @@ class Optimizer(object):
# 2020-03-10: Step quality thresholds are hard-coded here.
colors = {}
if Quality > 0.75: step_state = StepState.Good
- elif Quality > (0.5 if params.transition else 0.25): step_state = StepState.Okay
+ elif Converged_energy or (Quality > (0.5 if params.transition else 0.25)):
+ step_state = StepState.Okay
elif Quality > 0.0: step_state = StepState.Poor
else:
colors['energy'] = "\x1b[92m" if Converged_energy else "\x1b[91m"
@@ -634,6 +635,13 @@ class Optimizer(object):
# if step_state in (StepState.Okay, StepState.Poor, StepState.Reject) and params.transition:
# logger.info("LPW: Recalculating Hessian\n")
# self.recalcHess = True
+ def activate_subfrctor():
+ if self.params.subfrctor == 1 and ((rms_displacement_noalign / rms_displacement) > self.lowq_tr_thre):
+ self.lowq_tr_count += 1
+ if self.lowq_tr_count == self.lowq_tr_limit :
+ logger.info("Poor-quality step dominated by net translation/rotation detected; ")
+ logger.info("will project out net forces and torques past this point.\n")
+
if step_state in (StepState.Poor, StepState.Reject):
new_trust = max(params.tmin, min(self.trust, self.cnorm)/2)
# if (Converged_grms or Converged_gmax) or (params.molcnv and Converged_molpro_gmax):
@@ -642,17 +650,14 @@ class Optimizer(object):
self.trust = new_trust
# A poor quality step that is dominated by overall translation/rotation
# is a sign that projecting out the net force and torque may be needed
- if self.params.subfrctor == 1 and ((rms_displacement_noalign / rms_displacement) > self.lowq_tr_thre):
- self.lowq_tr_count += 1
- if self.lowq_tr_count == self.lowq_tr_limit :
- logger.info("Poor-quality step dominated by net translation/rotation detected; ")
- logger.info("will project out net forces and torques past this point.\n")
+ activate_subfrctor()
elif step_state == StepState.Good:
new_trust = min(params.tmax, np.sqrt(2)*self.trust)
self.trustprint = "\x1b[92m+\x1b[0m" if new_trust > self.trust else "="
self.trust = new_trust
elif step_state == StepState.Okay:
self.trustprint = "="
+ if (Quality < (0.5 if params.transition else 0.25)): activate_subfrctor()
if step_state == StepState.Reject:
if hasattr(self, 'X_rj') and np.allclose(self.X_rj, self.X, atol=1e-6): |
I didn't actually check
Actually, I haven't explored all options yet as I've just started using geomeTRIC on this Saturday. However, from what I see BTW, when I encounter
I see. However, I think this feature should be made more prominent, as most people optimize geometries in rotationally invariant conditions. |
I've finished
Results are here: As you can see, there are a lot of differences larger than 1e-5, though not that much larger. I expect that with default grid settings results will be worse — will report result later.
Well, |
However, it seems that in ORCA implementation there is a difference between them. Here is a quote from manual:
In other words, when step is larger than trust radius, QN step is simply scaled down while RFO step is optimized within radius to get lowest predicted energy. |
In QN it is also possible to minimize the energy on the hypersphere of radius Trust; that is what geomeTRIC does. I am not sure why the QN optimizer in ORCA decides to scale the step instead. |
hey folks, is there an accepted solution to these situations? reading through this, I'm not 100% sure what to do (I'm in a similar situation here) |
I'm using
drms 6e-4 dmax 1e-3
criteria for optimization (which were taken from ORCA'sTightOpt
), and I've got into situation where optimization got stuck in endless loop with energy slowly climbing up. Quailty was always < 0 but no steps were rejected with reasoningNot rejecting step - trust below tmin = 6.000e-05
.Unreasonably low
tmin
caught my attention (btw, I'm using tip of master branch). Corresponding code inparams.py
says thattmin
should actually be not smaller than1.0e-4
because of gradient errors.tmin
should be smaller thanConvergence_drms
to avoid rejection of valid steps.Both criteria look sensible to me. However, code that computes
tmin
is written aswhich explicitly contradicts criterion (1), as
tmin
can easily drop below1e-4
.I think proper default value of
tmin
should better be chosen asWith this logic
tmin
may only become smaller than1.0e-4
whenConvergence_drms < 1.0e-4
, but still be limited byConvergence_drms*0.1
from above for large values ofConvergence_drms
. For my case, it givestmin = 1.0e-4
.I can make a PR if you think this is the right way.
The text was updated successfully, but these errors were encountered: