-
Notifications
You must be signed in to change notification settings - Fork 844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shape Optimization #733
Comments
This is a very important topic, @jayantmukho. The optimization functionality is very difficult for folks to get working on their own. There are too many options, and it is not user friendly. We need to improve, and I am open to streamlining the interface if you have some ideas. In general, the bevy of options that you nicely describe above have been added over the years in order to coax the scipy SLSQP optimizer into converging, especially with constraints. It has mostly been a trial-and-error process during that time. Most of the tricks involve getting the scaling set such that the SLSQP optimizer does not take any massively errant steps during its line search that cause divergence. I think it would be great to see a standard normalization of the problem (say, make everything on the order of 1 going in/out of the optimizer) and interfaces to new optimizers (this exists already in part in feature_pyopt). In practice, I use ONLY the Scale value in OPT_OBJECTIVE and OPT_CONSTRAINT (set to a value that results in a first optimizer step roughly 10% the characteristic length of my geometry), and ignore the options OPT_GRADIENT_FACTOR, OPT_RELAX_FACTOR, and OPT_LINE_SEARCH_BOUND. I sometimes use the OPT_BOUND_UPPER and OPT_BOUND_LOWER options. But even this approach still requires some manual tuning. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
So I've got a more general question: Why do we want the gradient norm to be ~1E-6? Isn't this gradient norm problem dependent? it most definitely scales with the square root of the number of design variables. I would suspect that the relative scales of the design variables are also a factor. Is 1E-6 a good rule? Or is it just a decent starting point, and values very different from 1E-6 are used in practice? Also, if this is "a good rule," then shouldn't we just automatically rescale the problem after the first design iteration? |
These are all fantastic questions. I have exclusively used SLSQP for any optimizations that I have run. So, I can only speak to that optimization algorithm.
Not sure why this is. I agree with what @economon said. These tuning parameters are to coax out an ideal step size for the optimizer. As you can imagine, the ideal step size is wildly varying depending on the optimization objectives/constraints, scales of the geometry, and types/numbers of design variables that are being used. You have to play this game of scaling the objectives and constraints in different ways to coax out the ideal first step. In my limited testing, the value of 1E-6 for the gradient norm seems to work well for 3D RANS aerodynamic shape optimizations of an aircraft wing when using FFD control points to change the shape of the wing. This is a specific use case that was the subject of a lot of the underlying research that resulted in the shape optimization framework. Which is likely why I have had good results using this rule of thumb.
Scaling in an optimization problem can be pretty frustrating and time consuming. Anecdotally, I have been using @economon 's suggestions of leaving everything else as default (value of 1) and only playing with the objective and constraint scalings to get a good first step size. As mentioned before, this step size is of different values for different problems, which is why it is difficult to come up with universal scalings that would work for most problems. But I am hoping to address some of these scaling issues in #923 . I haven't really found much good literature on this problem, but I might be looking in the wrong places. Recommendations are welcome. A big boon is having a robust solver. If it can handle flow simulations with odd geometries, you need to do less parameter tweaking. The reason you need good scaling is so that the optimizer doesn't explore difficult parts of the design space. If the simulation diverges, the optimization fails. Some intelligent handling of simulation divergences would also help the optimization framework. |
I've run into some headaches getting the optimization to run efficiently on my end, which is why I ask. Playing with a toy problem, SLSQP actually does a great job on its own (with all tuning parameters set to 1.0) if the following conditions are met:
If those conditions are met, then playing with any of the tuning parameters makes SLSQP converge more slowly, sometimes with 10x the iterations. So its not clear to me when the tuning parameters are necessary, and how those tuning parameters affect the convergence in those cases. I'm not arguing that the tuning parameters aren't necessary, just that their effects aren't clear. And I agree, the proper way to nondimensionalize and regularize these problems is not clear from a brief search of the literature. |
Lately I've been playing with Ipopt, which has auto scaling, the documentation (https://coin-or.github.io/Ipopt/OPTIONS.html#OPT_NLP_Scaling) and implementation papers go into decent detail about the strategies they have. |
@pcarruscag When you say that you rescaled all the variables, what do you mean? Using the built-in tuning parameters? Or in the python scripts? And for the deformations (which must have physical values when applied by SU2_DEF), how did you rescale those? |
Ah, I forgot to mention I do not use our shape optimization framework (should have started with that). The optimizer "sees" y = s_f * f(x'), SU2_** takes as input x, say ffd points, and computes f, say drag, (x' = s_x * x) then dy/dx' = s_f / s_x * df/dx, there is perhaps some equivalence with the tuning parameters we have. |
WIP and I think there will be some updates on tomorrow's developers meeting. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is still a relevant issue please comment on it to restart the discussion. Thank you for your contributions. |
Hey everyone,
I have been taking a deep dive into the shape optimization features in SU2 and have resurfaced with some observations, suggestions, and questions.
First, the bug:
Even though the config_template, and option structure have changed the numbering of the DEFINITION_DV (for example, FFD_CONTROL_POINT went from being 7 to 11), this has not been reflected in the SU2_PY/SU2/io/tools.py, or any of the tutorials. Therefore, there is a mismatch in what the C++ code uses, and what the python code uses. This is an easy problem to fix and is being fixed in the branch fix_set_ffd_script.
Next, I want to have a discussion about the shape optimization options and scalings. Currently there are the following options in the config template to control scaling:
Now this provides a lot of flexibility in scaling the inputs to the optimizer, which is great because these optimizers aren't very robust with scaling. But the way they behave is sometimes a little odd. I am going to go through each option and talk about how it affects things (to the best of my knowledge). I am hoping someone can shed some light on the choices that were made and correct me if I am wrong in any of my assessment.
OPT_OBJECTIVE Scaling: It scales the objective function with the factor that you multiply it with. In the case above, the DRAG is being scaled by 10. This scaling is also applied to the gradient of the objective function.
OPT_CONSTRAINT Scaling: Same as the OPT_OBJECTIVE scaling, but for the constraints
OPT_GRADIENT_FACTOR: This is a misleading name and I propose changing it. Even though the name has gradient in it, this scaling is applied to both, the objective/constraint, and its gradient. I would like to change the name to OPT_GLOBAL_FACTOR. The reason why this is global and different from the objective/constraint scaling is because it's applied uniformly to all objectives and constraints.
OPT_RELAX_FACTOR: This is a scaling factor that purely multiplies the DV_VALUES from a config file, before applying the deformation to the mesh. For example, if you are performing a 2D optimization using a FFD and the optimizer spits out a suggested DV_VALUE of 0.001, the mesh deformation routine will move the FFD control point by 1 (according to the scaling given above)
OPT_LINE_SEARCH_BOUND: This is an interesting one and the one I am least sure off. This option limits the maximum final movement of the FFD control points in the cartesian coordinate system. So, DV_VALUE * OPT_RELAX_FACTOR results in a movement of the FFD_CONTROL_POINT. If the the maximum movement of any of the control points is greater than OPT_LINE_SEARCH_BOUND, then all the control point movements are scaled such that the maximum movement = OPT_LINE_SEARCH_BOUND.
OPT_UPPER/LOWER_BOUND: This value is divided by the OPT_RELAX_FACTOR to give the optimizer the maximum/minimum values for the design variables.
DEFINITION_DV Scaling: This one I am really confused about. The only place that I can find this being used, is to scale the gradient of the objectives/constraints. It doesn't seem to actually scale the DVs anywhere in the python code, except for in the initialization, where the DVs are zeroed out anyway. Am I missing something here? Is it correct to think that scaling the gradients is one way to ensure that the DVs that the optimizer outputs scaled DVs?
In general, I think it'd be useful to have more information in the config template so that the scalings don't seem to be a dark art. The template does have some suggestions, like the gradient norm should order 10^-6. I have found this suggestion to be super useful, and it does work well. But I am not sure why the gradients need to be so small for the optimizer to work well. Any insight into this value?
In the end I am hoping to de-mystify some of the scalings, and how best to go about them. Ideally at the end of this discussion I can document some best practices (at least for the case of 3D shape optimization with FFD_CONTROL_POINTS). Any comments, suggestions, corrections, and/or insight would be super helpful.
The text was updated successfully, but these errors were encountered: