-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a fix to have expected skipafterfailedadjoint behavior #45
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused by the root cause of the problem. Could you help me understand. Please see that in-code comments.
I think this looks good for now. @joanibal , I think what was happening is this: If the current adjoint solution cannot reach the tolerance in the specified number of iterations, the solution is reset to 0 regardless of what is set for ADflow currently does not give you any warning or raise fail flags when adjoints fail to converge. This is the worst possible outcome in my opinion since it just keep going with completely wrong gradients, which can be difficult to debug if you are not expecting this. Like I said, this PR as it stands is okay for me. However, I want to add a few warning outputs and update the adjoint printouts a bit so that it becomes easier to diagnose what is happening. I can do these changes in a separate PR, or push to @yqliaohk 's branch and update this PR. What do you all prefer? New PR, or update this PR? |
Perhaps I misunderstood, but based on the comments in the code I understood the option See the comment before the option is used I agree that it would be useful to have an option to allow the use of partially converged adjoints, but I think we should just add another option to do that. Perhaps something like then the option could be checked when checking for a fail. like this ... |
I think this option name is good as is. It tells you only about what it will do with the remaining adjoints. I think the behavior after this PR will be the expected behavior. Then we can discuss adding a new option that picks wether to use or not use the partially converged adjoint. I cannot think of many scenarios where a partially converged adjoint would not be useful, and I think most people would want to keep going with the partially converged answer. We can add an option that switches between using the partially converged adjoints, or raises an analysis fail flag. I am not sure if we have the capability to tell snopt that gradient evaluation failed. if we can, then that may be useful for some cases. But again, I think this PR fixes the issue with this option and would give the expected behavior after this is merged. |
Sorry, but I guess I'm still confused about how the current behavior for |
Before this PR, if |
Right, I get that. But, based on my understanding, the option was never intended to address that issue. It looks like it should only be used to skip other adjoint not to specify what happens to partially converged solutions. |
right, it does not specify what should be done with the partially converged solutions. by default, we want to use partially converged solutions. hence the PR. If you can think of a case where a partially converged solution is not useful, then we can create another PR with that option. This partially converged solution issue comes up very frequently with meshes where we have zippers. After some convergence, the solution bottoms out at the same place, so it is common to rely on the partially converged solution. The stall happens after 6-7 orders of magnitude convergence relatively (depending on where you start with the initial residual), so its not too bad. |
"by default, we want to use partially converged solutions. hence the PR. If you can think of a case where a partially converged solution is not useful, then we can create another PR with that option." Ok then why not just change the behavior after an adjoint fails from Why involve the |
I'm going to request some more reviewer for additional perspectives. |
Ok, I see how it can be confusing. From the option name, we expect to skip after any failed adjoint if we set it to |
We may want to zero-out the adjoint solution if the adjoint solution failed in some cases. For example, if you have So currently, the main thing I think we need is a fail flag when you have a partially converged adjoint and you have I think I agree with you on that I suggest we merge this PR as is, and then I can create another PR where I add the changes I propose. |
"I think I agree with you on that skipafterfailedadjoint option is kinda useless" The idea of merging incomplete features to the master makes me nervous. I'd prefer you submitted your changes as a PR on @yqliaohk's fork, which when accepted will automatically appear here. @nwu63 or @eirikurj how do you think we should handle this? |
@anilyil and I found the function To summarize, this is what I think the behaviour should be:
We need to check and figure out what is passed to In the future, I would like to see a secondary tolerance, both for primal and adjoint, similar to the semi-converged option for aerostructural problems. That way, we aim for the full convergence, but we will set As for the name, I agree the current name is not perfect, and As for the default option, I would want to keep it as True. IMO, as a user without extensive knowledge of ADflow, I think I would expect failure to be presented as failure, regardless of how close it was to converging. It is up to the user to set an appropriate tolerance for the particular problem. Issues with zipper meshes and so on are for advanced users who should then use options such as this to fix those issues. To summarize the summary:
|
Thanks @nwu63. I interpreted the I no longer oppose pulling in this PR after checking |
After further discussions with @anilyil, this is what we decided:
|
I have pushed a few changes (and possibly messed up while merging the changes from master but we can fix that). I added the warnings when the skipafterfailedadjoint has an effect on the adjoint solution, regardless of what the option is set to. I also realized once the Finally, I added the option to the options list. I just added it to the bottom, but we can reorder it. I may have messed up the whitespace due to my editor settings. @nwu63 to address the items you listed:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I seem to be a little late for the discussion here. For what its worth though, I do agree that partial solution may be useful in some cases and the previous discussion here is good.
However, I would have liked @anilyil changes to be reviewed/approved by reviewers before being merged in.
# a fail flag remaining from a previous call. Either way, we | ||
# want to at least try solving the adjoints here, regardless | ||
# of what happened in the previous iterations. | ||
self.curAP.adjointFailed = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For resetting between design iterations this location is probably the best.
However, this is explicitly creating and setting this attribute. This renders the following block useless in solveAdjoint
as the option always exists.
# Initialize the fail flag in this AP if it doesn't exist
if not hasattr(self.curAP, 'adjointFailed'):
self.curAP.adjointFailed = False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I know that, but my idea was to just keep that for the cases where solveadjoint
is called from some other routine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should remove that alltogether and have it throw an attribute error, since you should not go into that part of the code without the attribute set properly anyways.
Yeah I apologize, I tried to update the PR with master, but GitHub complained it wasn't possible, and I accidentally clicked the next button before I could read it. Next thing I knew the PR has been merged.. Sorry about that. Please continue discussions here, and if there are any changes to be made, let's make a new PR. |
Purpose
Added a minor fix to have an expected behavior if we set the
skipafterfaildadjoint
to False. Previously, when it's set to False, though the adjoints after a failed one get solved, they still get reset later. Changes have been made not to reset if we don't want to skip.Type of change
What types of change is it?
Testing
Explain the steps needed to test the new code to verify that it does indeed address the issue and produce the expected behavior.
Checklist
Put an
x
in the boxes that apply.