Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BC returnFail bug #148

Closed
anilyil opened this issue Jun 27, 2021 · 2 comments
Closed

BC returnFail bug #148

anilyil opened this issue Jun 27, 2021 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@anilyil
Copy link
Contributor

anilyil commented Jun 27, 2021

Description

Some of the BC routines such as the subsonic inflow and outflow have checks to make sure that the flow direction is consistent with the prescribed BC itself. If this is not satisfied, these routines raise errors. There are 2 main issues here:

  1. Most of these routines such as subsonic outflow use a terminate call for this (See: https://github.com/mdolab/adflow/blob/master/src/bcdata/BCData.F90#L969). This stops the execution completely. However, we want these routines to use returnFail so that we catch the failure and register it as a failure in the flow solver. When this happens in an optimization, it is used to raise the fail flag and the optimizer can move on. With terminate, the job just exits, which is not the desired behavior.
  2. This was patched only for subsonic inflow (See: https://github.com/mdolab/adflow/blob/master/src/bcdata/BCData.F90#L919-L922) where we use the preferred returnFail call. However, when running in parallel, the processors that do not have any partition of this BC face do not execute this code, and as a result, some processors raise the return flag while others run past these BC routines and hang elsewhere at a global communication operation. These BC routines have several layer of calls, and the ideal solution is to aggregate all of the failure flags while applying the BC routines, and communicate these across all procs to make sure everyone is on the same page. Currently, if this BC routine fails and some procs dont have any partition of this face, ADflow will just hang.

Current behavior

When some procs do not have a partitioning of the relevant BC routines that fail, the code hangs for subsonic inflow BC. For other BCs that fail, the execution is terminated which stops the entire run.

Expected behavior

The BC failures should be caught properly and communicated across processors so that we can gracefully fail and raise a fail flag to the optimizer.

I have a quick patch for this myself; I just commented out these returnFails because when the BCs for my cases fail like this, the mesh warping also fails so I dont need to rely on adflow alone for the fail flag. Furthermore, because I fixed my optimizations, I dont get any of these failures in my runs anymore so this is not an issue when the BC routines do not fail as expected. Ultimately, this stuff should be fixed properly (most likely by me after my defense).

@anilyil anilyil self-assigned this Jun 27, 2021
@anilyil anilyil added the bug Something isn't working label Jun 27, 2021
@lamkina lamkina self-assigned this Aug 6, 2022
@lamkina
Copy link
Contributor

lamkina commented Aug 6, 2022

I think my draft PR #224 should fix this issue. The solution is to remove the mpi barrier in the totalSubsonicInlet routine and add an allreduce call in the python layer when we set the data from the aero problem. We can't rely on the allreduce that already exists for error catching in the __call__ method because some procs that don't have BC data will attempt to update the geometry with a failed mesh. Hence, we add an additional allreduce in the _setAeroProblemData function after the BC's are updated in the Fortran layer.

@lamkina
Copy link
Contributor

lamkina commented Nov 21, 2022

This is resolved with PR #224 .

@lamkina lamkina closed this as completed Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants