Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix first participant method in compositional cplscheme #1307

Merged
merged 22 commits into from Jul 1, 2022

Conversation

uekerman
Copy link
Member

@uekerman uekerman commented May 27, 2022

Main changes of this PR

I tried to reproduce the problem a user ran into when

  • coupling three solvers A, B, C
  • with two serial-explicit coupling schemes A<-->B and B<-->C
  • and prescribing the time window size for B and C by A.

Reported here: https://precice.discourse.group/t/variable-time-steps-without-sub-cycling-in-serial-explicit-coupling/1049/5

I indeed ran into a problem when CompositionalCouplingScheme tried to merge different time stepping methods and fixed it.

Afterwards, I ran into another problem, however. "first-participant" has not great integration test coverage yet :/
I reproduced the problem with the integration test ReadWriteScalarDataFirstParticipant.

The following is problematic for the first participant of a serial-explicit coupling scheme with dt by "first-participant". Then, the coupling scheme does not define a time window size.

double timeStepStart = _couplingScheme->getTimeWindowSize() - _couplingScheme->getThisTimeWindowRemainder();

We need a similar treatment as here:

if (_couplingScheme->hasTimeWindowSize()) {
timeWindowSize = _couplingScheme->getTimeWindowSize();
} else {
timeWindowSize = computedTimestepLength;
}

and then probably store the time window site at SolverInterfaceImpl.

@BenjaminRodenberg Could I hand over to you here?

Author's checklist

  • I checked that this actually solves the user's problem.
  • I added a changelog file with make changelog if there are user-observable changes since the last release.
  • I ran make format to ensure everything is formatted correctly.
  • I sticked to C++14 features.
  • I sticked to CMake version 3.16.3.
  • I squashed / am about to squash all commits that should be seen as one.

Reviewers' checklist

  • Does the changelog entry make sense? Is it formatted correctly?
  • Do you understand the code changes?

@uekerman uekerman added the bug preCICE does not behave the way we want and we should look into it (and fix it if possible) label May 27, 2022
@uekerman uekerman added this to the Version 2.x.x milestone May 27, 2022
@precice-bot
Copy link

This pull request has been mentioned on preCICE Forum on Discourse. There might be relevant details there:

https://precice.discourse.group/t/variable-time-steps-without-sub-cycling-in-serial-explicit-coupling/1049/6

Comment on lines 283 to 287
bool BaseCouplingScheme::solverSetsTimeWindowSize() const
{
PRECICE_ASSERT(hasTimeWindowSize());
return false;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this function a copy of hastTimeWindowSize?
I agree that the name solverSetsTimeWindowSize() is much clearer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this function a copy of hastTimeWindowSize?

From the perspective of functionality: Yes. But I think it clearly improves readability to have two functions here. It's from the outside perspective not obvious that hasTimeWindowSize() == !solverSetsTimeWindowSize().

If we remove hasTimeWindowSize, we would end up in some situations with something like this:

double BaseCouplingScheme::getThisTimeWindowRemainder() const
{
  PRECICE_TRACE();
  double remainder = 0.0;
  if (!solverSetsTimeWindowSize()) {
    remainder = getNextTimestepMaxLength();
  }
  PRECICE_DEBUG("return {}", remainder);
  return remainder;
}

instead of

double BaseCouplingScheme::getThisTimeWindowRemainder() const
{
  PRECICE_TRACE();
  double remainder = 0.0;
  if (hasTimeWindowSize()) {
    remainder = getNextTimestepMaxLength();
  }
  PRECICE_DEBUG("return {}", remainder);
  return remainder;
}

So I would like to keep both functions for the sake of readability.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmh, I actually find the first version easier to read. It gives additional information on when what happens.
In then end, hasTimeWindowSize is hard to understand, what does "has a time window size" mean?

Copy link
Member

@BenjaminRodenberg BenjaminRodenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to remove the method solverSetsTimeWindowSize. I'm still not happy with the way how hasTimeWindowSize encodes two different pieces information (1) is there a time window size available? 2) Does this participant set the time window size?), but I don't think that the solution with solverSetsTimeWindowSize really helps with respect to compositional coupling schemes. I would keep it as it is. If you think there is still a need for refactoring this part then we can put it into an issue, but the original bug should be fixed now and with respect to refactoring I'm running out of ideas and time here.

@precice-bot
Copy link

This pull request has been mentioned on preCICE Forum on Discourse. There might be relevant details there:

https://precice.discourse.group/t/variable-time-steps-without-sub-cycling-in-serial-explicit-coupling/1049/10

Copy link
Member Author

@uekerman uekerman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@uekerman
Copy link
Member Author

Waiting to merge till we resolved the issue on Discourse, but I am confident that we don't need to change anything here.

@uekerman
Copy link
Member Author

We have another bug in preCICE, I just added an integration test to reproduce.

void SerialCouplingScheme::initializeImplementation()
{
// determine whether initial data needs to be communicated
determineInitialSend(getSendData());
determineInitialReceive(getReceiveData());
// If the second participant initializes data, the first receive for the
// second participant is done in initializeData() instead of initialize().
if (not doesFirstStep() && not sendsInitializedData() && isCouplingOngoing()) {
PRECICE_DEBUG("Receiving data");
receiveAndSetTimeWindowSize();
receiveData(getM2N(), getReceiveData());
checkDataHasBeenReceived();
}
}

If dt-method = "first participant": The second participant always needs to receive the time window size in initialize already. This cannot wait till initializeData as initialize needs to return this value to the user already.

@BenjaminRodenberg handing back to you 😁


BOOST_AUTO_TEST_SUITE(Integration)
BOOST_AUTO_TEST_SUITE(Serial)
BOOST_AUTO_TEST_SUITE(Time)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this test (maybe also the other new tests) fits very well into Integration/Serial/Time. Maybe Integration/Serial/InitializeData ? Or should we create a dedicated test suite for the first-participant configuration?

Copy link
Member Author

@uekerman uekerman Jun 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a way, that's an inherent problem / feature of integration tests; they always test multiple components simultaneously. Here it is "first-participant" + "data initialization". For me, "first-participant" is the dominating feature here. That's why I added it to time. It somehow fits nicely as it only applies to serial coupling.

@BenjaminRodenberg
Copy link
Member

We have another bug in preCICE, I just added an integration test to reproduce.

void SerialCouplingScheme::initializeImplementation()
{
// determine whether initial data needs to be communicated
determineInitialSend(getSendData());
determineInitialReceive(getReceiveData());
// If the second participant initializes data, the first receive for the
// second participant is done in initializeData() instead of initialize().
if (not doesFirstStep() && not sendsInitializedData() && isCouplingOngoing()) {
PRECICE_DEBUG("Receiving data");
receiveAndSetTimeWindowSize();
receiveData(getM2N(), getReceiveData());
checkDataHasBeenReceived();
}
}

If dt-method = "first participant": The second participant always needs to receive the time window size in initialize already. This cannot wait till initializeData as initialize needs to return this value to the user already.

@BenjaminRodenberg handing back to you grin

I tried to fix this, but we might have a deadlock here:

The first participant sends the timestep size at the end of advance (it does not know about the timestep size provided by the user before the user defines it via advance). But to reach advance, the first participant must call initializeData and receive initial data from the second participant here. Here comes the deadlock: The second participant does not reach initializeData to send initial data, because it is waiting for the timestep size in initialize.

#1196 might help us to resolve this deadlock:

  1. if we merge initialize and initializeData, we have more possibilities to move around communication.
  2. if initializeData is mandatory initializeData could be used to return dt, not initialize.

But with our current API design and order of API calls I do not really see a solution. To me it looks like initializeData + first participant has to be forbidden.

@uekerman
Copy link
Member Author

You're right 🙈

* Fixes implementation
* But results in a deadlock
* Modify test correspondingly
@BenjaminRodenberg
Copy link
Member

I just pushed 62e3417. This is from my current point of view as close as we can get to a solution. The deadlock-problem still exists, therefore I had to modify the test.

Depending on how we resolve the deadlock-situation, can can then keep 4431577 and 62e3417 here or cherry-pick it into a feature branch.

@uekerman uekerman force-pushed the fix-three-solver-first-participant branch from 1a42930 to a9ce3ec Compare June 30, 2022 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug preCICE does not behave the way we want and we should look into it (and fix it if possible)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants