Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification of handling FMUs with constant step size and final step when rounding errors occur #575

Closed
ghorwin opened this issue May 24, 2019 · 21 comments
Assignees
Labels
task Ready for implementation and pull request
Milestone

Comments

@ghorwin
Copy link

ghorwin commented May 24, 2019

I have a question about the correct handling of CoSim-FMUs that cannot use variable communication step sizes.

Example: StepSize = 0.01 s, StartTime = 0, StopTime = 1

Last communication interval would run due to rounding errors from 0.990000001 to 1.000000001. This, however, gives an exception in an FMU about "time out of range" (obviously when evaluation a parameter time series at t=1.00000001 where the data ends excactly at 1).

What would be the correct/expected behavior (in the sense of cross-checking rules) for a co-simulation master?

  1. violate the "constant step size" condition and adjust the final interval to be 0.99...1 (=StopTime)
  2. violate the EndTime condition (and risk an exception/error from within the FMU)

Thanks for clarification on the matter,
Andreas

PS: is there anywhere a clause that prohibits a master to call doStep() outside the StartTime/StopTime interval?

@ghorwin
Copy link
Author

ghorwin commented May 24, 2019

(reply from: Jean-Philippe Tavella)

Dear Andreas,

According to my experience, it is possible to violate the EndTime condition, especially with FMUs exported from Dymola. Even when the fmi2SetupExperiment() primitive is used by the Master with the parameter stopTimeDefined set to fmi2True.

I don't think there is anywhere in the standard a clause that prohibits a master to call doStep() outside the StartTime/StopTime interval.

Jean-Philippe Tavella

@ghorwin
Copy link
Author

ghorwin commented May 24, 2019

(reply from: Martin Sjölund)
Do you calculate next time as:

time = startTime + i * stepSize

Or:

time = time + stepSize

I usually calculate it as the former to avoid accumulating numerical errors. The following Python code shows why:

>>> i = 0
>>> for j in range(0,100): i = i + 0.01
...
>>> i
1.0000000000000007
>>> 0 + 100.0 * 0.01
1.0 

@ghorwin
Copy link
Author

ghorwin commented May 24, 2019

@martin: in my code I use the variant to add stepsize to time (same code for constant and variably spaced intervals).

In any case, the solution of using time = startTime + i * stepSize appears to fail when (StopTime - StartTime) is not divisible by StepSize. Also, the rounding issue will remain when stepsize is a truncated fractional value (timeStep = 1/3) with StopTime=3.

So, I guess FMUs must indeed handle these situations nicely without complaining in the last step.

@jbernalr
Copy link
Contributor

Should't the master needs to be sure before (starting computation) that stepSize and StopTime upon all FMUs are set as neccessary not to violate any of the above mentioned constrains?
If the user for instance sets a simulation time from 0 to 10,753 seconds for a 10ms StepSize, a warning should be prompted by the integrator asking if the user desires to round off or to change any of the relevant parameters. I think Simulink + FMI Kit work like this.

@chrbertsch chrbertsch added task Ready for implementation and pull request discussion and removed discussion task Ready for implementation and pull request labels May 24, 2019
@chrbertsch
Copy link
Collaborator

Regular Design Webmeeting:

Karl: In the FMI 2 Standard it is defined that the FMU has to synchrinize its time to the time defined by the master.
Klaus: I am not so sure. That is the reason why we give the current communication timestep
Karl: the FMU has to synchronize to time + communication timesteps
Klaus: there are FMUs out that do not do this. The ticket tasks only about the final time. Is there not a problem also for other timesteps?
Karl: in FMI 2.0 it was clarified for the doStep. This issue is not a problem but a quality of implementation. FMUs should deal with small rounding errors. Regarding the endTime: it is not a real violation but dealing with rounding error. In FMI 1 there were many problems with "adding up" time.
Klaus: We should try to clarify this issue in the specification. Could be non-normative text (e.g., good implementation would try to handly rounding error.
Klaus: I could not answer the question above. It depends on the FMU.
Klaus: if the FMU needs a constant timestep. But what happens if due to rounding error after some time, the communication timestep changes a little?
Karl: It is enough to say as in FMI 2 that the FMU has to synchronize.
Klaus: We had some discussion some time ago.
Klaus: the current specification does not tell if the communication timestep might change a little bit.

Klaus: I will check the FMI 2 text if we need something to clarify

@ghorwin
Copy link
Author

ghorwin commented Jun 3, 2019

Not sure if this is related, but the FMU

fmus_1.0_cs_linux64_AMESim_14_MIS_cs

from the test suite appears to have a problem related to step sizes. At least, the following output is generated by the FMU:

[slave1:debug] MIS_cs_fmiDoStep called at 0.015000 with step size 0.001000

with simulation time limits in the opt file:

StartTime,0.0
StopTime,0.016

With StepSize = 0.001 the error occurs, with 0.0001 the simulation runs fine to the end. Strange... (seems like an internal error in the FMU, maybe related to a check like t+dt > t_end --> error).

-Andreas

@clagms
Copy link
Collaborator

clagms commented Feb 23, 2020

My suggestion is that the FMU implementation checks for the stopTime according to some tolerance (maybe the tolerance given at the setupExperiment, normalized to the time scale).

@ghorwin
Copy link
Author

ghorwin commented Feb 25, 2020

Would be one option - or simply using stopTime for evaluating time-dependent quantities (time series data etc.) for any t > stopTime.

We should, however, make this clear in the standard.

@jean-philippe: as you said, there is no clause to prevent the Master to exceed the end time (a little) in a doStep() call. Couldn't we then just add this to the compliance checker and let it purposefully overstep the end time a little, just to check if the FMU handles this gracefully?

@clagms
Copy link
Collaborator

clagms commented Mar 7, 2020

@ghorwin

PS: is there anywhere a clause that prohibits a master to call doStep() outside the StartTime/StopTime interval?

I think there is.
In the standard, it says:

If the environment tries to compute past stopTime the FMU has to return fmi3Status = fmi3Error.

@ghorwin
Copy link
Author

ghorwin commented Mar 7, 2020

IMHO: that's a bad idea (see rounding discussion errors above).

Example: say the built-in stop time is some value x, and when writing the modelDescription file the number gets rounded (due to limited number precision) to be slightly larger than x. Now, the Master has no other information as to run to this slightly larger value, and hence to run past the built-in stop time = booom! So, I'm very much in favor to drop such a clause from the upcoming standard and instead require FMUs to work well (even though some of the tool vendors need to do some work on their end to create well-behaving FMUs).

@pmai
Copy link
Collaborator

pmai commented Mar 7, 2020

@ghorwin

PS: is there anywhere a clause that prohibits a master to call doStep() outside the StartTime/StopTime interval?

I think there is.
In the standard, it says:

If the environment tries to compute past stopTime the FMU has to return fmi3Status = fmi3Error.

Which seems like a mistake, since it suddenly requires error checking by the FMU, i.e. the callee, which we usually don‘t. And for an FMU that does not care about the stopTime (i.e. most FMUs) this means it must actively do something with the stopTime, which is again stupid.

I think this sentence must go, and be replaced by a sentence that states that it is an error if the environment exceeds stopTime.

I also think it should be the burden of the environment to ensure that the values of currentCommunicationPoint+stepSize do not exceed the stopTime: It is in full control of both, so it can ensure that the stopTime is reached exactly (by choosing e.g. an appropriate stopTime, or just not setting one in the first place). I don’t see the need for the FMU to do any kind of fudging... And we already make people aware of the fact that there can be numerical deviations between the internal times if the FMU and what is passed in as currentCommunicationPoint...

@pmai
Copy link
Collaborator

pmai commented Mar 7, 2020

Example: say the built-in stop time is some value x, and when writing the modelDescription file the number gets rounded (due to limited number precision) to be slightly larger than x.

Then the FMU generator is broken and should be fixed. I’m sorry, but in 2020 there is no reason to have faulty floating-point printer/reader implementations. Both IEEE754 and correct algorithms have been available for >30 years...

@clagms
Copy link
Collaborator

clagms commented Mar 7, 2020

So, I'm very much in favor to drop such a clause from the upcoming standard and instead require FMUs to work well (even though some of the tool vendors need to do some work on their end to create well-behaving FMUs).

I agree with this as well. However, in the same paragraph, there's the following:

The arguments startTime and stopTime can be used to check whether the model is valid within the given boundaries or to allocate memory which is necessary for storing results.

This text would have to be changed as well, perhaps to non-normative text.

Furthermore, I would expect that if the FMU supports fixed step size, both master and FMU will accumulate the time in a way that prevents the StopTime from being exceeded. This means that the FMU has to admit the possibility of numerical errors, and use the expressions when checking if the current time has exceeded the stopTime. For instance

t exceeds StopTime iff t !≈ StopTime /\ t > StopTime 

where ! means the logical "not" and is the float equality expression within some tolerance (see, e.g., isclose).
The FMU has access to how many steps the co-simulation will take, and should therefore be able to configure this tolerance value appropriately.

@t-sommer I see that you wrote this part, are there actual usages of this feature?
Could we remove both the text regarding how to use stopTime and relax the demand that the FMU must throw an error when the StopTime is exceeded?

@pmai
Copy link
Collaborator

pmai commented Mar 7, 2020

I don't think @t-sommer wrote this, this is straight from the 2.0 standard text, hence probably the fault of someone else ;).

I still don't see why the FMU should be doing anything special for stopTime: The master has complete control over its choice of stopTime and its time calculations, and hence can ensure it does not exceed stopTime. Why should the FMU care?

And the FMU already has to deal with discrepancies of the master time calculation, vs. its internal one, which is why the master communicates its idea of the current time in each doStep call (and there is text in the standard that addresses this).

What am I missing here?

@ghorwin
Copy link
Author

ghorwin commented Mar 7, 2020

Then the FMU generator is broken and should be fixed. I’m sorry, but in 2020 there is no reason to have faulty floating-point printer/reader implementations. Both IEEE754 and correct algorithms have been available for >30 years...

Pierre, you are of course right, but I'm having engineering users and engineering tool developers in mind, as quite a relevant target group for the FMI standard. From my experience, on average we cannot expect the same level of expertise that some of the FMI members have, so my preference is a robust and as much as possible human-error-tolerant standard.

An example for stopTime problems: in building energy simulation, duration of simulation cases easily exceeds several years, expressed in seconds this gives large numbers. Most FMI generators (including modelica-based tools) write stop time in seconds using general/scientific notation, with 5/6 digits precision. So much for the IEEE754 specs applied in the field :-) Maybe within automotive use cases this is not an issue (because durations are rather short) - with building energy and HVAC simulation models, well, we got problems...

From my point of view, the most robust solution would be:

  • FMU implementors write there doStep() functions in a robust way, i.e. if they get tCommEnd > stopTime, they gracefully integrate to stopTime (if it is close to tCommEnd) or, as I've written before, simply use that latest possible time (i.e. stopTime) for any time-related function evals past stopTime. A warning issue through the log mechanism would be nice.

  • FMI CoSimulation Masters try to stick to stopTime, i.e. when rounding error accumulation exceeds the stopTime, they fix the last interval's tCommEnd to be exactly the stopTime. Again, a small warning message would be nice.

This is IMHO a robust solution which helps to avoid any critical error aborts at the end of the simulation. These could be nasty in all cases, when such an abort causes data from the simulation run to be lost, ie. when master or FMUs only write their results after successful finish of simulation.

Generally, the last integration interval should not be overrated, since from my experience, the interesting stuff of the simulation model should really happen somewhere in between start and stop time.

-Andreas

@pmai
Copy link
Collaborator

pmai commented Mar 7, 2020

Pierre, you are of course right, but I'm having engineering users and engineering tool developers in mind, as quite a relevant target group for the FMI standard.

I think we can expect tool developers, i.e. developers of tools implementing numeric algorithms to have a basic understanding of IEEE754. We expect them to implement or at least use solvers responsibly, which requires far more understanding of numeric algorithms and their pitfalls than just using a suitably correct printer/reader library.

Or in other words, how would a developer who does not care about reading and printing accuracies then suddenly implement useful "fault-tolerant" behavior for unknown to him rounding issues of other implementations? The worst that one sees in numeric algorithms is random epsilons being applied to gloss over fundamental problems, so that stuff suddenly "works".

An example for stopTime problems: in building energy simulation, duration of simulation cases easily exceeds several years, expressed in seconds this gives large numbers. Most FMI generators (including modelica-based tools) write stop time in seconds using general/scientific notation, with 5/6 digits precision.

Then we (the users) should file bugs against those implementations, if they do not meet our expectations. And if an FMU exporter outputs a default stop time of 3.185136e9, but does not accept stepping to the IEEE754 double precision floating-point number that this corresponds to under IEEE754 conversion rules (using round-to-even), then again this is a bug that needs fixing.

What slightly confuses me is that you seem to be treating the stopTime in the modelDescription.xml element as anything other than a useful default value: Nothing at all prevents the master from chosing a stopTime of its own design that it is sure it can hit, using whatever algorithm it uses to calculate its communication points. And if it ensures that this is less than or equal to the default time, it would seem probable that the FMU will support this.

FMU implementors write there doStep() functions in a robust way, i.e. if they get tCommEnd > stopTime, they gracefully integrate to stopTime (if it is close to tCommEnd) or, as I've written before, simply use that latest possible time (i.e. stopTime) for any time-related function evals past stopTime. A warning issue through the log mechanism would be nice.

How is this robust? Robust against what? How does a co-simulation master rely on this behavior, and why should it?

FMI CoSimulation Masters try to stick to stopTime, i.e. when rounding error accumulation exceeds the stopTime, they fix the last interval's tCommEnd to be exactly the stopTime. Again, a small warning message would be nice.

Why not fix the stopTime instead? This surely seems to be the better behavior? And why should they "try" to stick to it, when they very easily can? I.e. it seems to me I'm missing something here.

This is IMHO a robust solution which helps to avoid any critical error aborts at the end of the simulation. These could be nasty in all cases, when such an abort causes data from the simulation run to be lost, ie. when master or FMUs only write their results after successful finish of simulation.

It seems to me that as a quality of implementation issue I'd file a bug against all implementations that intentionally throw away data just because an error occurred. But I think this is unrelated to this issue.

Generally, the last integration interval should not be overrated, since from my experience, the interesting stuff of the simulation model should really happen somewhere in between start and stop time.

Then why insist on having a stopTime that does not match your communication points?

Again FMUs and masters must solve the much harier issue of communication points being calculated in different ways resulting in potentially different times (hence the master gives its idea of the current time rendundantly in doStep), while the stopTime is fully under control of the master, so I don't see any new problems being created by this that cannot be solved by the master itself.

@clagms
Copy link
Collaborator

clagms commented Mar 8, 2020

I don't think @t-sommer wrote this, this is straight from the 2.0 standard text, hence probably the fault of someone else ;).

@pmai Do you have a better idea on how to find the rationale behind these statements in the FMI standard?

@pmai
Copy link
Collaborator

pmai commented Mar 8, 2020

@pmai Do you have a better idea on how to find the rationale behind these statements in the FMI standard?

Besides hoping that someone involved remembers the rationale at the time for this and chimes in on here? Probably not. You might want to look through the meeting minutes in the years prior to the 2.0 release, but whether they will contain something edifying?

That said, I don't think there is much mistery as to a suitable rationale:

The stopTime argument to the fmi2SetupExperiment call is an optional mechanism, that allows the FMU to a) perform various optimizations and b) check beforehand whether the model is valid for the planned simulation time.

IF a master chooses to set a stopTime, it can be expected to then adhere to its self-chosen stopTime. Stepping beyond that time is considered an error, and in 2.0 the FMU is required to report it as such. The master has full control over it, so why should it bungle this? Either set no stopTime, or set a stopTime that you can hit or stay below. If for unfathomable reasons you need to fudge here, just set the stopTime to a greater value that you will not exceed.

I don't agree with the mandatory checking by the FMU, since we generally don't require that kind of mandatory checking from calllees. And since most FMUs will not care about the stopTime at all, making them check seems excessive.

But other than that I don't see a specific problem with this mechanism.

@ghorwin
Copy link
Author

ghorwin commented Mar 8, 2020

I think we can expect tool developers, i.e. developers of tools implementing numeric algorithms to have a basic understanding of IEEE754. We expect them to implement or at least use solvers responsibly, which requires far more understanding of numeric algorithms and their pitfalls than just using a suitably correct printer/reader library.

Agreed. I'll think about this and will put up a proposal on how to add such a check to the fmi compliance checker. Though, we do not yet have a mechanism to forcefully cause an FMU to abort and require this abort as "correct behavior". Since there is currently no unique return code/info mechanism formulated, to identify the exact reason for failure, this may be tricky... (but unrelated to this topic).

IF a master chooses to set a stopTime, it can be expected to then adhere to its self-chosen stopTime. Stepping beyond that time is considered an error, and in 2.0 the FMU is required to report it as such. The master has full control over it, so why should it bungle this? Either set no stopTime, or set a stopTime that you can hit or stay below. If for unfathomable reasons you need to fudge here, just set the stopTime to a greater value that you will not exceed.

@pmai Indeed, it looks like I've missed something here.

What I was aiming at was the following scenario:

  • Master reads modelDescription.xml from fixed-step FMU and takes the default stop time
  • Master calls fmi2SetupExperiment() with this stop time - FMU does not complain
  • Master integrates, hereby accumulating time by adding up constant step sizes. Final communication end time slightly exceeds previously communicated stop time due to rounding errors.
  • FMU bails out in last doStep because it complains tCommEnd > stopTime

If I understand you right, this would be considered a fault on the Master's side, correct?

There are two options to fix that in the master:

  • Master would violate the "constant step" rule in the last step and adjust the last interval's step size so that stopTime would be hit exactly.
  • Master adds up constant time steps during initialization as it would do it in the actual calculation, computes the stopTime it would reach including rounding errors and communicate this stopTime to the FMU, hereby risking the FMU to say (sorry, can only compute to my stopTime, not your stopTime + rounderr).

Are we on the same page, now?

-Andreas

@pmai
Copy link
Collaborator

pmai commented Mar 8, 2020

Master reads modelDescription.xml from fixed-step FMU and takes the default stop time
Master calls fmi2SetupExperiment() with this stop time - FMU does not complain
Master integrates, hereby accumulating time by adding up constant step sizes. Final communication end time slightly exceeds previously communicated stop time due to rounding errors.
FMU bails out in last doStep because it complains tCommEnd > stopTime

If I understand you right, this would be considered a fault on the Master's side, correct?

Yes, I would consider this to be a fault of the master algorithm.

There are two options to fix that in the master:

Master would violate the "constant step" rule in the last step and adjust the last interval's step size so that stopTime would be hit exactly.
Master adds up constant time steps during initialization as it would do it in the actual calculation, computes the stopTime it would reach including rounding errors and communicate this stopTime to the FMU, hereby risking the FMU to say (sorry, can only compute to my stopTime, not your stopTime + rounderr).

Four things I'd add to that:

  • The default experiment information in the FMU modelDescription does not carry the information that these are the intended limits of the FMU, it just says that this would be a good starting point, if the master has no other information. Therefore the likelihood that exceeding those times leads to an error is slight, IMHO.

  • If one worried about this, I'd let the master choose a stop time that is strictly less than the default experiment stop time.

  • Furthermore nothing forces the master to take the last step, i.e. you can terminate the simulation at any time without reaching the stop time; so this is another alternative that can be taken. Or the master can handle the last step failing gracefully.

  • And finally: The FMU must be able to deal with differences in step times due to rounding issues at all time steps, so fixed step size does not mean that all communication times will be at strictly that raster, due to the limited resolution of floating point numbers.

@andreas-junghanns
Copy link
Contributor

We have introduced a few things since this ticket started to address this point:

  • fmi3CoSimulation has now an attribute "fixedInternalStepSize" to communicate the internal step size constraint to the master
  • we clearly define now that the FMU may not reach exactly Tci+h and the FMU can now report its internal time with lastSuccessfulTime when returning from doStep
  • we explain (better?) now what the relationship between master and slave is w.r.t. time "4.2.2. Computation in Co-Simulation interface types"

We could add a sentence about how not to add up h with multiple additions and use n*h instead to avoid numeric dirt. Or add a link to Goldberg: https://dl.acm.org/doi/abs/10.1145/103162.103163.
But there are too many ways to do it wrong to list them all.
But I think this is something for the FMI implementers guide.

Closing this ticket, please reopen and add PR if you think there is something missing in the standard document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task Ready for implementation and pull request
Projects
None yet
Development

No branches or pull requests

8 participants