Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide adjoint derivatives #664

Open
chrbertsch opened this issue Oct 29, 2019 · 13 comments
Open

Provide adjoint derivatives #664

chrbertsch opened this issue Oct 29, 2019 · 13 comments
Assignees
Labels
Milestone

Comments

@chrbertsch
Copy link
Collaborator

@chrbertsch chrbertsch commented Oct 29, 2019

Currently FMI 2.0 provides and interface to provide partial derivatives in the form of dirctional derivatives (e.g., Jacobian J times direction vector v, Jv)

For several use cases, it would be beneficial to get vector-Jacobian products vTJ, or adjoint derivatives from the FMU:
-using FMUs in the context of AI frameworks (often called there“VJP” vector-gradient-product). There adjoint derivatives are used in the backpropagation process to do gradient-based optimization of parameters ussind Automatic Differentiation (AD). Also neural differential equations (https://github.com/JuliaDiffEq/DiffEqFlux.jl) or hybrid forms of AI and equation/physics-based models could be supported together with FMUs. This feature could widen the scope of FMI.

fmiStatus  fmi2GetAdjointDerivative (
fmi2Component c ,
cons fmi2ValueReference  vUnknown_ref [ ] ,
size_t  nUnknown ,
cons fmi2ValueReference  vKnown_ref [ ] ,
size_t  nKnown ,
const fmi2Real  dvUnknown [ ] ,
fmi2Real  dvKnown [ ])

I consider this proposal mature and intend to create a PR.

@HansOlsson, @ChrisRackauckas, @jph-tavella, @masoud-najafi, @CSchulzeTLK, @rfranke, and others: your comments would be very much appreciated.

@chrbertsch chrbertsch self-assigned this Oct 29, 2019
@HansOlsson

This comment has been minimized.

Copy link

@HansOlsson HansOlsson commented Oct 30, 2019

In terms of implementation effort this is straightforward, but time-consuming (much more than other variants).
The problems I see are four-fold:

  • It is to easy to "cheat" and compute it inefficiently from multiple directional derivatives; so we have to make sure that it is actually provided efficiently by the FMU.
  • It is rather memory consuming (especially for co-simulation and model-code with loops).

The memory consumption is due to reverse-mode AD (at least traditionally) storing all operations on reals and then running them in reverse order. One can make a trade-off and use less memory for this - and instead re-run operations; but that is slower. That is something we might need to consider as part of the design.

  • This adds the next point - in particular for co-simulation FMUs. In many scenarios we want the adjoint for the entire simulation interval (e.g. optimal control); can the FMU store that?
  • The last point is that derivatives assume continuity around the point, and in many optimization scenarios the optimum is at the limit of triggering events. Having efficient adjoint derivatives does not help with that.
@ChrisRackauckas

This comment has been minimized.

Copy link

@ChrisRackauckas ChrisRackauckas commented Oct 30, 2019

The last point is that derivatives assume continuity around the point, and in many optimization scenarios the optimum is at the limit of triggering events. Having efficient adjoint derivatives does not help with that.

The adjoint implementation handles that. The adjoint implementation still needs the vjp to do that.

In terms of implementation effort this is straightforward, but time-consuming (much more than other variants).

I would not commit to a single implementation like ADOL-C (which is slow...), and instead allow for this to be a function which could exist that provides a vjp. There are many ways to implement the pullback, and fully non-allocating pullbacks derived in a symbolic form are possible, but not in all cases, so I wouldn't commit to one way of doing it

@t-sommer

This comment has been minimized.

Copy link
Collaborator

@t-sommer t-sommer commented Oct 30, 2019

I wouldn't commit to one way of doing it

Does the proposed API impose any constraints on the implementation?

@ChrisRackauckas

This comment has been minimized.

Copy link

@ChrisRackauckas ChrisRackauckas commented Oct 30, 2019

I was just saying I hope that's not done, and instead the API just has this function which anyone can write. I am not sure what the proposal is implying on that, but the paper said they extended FMI to build adjoints with ADOL-C.

@chrbertsch

This comment has been minimized.

Copy link
Collaborator Author

@chrbertsch chrbertsch commented Oct 30, 2019

replying to @HansOlsson :

In terms of implementation effort this is straightforward, but time-consuming (much more than other variants).
The problems I see are four-fold:

* It is to easy to "cheat" and compute it inefficiently from multiple directional derivatives; so we have to make sure that it is actually provided efficiently by the FMU.

This is an issue of the tool vendor implementing the API function as with any other implementation detail of FMU export. It is up to the user to judge good or bad implementations. Perhaps for Modelica models als source of FMUs we could set up a benchmark regarding partial derivative implementations.

Regarding efficiency: with a good implementation of the adjoint derivatives, in cases when one needs only the vector-Jacobian-products motivated above, one can expect significant speedups as described in the paper above.

* It is rather memory consuming (especially for co-simulation and model-code with loops).

The memory consumption is due to reverse-mode AD (at least traditionally) storing all operations on reals and then running them in reverse order. One can make a trade-off and use less memory for this - and instead re-run operations; but that is slower. That is something we might need to consider as part of the design.

This is also up to the implementer, not to the interface

* This adds the next point - in particular for co-simulation FMUs. In many scenarios we want the adjoint for the entire simulation interval (e.g. optimal control); can the FMU store that?

Do you mean the values at the end of different macro steps ? (Could be stored outside the FMU)
Or the intermediate values within one macro step of a co-simulation FMU? Then this would be a feature of "Intermediate variable access" of FMI 3.0

* The last point is that derivatives assume continuity around the point, and in many optimization scenarios the optimum is at the limit of triggering events. Having efficient adjoint derivatives does not help with that.

But this does not deteriorate the situation compared to directional derivatives, right?
As mentioned above, the adjoint derivatives are heavily used also in contexts beyond phyiscs-based models (e.g. neural networks)

replying to @ChrisRackauckas :

I was just saying I hope that's not done, and instead the API just has this function which anyone can write.

This is the basic idea of the FMI standard: only the interface is defined, the implementation of the interface functions is up to the exporing tools.

I am not sure what the proposal is implying on that, but the paper said they extended FMI to build adjoints with ADOL-C.

One should see this only as an example

replying to @t-sommer :

Does the proposed API impose any constraints on the implementation?

No

@jph-tavella

This comment has been minimized.

Copy link

@jph-tavella jph-tavella commented Oct 31, 2019

I perfectly agree with @chrbertsch.
The proposed API does not impose any constraints for implementation by tool vendors.
Performance, memory consumption, accuracy in calculations, etc. are an issue for the tool vendors and at the user point of view, it's his/her responsability to simply judge good or bad implementations, and then to privilege FMUs from one tool or from another one.

@chrbertsch chrbertsch added this to the v3.0 milestone Nov 5, 2019
@chrbertsch

This comment has been minimized.

Copy link
Collaborator Author

@chrbertsch chrbertsch commented Nov 8, 2019

Regular FMI Design Meeting:

One should have at least an as good description as for directional derivatives.
Should not depend on a specific implemnetation
We should have a concrete proposal in a PR. Christian: I will work on this.
We should ask the AI / ML community if this is the only change they would need.

@chrbertsch

This comment has been minimized.

Copy link
Collaborator Author

@chrbertsch chrbertsch commented Nov 9, 2019

I have started working on this issue on a branch :https://github.com/chrbertsch/fmi-standard/tree/adjoint-derivatives
Feel free to comment and contribute.

@masoud-najafi

This comment has been minimized.

Copy link
Collaborator

@masoud-najafi masoud-najafi commented Nov 25, 2019

Please correct me if I am wrong. This API allows retrieving the Jacobian matrix with a single call to fmi2GetAdjointDerivative by setting nUnknown=NX and nKnown =NX (where NX is the number of states). Also, dvUnknown is almost useless, because the importer can do the multiplication outside of the function.
The API fmi2GetDirectionalDerivative and the proposed fmi2GetAdjointDerivative almost do the same thing. In other words, each API can be obtained from the other API by an appropriate wrapper. If we want fmi2GetAdjointDerivative, fmi2GetDirectionalDerivative is no longer needed.

@chrbertsch

This comment has been minimized.

Copy link
Collaborator Author

@chrbertsch chrbertsch commented Nov 25, 2019

@masoud-najafi : You cannot get the full Jacobian Matrix with one call of fmi2GetDirectionalDerivative or fmi2GetAdjointDerivative.

fmi2GetDirectionalDerivative returns a column vector of size nUnknowns, that equals Jacobian matrix times the seed vector (plesae note the seed vector \Delta v_known has the same size as the vector v_known (I think, this is not yet stated explicitly, but otherwise, the formula
grafik does not make sense)

fmi2GetAdjointDerivative returns a row vector of size nKnowns, that equals the vector v (transposed) times the Jacobian (pleas note the vector \Delta v_unknown has the same size as the vector v_unknown.

The importer cannot do the multiplication outside the function, as the return value of the function is already the result of the multiplication.

Thus one needs multiple calls of fmi2GetAdjointDerivative or fmi2GetDirectionalDerivative to construct the Jacobian matrix.

If the FMU can calculate either directional or adjoint derivatives efficiently (e.g., with means of AD), in the case of sparse Jacobians it is not efficient to calculate directional derivatives from adjoint derivatives and vice versa, but the efficient implementation of depends on whether the FMU supports forward or backward AD or both.

@masoud-najafi

This comment has been minimized.

Copy link
Collaborator

@masoud-najafi masoud-najafi commented Nov 29, 2019

Then I do not understand why and how in the above paper, it is indicated that with fmi2GetAdjointDerivative one "row" of the Jacobian matrix can be retrieved with only a single call. Can anyone clarify this by specifying the arguments of fmi2GetAdjointDerivative?

@ChrisRackauckas

This comment has been minimized.

Copy link

@ChrisRackauckas ChrisRackauckas commented Nov 29, 2019

Just use the e_i basis function for the i'th row. It just follows from being the vjp. For more information I'd just link to my lecture notes which build up differentiable programming from a vjp/jvp standpoint:

https://mitmath.github.io/18337/lecture9/autodiff_dimensions
https://mitmath.github.io/18337/lecture10/estimation_identification
https://mitmath.github.io/18337/lecture11/adjoints

@t-sommer

This comment has been minimized.

Copy link
Collaborator

@t-sommer t-sommer commented Feb 13, 2020

I've started with the implementation of fmi3GetAdjointDerivatives() for the Reference FMUs on https://github.com/t-sommer/Reference-FMUs/tree/adjoint-derivatives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.