Extracting feedback-gain matrices from d_ocp_qp_ipm_ws #76

markusgft · 2019-10-04T11:29:37Z

in closed-loop single shooting SQP algorithms, like iLQR, we usually use the time-varying LQR feedback from the riccati recursion to forward simulate the system dynamics (generate a roll-out) in the next SQP iteration.

Here is an example how I reconstructed the LQR-feedback gains from the ocp_qp_ipm workspace after successful OCP solution (code), based on the ct-v2 tag of hpipm:

For all stages except for the first one, i.e. i=2,...,N

    Eigen::Matrix<double, control_dim, control_dim> Lr;    
    Eigen::Matrix<double, state_dim, control_dim> Ls;
    for (int i = 1; i < N; i++)
    {
        ::d_cvt_strmat2mat(Lr.rows(), Lr.cols(), &workspace_.L[i], 0, 0, Lr.data(), Lr.rows());
        ::d_cvt_strmat2mat(Ls.rows(), Ls.cols(), &workspace_.L[i], Lr.rows(), 0, Ls.data(), Ls.rows());
        TVLQR_K_ [i] = (-Ls * Lr.partialPivLu().inverse()).transpose();
    }

For the first time-step, i=0, it was not so straightforward, in fact we had to reconstruct the feedback gain for the first stage methodically:

    // step 1: reconstruct Hessian for first time step H
    Eigen::Matrix<double, control_dim, control_dim> Lr;
    ::blasfeo_unpack_dmat(Lr.rows(), Lr.cols(), &workspace_.L[0], 0, 0, Lr.data(), Lr.rows());
    Eigen::Matrix<double, control_dim, control_dim> H;
    H = Lr.template triangularView<Eigen::Lower>() * Lr.transpose();  // Lr is cholesky of H

    // step2: reconstruct riccati matrix S[1] for second time-step
    Eigen::Matrix<double, state_dim, state_dim> Lq;
    ::blasfeo_unpack_dmat(Lq.rows(), Lq.cols(), &workspace_.L[1], control_dim, control_dim, Lq.data(), Lq.rows());
    Eigen::Matrix<double, state_dim, state_dim> S;
    S = Lq.template triangularView<Eigen::Lower>() * Lq.transpose();  // Lq is cholesky of S

    // step3 : compute tvlqr-feedback for first stage
    TVLQR_K_ [0] = (-H.inverse() *  (P_[0] + B_[0].transpose() * S * A_[0]));

In above code, A_ and B_ refer to LTV-system matrices at respective stages, and P_ is obviously the second-order cost cross-term from an LQ OCP problem. TVLQR_K_ is the desired feedback matrix, and the ocp workspace is denoted "workspace_".

Now to the question: the issue is, that with the newest HPIPM release, I can still successfully reconstruct the TVLQR-feedback for stages 2...N, but it seems that the content of workspace_.L[0] has changed ? I fail in reconstructing appropriate TVLQR-feedback for the first time step.

Sorry for the lengthy introduction -- but would you mind commenting on as to what extent the meaning of the first entry of L, i.e. L[0], has changed in the workspace?

Thanks a lot in advance and all the best!

The text was updated successfully, but these errors were encountered:

markusgft · 2019-10-05T14:49:30Z

Update: I noticed that the above extraction of TVLQR-feedback works also for the first-stage, if I set the settings flag ric_alg=1 via d_ocp_qp_ipm_arg_set_ric_alg(&ric_alg, &arg);
So when the square-root ricatti algorithm is active, it seems to work.
Just for my understanding -- what actually happens to the workspace_.L[0] variable when ric_alg=0?

giaf · 2019-10-05T21:38:24Z

Hi, that's a good hint!

When ric_alg=0 the Riccati is implemented using a different algorithm (the 'classical' Riccati recursion, without the cholesky factorization of the recursion matrix).

I'll look into adding a proper getter to get the feedback matrix, such that this works regardless of the algorithm used (and if the algorithm is changed again in the future), following what I wrote in the other issue. Get back on that soon!

markusgft · 2019-10-08T11:25:40Z

Hi @giaf,
thanks for clarifying this! I would welcome a dedicated getter-routine for the Feedback-matrices very much, and it may also help to get further users in the DDP/iLQR communities!
Speaking of this, I noticed that a getter method for another quantity would be incredibly helpfull, too.
To simplify the discussion, let's take a look at equations, maybe this paper here. The relevant equation is (6):
In here the control input update gets computed as

and the "feedback matrix" in question is It would be absolutely fantastic if you could also add a getter to the second quantity in Eq. (6), that is , which is basically a feedforward term.
Thanks a lot in advance!!!

giaf · 2019-10-09T18:45:28Z

Ok, these days here we had a group retreat and now I'm going on holidays for a few days, so it will need a bit more time.

Btw, what is this peace of code doing? I'm not familiar with eigen

TVLQR_K_ [i] = (-Ls * Lr.partialPivLu().inverse()).transpose();

Is Lr.partialPivLu() taking the LU factorization of Lr? In theory Lr should already contain the lower triangular Cholesky factorization, so no extra factorization should be needed.

Anyway, I will study the references you sent me, to be sure to implement exactly what is needed.

markusgft · 2019-10-10T06:23:05Z

Thanks a lot @giaf, there is no urgency here.
You're right about the LU factorization, this is redundant.
Let me know if you have questions, I can also send you different references in case the above mentioned one is too general.

giaf · 2019-10-13T15:13:31Z

Hi Markus,

I added the getters, an example of the usage is here
https://github.com/giaf/hpipm/blob/master/examples/c/example_d_ocp_qp_unconstr.c#L375

There are getters for the matrices Lr, Ls and P. The feedback matrix K can be easily computed as in your example, as

K= - (Ls * inv(Lr))'

(where the inverse is only for mathematical notation and it should be replaced by a solution of a triangular system).
If the initial state is eliminated from the optimization variables before calling HPIPM, then at the first stage the matrix K has to be computed externally, as you are already doing. However, now the getter for P returns the correct value for the Riccati recursion matrix for all algorithms in HPIPM.

There are getters also for the vectors ls and p, but these are only valid in case of unconstrained MPC problems. In case of constrained problems, they may return different values depending on the chosen algorithm (i.e. absolute vs relative formulation, enabling of predictor-corrector and so on).
The feedback vector k is computed as

k = - inv(Lr) * ls

(where the inverse is only for mathematical notation and it should be replaced by a solution of a triangular system).

markusgft · 2019-10-14T06:38:12Z

Thanks for the quick implementation @giaf !
I will try to find a free time-slot for testing the new getters as soon as possible and come back to you soon!

markusgft · 2019-10-18T22:05:05Z

@giaf I was able to test the new getters and am glad to confirm that they match exactly what I needed! A quick clarification:
There is only a getter d_ocp_qp_ipm_get_ric_lr (not ls) and hence the feedback vector k should be computed as

k = - inv(Lr).transpose() * lr

Is this what you had in mind? In any case, it matches the quantity that I had cited above!

giaf · 2019-10-22T07:24:10Z

Yes sorry, this is what I meant!

markusgft · 2019-10-22T09:35:46Z

Great! Do you think it would make sense to fast-forward hpipm release 0.1.1 to include these changes?

giaf · 2019-10-22T11:27:19Z

It doesn't make sense now, since I'm also in the middle of some work on QCQP. That stuff has rather high priority, so I would expect to finalize it in the next few weeks. At that point, I would advance the version to 0.1.2.

For now the best option for you would be to refer to the current commit 806c845

markusgft · 2019-10-22T15:57:15Z

Okay, that sounds good to me! I wil get notified about the new release automatically anyway. All the best, and thanks again!

giaf · 2020-04-04T09:39:03Z

@markusgft I recently did some more work on this, now there are also routines to directly get the feedback gains K and k. All these routines now also work for the constrained case, and their API is slightly changed now, in case you want to upgrade.

markusgft · 2020-04-04T19:53:51Z

Hi @giaf, thanks for the heads up!
I upgraded to your new interface and did some tests (comparison to LQR/iLQR backward pass), with the following findings:

the vector that you call k, i.e. the feedforward term, matches the iLQR feedforward for all stages
the feedback matrix K is computed indentically to the LQR bakward pass for all stages but for the first.
I believe the reason for that behaviour is that for stage k=0, HPIPM does not have meaningful entries for the matrix that you call Ls. What I did so far, is that I manually reconstructed the LQR feedback matrix for stage 0, using the following trick:

  // get Lr for stage 0 and compute its inverse, Lr_inv_ (omitted)
  // ...
  // retrieve Riccati matrix for stage 1 (we call it S, others call it P)
    Eigen::Matrix<double, state_dim, state_dim> S1;
    ::d_ocp_qp_ipm_get_ric_P(&qp_, &arg_, &workspace_, 1, S1.data());

    // step2: compute G[0]
    Eigen::Matrix<double, control_dim, state_dim> G;
    G = p.U_[0]; // cross-terms derivative 
    G.noalias() += p.B_[0].transpose() * S1 * p.A_[0];

    // step3: compute K[0] = H.inverse() * G
    K_[0] = (-Lr_inv_[0].transpose() * Lr_inv_[0] * G);   // <== this is the correct LQR feedback gain matrix for stage 0

Do you think it makes sense for you to adapt the computation of K[0] on hpipm side? Thanks in advance!

giaf · 2020-04-05T21:48:00Z

Hi @markusgft thanks for the feedback!
Could you give me some more details on the setup? Like, constrained or unconstrained problem? Value of nx[0]? (i.e. did you remove x0, or is it used for equal upper and lower bounds?)

markusgft · 2020-04-07T05:37:29Z

Hi @giaf, sure!
The minimal setup that I was testing was an unconstrained problem, and

nu_[N_] = 0;                      // last input is not a decision variable
nx_[0] = 0;                       // initial state is not a decision variable but given

I hope that helps!

giaf · 2020-04-07T08:43:58Z

Ok then this is the reason for K[0]: in HPIPM this is a matrix of size nu[0] x nx[0], so 0. This is the intended behavior, since e.g. A[0] is not even part of the QP formulation, so there is no way I can compute this internally anyway.

If you keep x[0] as an optimization variable and set the relative upper and lower bounds to x0, then you would also get a meaningful K[0]. But this is much slower (and likely less accurate) in your case, since instead of solving an unconstrained MPC problem with a single Riccati swap, you would solve a constrained MPC problem using IPM and one Riccati swap per IPM iteration.

markusgft · 2020-04-09T16:37:50Z

Thank you @giaf for the clarification.
I will stick to the manual computation of the feedback gains for the first stage ... your new feature is very useful, since it allowed me to eliminate many redundant computations.
I saw that you released a tag for hpipm 0.1.2.... are you going to create a matching tag for blasfeo as well?

markusgft mentioned this issue Oct 10, 2019

Get TVLQR-feedback matrices K and feedforward lv directly from HPIPM, without custom reconstruction step ethz-adrl/control-toolbox#56

Closed

markusgft closed this as completed Oct 22, 2019

This was referenced Apr 4, 2020

v3.0.2 - adapting latest changes in hpipm for evaluating feedback matrices ethz-adrl/control-toolbox#111

Merged

v3.0.3devel - adapting latest changes in hpipm for evaluating feedback matrices ethz-adrl/control-toolbox#112

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extracting feedback-gain matrices from d_ocp_qp_ipm_ws #76

Extracting feedback-gain matrices from d_ocp_qp_ipm_ws #76

markusgft commented Oct 4, 2019

markusgft commented Oct 5, 2019

giaf commented Oct 5, 2019

markusgft commented Oct 8, 2019

giaf commented Oct 9, 2019

markusgft commented Oct 10, 2019

giaf commented Oct 13, 2019

markusgft commented Oct 14, 2019

markusgft commented Oct 18, 2019

giaf commented Oct 22, 2019

markusgft commented Oct 22, 2019

giaf commented Oct 22, 2019

markusgft commented Oct 22, 2019

giaf commented Apr 4, 2020

markusgft commented Apr 4, 2020 •

edited

Loading

giaf commented Apr 5, 2020

markusgft commented Apr 7, 2020

giaf commented Apr 7, 2020

markusgft commented Apr 9, 2020

Extracting feedback-gain matrices from d_ocp_qp_ipm_ws #76

Extracting feedback-gain matrices from d_ocp_qp_ipm_ws #76

Comments

markusgft commented Oct 4, 2019

markusgft commented Oct 5, 2019

giaf commented Oct 5, 2019

markusgft commented Oct 8, 2019

giaf commented Oct 9, 2019

markusgft commented Oct 10, 2019

giaf commented Oct 13, 2019

markusgft commented Oct 14, 2019

markusgft commented Oct 18, 2019

giaf commented Oct 22, 2019

markusgft commented Oct 22, 2019

giaf commented Oct 22, 2019

markusgft commented Oct 22, 2019

giaf commented Apr 4, 2020

markusgft commented Apr 4, 2020 • edited Loading

giaf commented Apr 5, 2020

markusgft commented Apr 7, 2020

giaf commented Apr 7, 2020

markusgft commented Apr 9, 2020

markusgft commented Apr 4, 2020 •

edited

Loading