Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracting feedback-gain matrices from d_ocp_qp_ipm_ws #76

Closed
markusgft opened this issue Oct 4, 2019 · 18 comments
Closed

Extracting feedback-gain matrices from d_ocp_qp_ipm_ws #76

markusgft opened this issue Oct 4, 2019 · 18 comments

Comments

@markusgft
Copy link

Dear @giaf ,

in closed-loop single shooting SQP algorithms, like iLQR, we usually use the time-varying LQR feedback from the riccati recursion to forward simulate the system dynamics (generate a roll-out) in the next SQP iteration.

Here is an example how I reconstructed the LQR-feedback gains from the ocp_qp_ipm workspace after successful OCP solution (code), based on the ct-v2 tag of hpipm:

For all stages except for the first one, i.e. i=2,...,N

    Eigen::Matrix<double, control_dim, control_dim> Lr;    
    Eigen::Matrix<double, state_dim, control_dim> Ls;
    for (int i = 1; i < N; i++)
    {
        ::d_cvt_strmat2mat(Lr.rows(), Lr.cols(), &workspace_.L[i], 0, 0, Lr.data(), Lr.rows());
        ::d_cvt_strmat2mat(Ls.rows(), Ls.cols(), &workspace_.L[i], Lr.rows(), 0, Ls.data(), Ls.rows());
        TVLQR_K_ [i] = (-Ls * Lr.partialPivLu().inverse()).transpose();
    }

For the first time-step, i=0, it was not so straightforward, in fact we had to reconstruct the feedback gain for the first stage methodically:

    // step 1: reconstruct Hessian for first time step H
    Eigen::Matrix<double, control_dim, control_dim> Lr;
    ::blasfeo_unpack_dmat(Lr.rows(), Lr.cols(), &workspace_.L[0], 0, 0, Lr.data(), Lr.rows());
    Eigen::Matrix<double, control_dim, control_dim> H;
    H = Lr.template triangularView<Eigen::Lower>() * Lr.transpose();  // Lr is cholesky of H

    // step2: reconstruct riccati matrix S[1] for second time-step
    Eigen::Matrix<double, state_dim, state_dim> Lq;
    ::blasfeo_unpack_dmat(Lq.rows(), Lq.cols(), &workspace_.L[1], control_dim, control_dim, Lq.data(), Lq.rows());
    Eigen::Matrix<double, state_dim, state_dim> S;
    S = Lq.template triangularView<Eigen::Lower>() * Lq.transpose();  // Lq is cholesky of S

    // step3 : compute tvlqr-feedback for first stage
    TVLQR_K_ [0] = (-H.inverse() *  (P_[0] + B_[0].transpose() * S * A_[0])); 

In above code, A_ and B_ refer to LTV-system matrices at respective stages, and P_ is obviously the second-order cost cross-term from an LQ OCP problem. TVLQR_K_ is the desired feedback matrix, and the ocp workspace is denoted "workspace_".

Now to the question: the issue is, that with the newest HPIPM release, I can still successfully reconstruct the TVLQR-feedback for stages 2...N, but it seems that the content of workspace_.L[0] has changed ? I fail in reconstructing appropriate TVLQR-feedback for the first time step.

Sorry for the lengthy introduction -- but would you mind commenting on as to what extent the meaning of the first entry of L, i.e. L[0], has changed in the workspace?

Thanks a lot in advance and all the best!

@markusgft
Copy link
Author

Update: I noticed that the above extraction of TVLQR-feedback works also for the first-stage, if I set the settings flag ric_alg=1 via d_ocp_qp_ipm_arg_set_ric_alg(&ric_alg, &arg);
So when the square-root ricatti algorithm is active, it seems to work.
Just for my understanding -- what actually happens to the workspace_.L[0] variable when ric_alg=0?

@giaf
Copy link
Owner

giaf commented Oct 5, 2019

Hi, that's a good hint!

When ric_alg=0 the Riccati is implemented using a different algorithm (the 'classical' Riccati recursion, without the cholesky factorization of the recursion matrix).

I'll look into adding a proper getter to get the feedback matrix, such that this works regardless of the algorithm used (and if the algorithm is changed again in the future), following what I wrote in the other issue. Get back on that soon!

@markusgft
Copy link
Author

Hi @giaf,
thanks for clarifying this! I would welcome a dedicated getter-routine for the Feedback-matrices very much, and it may also help to get further users in the DDP/iLQR communities!
Speaking of this, I noticed that a getter method for another quantity would be incredibly helpfull, too.
To simplify the discussion, let's take a look at equations, maybe this paper here. The relevant equation is (6):
In here the control input update gets computed as
Screenshot from 2019-10-08 12-26-45
and the "feedback matrix" in question is CodeCogsEqn It would be absolutely fantastic if you could also add a getter to the second quantity in Eq. (6), that is CodeCogsEqn(1), which is basically a feedforward term.
Thanks a lot in advance!!!

@giaf
Copy link
Owner

giaf commented Oct 9, 2019

Ok, these days here we had a group retreat and now I'm going on holidays for a few days, so it will need a bit more time.

Btw, what is this peace of code doing? I'm not familiar with eigen

TVLQR_K_ [i] = (-Ls * Lr.partialPivLu().inverse()).transpose();

Is Lr.partialPivLu() taking the LU factorization of Lr? In theory Lr should already contain the lower triangular Cholesky factorization, so no extra factorization should be needed.

Anyway, I will study the references you sent me, to be sure to implement exactly what is needed.

@markusgft
Copy link
Author

Thanks a lot @giaf, there is no urgency here.
You're right about the LU factorization, this is redundant.
Let me know if you have questions, I can also send you different references in case the above mentioned one is too general.

@giaf
Copy link
Owner

giaf commented Oct 13, 2019

Hi Markus,

I added the getters, an example of the usage is here
https://github.com/giaf/hpipm/blob/master/examples/c/example_d_ocp_qp_unconstr.c#L375

There are getters for the matrices Lr, Ls and P. The feedback matrix K can be easily computed as in your example, as

K= - (Ls * inv(Lr))'

(where the inverse is only for mathematical notation and it should be replaced by a solution of a triangular system).
If the initial state is eliminated from the optimization variables before calling HPIPM, then at the first stage the matrix K has to be computed externally, as you are already doing. However, now the getter for P returns the correct value for the Riccati recursion matrix for all algorithms in HPIPM.

There are getters also for the vectors ls and p, but these are only valid in case of unconstrained MPC problems. In case of constrained problems, they may return different values depending on the chosen algorithm (i.e. absolute vs relative formulation, enabling of predictor-corrector and so on).
The feedback vector k is computed as

k = - inv(Lr) * ls

(where the inverse is only for mathematical notation and it should be replaced by a solution of a triangular system).

@markusgft
Copy link
Author

Thanks for the quick implementation @giaf !
I will try to find a free time-slot for testing the new getters as soon as possible and come back to you soon!

@markusgft
Copy link
Author

@giaf I was able to test the new getters and am glad to confirm that they match exactly what I needed! A quick clarification:
There is only a getter d_ocp_qp_ipm_get_ric_lr (not ls) and hence the feedback vector k should be computed as

k = - inv(Lr).transpose() * lr

Is this what you had in mind? In any case, it matches the quantity that I had cited above!

@giaf
Copy link
Owner

giaf commented Oct 22, 2019

Yes sorry, this is what I meant!

@markusgft
Copy link
Author

Great! Do you think it would make sense to fast-forward hpipm release 0.1.1 to include these changes?

@giaf
Copy link
Owner

giaf commented Oct 22, 2019

It doesn't make sense now, since I'm also in the middle of some work on QCQP. That stuff has rather high priority, so I would expect to finalize it in the next few weeks. At that point, I would advance the version to 0.1.2.

For now the best option for you would be to refer to the current commit 806c845

@markusgft
Copy link
Author

Okay, that sounds good to me! I wil get notified about the new release automatically anyway. All the best, and thanks again!

@giaf
Copy link
Owner

giaf commented Apr 4, 2020

@markusgft I recently did some more work on this, now there are also routines to directly get the feedback gains K and k. All these routines now also work for the constrained case, and their API is slightly changed now, in case you want to upgrade.

@markusgft
Copy link
Author

markusgft commented Apr 4, 2020

Hi @giaf, thanks for the heads up!
I upgraded to your new interface and did some tests (comparison to LQR/iLQR backward pass), with the following findings:

  • the vector that you call k, i.e. the feedforward term, matches the iLQR feedforward for all stages
  • the feedback matrix K is computed indentically to the LQR bakward pass for all stages but for the first.
    I believe the reason for that behaviour is that for stage k=0, HPIPM does not have meaningful entries for the matrix that you call Ls. What I did so far, is that I manually reconstructed the LQR feedback matrix for stage 0, using the following trick:
  // get Lr for stage 0 and compute its inverse, Lr_inv_ (omitted)
  // ...
  // retrieve Riccati matrix for stage 1 (we call it S, others call it P)
    Eigen::Matrix<double, state_dim, state_dim> S1;
    ::d_ocp_qp_ipm_get_ric_P(&qp_, &arg_, &workspace_, 1, S1.data());

    // step2: compute G[0]
    Eigen::Matrix<double, control_dim, state_dim> G;
    G = p.U_[0]; // cross-terms derivative 
    G.noalias() += p.B_[0].transpose() * S1 * p.A_[0];

    // step3: compute K[0] = H.inverse() * G
    K_[0] = (-Lr_inv_[0].transpose() * Lr_inv_[0] * G);   // <== this is the correct LQR feedback gain matrix for stage 0

Do you think it makes sense for you to adapt the computation of K[0] on hpipm side? Thanks in advance!

@giaf
Copy link
Owner

giaf commented Apr 5, 2020

Hi @markusgft thanks for the feedback!
Could you give me some more details on the setup? Like, constrained or unconstrained problem? Value of nx[0]? (i.e. did you remove x0, or is it used for equal upper and lower bounds?)

@markusgft
Copy link
Author

Hi @giaf, sure!
The minimal setup that I was testing was an unconstrained problem, and

nu_[N_] = 0;                      // last input is not a decision variable
nx_[0] = 0;                       // initial state is not a decision variable but given

I hope that helps!

@giaf
Copy link
Owner

giaf commented Apr 7, 2020

Ok then this is the reason for K[0]: in HPIPM this is a matrix of size nu[0] x nx[0], so 0. This is the intended behavior, since e.g. A[0] is not even part of the QP formulation, so there is no way I can compute this internally anyway.

If you keep x[0] as an optimization variable and set the relative upper and lower bounds to x0, then you would also get a meaningful K[0]. But this is much slower (and likely less accurate) in your case, since instead of solving an unconstrained MPC problem with a single Riccati swap, you would solve a constrained MPC problem using IPM and one Riccati swap per IPM iteration.

@markusgft
Copy link
Author

Thank you @giaf for the clarification.
I will stick to the manual computation of the feedback gains for the first stage ... your new feature is very useful, since it allowed me to eliminate many redundant computations.
I saw that you released a tag for hpipm 0.1.2.... are you going to create a matching tag for blasfeo as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants