Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BGV]: add lazy relinearization #599

Closed
j2kun opened this issue Apr 5, 2024 · 21 comments · Fixed by #1016
Closed

[BGV]: add lazy relinearization #599

j2kun opened this issue Apr 5, 2024 · 21 comments · Fixed by #1016
Assignees
Labels
dialect: bgv Issues concerning the bgv dialect newcomer project Project ideas for new contributors

Comments

@j2kun
Copy link
Collaborator

j2kun commented Apr 5, 2024

Ideally this is done in a way that the optimization pass can be shared between all relinearizing ops across all schemes. In particular, we should implement a RelinearizeLike trait/interface, and implement the optimization pass against the interface.

@j2kun j2kun added good first issue Good for newcomers dialect: bgv Issues concerning the bgv dialect labels Apr 5, 2024
@ahmedshakill
Copy link
Contributor

Hi @j2kun I would like to work on this. Could you shade some light on how may I approach this. I have gathered some basic knowledge about bgv, openfhe and their conversion dialects.
About lazy-ness: we want to delay the relinearization computation ( and also to find the scope to do so) ? How to introduce laziness to a computation in a pass?
There's an idea of lazy loading in mlir::BytecodeReader. But I got lost and couldn't extract anything helpful from there.

@j2kun
Copy link
Collaborator Author

j2kun commented Jun 9, 2024

Sure! The basic idea is that an arith.muli lowers to the ops bgv.mul and bgv.relin. The relin is necessary to return the key basis from (1, s, s^2) to (1, s) after the bgv.mul, but sometimes the relin op can be delayed if the output of the mul is combined with other values in that same key basis (1, s, s^2).

For example, if you multiply two pairs of values and then add their results, you can do it with one relin op after the addition, rather than two relin ops, one after each mul.

The implementation of this feature would, I believe, require some relatively straightforward tablegen patterns. Look for opportunities to push a relin op further back, by matching on ops whose return values have a ciphertext type with the same key basis attribute. If you'd like, we could work through some examples in detail in this issue thread.

@AlexanderViand-Intel
Copy link
Collaborator

While looking at bgv.mul, I noticed it (incorrectly) wouldn't allow having sequential multiplications without relineraization in-between. With the fix in (currently PR #733), I'd suggest the simplest way to realize this is to proceed as follows

  • make the secret->bgv pass lowering of multiplication not emit any relinearization.
  • have a new "lazy relin" pass with two simple patterns that rewrite mul(mul1, _) and mul(_, mul1) to mul(relin(mul1),_ and mul(_, relin(mul1)), respectively. As Jeremy suggested, these should be pretty straightforward DDR patterns.

@j2kun
Copy link
Collaborator Author

j2kun commented Jun 10, 2024

I didn't realize that relinearizing after repeated multiplications is a thing people do. I assume that:

  • The key basis for that would be (1, s, s^2, s^3, s^4)
  • We'd have to implement more generalized lowerings of relinearize for all these new possible key bases (and, e.g., maybe some backends don't support it? Does OpenFHE allow this?)

@AlexanderViand-Intel
Copy link
Collaborator

AlexanderViand-Intel commented Jun 10, 2024

I didn't realize that relinearizing after repeated multiplications is a thing people do.

It's not really something people do outside of very specific use cases. I think quite a few of the common libraries don't even implement "general" relinearization, because it'd blow up the evaluation key size so much.
EDIT: I do think we should be able to model it in the IR, though, and not just because it makes lazy relin easier.

Coming back to the pass: My suggestion isn't necessarily to emit final programs with repeated multiplications (i.e., the openFHE pipline would always contain --lazy-relin ). It's just that, with MLIR, it seems a lot easier to take a "not yet relin-managed" program and introduce only the needed operations and harder to remove the redundant ones.

@j2kun
Copy link
Collaborator Author

j2kun commented Jun 10, 2024

I do worry a little bit that the key basis in the attributes would explode for some mul-heavy IRs, but I'm down to try it.

Another approach would be to take any IR, using relins or not, and re-relinearize-ify it. That would be the most general approach, but couldn't be done with a simple set of patterns. I bet we could formulate it as an ILP if it gets to that.

@ahmedshakill
Copy link
Contributor

If you'd like, we could work through some examples in detail in this issue thread.

Yes, working through some example would be helpful for me.

I was thinking about this pattern

  • mul(mul1, _) and mul(_, mul1) to mul(relin(mul1),_ and mul(_, relin(mul1))

do we have a possibility to reach at something like this mul3(_, relin(mul2(_, relin(mul1))))

@AlexanderViand-Intel
Copy link
Collaborator

do we have a possibility to reach at something like this mul3(_, relin(mul2(_, relin(mul1))))

I think so: if we had mul3(_, mul2(_, mul1(_,_))) to begin with, the RHS pattern would match on mul2 to give us mul3(_, mul2(_,relin(mul1(_,_))) and also on mul3 to then give us mul3(_,relin(mul2(_, relin(mul1(_,_))). Which is (I think?) what we want, right? This also works if the pattern is first applied on mul3 and then mul2, and it should also work if both operands are multiplications.

off-topic question: is it better to add a third pattern mul(mul1, mul2) for this case, or to just rely on the sequential application of the LHS/RHS patterns?

@ahmedshakill
Copy link
Contributor

ahmedshakill commented Jun 11, 2024

mul3(_ ,relin( mul2( relin(lmul1(x,y)), relin(rmul1(y,x)))))

x and y are not operations.

Here I am considering both operands of mul2 are multiplications. Inside mul2 the left and right operands would have same key basis I guess. So at this stage could we optimize the above to
mul3(_ ,relin(mul2( lmul1(x,y), rmul1(y,x))))
such that we relin only after mul2 ?

@AlexanderViand-Intel
Copy link
Collaborator

So at this stage could we optimize the above to mul3(_ ,relin(mul2( lmul1(x,y), rmul1(y,x)))) such that we relin only after mul2 ?

I don't think that'd be valid (assuming we never want to exceed ctxt-degree three) as the output of mul2 would be of degree six already.

@j2kun
Copy link
Collaborator Author

j2kun commented Jun 11, 2024

I think perhaps the simplest initial test case would be

%x = bgv.mul %0, %1
%x_relin = bgv.relinearize %x
%y = bgv.mul %3, %4
%y_relin = bgv.relinearize %y
%z = bgv.add %x_relin, %y_relin

Which should map to

%x = bgv.mul %0, %1
%y = bgv.mul %3, %4
%z = bgv.add %x, %y
%z_relin = bgv.relinearize %z

So long as %x_relin and %y_relin have no other uses. This example could be extended to more muls and a reducing add.

@j2kun
Copy link
Collaborator Author

j2kun commented Jun 11, 2024

Looking back on https://www.jeremykun.com/2023/11/15/mlir-a-global-optimization-and-dataflow-analysis/#an-ilp-optimization-pass, I'm not seeing any real obstacle to adapting that ILP to this problem. Basically, after each BGV op in the IR, you add a "InsertRelin" variable (instead of "InsertReduceNoise"), and a "KeyBasisDegree" variable instead of a "NoiseAt" variable. The constraints are simpler because KeyBasisDegree is additive in each mul and unchanged by other ops, and we can constrain the KeyBasisDegree of each starting SSA value and any decrypted/returned SSA values to be 1. There would be no upper bound on the maximum allowable key basis degree, though in practice we may want to make this configurable. The objective would be to minimize the sum of InsertRelin variable values.

@ahmedshakill what do you think, are you up for implementing an ILP? The existing implementation is here

@ahmedshakill
Copy link
Contributor

Yeah, I am up for it. The task is interesting and the resources (blogs and discussions) are really good. I would hate to lose this opportunity.
As a newbie, I would need a bit more time though than standard.

@AlexanderViand-Intel
Copy link
Collaborator

Looking back on https://www.jeremykun.com/2023/11/15/mlir-a-global-optimization-and-dataflow-analysis/#an-ilp-optimization-pass, I'm not seeing any real obstacle to adapting that ILP to this problem.

Mh, wouldn't you need to keep all the noise-related stuff in the problem, still? Otherwise, you could end-up with a trivial solution that lets the degree explode throughout the computation and only relinearizes at the very end, which would be both very slow and absolutely catastrophic for noise growth.

Maybe enforcing that no relin is "higher degree" than 3->2 is sufficient to make it work? I vaguely recall papers looking into things more optimal than the basic layz relin, but I'm not sure if they showed sufficient performance gains to make it worthwhile.

@j2kun
Copy link
Collaborator Author

j2kun commented Jun 12, 2024

Otherwise, you could end-up with a trivial solution that lets the degree explode throughout the computation and only relinearizes at the very end, which would be both very slow and absolutely catastrophic for noise growth.

That is true. You could easily force it to have max degree 3, or add a cost to each relin based on the input degree.

I didn't think relin affected noise growth though. Can't you modulus switch in a higher key basis?

@j2kun
Copy link
Collaborator Author

j2kun commented Jun 24, 2024

@ahmedshakill Any progress or help needed on this topic? Feel free to post a draft PR and I can take a look and offer some suggestions.

@ahmedshakill
Copy link
Contributor

I was taking a bit of time. I'll update you on my progress and post a pr asap.

@j2kun j2kun added newcomer project Project ideas for new contributors and removed good first issue Good for newcomers labels Jun 30, 2024
@ahmedshakill
Copy link
Contributor

Hi @j2kun it took me a while to respond. I have created a PR. In the current state it holds a initial template for the pass. Please have a look and let me know about your thoughts.

@AlexanderViand-Intel
Copy link
Collaborator

AlexanderViand-Intel commented Jul 3, 2024

Btw, I realized that my "simple DDR Pattern"-based approach probably wouldn't work, as it wouldn't correctly handle something like:

// Assume arg0, arg1, arg2, arg3 are "appropriately typed" function parameters
%0 = bgv.mul(%arg0, %arg1)
%1 = bgv.add(%0, %arg2)
%2 = bgv.mul(%1, %arg3)

I.e., if there are any kind of non-mul operations between the multiplications, the patterns wouldn't catch them.
However, @Maokami's new Mult-Depth Analysis should be the answer to these cases: #760

Even if you're going for a more ILP/optimization based approach, I'd expect the mult-depth analysis to come in handy :)

@ahmedshakill
Copy link
Contributor

ahmedshakill commented Jul 11, 2024

what do you think, are you up for implementing an ILP?

I think, now I understand the weight of the above line. Here is the current progress status

  • The lazy relin pass is visible in heir-opt.
  • Looked into google or-tools tutorial on LP and MIP.
  • Need an in-depth look into Jeremy's ILP blog.
  • Translate the minimization problem into an ILP one.
  • Introduce trait/interface

@j2kun
Copy link
Collaborator Author

j2kun commented Jul 11, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dialect: bgv Issues concerning the bgv dialect newcomer project Project ideas for new contributors
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants