-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-core compiler evaluation #8
Comments
Hi! -DUSE_GALOIS=ON is indeed the right way to enable the multi-core support. I was able to reproduce the behavior on my end (also did not see multiple cores used in a program with ample concurrency), so it seems we have a regression. I'll investigate, thanks for filing the issue! |
Turns out Galois was defaulting to 1 thread and we were missing a call to This issue should be fixed in |
Thank you very much for the support @olsaarik ! I tried right now to clone EVA and install it with -DUSE_GALOIS=ON but when I try to run my code I don't see any parallelism in place by looking at HTOP: My code is this: from eva import EvaProgram, Input, Output, evaluate
from eva import evaluate
from eva.ckks import CKKSCompiler
from eva.seal import generate_keys
from eva.metric import valuation_mse
from common import *
from random import uniform
import numpy as np
import unittest
import math
batch = 2
dim = 32
dim1 = 32
def linear(x, y):
return np.matmul(x, y) # + bias
def attention(x):
temp = np.array([i for i in range(dim*dim1)]).reshape(dim, dim1) / 1000
query = linear(x, temp)
key = linear(x, temp)
value = linear(x, temp)
return query, key, value
def self_attention(query, key, value):
temp = linear(query, key)
temp = linear(temp, np.arange(dim*dim).reshape(dim,dim))
temp = linear(temp, value)
return temp
def norm(x):
return linear(x, np.arange(dim*dim).reshape(dim,dim))
def ff(x):
return linear(linear(x, np.arange(dim*dim).reshape(dim, dim)/1000), np.arange(dim*dim).reshape(dim, dim)/1000)
def classifier(x):
v = np.arange(dim*2).reshape(dim,2)/1000
w = np.arange(2*2).reshape(2,2)/1000
return np.dot(np.dot(x, v), w)
transformer = EvaProgram('transformer', vec_size=dim*dim1)
with transformer:
x = np.array([Input(f'x{i}') for i in range(batch*dim*dim1)]).reshape(batch, dim, dim1)
print(x.shape)
query, key, value = attention(x)
attn = self_attention(query, key, value)
out = norm(attn)
out = ff(out)
out = out ** 2
out = norm(out)
out = out ** 2
out = classifier(out)
print(out.shape)
for i in range(out.shape[0]):
for j in range(out.shape[1]):
for k in range(out.shape[2]):
Output(f'y{i+j}', out[i][j][k])
transformer.set_input_scales(32)
transformer.set_output_ranges(32)
if __name__ == "__main__":
compiler = CKKSCompiler(config={'security_level':'128', 'warn_vec_size':'false'})
compiled, params, signature = compiler.compile(transformer)
print("poly degree: ", params.poly_modulus_degree)
print("prime bits: ", params.prime_bits)
print("rotations: ", params.rotations) I also tried to encapsulate the logic written inside the EvaProgram in a loop as suggested here #6 but only one core was being used, is there anything wrong? Do you have any working example on multiple cores? It would be very good to have an example for multi core usage for EVA in trying to obtain the required CKKS parameters. |
Thanks for clarifying your use case. Multi-core evaluation is currently only used for encrypting inputs and the actual evaluation of the program, i.e., I took a look at your program (nice re-use of Numpy by the way, I didn't know you could do that!) and it is larger than the programs we've used with EVA before. I exported it into DOT by writing out I tried playing with the dimensions and on my own laptop with 8 GB of RAM I'll dig into this more to figure out what is slowing down the compilation for this program. Thanks for the very interesting example! |
Good morning/evening @olsaarik , thank you so much for the detailed explanation! Yes indeed I tried to run the code with lower sizes like 4 or 8 exactly and it was fast, I got the parameters but then of course they are not good for way larger sizes like 32x32 or more. Unfortunately I don't have much experience / didn't find much documentation with how CKKS parameters are chosen, or better: compared to the other HE scheme provided in SEAL, BFV, I find it not very straightforward on how should I search/apply some kind of criteria for the right prime coefficients' bit sizes (and the number of them) given a certain polynomial modulus degree. So, that's the reason I was leveraging on EVA, which is indeed really so great! Ok that's great, I thank you very much! |
I am also hitting performance limits at the compiler step. With multiplicative depth of only 1, compile time seems to go up quadratically with the number of operations in the program. Since the compilation is just a constant number of traversals through the program, that is unexpected right? EDIT: To clarify, the multiplications are all plain text. Also, there are a similar number of Ciphertext additions. Additive depth 2.
|
It seems I am able to work around this issue by converting all python-land constants into Eva Inputs with is_encrypted=False. For example, if x is an encrypted Input, instead of doing 2*x, I do p*x with p = Input("p", is_encrypted=False) and then set p to 2 at evaluation time. |
Hello!
Really great work with EVA!
I was trying to use it but I was unable to scale it efficiently in time. I checked via htop on Ubuntu 20 that it was being used only one core.
Now, I am on a VM so if the problem is this that's okay but otherwise would it be possible to let the code being compiled have multi-core support?
I tried to build EVA with the
cmake -DUSE_GALOIS=ON .
command, but nothing changed.Thank you for your time!
The text was updated successfully, but these errors were encountered: