TenSEALベンチマーク
====================
4層ニューラルネットワークでのユースケースを考える。  
入力は300次元で出力は16次元  
大体のアーキテクチャは、(300, 128, 64, 32, 16)に次元を落とす。  
バッチサイズを128とし、データ数は最大で３０年分のデータを想定する。  
よって、一回のFeedforwardで必要な演算は
- 128x300x128のテンソル積、128x128次元の行列和
- 128x128x64のテンソル積、128x64次元の行列和
- 128x64x32のテンソル積、128x32次元の行列和
- 128x32x16のテンソル積、128x16次元の行列和
これを繰り返し行う回数としてと考えると、  


In [1]:
!nvidia-smi

Wed Mar  6 23:43:59 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 528.24       Driver Version: 528.24       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ... WDDM  | 00000000:05:00.0  On |                  N/A |
|  0%   45C    P8    16W / 220W |    788MiB /  8192MiB |      5%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
%pip install tenseal

In [2]:
import time
import numpy as np
import matplotlib.pyplot as plt
import tenseal as ts
from contextlib import contextmanager

In [3]:
@contextmanager
def timer(name):
    t0 = time.time()
    yield
    print(f'[{name}] done in {time.time() - t0:.2f} s')

In [4]:
# Create the TenSEAL security context
def create_ctx():
    """Helper for creating the CKKS context.
    CKKS params:
        - Polynomial degree: 8192.
        - Coefficient modulus size: [40, 21, 21, 21, 21, 21, 21, 40].
        - Scale: 2 ** 21.
        - The setup requires the Galois keys for evaluating the convolutions.
    """
    poly_mod_degree = 8192
    coeff_mod_bit_sizes = [40, 21, 21, 21, 21, 21, 21, 40]
    ctx = ts.context(ts.SCHEME_TYPE.CKKS, poly_mod_degree, -1, coeff_mod_bit_sizes)
    ctx.global_scale = 2 ** 21
    ctx.generate_galois_keys()
    ctx.generate_relin_keys()
    return ctx

In [5]:
max_log_scale = 3
iter_num = 10
ctx = create_ctx()
vecs = list()
mats = list()
tens = list()

In [6]:
for log_scale in range(max_log_scale):
    vec = np.random.rand(*[10**log_scale]*1)
    vecs.append(vec)
    mat = np.random.rand(*[10**log_scale]*2)
    mats.append(mat)
    # ten = np.random.rand(*[10**log_scale]*3)
    # tens.append(ten)

In [7]:
for i in range(max_log_scale):
    with timer(f"raw-{10**i}-size vec + vec {iter_num} times"):
        for _ in range(iter_num):
            vecs[i] + vecs[i]
    with timer(f"raw-{10**i}-size mat + mat {iter_num} times"):
        for _ in range(iter_num):
            mats[i] + mats[i]
    # with timer(f"raw-{10**i}-size ten + ten {iter_num} times"):
    #     for _ in range(iter_num):
    #         tens[i] + tens[i]
    with timer(f"raw-{10**i}-size vec * vec {iter_num} times"):
        for _ in range(iter_num):
            vecs[i] @ vecs[i]
    with timer(f"raw-{10**i}-size mat * mat {iter_num} times"):
        for _ in range(iter_num):
            mats[i] @ mats[i]
    # with timer(f"raw-{10**i}-size ten * ten `iter_num` times"):
    #     for _ in range(iter_num):
    #         tens[i] @ tens[i]

[raw-1-size vec + vec 10 times] done in 0.00 s
[raw-1-size mat + mat 10 times] done in 0.00 s
[raw-1-size vec * vec 10 times] done in 0.00 s
[raw-1-size mat * mat 10 times] done in 0.00 s
[raw-10-size vec + vec 10 times] done in 0.00 s
[raw-10-size mat + mat 10 times] done in 0.00 s
[raw-10-size vec * vec 10 times] done in 0.00 s
[raw-10-size mat * mat 10 times] done in 0.00 s
[raw-100-size vec + vec 10 times] done in 0.00 s
[raw-100-size mat + mat 10 times] done in 0.00 s
[raw-100-size vec * vec 10 times] done in 0.00 s
[raw-100-size mat * mat 10 times] done in 0.00 s


In [8]:
vecs = [ts.ckks_vector(ctx, vec) for vec in vecs]
mats = [ts.ckks_tensor(ctx, mat) for mat in mats]
# tens = [ts.ckks_tensor(ctx, ten) for ten in tens]

In [9]:
for i in range(max_log_scale):
    with timer(f"tenseal-{10**i}-size vec + vec {iter_num} times"):
        for _ in range(iter_num):
            vecs[i] + vecs[i]
    with timer(f"tenseal-{10**i}-size mat + mat {iter_num} times"):
        for _ in range(iter_num):
            mats[i] + mats[i]
    # with timer(f"tenseal-{10**i}-size ten + ten {iter_num} times"):
    #     for _ in range(iter_num):
    #         tens[i] + tens[i]
    with timer(f"tenseal-{10**i}-size vec * vec {iter_num} times"):
        for _ in range(iter_num):
            vecs[i].dot(vecs[i])
    with timer(f"tenseal-{10**i}-size mat * mat {iter_num} times"):
        for _ in range(iter_num):
            mats[i].dot(mats[i])
    # with timer(f"tenseal-{10**i}-size ten * ten `i/ter_num` times"):
    #     for _ in range(iter_num):
    #         tens[i].dot(tens[i])

[tenseal-1-size vec + vec 10 times] done in 0.01 s
[tenseal-1-size mat + mat 10 times] done in 0.00 s
[tenseal-1-size vec * vec 10 times] done in 0.12 s
[tenseal-1-size mat * mat 10 times] done in 0.12 s
[tenseal-10-size vec + vec 10 times] done in 0.00 s
[tenseal-10-size mat + mat 10 times] done in 0.41 s
[tenseal-10-size vec * vec 10 times] done in 0.49 s
[tenseal-10-size mat * mat 10 times] done in 17.27 s
[tenseal-100-size vec + vec 10 times] done in 0.00 s
[tenseal-100-size mat + mat 10 times] done in 47.75 s
[tenseal-100-size vec * vec 10 times] done in 1.04 s
[tenseal-100-size mat * mat 10 times] done in 16453.19 s
