Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPU utilization for maths problems #51

Open
KhDenys opened this issue Jun 11, 2024 · 9 comments
Open

NPU utilization for maths problems #51

KhDenys opened this issue Jun 11, 2024 · 9 comments
Labels
good first issue Good for newcomers

Comments

@KhDenys
Copy link

KhDenys commented Jun 11, 2024

Hi, I'm interested in using NPU processors / AI accelerators for pure maths problem, and I wonder if I can use this library in such scenarios? Are there plans to develop the API (add more functions that can be performed on the NPU, build pipelines, etc.)?

Thanks

@alessandropalla alessandropalla added the good first issue Good for newcomers label Jun 11, 2024
@alessandropalla
Copy link
Contributor

I was thinking of accelerate some n-body problems but never got the time. What kind of problems are you most interested in? Do you have any API in mind that can facilitate your work? I'm very interested in this use-case

@KhDenys
Copy link
Author

KhDenys commented Jun 13, 2024

@alessandropalla I've worked with math models of quantum dots, and there are used theoretically infinity detentions matrix to describe a particle/quantum system. On practice we use matrices 16x16 since all computations may take from few minutes to few hours. AFAIK Computational Number Theory also relies on huge matrices and the long arithmetic.

So it would be great if something like numpy API and have possibilities to create computation pipelines which will doing on NPU without switching contexts.

Thanks

@alessandropalla
Copy link
Contributor

So if it is matrix matrix multiplications that you want to accelerate we can help you. NPU can give you a massive acceleration over numpy for such operations. Here for example the code to compare NPU vs Numpy on matmukl on a [1024 x 1024] x [1024 x 1024] Matrix multiplication

import intel_npu_acceleration_library as npu_lib
import numpy as np
import tqdm
import time

def npu_vs_numpy(inC, outC, batch, n_iters=500):
    data = []

    X = np.random.uniform(-1, 1, (batch, inC)).astype(np.float16)
    W = np.random.uniform(-1, 1, (outC, inC)).astype(np.float16)
    

    mm = npu_lib.backend.MatMul(inC, outC, batch)

    # Run the actual computation
    print(f"Running {n_iters} iterations of matmul with shape ({batch}, {inC}) x ({outC}, {inC})")
    print("Running NPU...")
    for _ in tqdm.tqdm(range(n_iters)):
        npu_start = time.perf_counter()
        mm.run(X, W)
        npu_stop = time.perf_counter()

    W_T = W.T

    print("Running Numpy...")
    for _ in tqdm.tqdm(range(n_iters)):
        np_start = time.perf_counter()
        np.matmul(X, W_T)
        np_stop = time.perf_counter()

        data.append({"npu_runtime_ms": (npu_stop - npu_start) * 1000, "numpy_runtime_ms": (np_stop - np_start) * 1000})

    print(f"NPU runtime: {np.mean([d['npu_runtime_ms'] for d in data]):.2f} ms ± {2 * np.std([d['npu_runtime_ms'] for d in data]):.2f} ms")
    print(f"Numpy runtime: {np.mean([d['numpy_runtime_ms'] for d in data]):.2f} ms ± {2 * np.std([d['numpy_runtime_ms'] for d in data]):.2f} ms")

npu_vs_numpy(1024, 1024, 1024, n_iters=50)

NPU runtime: 1.66 ms ± 0.00 ms
Numpy runtime: 2395.62 ms ± 199.42 ms

As you can see speedup can be very significative

If you need to accelerate other operations you can use the NNFactory class to build your pipeline. Let me know if I can help you

@KhDenys
Copy link
Author

KhDenys commented Jun 14, 2024

@alessandropalla Thanks for your example, I think it is very useful. However, unfortunately I’m only planning to buy a laptop or pc with next gen Intel cpu (with 48 TOPs), so I can’t test the perfomance differences.

Also I have no clue how to make all others matrix operation (add, subtract, inverse, eigenvalues) or how I can built them by the api. Could you point to some resource where I can find more Information?

@alessandropalla
Copy link
Contributor

Can you make ax example of a numpy code that you'd like to get accelerated?

@KhDenys
Copy link
Author

KhDenys commented Jun 17, 2024

@alessandropalla basically I want to implant long arithmetic for integers (Schönhage-Strassen's algorithm) it's for pure math problems. For quantum problems it require all numpy's linear algebra module.

Again I can implant all needed stuff if it will clear for me have to use the NPU for matrix operations since currently only matmul has been described. Does it possible to have such api?

I really appreciate your help, thank you!

@alessandropalla
Copy link
Contributor

I think that there is a very simple and elegant solution to this. Numpy allows to write custom array dispatcher (https://numpy.org/doc/stable/user/basics.dispatch.html#) that allows the NPU to be used from pure numpy. I think it will be very handy :) We have already done something similar for torch that has the same mechanism. I'll keep you updated

@KhDenys
Copy link
Author

KhDenys commented Jul 11, 2024

@alessandropalla sounds very interesting. Thanks for the update. Looking forward to being able to use NPU natively in numpy!

@KhDenys
Copy link
Author

KhDenys commented Nov 21, 2024

@alessandropalla hey, just kindly remind you about this topic😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants