Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark OpenBLAS and consider it for numpy #3763

Open
rth opened this issue Apr 12, 2023 · 4 comments
Open

Benchmark OpenBLAS and consider it for numpy #3763

rth opened this issue Apr 12, 2023 · 4 comments

Comments

@rth
Copy link
Member

rth commented Apr 12, 2023

It would be interesting to benchmark OpenBLAS (for instance for matrix multiplication). Depending on the result we could consider using it with numpy. Though it's a bit of a size/performance compromise.

@rth
Copy link
Member Author

rth commented Apr 27, 2023

BTW searching for such benchmarks online I rediscovered some earlier benchmarks written for Pyodide 0.17 and non-wasm BLIS: https://gist.github.com/rth/c71fe792eb56fb271317e35e08576c7a

@rth rth removed the Sprint label Apr 27, 2023
@rth
Copy link
Member Author

rth commented Apr 27, 2023

All the benchmarks are run with the following version

Host Python

conda create -c conda-forge-n pyodide-bench python=3.11 numpy=1.24 scipy=1.9.3
conda activate pyodide-bench
OMP_NUM_THREADS=1 python test.py
  • Python version: 3.11.3
  • Numpy version: 1.24.2
  • Scipy version: 1.9.3
  • OpenBLAS version: 0.3.21
  • CPU: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz

Results are for a square matrix dot product; time is in ms.

Matrix Size float32 (ms) float64 (ms)
100 0.03 0.04
500 2.44 5.48
1000 19.43 39.72
2000 141.12 318.13

Pyodide 0.23.1 / WASM (reference BLAS)

(same versions as above)

Matrix Size float32 (ms) float64 (ms)
100 0.99 0.96
500 73.89 75.15
1000 584.75 598.80
2000 5149.48 6785.50

so 30-35x slower for float32 and 15-20x slower for float64. SIMD likely accounts for a large part of the difference (and even more so for float32)

Pyodide 0.24.0dev / WASM (OpenBLAS)

Python version: 3.11.3
Numpy version: 1.24.3
Scipy version: 1.10.1
OpenBLAS version: 0.3.23

Matrix Size float32 (ms) float64 (ms)
100 1.96 0.63
500 33.72 34.33
1000 266.86 271.83
2000 2167.66 2236.59

So it's indeed 2-3x faster than reference BLAS on WASM even without SIMD enabled for dot matrix product.

OpenBLAS is 1.75MB compressed though. So we need to see if enabling openblas for numpy (currently 2.82MB compressed) would be the tradeoff in size.

Edit: for future reference, benchmark scripts are below,
benchmark.py

import numpy as np
import scipy
from scipy.linalg import blas
import timeit
import sys

def benchmark_blas_func(blas_func, n, dtype, num_trials=10):
    # Create random matrices
    A = np.random.rand(n, n).astype(dtype)
    B = np.random.rand(n, n).astype(dtype)

    # Measure the time taken for matrix multiplication
    time_taken = timeit.timeit(lambda: blas_func(alpha=1.0, a=A, b=B),
                               number=num_trials)
    return 1e3*time_taken / num_trials

matrix_sizes = [100, 500, 1000, 2000]

print("Matrix multiplication benchmark (average time in ms):")
print('Python version: ', sys.version)
print("Numpy version: ", np.__version__)
print("Scipy version: ", scipy.__version__)


print("Matrix Size |  float32 |  float64")
for n in matrix_sizes:
    time_sgemm = benchmark_blas_func(blas.sgemm, n, np.float32)
    time_dgemm = benchmark_blas_func(blas.dgemm, n, np.float64)
    print(f"{n:11d} | {time_sgemm:8.2f} | {time_dgemm:8.2f}")

benchmark.js

const { loadPyodide } = require("pyodide");

async function main() {
  let pyodide = await loadPyodide({
       indexUrl: "https://cdn.jsdelivr.net/pyodide/dev/full/",
  });
  await pyodide.loadPackage(["numpy", "scipy"]);
  return pyodide.runPythonAsync(`
<copy-paste-the-above-python-code>
  `);;
}

main();

@lesteve
Copy link
Contributor

lesteve commented Apr 28, 2023

Would it be interesting to also compare to numpy.dot in Pyodide (which currently does not use OpenBLAS so the fall-back option in this case lapack-lite or whatever it is called), given OpenMathLib/OpenBLAS#4023 (comment)?

@rgommers
Copy link
Contributor

@rth @lesteve we're considering dropping lapack-lite from NumPy, and Pyodide still uses that. It could switch over to the proper BLAS/LAPACK that is used for SciPy right now, or to OpenBLAS (this issue). If you have concerns about that, it'd be great of you could weigh in at numpy/numpy#24200 (comment). We obviously don't want to cause you problems here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants