New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark OpenBLAS and consider it for numpy #3763
Comments
BTW searching for such benchmarks online I rediscovered some earlier benchmarks written for Pyodide 0.17 and non-wasm BLIS: https://gist.github.com/rth/c71fe792eb56fb271317e35e08576c7a |
All the benchmarks are run with the following version Host Python
Results are for a square matrix dot product; time is in ms.
Pyodide 0.23.1 / WASM (reference BLAS) (same versions as above)
so 30-35x slower for float32 and 15-20x slower for float64. SIMD likely accounts for a large part of the difference (and even more so for float32) Pyodide 0.24.0dev / WASM (OpenBLAS) Python version: 3.11.3
So it's indeed 2-3x faster than reference BLAS on WASM even without SIMD enabled for dot matrix product. OpenBLAS is 1.75MB compressed though. So we need to see if enabling openblas for numpy (currently 2.82MB compressed) would be the tradeoff in size. Edit: for future reference, benchmark scripts are below, import numpy as np
import scipy
from scipy.linalg import blas
import timeit
import sys
def benchmark_blas_func(blas_func, n, dtype, num_trials=10):
# Create random matrices
A = np.random.rand(n, n).astype(dtype)
B = np.random.rand(n, n).astype(dtype)
# Measure the time taken for matrix multiplication
time_taken = timeit.timeit(lambda: blas_func(alpha=1.0, a=A, b=B),
number=num_trials)
return 1e3*time_taken / num_trials
matrix_sizes = [100, 500, 1000, 2000]
print("Matrix multiplication benchmark (average time in ms):")
print('Python version: ', sys.version)
print("Numpy version: ", np.__version__)
print("Scipy version: ", scipy.__version__)
print("Matrix Size | float32 | float64")
for n in matrix_sizes:
time_sgemm = benchmark_blas_func(blas.sgemm, n, np.float32)
time_dgemm = benchmark_blas_func(blas.dgemm, n, np.float64)
print(f"{n:11d} | {time_sgemm:8.2f} | {time_dgemm:8.2f}") benchmark.js const { loadPyodide } = require("pyodide");
async function main() {
let pyodide = await loadPyodide({
indexUrl: "https://cdn.jsdelivr.net/pyodide/dev/full/",
});
await pyodide.loadPackage(["numpy", "scipy"]);
return pyodide.runPythonAsync(`
<copy-paste-the-above-python-code>
`);;
}
main(); |
Would it be interesting to also compare to |
@rth @lesteve we're considering dropping |
It would be interesting to benchmark OpenBLAS (for instance for matrix multiplication). Depending on the result we could consider using it with numpy. Though it's a bit of a size/performance compromise.
The text was updated successfully, but these errors were encountered: