Feat: Adding Linear Algebra Dot operation support #116

SwayamInSync · 2025-07-11T08:18:20Z

This PR contributes as follows:

Ship the dot method within package that supports following operations
- vector-vector dot product
- matrix-vector dot product
- matrix-matrix multiplication
Optimized Linear Algebra ops supported by the QBLAS on x86-64 Linux and ARM machine. On windows it fallbacks to naive implementation due to QBLAS incompatibility with MSVC
Test Suite to validate dot products between inputs

Images below are the performance comparison

Machine: x86-64 Linux with 96 cores

Machine: MacOS-Silicon (ARM) with 8 cores

To compile without QBLAS set DISABLE_QUADBLAS as CFLAGS and CXXFLAGS

SwayamInSync · 2025-07-11T08:24:31Z

Ahh forgot to update the general CI, or should we remove the quaddtype from there, given that build_wheels.yml has those same checks in its CI with more strict and on all platforms

juntyr · 2025-07-11T15:22:26Z

Would the new functionality only be accessed through the dot function or is there a way to call that automatically when using numpy dot, matmul, etc?

ngoldbaum · 2025-07-11T15:28:40Z

Unfortunately no, not easily. We'd need to add a new dtype hook to the DType API in NumPy. Worth doing though! See numpy/numpy#28516 which adds a hook for sorting.

SwayamInSync · 2025-07-19T23:21:10Z

As per @seberg 's suggestion, this now implements numpy.matmul ufunc.

Some extra stuff includes:

Refactor of umath
Added a release_tracker.md to track the progress (pinned all the available numpy ufuncs and what we support right now)

@ngoldbaum please take a look and let me know if anything needs a fix

quaddtype/release_tracker.md

ngoldbaum · 2025-07-21T14:36:47Z

This is a lot of code! I'll try to give this a once-over but I'm probably not going to have the bandwidth to go over this with a fine-toothed comb. @juntyr I'd appreciate it if you could also do a round of code review. Don't be afraid to ask questions if anything is confusing to you.

SwayamInSync · 2025-07-21T14:41:18Z

This is a lot of code! I'll try to give this a once-over but I'm probably not going to have the bandwidth to go over this with a fine-toothed comb. @juntyr I'd appreciate it if you could also do a round of code review. Don't be afraid to ask questions if anything is confusing to you.

Yeah the refactor of umath made it big (but those codes are already reviewed), here is what's new

umath/matmul.cpp & .h files
quadblas_interface.cpp and h files
tests/test_dot.py
GitHub CI files

ngoldbaum · 2025-07-21T14:44:16Z

quaddtype/README.md

@@ -53,11 +54,17 @@ source temp/bin/activate
 # Install the package
 pip install meson-python numpy pytest

-export LDFLAGS="-Wl,-rpath,$SLEEF_DIR/lib"
+export LDFLAGS="-Wl,-rpath,$SLEEF_DIR/lib -fopenmp -latomic -lpthread"


Does this work on Windows?

No, I can add a section for windows build too in README

(if not, maybe add a comment with a different suggestion)

ngoldbaum · 2025-07-21T14:44:46Z

quaddtype/numpy_quaddtype/__init__.py

 )

+import multiprocessing


why add this?

This was used earlier when dot exposed as python package, I'll remove this from here

ngoldbaum · 2025-07-21T14:45:49Z

quaddtype/numpy_quaddtype/src/quadblas_interface.cpp

+        return NULL;
+    }
+
+    QuadBLAS::set_num_threads(num_threads);


You could also maybe check for a setting from threadpoolctl: https://github.com/joblib/threadpoolctl

Good point, yes so by default (during the importing of package) it uses all the available cores but I also expose a python function which directly allow users to change this to whatever they want at any time.

I can add the functionality to just pick similar to numpy using threadpoolctl

quaddtype/numpy_quaddtype/src/quaddtype_main.c

quaddtype/numpy_quaddtype/src/umath/matmul.cpp

ngoldbaum · 2025-07-21T14:52:27Z

quaddtype/numpy_quaddtype/src/umath/matmul.cpp

+    if (descr_in1->backend != BACKEND_SLEEF || descr_in2->backend != BACKEND_SLEEF) {
+        PyErr_SetString(PyExc_NotImplementedError,
+                        "QBLAS-accelerated matmul only supports SLEEF backend. "
+                        "Other backends are not supported with QBLAS.");


you could maybe add some text that people should open an issue if they want this

This is QBLAS intrinsics, for now I implemented it using SLEEF based vectorized operations.
For longdouble I think OpenBLAS and this one I recently found https://github.com/nakatamaho/mplapack are better alternatives.
Although if demanded then we can move towards that direction

Will add the error message to open issue on QBLAS for this rather than here

ngoldbaum · 2025-07-21T14:56:19Z

quaddtype/numpy_quaddtype/src/umath/matmul.cpp

+        QuadPrecDTypeObject *descr_out = (QuadPrecDTypeObject *)given_descrs[2];
+        if (descr_out->backend != target_backend) {
+            PyErr_SetString(PyExc_NotImplementedError,
+                            "QBLAS-accelerated matmul only supports SLEEF backend for output.");


make this use exactly the same message as the other error above.

ngoldbaum · 2025-07-21T14:57:27Z

quaddtype/numpy_quaddtype/src/umath/matmul.cpp

+    }
+
+    return 0;
+}


I didn't carefully review the strided loop implementations, if someone else wants to do that it would be nice!

ngoldbaum · 2025-07-21T15:00:14Z

quaddtype/tests/test_dot.py

+
+# ================================================================================
+# UTILITIES
+# ================================================================================


These helpers probably want to live in e.g. a test_utils.py file. There's also almost certainly a few spots you can use them in the other test file to justify moving them out of this file.

ngoldbaum · 2025-07-21T15:51:42Z

quaddtype/tests/test_dot.py

+
+
+if __name__ == "__main__":
+    pytest.main([__file__, "-v"])


this bit at the end isn't necessary.

Sorry I didn't review the tests carefully - it's too much code for my poor brain...

SwayamInSync · 2025-07-21T17:42:09Z

Thanks @ngoldbaum will apply all the changes in next commit :) and so sorry this one got hectic

SwayamInSync added 25 commits July 9, 2025 13:01

interfacing with qblas

bc333df

adding test cases

babaa96

test-1: ci

c3aaa05

fixing ci

ca7dd6d

fixing ci

04314e3

fixing ci

037021a

fixing linux CI

d6fc9c6

fixing linux CI

f99f565

fixing linux CI

fb3579c

fixing linux CI

03e9acd

fixing linux CI

1ed7bab

fixing linux CI

63a355e

fixing linux CI

88a98d1

updating qblas:

764fc72

fixing macos CI

1669e5f

bumping macos deployment target

b35bac3

bumping macos deployment target

042b25a

dynamic macos deployment target

f78dd90

explicit init of res array in dot-mat-mat

cd88de0

fixing windows CI

abf0224

disabling qblas for windows; MSVC incompatibility

c5198d1

updating CI triggering paths

c0d93f8

updating CI triggering paths

838adee

reverting branch to main

433aa90

bumping qblas

5836505

SwayamInSync requested a review from ngoldbaum July 11, 2025 08:19

SwayamInSync self-assigned this Jul 11, 2025

SwayamInSync added 8 commits July 16, 2025 12:16

switching to apt

8516544

submodule fix

335f425

submodule fix

85e7840

submodule fix

e467f4b

initial matmul ufunc setup

e201b90

mid-way test

09918a3

shifting to matmul ufunc

70ca644

will figure out later

f89c2e6

SwayamInSync mentioned this pull request Jul 17, 2025

Extending packaged dot to support matmul ufunc SwayamInSync/numpy-user-dtypes#2

Merged

SwayamInSync added 11 commits July 19, 2025 13:46

matmul registered with naive

894a84d

adding initial qblas support to matmul ufunc, something is breaking, nan

6800a90

matmul ufunc completed, naive plugged, qblas experimental

742ce64

adding release tracker to keep record for tasks, v1.0.0

d993bc9

it should be failing but passes on x86-64

c518a29

ahh stupid me :), fallback to naive for MSVC

bbce2ac

switching to internal function use only

5e5fa65

this should fix them all

cec5ace

wrapping up

1fe6c81

updated branch to main

8f16b99

Merge pull request #2 from SwayamInSync/matmul-ufunc

238ef89

juntyr reviewed Jul 20, 2025

View reviewed changes

quaddtype/release_tracker.md Outdated Show resolved Hide resolved

SwayamInSync added 2 commits July 20, 2025 14:21

added test coverage in release_tracker.md

ed47e33

more edge tests

573eb76

SwayamInSync mentioned this pull request Jul 20, 2025

Implement sign, sigbit, and copysign ufuncs and extend unary op tests #118

Open

SwayamInSync added the numpy_quaddtype label Jul 20, 2025

ngoldbaum reviewed Jul 21, 2025

View reviewed changes

		)

		import multiprocessing

Uh oh!

Feat: Adding Linear Algebra Dot operation support #116

Are you sure you want to change the base?

Feat: Adding Linear Algebra Dot operation support #116

Uh oh!

Conversation

SwayamInSync commented Jul 11, 2025

Uh oh!

SwayamInSync commented Jul 11, 2025

Uh oh!

juntyr commented Jul 11, 2025

Uh oh!

ngoldbaum commented Jul 11, 2025

Uh oh!

SwayamInSync commented Jul 19, 2025

Uh oh!

Uh oh!

ngoldbaum commented Jul 21, 2025

Uh oh!

SwayamInSync commented Jul 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SwayamInSync commented Jul 21, 2025

Uh oh!

Uh oh!