-
-
Notifications
You must be signed in to change notification settings - Fork 9
Feat: Adding Linear Algebra Dot operation support #116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Ahh forgot to update the general CI, or should we remove the |
Would the new functionality only be accessed through the dot function or is there a way to call that automatically when using numpy dot, matmul, etc? |
Unfortunately no, not easily. We'd need to add a new dtype hook to the DType API in NumPy. Worth doing though! See numpy/numpy#28516 which adds a hook for sorting. |
As per @seberg 's suggestion, this now implements Some extra stuff includes:
@ngoldbaum please take a look and let me know if anything needs a fix |
This is a lot of code! I'll try to give this a once-over but I'm probably not going to have the bandwidth to go over this with a fine-toothed comb. @juntyr I'd appreciate it if you could also do a round of code review. Don't be afraid to ask questions if anything is confusing to you. |
Yeah the refactor of umath made it big (but those codes are already reviewed), here is what's new
|
@@ -53,11 +54,17 @@ source temp/bin/activate | |||
# Install the package | |||
pip install meson-python numpy pytest | |||
|
|||
export LDFLAGS="-Wl,-rpath,$SLEEF_DIR/lib" | |||
export LDFLAGS="-Wl,-rpath,$SLEEF_DIR/lib -fopenmp -latomic -lpthread" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work on Windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I can add a section for windows build too in README
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(if not, maybe add a comment with a different suggestion)
) | ||
|
||
import multiprocessing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why add this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was used earlier when dot exposed as python package, I'll remove this from here
return NULL; | ||
} | ||
|
||
QuadBLAS::set_num_threads(num_threads); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also maybe check for a setting from threadpoolctl
: https://github.com/joblib/threadpoolctl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, yes so by default (during the importing of package) it uses all the available cores but I also expose a python function which directly allow users to change this to whatever they want at any time.
I can add the functionality to just pick similar to numpy using threadpoolctl
if (descr_in1->backend != BACKEND_SLEEF || descr_in2->backend != BACKEND_SLEEF) { | ||
PyErr_SetString(PyExc_NotImplementedError, | ||
"QBLAS-accelerated matmul only supports SLEEF backend. " | ||
"Other backends are not supported with QBLAS."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could maybe add some text that people should open an issue if they want this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is QBLAS intrinsics, for now I implemented it using SLEEF based vectorized operations.
For longdouble I think OpenBLAS and this one I recently found https://github.com/nakatamaho/mplapack are better alternatives.
Although if demanded then we can move towards that direction
Will add the error message to open issue on QBLAS for this rather than here
QuadPrecDTypeObject *descr_out = (QuadPrecDTypeObject *)given_descrs[2]; | ||
if (descr_out->backend != target_backend) { | ||
PyErr_SetString(PyExc_NotImplementedError, | ||
"QBLAS-accelerated matmul only supports SLEEF backend for output."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this use exactly the same message as the other error above.
} | ||
|
||
return 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't carefully review the strided loop implementations, if someone else wants to do that it would be nice!
|
||
# ================================================================================ | ||
# UTILITIES | ||
# ================================================================================ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These helpers probably want to live in e.g. a test_utils.py file. There's also almost certainly a few spots you can use them in the other test file to justify moving them out of this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it
|
||
|
||
if __name__ == "__main__": | ||
pytest.main([__file__, "-v"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this bit at the end isn't necessary.
Sorry I didn't review the tests carefully - it's too much code for my poor brain...
Thanks @ngoldbaum will apply all the changes in next commit :) and so sorry this one got hectic |
This PR contributes as follows:
dot
method within package that supports following operationsImages below are the performance comparison
To compile without QBLAS set
DISABLE_QUADBLAS
asCFLAGS
andCXXFLAGS