-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify MULTIPLY-COMPLEX-MATRIX. Cons less, compute more. #60
Simplify MULTIPLY-COMPLEX-MATRIX. Cons less, compute more. #60
Conversation
This is on all implementations of BLAS? |
@notmgsk I have tested it on OpenBLAS but I expect the trend to be the same in MKL and Apple's Accelerate framework (i.e., new version faster than old version). I'll do more checks and report the results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had it in my head that zgemm
could destroy some of the input data, as well as writing to c
—but, looking at the documentation again, I don't see any mention of that.
If it's true that a
and b
stay untouched, then this is great!
In a future PR, I'd like to see the transpose options exposed (especially if one of them acts by complex conjugation—but, again looking at the docs, it's not clear that T and C act any differently??). In a PR after that, it'd be nice to upgrade CL-QUIL's m*
to allow individual matrices in the sequence to be tagged with transpose flags.
Both matrices are indeed unchanged and only The code for the reference implementation of [2] http://www.netlib.org/lapack/lapack-3.1.1/html/zgemm.f.html [3] https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks perfectly reasonable to my uninformed eyes.
Are there any tests covering this function? If not, any chance of adding a few, just to ease the nerves regarding @ecpeterson's concern about mutating inputs and maybe to verify that the behavior hasn't changed w.r.t. the corner cases (row & col vectors) that were explicitly handled previously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sold.
The results are consistent when running the snippet of code below using Accelerate, OpenBLAS, and MKL. (ql:quickload :magicl)
(in-package #:magicl)
(defparameter *n* 2048)
(defparameter *matrix* (magicl:random-gaussian-matrix *n* *n*))
(defparameter *vector* (magicl:random-matrix *n* 1))
(progn
(sb-ext:gc :full t)
(dotimes (i 3)
(time (magicl:multiply-complex-matrices *matrix* *vector*)))
(defparameter *results-0* (magicl:multiply-complex-matrices *matrix* *vector*)))
(format t "===================================~%")
(progn
(sb-ext:gc :full t)
(dotimes (i 3)
(time (magicl::multiply-complex-matrices* *matrix* *vector*)))
(defparameter *results-1* (magicl::multiply-complex-matrices* *matrix* *vector*)))
(format t "Error: ~A~%" (loop :for a :across (matrix-data *results-0*)
:for b :across (matrix-data *results-1*)
:maximizing (abs (- a b))))
(sb-ext:quit) |
@appleby The current test suite exercises this function extensively (albeit indirectly). |
As promised in #54, here is a patch that makes
multiply-complex-matrix
two orders of magnitude faster and less memory-hungry (see results below).