-
-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NumPy tries to be too smart when dispatching to BLAS (gemm vs gemv), which prevents using MKL's strict CNR in full #13732
Comments
@oleksandr-pavlyk maybe you would know? (it's also a bit baffling that mm functions support strict cnr and not mv, and not the other way around) |
@aldanor - we do want code to be fast so the special-casing makes sense. The work-around would be to ensure the arrays always have at least 2 dimensions (being careful to add the |
@mhvk That's totally understandable (I haven't tested, but I'm guessing gemv would be slightly faster than gemm in general on same inputs). I was just wondering what's the current workaround to force it to use gemm (and maybe other folks can benefit from it as well if anyone cares about strict CNR). I actually have one idea, for matrix-vector dot products instead of doing I'll try to check it on Monday and report back (whether it would dispatch to gemm). |
I looked at using matrix matrix multiply instead of matrix vector at some point and, IIRC, there was no difference in speed. No doubt that might depend on the library, but I think a good |
@mhvk I think it has to be at least 3-dimensional, if you check out the snippet I posted above - if it’s 2-D and one of the dimensions is of size 1, numpy starts being smart and treats it as a vector. Yea, explicit matmat, vecmat and matvec would be awesome to have:
|
Doesn't scipy provide thinner wrappers around much of Lapack/BLAS nowadays: https://docs.scipy.org/doc/scipy/reference/linalg.blas.html which might fit better if you really want to know which function gets called (even if it is likely less convenient). |
I have asked around, and there is not immediate plants to make |
@charris I can try running some benchmarks on mkl and openblas next week. My thoughts exactly - I’d expect a mature gemm implementation to be smart enough to figure things like 1-sized dimensions automatically. Same like it would probably treat alpha=1 case separately (which is what numpy is calling). If there’s no visible performance difference, numpy could just use gemm then (this is still orthogonal to having explicit matvec etc public functions). |
@aldanor - that would be very useful. I agree that we should punt these kinds of decisions to the library, keeping our code simpler, but it would be good to be sure that we don't get hit with a big performance penalty. p.s. yes, you're right about the snippet - I misread |
If you can, do add some benchmarks to |
The performance issue is a bit of a red-herring in my opinion. We all want things to be performant; I'm not sure tying users hands with magic is the sensible approach though. As @aldanor points out you actually lose performance in tight loops because of these checks. +1 for explicit matmat, vecmat and matvec functions. At the very least it would make code more transparent. |
It does seem there is some consensus on a path forward:
|
Hi folks, I'm back with some benchmark results - I'm yet to read through them myself though (there's also strict cnr mode tests which are not directly related to this thread but may be interesting nevertheless). https://gist.github.com/aldanor/5bb9f1ff3577a4c6d35c267db75e47bd |
Here's the actual use case: Intel has recently released a "Strict CNR mode" for MKL (in 2019.3 release), see: https://software.intel.com/en-us/mkl-linux-developer-guide-reproducibility-conditions
Basically, for a small subset of BLAS functions, it guarantees strict bit-wise reproducibility regardless of the number of threads used etc:
For whatever reason,
*gemv
functions are not listed, but*gemm
are - so all your matrix-by-matrix dot products can now be numerically stable, but not matrix-vector...Is there are way to force numpy to use e.g. dgemm/sgemm and not dgemv/sgemv, when multiplying (1 x n) by (n x m)?
I've stumbled upon this piece of code which, IIUC, tries to be "smart" and treats single-row/single-column matrices as vectors when dispatching to blas routines, so you can't force it to use 'mm' versions of blas functions instead of 'mv' even if you want to:
numpy/numpy/core/src/common/cblasfuncs.c
Lines 156 to 178 in 5000356
The text was updated successfully, but these errors were encountered: