-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Comparison to Intel MKL (MATLAB) #124
Comments
@giaf , any thought on this? |
Hi @RoyiAvital On top of that, MKL is not a sitting dog, they have a large team and, according to the MKL release notes, they have been improving their small scale performance too. About comparison with multi-threaded MKL, it's not an apple-to-apple comparison, as MKL performance would scale depending on the number of available cores (and then the CPU model), while BLASFEO's one would not. But as of now, BLASFEO is designed with the aim of providing fast single-threaded routines for matrices fitting in cache. |
@giaf , By the way, This is the Pay attention it is not fully working and there is no validation of the input in order to compare pure performance (It works for square matrices). |
Seeing the benchmarks of BLASFEO and how it beats Intel MKL on small matrices made want to create a MATLAB MEX wrapper for it to speed up small matrices calculations.
The logic was, since BLASFEO beats Intel MKL on tests with no overheads with MEX I'd beat MATLAB by a lot since MATLAB only adds overhead on top of MKL and doesn't use
MKL_DIRECT_CALL
.All reasons to be optimistic.
I implemented a MEX wrapper around
blasfeo_dgemm()
and validated it against MATLAB (The error is almost nothing).Then I did a run time analysis:
Now, the BLASFEO MEX working in place (Namely it receives a pre allocated matrix to write the result onto) while MATLAB has to use its regular API (Allocates the output, overhead on the input).
Yet still it much faster than BLASFEO compiled with
AVX2
code path.MATLAB does use Multi Threading (I don't know the threshold, but it does as I can see on the CPU Utilization graph). But even for very small matrices (Size
2:10
) MATLAB beats BLASFEO.I think that in order to validate results we need to use the Multi Threaded version of MKL in benchmarks.
This is the analysis MATLAB File - RunTimeAnalysis.zip.
The text was updated successfully, but these errors were encountered: