Performance Comparison to Intel MKL (MATLAB) #124

RoyiAvital · 2020-05-05T19:49:27Z

Seeing the benchmarks of BLASFEO and how it beats Intel MKL on small matrices made want to create a MATLAB MEX wrapper for it to speed up small matrices calculations.

The logic was, since BLASFEO beats Intel MKL on tests with no overheads with MEX I'd beat MATLAB by a lot since MATLAB only adds overhead on top of MKL and doesn't use MKL_DIRECT_CALL.
All reasons to be optimistic.

I implemented a MEX wrapper around blasfeo_dgemm() and validated it against MATLAB (The error is almost nothing).

Then I did a run time analysis:

Now, the BLASFEO MEX working in place (Namely it receives a pre allocated matrix to write the result onto) while MATLAB has to use its regular API (Allocates the output, overhead on the input).
Yet still it much faster than BLASFEO compiled with AVX2 code path.

MATLAB does use Multi Threading (I don't know the threshold, but it does as I can see on the CPU Utilization graph). But even for very small matrices (Size 2:10) MATLAB beats BLASFEO.

I think that in order to validate results we need to use the Multi Threaded version of MKL in benchmarks.

This is the analysis MATLAB File - RunTimeAnalysis.zip.

The text was updated successfully, but these errors were encountered:

RoyiAvital · 2020-05-07T15:13:22Z

@giaf , any thought on this?
Could you run your benchmarks on Linux with Multi Threaded version of MKL?

giaf · 2020-05-13T07:44:39Z

Hi @RoyiAvital
it's difficult for me to judge without playing around myself, there are a lot of factors affecting performance, and also MEX introduces some overhead, it's not for free.
Could you also upload the source code for the MEX wrapper?

On top of that, MKL is not a sitting dog, they have a large team and, according to the MKL release notes, they have been improving their small scale performance too.
While here man power is very limited and divided among multiple projects :p
I think benchmarks on blasfeo.syscop.de have not been updates since the BLASFEO BLAS API paper was initially submitted, so I'll try to find some time to repeat them.

About comparison with multi-threaded MKL, it's not an apple-to-apple comparison, as MKL performance would scale depending on the number of available cores (and then the CPU model), while BLASFEO's one would not.
I eventually want to add multi-thread capabilities to BLASFEO too, at that point such comparison would make sense in my opinion.

But as of now, BLASFEO is designed with the aim of providing fast single-threaded routines for matrices fitting in cache.
There are multiple applications where this is needed, as e.g. in the PLASMA project.
And especially (for applications in our group) to provide the linear algebra framework for the implementation of embedded optimization algorithms for optimal control software such as HPIPM and acados.
This is the main aim in development.

RoyiAvital · 2020-05-13T08:47:09Z

@giaf ,
I know the idea is to have Single Threaded performance.
Yet I think comparing to Multi Threaded MKL will give a break point where one should use BLASFEO and MKL.
My thought it worth you add those wasn't to show BLASFEO in negative light, on the contrary, it will probably show how close you get even with a single thread.

By the way, MEX indeed has an overhead. But remember MATLAB also calls MKL using its general purpose DLL which has the same overhead to the least (Probably more). Also MATLAB uses the CNR channel of MKL which has lower performance (Yet reproducible results).

This is the MEX file (Change to c postfix):

BLASFEODGEMM.txt

Pay attention it is not fully working and there is no validation of the input in order to compare pure performance (It works for square matrices).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Comparison to Intel MKL (MATLAB) #124

Performance Comparison to Intel MKL (MATLAB) #124

RoyiAvital commented May 5, 2020

RoyiAvital commented May 7, 2020 •

edited

Loading

giaf commented May 13, 2020 •

edited

Loading

RoyiAvital commented May 13, 2020 •

edited

Loading

Performance Comparison to Intel MKL (MATLAB) #124

Performance Comparison to Intel MKL (MATLAB) #124

Comments

RoyiAvital commented May 5, 2020

RoyiAvital commented May 7, 2020 • edited Loading

giaf commented May 13, 2020 • edited Loading

RoyiAvital commented May 13, 2020 • edited Loading

RoyiAvital commented May 7, 2020 •

edited

Loading

giaf commented May 13, 2020 •

edited

Loading

RoyiAvital commented May 13, 2020 •

edited

Loading