SGEMM benchmark config for Vega FE? #122

eqy · 2017-10-16T19:26:21Z

Hi,

I'm experimenting with SGEMM performance tuning on Vega FE and get around 5 GFLOP/s max with the 5760 benchmark config. I was wondering if there was a pointer to a current best config for Vega/Vega FE that I could use as a starting point that was closer to peak performance?

Thanks,

Eddie

guacamoleo · 2017-10-16T20:09:42Z

If you mean you were using sgemm_5760.yaml, that produces HIP kernels and 5 TFlops does sound like the best that our compiler can do at this time. If you use sgemm_asm.yaml, then Tensile will produce assembly kernels and you should see over 90% efficiency. David E. Tanner

…

_________________________________________________________________________________ MTS Software Engineer | Radeon Technologies Group – Open Compute From: eqy [mailto:notifications@github.com] Sent: Monday, October 16, 2017 2:26 PM To: ROCmSoftwarePlatform/Tensile <Tensile@noreply.github.com> Cc: Subscribed <subscribed@noreply.github.com> Subject: [ROCmSoftwarePlatform/Tensile] SGEMM benchmark config for Vega FE? (#122) Hi, I'm experimenting with SGEMM performance tuning on Vega FE and get around 5 GFLOP/s max with the 5760 benchmark config. I was wondering if there was a pointer to a current best config for Vega/Vega FE that I could use as a starting point that was closer to peak performance? Thanks, Eddie — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#122>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ACiWnhO_IKbkErK8Q9PNm7G_MkwngTmaks5ss63fgaJpZM4P7HuS>.

eqy · 2017-10-16T20:32:24Z

It is possible to run an apples-to-apples comparison with the sgemm_5760 config (in terms of input size) with the sgemm_asm.yaml config?
I'm getting terminate called after throwing an instance of 'std::bad_alloc' so I'm not sure if that's due to an input size that is too large.

guacamoleo · 2017-10-16T21:16:03Z

sgemm_5760.yaml is an example of non-batched gemm while sgemm_asm.yaml is an example of batched gemm. In the sgemm_asm.yaml file, find all instances of “Batched: True” and change to False. This should make the two much more similar. David E. Tanner

…

_________________________________________________________________________________ MTS Software Engineer | Radeon Technologies Group – Open Compute From: eqy [mailto:notifications@github.com] Sent: Monday, October 16, 2017 3:32 PM To: ROCmSoftwarePlatform/Tensile <Tensile@noreply.github.com> Cc: Tanner, David <David.Tanner@amd.com>; Comment <comment@noreply.github.com> Subject: Re: [ROCmSoftwarePlatform/Tensile] SGEMM benchmark config for Vega FE? (#122) It is possible to run an apples-to-apples comparison with the sgemm_5760 config (in terms of input size) with the sgemm_asm.yaml config? I'm getting terminate called after throwing an instance of 'std::bad_alloc' so I'm not sure if that's due to an input size that is too large. — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#122 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ACiWnp5jVJc8nMf3rNm1ziQMuimk2wNIks5ss71dgaJpZM4P7HuS>.

eqy · 2017-10-16T21:16:42Z

Great, thanks!

eqy closed this as completed Oct 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SGEMM benchmark config for Vega FE? #122

SGEMM benchmark config for Vega FE? #122

eqy commented Oct 16, 2017

guacamoleo commented Oct 16, 2017 via email

eqy commented Oct 16, 2017

guacamoleo commented Oct 16, 2017 via email

eqy commented Oct 16, 2017

SGEMM benchmark config for Vega FE? #122

SGEMM benchmark config for Vega FE? #122

Comments

eqy commented Oct 16, 2017

guacamoleo commented Oct 16, 2017 via email

eqy commented Oct 16, 2017

guacamoleo commented Oct 16, 2017 via email

eqy commented Oct 16, 2017