pyMIC dgemm performance? #6

hhuuggoo · 2015-08-12T20:03:04Z

I'm getting around 100 GFLOPS using the dgemm.py example for 4096x4096 matrices vs ~300 GFLOPS reported in http://www.dlr.de/sc/Portaldata/15/Resources/dokumente/pyhpc2014/submissions/pyhpc2014_submission_8.pdf.

any ideas on why?

Thanks

mjklemm · 2015-08-13T06:00:27Z

Hi,

May I ask what setting you used and what coprocessor is in the system?

Can you please try with the following settings:

PYMIC_KMP_AFFINITY=granularity=fine,balanced,verbose ./dgemm.py

That enables core pinning of the OpenMP threads of MKL’s dgemm on the device. If it worked, you will see some additional output of the OpenMP runtime, indicating which OpenMP thread of the dgemm runs on what core.

Let me know if this helped to push the performance.

Cheers,
-michael

Dr.-Ing. Michael Klemm
Senior Application Engineer
Software and Services Group
Developer Relations Division
Phone +49 89 9914 2340
Cell +49 174 2417583

From: hhuuggoo [mailto:notifications@github.com]
Sent: Wednesday, August 12, 2015 10:03 PM
To: 01org/pyMIC
Subject: [pyMIC] pyMIC dgemm performance? (#6)

I'm getting around 100 GFLOPS using the dgemm.py example for 4096x4096 matrices vs ~300 GFLOPS reported in http://www.dlr.de/sc/Portaldata/15/Resources/dokumente/pyhpc2014/submissions/pyhpc2014_submission_8.pdf.

any ideas on why?

Thanks

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/6.
Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Prof. Dr. Hermann Eul
Chairperson of the Supervisory Board: Tiffany Doon Silva
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928

hhuuggoo · 2015-08-13T15:06:11Z

I'm using a Phi 31S1P. I haven't configured any settings other than the env vars you prescribed - do you know what settings were used in that paper, and which env vars to set?

thanks

luisfeli · 2015-08-13T17:12:05Z

Hi,

An alternative method to measure performance of Xeon Phi cards is to use the micperf packages which is included in MPSS. As described in this thread https://software.intel.com/en-us/forums/topic/498633 to install micperf run from the directory where you untarred the MPSS tar ball:

$ sudo yum install perf/micperf*.rpm
$ export PYTHONPATH=/usr/src/micperf/micp:${PYTHONPATH}
$ export PATH=/usr/src/micperf/micp/micp/scripts:${PATH}

Then make sure compilervars.sh (Composer 2013 you can get the distributable version from https://software.intel.com/en-us/articles/redistributable-libraries-for-the-intel-c-and-fortran-composer-xe-2013-sp1-for-linux) was sourced and to run DGEMM:

$ micprun -k dgemm -c optimal # -c optimal tells micperf to use the optimal parameters for DGEMM

if you only interested in the environment variables micperf sets before executing DGEMM, you can look at the mipcerf dgemm.py source code (/usr/src/micperf/micp-/micp/kernels/dgemm.py) :

            'MIC_BUFFERSIZE':'256M',
            'MKL_MIC_ENABLE':'1',
            'MKL_MIC_DISABLE_HOST_FALLBACK':'1',
            'LD_LIBRARY_PATH': <PATH >
            'MIC_LD_LIBRARY_PATH':self.mic_ld_library_path(),  
            'MIC_ENV_PREFIX':'MIC',
            'MIC_OMP_NUM_THREADS':str(numThreads),
            'KMP_AFFINITY':'compact,1,0',
            'MIC_KMP_AFFINITY':'explicit,granularity=fine,proclist=[1-' + str(numThreads) + ':1]',
            'MIC_USE_2MB_BUFFERS':'16K',
            'MKL_MIC_MAX_MEMORY':maxMemory + 'G'

Hope this helps,

Luis
Thanks

mjklemm closed this as completed Jun 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyMIC dgemm performance? #6

pyMIC dgemm performance? #6

hhuuggoo commented Aug 12, 2015

mjklemm commented Aug 13, 2015

hhuuggoo commented Aug 13, 2015

luisfeli commented Aug 13, 2015

pyMIC dgemm performance? #6

pyMIC dgemm performance? #6

Comments

hhuuggoo commented Aug 12, 2015

mjklemm commented Aug 13, 2015

hhuuggoo commented Aug 13, 2015

luisfeli commented Aug 13, 2015