Skip to content
This repository has been archived by the owner on Aug 5, 2022. It is now read-only.

pyMIC dgemm performance? #6

Closed
hhuuggoo opened this issue Aug 12, 2015 · 3 comments
Closed

pyMIC dgemm performance? #6

hhuuggoo opened this issue Aug 12, 2015 · 3 comments

Comments

@hhuuggoo
Copy link

I'm getting around 100 GFLOPS using the dgemm.py example for 4096x4096 matrices vs ~300 GFLOPS reported in http://www.dlr.de/sc/Portaldata/15/Resources/dokumente/pyhpc2014/submissions/pyhpc2014_submission_8.pdf.

any ideas on why?

Thanks

@mjklemm
Copy link
Contributor

mjklemm commented Aug 13, 2015

Hi,

May I ask what setting you used and what coprocessor is in the system?

Can you please try with the following settings:

PYMIC_KMP_AFFINITY=granularity=fine,balanced,verbose ./dgemm.py

That enables core pinning of the OpenMP threads of MKL’s dgemm on the device. If it worked, you will see some additional output of the OpenMP runtime, indicating which OpenMP thread of the dgemm runs on what core.

Let me know if this helped to push the performance.

Cheers,
-michael

Dr.-Ing. Michael Klemm
Senior Application Engineer
Software and Services Group
Developer Relations Division
Phone +49 89 9914 2340
Cell +49 174 2417583

From: hhuuggoo [mailto:notifications@github.com]
Sent: Wednesday, August 12, 2015 10:03 PM
To: 01org/pyMIC
Subject: [pyMIC] pyMIC dgemm performance? (#6)

I'm getting around 100 GFLOPS using the dgemm.py example for 4096x4096 matrices vs ~300 GFLOPS reported in http://www.dlr.de/sc/Portaldata/15/Resources/dokumente/pyhpc2014/submissions/pyhpc2014_submission_8.pdf.

any ideas on why?

Thanks


Reply to this email directly or view it on GitHubhttps://github.com//issues/6.
Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Prof. Dr. Hermann Eul
Chairperson of the Supervisory Board: Tiffany Doon Silva
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928

@hhuuggoo
Copy link
Author

I'm using a Phi 31S1P. I haven't configured any settings other than the env vars you prescribed - do you know what settings were used in that paper, and which env vars to set?

thanks

@luisfeli
Copy link

Hi,

An alternative method to measure performance of Xeon Phi cards is to use the micperf packages which is included in MPSS. As described in this thread https://software.intel.com/en-us/forums/topic/498633 to install micperf run from the directory where you untarred the MPSS tar ball:

$ sudo yum install perf/micperf*.rpm
$ export PYTHONPATH=/usr/src/micperf/micp:${PYTHONPATH}
$ export PATH=/usr/src/micperf/micp/micp/scripts:${PATH}

Then make sure compilervars.sh (Composer 2013 you can get the distributable version from https://software.intel.com/en-us/articles/redistributable-libraries-for-the-intel-c-and-fortran-composer-xe-2013-sp1-for-linux) was sourced and to run DGEMM:

$ micprun -k dgemm -c optimal # -c optimal tells micperf to use the optimal parameters for DGEMM

if you only interested in the environment variables micperf sets before executing DGEMM, you can look at the mipcerf dgemm.py source code (/usr/src/micperf/micp-/micp/kernels/dgemm.py) :

            'MIC_BUFFERSIZE':'256M',
            'MKL_MIC_ENABLE':'1',
            'MKL_MIC_DISABLE_HOST_FALLBACK':'1',
            'LD_LIBRARY_PATH': <PATH >
            'MIC_LD_LIBRARY_PATH':self.mic_ld_library_path(),  
            'MIC_ENV_PREFIX':'MIC',
            'MIC_OMP_NUM_THREADS':str(numThreads),
            'KMP_AFFINITY':'compact,1,0',
            'MIC_KMP_AFFINITY':'explicit,granularity=fine,proclist=[1-' + str(numThreads) + ':1]',
            'MIC_USE_2MB_BUFFERS':'16K',
            'MKL_MIC_MAX_MEMORY':maxMemory + 'G'

Hope this helps,

Luis
Thanks

@mjklemm mjklemm closed this as completed Jun 1, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants