Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QUDA with MILC #313

Closed
Drink7 opened this issue Jul 10, 2015 · 22 comments
Closed

QUDA with MILC #313

Drink7 opened this issue Jul 10, 2015 · 22 comments

Comments

@Drink7
Copy link

Drink7 commented Jul 10, 2015

I have some problem about the quda with MILC.
I'm using the application in the MILC called ks_imp_rhmc,and the Makefile setting is below.
//compiler
CC = mpicc
//Linker flag
LD=mpicxx
//QUDA option
WANTQUDA = true
WANT_GF_GPU = true

Other QUDA options are commented.
But I still cannot run on the GPU,an executable is generated but it seems that it doesn't run on the GPU.
I don't know some warning message mean,I see this two warning message in the log.

WARNING: Failed to determine NUMA affinity for device 0 (possibly not applicable)
WARNING: Cache file not found. All kernels will be re-tuned (if tuning is enabled).

Can someone help me ? Thanks.
(My quda version is 0.7.1)

@AlexVaq
Copy link
Member

AlexVaq commented Jul 10, 2015

Well, these two warning messages look QUDA generated. What makes you think you're not running in the GPU? Did you try to log in the node you were running and call nvidia-smi to see the GPU workload?

Aldo take into accout that, if there is not cache file, the first time you run it might take a while...

El 10/7/2015, a las 7:03, Drink7 notifications@github.com escribió:

I have some problem about the quda with MILC.
I'm using the application called ks_imp_rhmc,and the Makefile setting is below.
//compiler
CC = mpicc
//Linker flag
LD=mpicxx
//QUDA option
WANTQUDA = true
WANT_GF_GPU = true

Other QUDA options are commented.
But I still cannot run on the GPU,an executable is generated but it seems that it doesn't run on the GPU.
I don't know some warning message mean,I see this two warning message in the log.

WARNING: Failed to determine NUMA affinity for device 0 (possibly not applicable)
WARNING: Cache file not found. All kernels will be re-tuned (if tuning is enabled).

Can someone help me ? Thanks.
(My quda version is 0.7.1)


Reply to this email directly or view it on GitHub.

@Drink7
Copy link
Author

Drink7 commented Jul 10, 2015

The GPU card I use is Tesla K40,and I called nvidia-smi to see the workload and got this picture.
image

Because of the 0% GPU Util,I think the executable doesn't correctly run in the GPU.

@AlexVaq
Copy link
Member

AlexVaq commented Jul 10, 2015

Which part of QUDA are you trying to use exactly? Did you enable VERBOSE output?

@Drink7
Copy link
Author

Drink7 commented Jul 10, 2015

The application ks_imp_rhmc seems that trying to use HISQ fermion force and gauge tools,and I just use the configure file in the quda's directory named configure.milc.titan to build up QUDA.
In the configure file it doesn't enable VERBOSE,I'll add it to configure file and build it again.

@AlexVaq
Copy link
Member

AlexVaq commented Jul 10, 2015

It might give you some info, but I don’t guarantee anything. To tell you the truth, I’m not familiar at all with MILC code. However, I contributed to gauge tools... What are you using exactly of gauge tools?

@Drink7
Copy link
Author

Drink7 commented Jul 10, 2015

In the MILC code,the readme file in the application ks_imp_rhmc said that measurements include plaquette...,so I add the flag to the configure file.But I'm not sure whether it will work with the executable ks_imp_rhmc generate.

@mathiaswagner
Copy link
Member

Which version of MILC and QUDA do you use?

Can you share your MILC Makefile and QUDA make.inc ?

@mathiaswagner
Copy link
Member

The NUMA affinity message is from QUDA. It is a known issue that might affect performance on some systems.

@Drink7
Copy link
Author

Drink7 commented Jul 10, 2015

I use the latest version,version 7.7.11 of MILC and version 0.7.1 of QUDA.
Here is my MILC Makefile
http://codepad.org/2EaY2rJp
QUDA make.inc
http://codepad.org/9cG6fxmQ

@mathiaswagner
Copy link
Member

Thanks. I will try to have a look later.

Did you try to run on of the test input files in the ks_imp_rhmc/test directory? Which binary exactly did you use in the ks_imp_rhmc directory? su3_rhmc_hisq ?

@mathiaswagner
Copy link
Member

From my first look in your Makefile:

You only offload the gauge force to the GPU. Everything else is kept on the CPU. So that should explain why your GPU is idle most of the time. I assume the code is running on the GPU, it is just only the gauge force.

Things you can check to verify it is running on the GPU:

  • at the end QUDA should print some timing information:
computeGaugeForceQuda Total time = 6.55167 secs
download     = 3.397669 secs (  51.9%), with       12 calls at 2.831391e+05 us per call
upload     = 1.160581 secs (  17.7%), with        6 calls at 1.934302e+05 us per call
init     = 0.042459 secs ( 0.648%), with       12 calls at 3.538250e+03 us per call
compute     = 1.926511 secs (  29.4%), with        6 calls at 3.210852e+05 us per call
free     = 0.020827 secs ( 0.318%), with        6 calls at 3.471167e+03 us per call
constant     = 0.003388 secs (0.0517%), with       12 calls at 2.823333e+02 us per call
total accounted       = 6.551435 secs (   100%)
total missing         = 0.000236 secs (0.0036%)
  • for the calls of the gauge force you should see lines similar to
QUDA_MILC_INTERFACE: qudaGaugeForce (called)
QUDA_MILC_INTERFACE: qudaGaugeForce (return)
GFTIME:   time = 1.507170e+00 (Symanzik1_QUDA) mflops = 1.064487e+05

You might want to try to put the inversions also on the GPU by using
WANT_FN_CG_GPU = true in the MILC Makefile.

If you still have troubles feel free to share your output file. To reduce its length you can change line 216 in the MILC Makefile to

CGPU += -DSET_QUDA_VERBOSE # -DSET_QUDA_SUMMARIZE

@Drink7
Copy link
Author

Drink7 commented Jul 10, 2015

Yes,I've tried to run the test input file in the ks_imp_rhmc/test directory before,and I used the executable su3_rhmd_hisq with double precision.And then I called nvidia-smi and it showed the GPU information above.

So should I put the inversions for all the QUDA Options or just change this option?
WANT_FN_CG_GPU = true

@mathiaswagner
Copy link
Member

Well, your nvidia-smi output shows that the GPU is used. But with only the gauge force on the GPU the utilization is probably pretty low. That is what you see.
How long does the execution take and what does QUDA print for computeGaugeForceQuda at the end of the run?

If you want to put the inversion on the GPU the WANT_FN_CG_GPU = true is sufficient but you may also set everything to true. Just give it a try.

@detar
Copy link
Contributor

detar commented Jul 10, 2015

For the ks_imp_rhmc applications, you will need the full suite of HISQ
evolution modules.

Perhaps the following example Makefile for ks_imp_rhmc would help

http://www.physics.utah.edu/~detar/milc/Makefile-Drink7

This is for a somewhat later version of the MILC code than 7.7.11, but
the QUDA macros should still be OK.

On 7/10/2015 8:48 AM, Drink7 wrote:

Yes,I've tried to run the test input file in the ks_imp_rhmc/test
directory before,and I used the executable su3_rhmd_hisq with double
precision.And then I called nvidia-smi and it showed the GPU
information above.

So should I put the inversions for all the QUDA Options or just change
this option?
WANT_FN_CG_GPU = true


Reply to this email directly or view it on GitHub
#313 (comment).

Carleton DeTar
Department of Physics and Astronomy
University of Utah

@Drink7
Copy link
Author

Drink7 commented Jul 11, 2015

OK. I'll try to set those option to true and see how they change the performance about GPU.
And I'll try to relink QUDA with MILC later.
I forgot to put the execution result into a log file,so I'll run the test input again and check the information you talked about.

Thank you very much for your help!

@stevengottlieb
Copy link
Member

Are you part of the student cluster competition? If so, you should use the MILC tar all prepared for the competition.

Sent from my iPad

On Jul 11, 2015, at 3:35 AM, Drink7 <notifications@github.commailto:notifications@github.com> wrote:

OK. I'll try to set those option to true and see how they change the performance about GPU.
And I'll try to relink QUDA with MILC later.
I forgot to put the execution result into a log file,so I'll run the test input again and check the information you talked about.

Thank you very much for your help.


Reply to this email directly or view it on GitHubhttps://github.com//issues/313#issuecomment-120592822.

@Drink7
Copy link
Author

Drink7 commented Jul 11, 2015

Yes,I meet some problem when building QUDA with MILC and trying to ask for help.
You mean all the application in MILC 7.7.11(like ks_imp_dyn,pure_gauge and others) or just ks_imp_rhmc in the MILC?

@mathiaswagner
Copy link
Member

If this is part of the student cluster competition I would prefer to take the further support away from the QUDA bug tracker. I think QUDA performs as expected.

@stevengottlieb , @detar : Do you provide the support the student cluster competition?

@detar
Copy link
Contributor

detar commented Jul 12, 2015

Could you please introduce yourself?

Are you part of the student cluster competition?

On 7/11/2015 8:35 AM, Drink7 wrote:

Yes,I meet some problem when building QUDA with MILC and trying to ask
for help.
You mean all the application in MILC 7.7.11(like ks_imp_dyn,pure_gauge
and others) or just ks_imp_rhmc in the MILC?


Reply to this email directly or view it on GitHub
#313 (comment).

@stevengottlieb
Copy link
Member

Please use the google group set up for the student cluster competition,
not the github developers list. I agree with Mathias Wagner that this
discussion belongs elsewhere.

Read the instructions on the competition webpage for MILC that were
recently updated. There is a specific tarball for the competition that
has a restricted set of code. There are also more test cases.

I will no longer respond to github posts on this issue and will
encourage others to do the same.

On Sat, 2015-07-11 at 14:35 +0000, Drink7 wrote:

Yes,I meet some problem when building QUDA with MILC and trying to ask
for help.
You mean all the application in MILC 7.7.11(like ks_imp_dyn,pure_gauge
and others) or just ks_imp_rhmc in the MILC?


Reply to this email directly or view it on GitHub.

@Drink7
Copy link
Author

Drink7 commented Jul 13, 2015

I should use the google group to ask for help,not the github here.
I'm sorry and I'll close this issue later.

@Drink7 Drink7 closed this as completed Jul 13, 2015
@mathiaswagner
Copy link
Member

Thanks for moving that to the right place.

@stevengottlieb , @detar : If anything comes up during the cluster competition that is QUDA related please feed it back here. Also If you want some of QUDA developers to sometimes have a look into the issues popping up in the student competition let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants