Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't compile with CUDA and MEMORY\_INSTALLED\_PER\_CORE\_IN\_GB=6.d0 #186

Closed
luet opened this issue Jul 22, 2014 · 8 comments
Closed

Can't compile with CUDA and MEMORY\_INSTALLED\_PER\_CORE\_IN\_GB=6.d0 #186

luet opened this issue Jul 22, 2014 · 8 comments
Labels

Comments

@luet
Copy link
Contributor

luet commented Jul 22, 2014

Problem description

I get a compile error when compiling the CUDA version of specfem3d_globe and using MEMORY_INSTALLED_PER_CORE_IN_GB > 4.

When setting MEMORY_INSTALLED_PER_CORE_IN_GB=6.d0 The error I get is:

 size of static arrays per slice =    4504.6434200000003       MB
                                 =    4295.9627342224121       MiB
                                 =    4.5046434199999998       GB
                                 =    4.1952761076390743       GiB

    (should be below 80% or 90% of the memory installed per core)
    (if significantly more, the job will not run by lack of memory)
    (note that if significantly less, you waste a significant amount
     of memory per processor core)
    (but that can be perfectly acceptable if you can afford it and
     want faster results by using more cores)

 size of static arrays for all slices =    108.11144208000000       GB
                                      =    100.68662658333778       GiB
                                      =   0.10811144207999999       TB
                                      =   9.83267837727908045E-002  TiB

 *******************************************************************************
 Estimating optimal disk dumping interval for UNDO_ATTENUATION:
 *******************************************************************************

STOP you are using more memory than what you told us is installed!!! there is an error

make: *** [OUTPUT_FILES/values_from_mesher.h] Error 1
make: *** Waiting for unfinished jobs....

I need more than 4 GB per core because NEX_XI=256. If I use NEX_XI=128, I can compile and run without problems.

I have posted two Par_file's on an external web site (http://geoweb3.princeton.edu/~luet/) since I don't think we can upload files on GitHub. Those files are:

  1. Par_file_GPU_movie (NEX_XI=128)
  2. Par_file_GPU_syn (NEX_XI=256)

If I set MEMORY_INSTALLED_PER_CORE_IN_GB_=10.d0, the compilation goes further but fails at link time: see link_error.txt.

Configure and compile

I configure with:

   configure FC=gfortran CC=gcc MPIFC=mpif90 MPICC=mpicc --with-cuda=cuda5

The problem occurs both with GNU and Intel compilers.

I use cuda version 5.5.22.

@luet luet added the bug label Jul 22, 2014
@komatits
Copy link
Contributor

Hi David,

MEMORY_INSTALLED_PER_CORE_IN_GB is unrelated to CUDA, it is used by
UNDO_ATTENUATION. Just set UNDO_ATTENUATION = .false. in DATA/Par_file
(and update to the latest version of "devel", in which UNDO_ATTENUATION
was improved last week).

Thanks,
Dimitri.

On 22/07/2014 17:52, David Luet wrote:

Problem description

I get a compile error when compiling the CUDA version of specfem3d_globe
and using MEMORY_INSTALLED_PER_CORE_IN_GB > 4.

When setting MEMORY_INSTALLED_PER_CORE_IN_GB=6.d0 The error I get is:

| size of static arrays per slice = 4504.6434200000003 MB
= 4295.9627342224121 MiB
= 4.5046434199999998 GB
= 4.1952761076390743 GiB

 (should be below 80% or 90% of the memory installed per core)
 (if significantly more, the job will not run by lack of memory)
 (note that if significantly less, you waste a significant amount
  of memory per processor core)
 (but that can be perfectly acceptable if you can afford it and
  want faster results by using more cores)

size of static arrays for all slices = 108.11144208000000 GB
= 100.68662658333778 GiB
= 0.10811144207999999 TB
= 9.83267837727908045E-002 TiB


Estimating optimal disk dumping interval for UNDO_ATTENUATION:


STOP you are using more memory than what you told us is installed!!! there is an error

make: *** [OUTPUT_FILES/values_from_mesher.h] Error 1
make: *** Waiting for unfinished jobs....
|

I need more than 4 GB per core because NEX_XI=256. If I use NEX_XI=128,
I can compile and run without problems.

I have posted two Par_file's on an external web site
(http://geoweb3.princeton.edu/~luet/
http://geoweb3.princeton.edu/%7Eluet/) since I don't think we can
upload files on GitHub. Those files are:

  1. Par_file_GPU_movie
    http://geoweb3.princeton.edu/%7Eluet/Par_file_GPU_movie (NEX_XI=128)
  2. Par_file_GPU_syn
    http://geoweb3.princeton.edu/%7Eluet/Par_file_GPU_syn (NEX_XI=256)

If I set MEMORY_INSTALLED_PER_CORE_IN_GB_=10.d0, the compilation goes
further but fails at link time: see link_error.txt
http://geoweb3.princeton.edu/%7Eluet/link_error.txt.

Configure and compile

I configure with:

| configure FC=gfortran CC=gcc MPIFC=mpif90 MPICC=mpicc --with-cuda=cuda5
|

The problem occurs both with GNU and Intel compilers.

I use cuda version 5.5.22.


Reply to this email directly or view it on GitHub
#186.

Dimitri Komatitsch
CNRS Research Director (DR CNRS), Laboratory of Mechanics and Acoustics,
UPR 7051, Marseille, France http://komatitsch.free.fr

@QuLogic
Copy link

QuLogic commented Jul 22, 2014

For the link error, since you have large static arrays, you will need to add -mcmodel=medium to both FCFLAGS and CFLAGS. Also -shared-intel if using Intel compilers. Alternatively, use more processors or a lower resolution.

@luet
Copy link
Contributor Author

luet commented Jul 22, 2014

@komatits you are right. I was not using the latest devel version. I went passed this error.

@luet
Copy link
Contributor Author

luet commented Jul 22, 2014

@QuLogic this is the fix indeed. But I think there might be a problem with the configure script.
If I do

./configure FC=ifort CC=icc MPIFC=mpif90 MPICC=mpicc CFLAGS="-mcmodel=large -shared-intel"  FCFLAGS="-mcmodel=large -shared-intel"  --with-opencl

the make operation fails. The problem is that ifort doesn't use the -mcmodel=large -shared-intel option.

But if I try to trick it by doing:

./configure FC="ifort -mcmodel=large -shared-intel"  CC=icc MPIFC=mpif90 MPICC=mpicc CFLAGS="-mcmodel=large -shared-intel"  --with-opencl

it works.
Am I doing something wrong?
Thanks,
David

@QuLogic
Copy link

QuLogic commented Jul 22, 2014

You are correct, for some reason FCFLAGS is commented out. I do not know why; maybe @komatits knows?

PS, you should add -O2 (or similar) for the CFLAGS. They are not automatically specified; maybe we should change that.

@luet
Copy link
Contributor Author

luet commented Jul 22, 2014

Good point about -O2. I would agree that this could be set by default.

@komatits
Copy link
Contributor

Not a bug apparently (?).

@komatits
Copy link
Contributor

Fixed by David @luet by uncommenting FCFLAGS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants