Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load rocm for setonix CPUs #571

Closed
wants to merge 4 commits into from

Conversation

psharda
Copy link
Contributor

@psharda psharda commented Mar 18, 2024

Description

We need to load rocm while compiling on setonix CPUs.

Related issues

Fixes #563

Checklist

Before this pull request can be reviewed, all of these tasks should be completed. Denote completed tasks with an x inside the square brackets [ ] in the Markdown source below:

  • I have added a description (see above).
  • I have added a link to any related issues see (see above).
  • I have read the Contributing Guide.
  • I have added tests for any new physics that this PR adds to the code.
  • I have tested this PR on my local computer and all tests pass.
  • I have manually triggered the GPU tests with the magic comment /azp run.
  • I have requested a reviewer for this PR.

@psharda
Copy link
Contributor Author

psharda commented Mar 18, 2024

/azp run

Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@BenWibking
Copy link
Collaborator

For production runs on CPUs, it might be better to avoid the Cray compiler altogether and use gcc instead.


# GPU-aware MPI
export MPICH_GPU_SUPPORT_ENABLED=0

# compiler environment hints
export CC=$(which cc)
export CXX=$(which CC)
export CXX="$(which CC) -fno-cray"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenWibking will this fix #566 for CPUs? Will it work even if the compiler is not cray?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably a better idea to detect the cray compiler based on the value of CMAKE_CXX_COMPILER_ID, e.g.

if(CMAKE_CUDA_COMPILER_ID STREQUAL "Clang")

except the appropriate compiler ID is "CrayClang" (https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_COMPILER_ID.html).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, put it in the wrong place

BenWibking
BenWibking previously approved these changes Mar 18, 2024
@psharda
Copy link
Contributor Author

psharda commented Mar 18, 2024

Ugh, using fno-cray seems to give a linker error while compiling on Setonix CPUs.

@psharda psharda requested a review from BenWibking March 18, 2024 20:46
@psharda psharda dismissed BenWibking’s stale review March 18, 2024 20:46

fails to build on setonix

@BenWibking
Copy link
Collaborator

@chongchonghe got it to work on Setonix -- maybe @chongchonghe can help?

@chongchonghe
Copy link
Contributor

On top of the setonix-gpu.profile, I have to add -ffp=0 to both CFLAGS= and CXXFLAGS= and the code compiles and produces correct results for some pure hydro simulations. The radiation runs failed.

@BenWibking
Copy link
Collaborator

On top of the setonix-gpu.profile, I have to add -ffp=0 to both CFLAGS= and CXXFLAGS= and the code compiles and produces correct results for some pure hydro simulations. The radiation runs failed.

But it works if you add -fno-cray, right?

@chongchonghe
Copy link
Contributor

On top of the setonix-gpu.profile, I have to add -ffp=0 to both CFLAGS= and CXXFLAGS= and the code compiles and produces correct results for some pure hydro simulations. The radiation runs failed.

But it works if you add -fno-cray, right?

If I add -fno-cray, the job failed with segfault when running with MPI.

@psharda
Copy link
Contributor Author

psharda commented Mar 19, 2024

So then fno-cray is not going to work for all the test problems? @BenWibking

@BenWibking
Copy link
Collaborator

So then fno-cray is not going to work for all the test problems? @BenWibking

It should be possible to avoid the Cray compiler entirely on Setonix. Then no special options should be necessary.

This should be possible by loading PrgEnv-gnu and then setting the compiler to hipcc:

module load PrgEnv-gnu
export CC=$(which hipcc)
export CXX=$(which hipcc)

@psharda
Copy link
Contributor Author

psharda commented Mar 19, 2024

So then fno-cray is not going to work for all the test problems? @BenWibking

It should be possible to avoid the Cray compiler entirely on Setonix. Then no special options should be necessary.

This should be possible by loading PrgEnv-gnu and then setting the compiler to hipcc:

module load PrgEnv-gnu
export CC=$(which hipcc)
export CXX=$(which hipcc)

This by itself does not seem to be sufficient. We still need HDF5 and ROCm for example. When I load HDF5 and ROCm modules and make a test problem, I get these warnings:

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = (unset),
	LC_CTYPE = "UTF-8",
	LANG = (unset)
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

@psharda
Copy link
Contributor Author

psharda commented Mar 19, 2024

I still have some cray modules loaded. Is this an issue:

psharda@setonix-06:~> module list

Currently Loaded Modules:
  1) craype-x86-milan     4) perftools-base/23.03.0                  7) gcc/12.2.0        10) cray-mpich/8.1.25      13) pawsey         16) python/3.10.10
  2) libfabric/1.15.2.0   5) xpmem/2.5.2-2.4_3.47__gd0f7936.shasta   8) craype/2.7.20     11) cray-libsci/23.02.1.1  14) pawseytools
  3) craype-network-ofi   6) pawseyenv/2023.08                       9) cray-dsmml/0.2.2  12) PrgEnv-gnu/8.3.3       15) slurm/22.05.2

@BenWibking
Copy link
Collaborator

I think this is a better solution: #566.

@BenWibking
Copy link
Collaborator

Since there is no activity on this for the past 2 months, I am going to close this issue for now. If it turns out to be needed, we can re-open it.

@BenWibking BenWibking closed this May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

setonix-cpu profile needs rocm
3 participants