Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: how to set up environment to run batch script #289

Open
jhux2 opened this issue May 25, 2022 · 14 comments
Open

Question: how to set up environment to run batch script #289

jhux2 opened this issue May 25, 2022 · 14 comments
Assignees

Comments

@jhux2
Copy link

jhux2 commented May 25, 2022

Once I've built naluX on Summit, it's unclear to me how to make sure a batch job has the correct environment to run that executable.

Normally, I'd load the same modules as were used for the build.

I did try quick-activate $SPACK_MANAGER/environments/jhubuild and then submitting a job. The job failed with an error indicating it could not find CUDA.

@jhux2
Copy link
Author

jhux2 commented May 25, 2022

@psakievich
Copy link
Collaborator

@psakievich psakievich self-assigned this May 25, 2022
@jrood-nrel
Copy link
Collaborator

jrood-nrel commented May 25, 2022

This is what I do on Summit for the exawind-driver for example:

export CUDA_LAUNCH_BLOCKING=1
export SPACK_MANAGER=${PROJWORK}/cfd116/jrood/spack-manager-summit
source ${SPACK_MANAGER}/start.sh && spack-start
spack env activate -d ${SPACK_MANAGER}/environments/exawind-summit
spack load exawind
which exawind

@psakievich
Copy link
Collaborator

We should be getting CUDA_LAUNCH_BLOCKING in the environment when we do spack load exawind. Is that not the case @jrood-nrel ?

@jhux2
Copy link
Author

jhux2 commented May 25, 2022

So if I understand, I should do

quick-activate $SPACK_MANAGER/environments/jhubuild

where jhubuild is the "environment" that I built naluX under.

But

spack load naluX

returns

==> Error: Spec 'naluX' matches no installed packages.

I feel that I'm missing something fundamental here.

[EDIT]

Btw, spack load exawind works, but the SHAs of exawind and naluX are different.

@jrood-nrel
Copy link
Collaborator

@jrood-nrel
Copy link
Collaborator

spack load nalu-wind @jhux2

@jhux2
Copy link
Author

jhux2 commented May 25, 2022

@jrood-nrel Thanks. I've launched a couple test jobs to see what effect spack load nalu-wind has.

@jhux2
Copy link
Author

jhux2 commented May 25, 2022

My jobs failed with the same error as before:

762 FATAL ERROR: dlopen libcudart.so: libcudart.so: cannot open shared object file: No such file or directory
763 FATAL ERROR: dlopen libcudart.so: libcudart.so: cannot open shared object file: No such file or directory
764 [h26n01:464770] Error: common_pami.c:1056 - ompi_common_pami_init() Unable to create PAMI client (rc=1)
765 [h26n01:464771] Error: common_pami.c:1056 - ompi_common_pami_init() Unable to create PAMI client (rc=1)

After issuing spack load nalu-wind, should there be any change in what modules are loaded? Or is that handled by spack setting all the right paths, etc.?

@psakievich
Copy link
Collaborator

@jhux2 spack should be handling all the right paths.
so to confirm your script looks something like this?

# source $SPACK_MANAGER/start.sh has already occured in bashrc
quick-activate $SPACK_MANAGER/environments/jhubuild
spack load nalu-wind
srun [args] naluX -i [args] 

@jhux2
Copy link
Author

jhux2 commented May 25, 2022

@psakievich Here's what I have in my batch script:

  export SPACK_MANAGER=~/exawind/sources/spack-manager
  source $SPACK_MANAGER/start.sh
  quick-activate $SPACK_MANAGER/environments/jhubuild
  spack load nalu-wind

  jsrun ....

This is a script that I've used for a long time. (I did move the naluX executable to another location, but I assume that should be safe to do.)

Where in the spack-manager tree can I find configure/build logs for Trilinos? I'd like to look over those logs to see if anything jumps out.

@psakievich
Copy link
Collaborator

spack cd -b trilinos will take you there and the spack- files will show you logs for everything that happened

@psakievich
Copy link
Collaborator

@jhux2 where are you at on this? do you still need help?

@jhux2
Copy link
Author

jhux2 commented Jun 1, 2022

@psakievich Thanks for checking in. I haven't returned to this yet. The motivation was to see if building with spack-manager would help work around a Nalu-Wind runtime failure. It turns out there's a bug that affects both solver paths in the NGP code, so how nalu-wind gets built is moot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants