You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
On leconte, SVD has bad backward error with 8 ranks / 8 GPUs, for both target host and device.
Except backwards error ~ 1e-15. Using 1, 2, 4 ranks worked.
Steps To Reproduce
mpirun -np 8 ./tester --origin h --target h --jobu s --jobvt s --dim 1234 --dim 1k,2k,4k,8k,16k --ref n --nb 128,192,256,320 svd
% SLATE version 2023.08.25, id 57ea922b
% 2023-09-01 11:41:53, 8 MPI ranks, CPU-only MPI, 7 OpenMP threads per MPI rank
type origin target A jobu jobvt m n nb ib p q la pt S - Sref Backward U orth. V orth. time (s) ref time (s) status
d host host 1 some some 1234 1234 128 32 2 4 1 3 NA 1.91e-03 1.80e-16 1.90e-16 1.659 NA FAILED
d host host 1 some some 1234 1234 192 32 2 4 1 3 NA 1.81e-03 1.85e-16 1.94e-16 1.974 NA FAILED
...
mpirun -np 8 ./bind_gpus.sh ./tester --origin d --target d --jobu s --jobvt s --dim 1234 --dim 1k,2k,4k,8k,16k --ref n --nb 128,192,256,320 svd
% SLATE version 2023.08.25, id 57ea922b
% 2023-09-01 07:30:57, 8 MPI ranks, CPU-only MPI, 7 OpenMP threads, 1 GPU devices per MPI rank
type origin target A jobu jobvt m n nb ib p q la pt S - Sref Backward U orth. V orth. time (s) ref time (s) status
d dev dev 1 some some 1234 1234 128 32 2 4 1 3 NA 1.87e-03 1.89e-16 1.98e-16 1.854 NA FAILED
d dev dev 1 some some 1234 1234 192 32 2 4 1 3 NA 1.82e-03 1.79e-16 1.94e-16 1.992 NA FAILED
...
Environment
The more information that you can provide about your environment, the simpler it is for us to understand and reproduce the issue.
SLATE version / commit ID (e.g., git log --oneline -n 1):
57ea922 (HEAD -> release, tag: v2023.08.25, github/master) Version 2023.08.25
How installed:
git clone
release tar file
Spack
module
How compiled:
makefile (include your make.inc)
sh leconte test> cat ../make.inc
CXXFLAGS = -Werror -Dslate_omp_default_none='default(none)'
CXX = mpicxx
CC = mpicc
FC = mpif90
blas = mkl
mkl_blacs = intelmpi
blas_threaded = 1
Description
On leconte, SVD has bad backward error with 8 ranks / 8 GPUs, for both target host and device.
Except backwards error ~ 1e-15. Using 1, 2, 4 ranks worked.
Steps To Reproduce
Environment
The more information that you can provide about your environment, the simpler it is for us to understand and reproduce the issue.
git log --oneline -n 1
):make.inc
)mpicxx --version
):nvcc --version
):mpicxx -v
gives info.):The text was updated successfully, but these errors were encountered: