Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COSMA miniapp on Summit #55

Closed
ajaypanyala opened this issue May 27, 2020 · 1 comment
Closed

COSMA miniapp on Summit #55

ajaypanyala opened this issue May 27, 2020 · 1 comment

Comments

@ajaypanyala
Copy link

Hello! I am trying out COSMA on Summit. Built COSMA using -DCOSMA_BLAS=CUDA -DCOSMA_WITH_PROFILING=ON using GCC 8.1, CUDA 10.1 and IBM Spectrum MPI 10.3

Testing the miniapp as follows using 3 nodes, 6 mpi ranks, 6 GPUs per node:

cosma/build/miniapp/cosma_miniapp -m 1000 -n 1000 -k 1000 -P 18

The result I get is

Strategy = Matrix dimensions (m, n, k) = (1000, 1000, 1000)
Number of processors: 1
Overlap of communication and computation: OFF.
Divisions strategy: 
Required memory per rank (in #elements): 166668
Available memory per rank (in #elements): 9223372036854775807

_p_ REGION                     CALLS      THREAD        WALL       %
_p_ total                          -       0.005       0.005   100.0
_p_   multiply                     -       0.004       0.004    86.8
_p_     computation                1       0.004       0.004    86.8
_p_     other                      2       0.000       0.000     0.0
_p_   preprocessing                -       0.001       0.001    13.2
_p_     communicators              1       0.001       0.001    12.8
_p_     matrices                   -       0.000       0.000     0.4
_p_       mapper                   -       0.000       0.000     0.3
_p_         coordinates            3       0.000       0.000     0.2
_p_         sizes                  3       0.000       0.000     0.1
_p_       layout                   3       0.000       0.000     0.0
_p_       buffer                   3       0.000       0.000     0.0
_p_     allocation                 2       0.000       0.000     0.0
COSMA TIMES [ms] = 5 

Am not sure why the Number of processors is reported as 1 instead of 18.

@ajaypanyala
Copy link
Author

Sorry. I did not realize that the processors and strategy are dynamically adjusted based on problem size. Closing this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant