Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New View : Performance Test #136

Closed
crtrott opened this issue Nov 23, 2015 · 2 comments
Closed

New View : Performance Test #136

crtrott opened this issue Nov 23, 2015 · 2 comments

Comments

@crtrott
Copy link
Member

crtrott commented Nov 23, 2015

Need to test:

  • MiniApps: MiniMD, MiniFE, Lulesh, MiniAero
  • LAMMPS: couple test cases
  • Trilinos: FENL, Tpetra CG-PerformanceTest
  • NALU: crtrott version
@crtrott crtrott added this to the New Views MUST WORK NOW milestone Nov 23, 2015
@crtrott
Copy link
Member Author

crtrott commented Dec 10, 2015

LAMMPS Data:

Run with OMP_NUM_THREADS=24
For GCC OMP_PROC_BIND=TRUE
For Intel KMP_AFFINITY=compact

mpirun -np 2 -bind-to socket -map-by socket ../src/lmp_VARIANT -i in.lj -k on t 24 -sf kk -pk kokkos neigh full comm device -var x 3 -var y 3

Lennard Jones (lammps/bench/in.lj); 1000 steps ; for Cuda set binsize 2.8
Times are given in seconds OLD_VALUE/NEW_VALUE
Compiler 864k/force 864k/neigh 32k/force 32k/neigh 32k/other
GCC/4.8.4 34/34 6.9/6.3 1.18/1.17 0.26/0.24
GCC/4.9.2 33.7/32.6 6.4/6.2 1.18/1.16 0.24/0.24
Intel/15.0.2 32.7/32.7 6.7/6.7 1.13/1.14 0.24/0.24
Cuda/7.5.18 10.5/10.5 2.6/2.6 0.48/0.47 0.16/0.16

@crtrott
Copy link
Member Author

crtrott commented Dec 15, 2015

NALU data:

Trilinos: 872a11a5c30f31c41ea1da86ad035239b1788ce8
Nalu-crtrott:9d30a9f9a448919c9c1a4cad393bf5da64aac056

run command:mpirun --bind-to socket -map-by socket -n $1 ${NALU_PATH}/naluX -i nalu_conduction.i

Compiler Assembly Solve
GCC/4.7.2 5.2/5.5* 2.61/2.57
GCC/4.9.2 5.1/4.5 2.64/2.60
GCC/5.1.0 4.9/5.9* 2.62/2.54
Intel/15.0.2 4.8/4.6 2.65/2.61

*some of the assembly data (i.e. exp-view with GCC 5.1.0 and GCC 4.7.2 showed significant spread, while all the other data was pretty consistent. Closer investigation shows that it is due to a single kernel: AssembleNodeSolver while the other kernels are fine, and generally a bit faster with the new view implementation than the old.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants