Empty commits - latest logs used in the paper by valassi · Pull Request #218 · madgraph5/madgraph4gpu

valassi · 2021-06-28T16:12:38Z

No description provided.

On itscrd70.cern.ch [CPU: Intel(R) Xeon(R) Silver 4216 CPU] [GPU: NVIDIA Tesla V100S-PCIE-32GB]: ========================================================================= Process = EPOCH1_EEMUMU_CUDA [nvcc 11.0.221] FP precision = DOUBLE (NaN/abnormal=0, zero=0) EvtsPerSec[MatrixElems] (3) = ( 5.021815e+08 ) sec^-1 EvtsPerSec[MECalcOnly] (3a) = ( 7.504267e+08 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 0.747806 sec 2,558,737,342 cycles # 2.649 GHz 3,500,196,731 instructions # 1.37 insn per cycle 1.050331328 seconds time elapsed ==PROF== Profiling "sigmaKin": launch__registers_per_thread 136 ==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100% ========================================================================= Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = DOUBLE (NaN/abnormal=0, zero=0) Internal loops fptype_sv = SCALAR ('none': ~vector[1], no SIMD) OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 1.199446e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 7.595453 sec 20,332,815,475 cycles # 2.675 GHz 52,130,992,146 instructions # 2.56 insn per cycle 7.606473393 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 600) (avx2: 0) (512y: 0) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = DOUBLE (NaN/abnormal=0, zero=0) Internal loops fptype_sv = VECTOR[4] ('512y': AVX512, 256bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 3.826392e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 3.941491 sec 9,885,547,963 cycles # 2.504 GHz 18,304,396,256 instructions # 1.85 insn per cycle 3.952184812 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 0) (avx2: 2525) (512y: 95) (512z: 0) =========================================================================

On CUDA, the faster ixx/oxx give a factor 2 speedup in MECalconly On C++, the faster ixx/oxx give a factor 1.3 speedup in MECalconly for 512y On itscrd70.cern.ch [CPU: Intel(R) Xeon(R) Silver 4216 CPU] [GPU: NVIDIA Tesla V100S-PCIE-32GB]: ========================================================================= Process = EPOCH1_EEMUMU_CUDA [nvcc 11.0.221] FP precision = DOUBLE (NaN/abnormal=0, zero=0) EvtsPerSec[MatrixElems] (3) = ( 7.230201e+08 ) sec^-1 EvtsPerSec[MECalcOnly] (3a) = ( 1.352934e+09 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 0.733924 sec 2,541,933,264 cycles # 2.642 GHz 3,482,878,075 instructions # 1.37 insn per cycle 1.027491825 seconds time elapsed ==PROF== Profiling "sigmaKin": launch__registers_per_thread 120 ==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100% ========================================================================= Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = DOUBLE (NaN/abnormal=0, zero=0) Internal loops fptype_sv = SCALAR ('none': ~vector[1], no SIMD) OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 1.306635e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 7.166390 sec 19,172,782,268 cycles # 2.673 GHz 48,581,048,150 instructions # 2.53 insn per cycle 7.177041383 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 614) (avx2: 0) (512y: 0) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = DOUBLE (NaN/abnormal=0, zero=0) Internal loops fptype_sv = VECTOR[4] ('512y': AVX512, 256bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 4.898666e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 3.583812 sec 9,072,290,427 cycles # 2.527 GHz 16,497,789,079 instructions # 1.82 insn per cycle 3.594563440 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 0) (avx2: 2572) (512y: 95) (512z: 0) =========================================================================

This is the status of the current upstream/master NB! The EvtsPerSec[MatrixElems] in CUDA are varying wildly. In double I see this go from 5.9E8 to 6.7E8 today, yesterday 7.2E8 with the same code. The EvtsPerSec[MatrixElems] however is much more stable around 1.37E9 in all those cases. I will therefore keep the numbers in the paper from previous measurements and vCHEP for EvtsPerSec[MatrixElems] (namely 7.25E8 /double and 1.59E9 /float) and add any recent EvtsPerSec[MECalcOnly] as 1.37E9 /double and 3.28E9 /float On itscrd70.cern.ch [CPU: Intel(R) Xeon(R) Silver 4216 CPU] [GPU: NVIDIA Tesla V100S-PCIE-32GB]: ========================================================================= Process = EPOCH1_EEMUMU_CUDA [nvcc 11.0.221] FP precision = DOUBLE (NaN/abnormal=0, zero=0) EvtsPerSec[MatrixElems] (3) = ( 6.743408e+08 ) sec^-1 EvtsPerSec[MECalcOnly] (3a) = ( 1.367534e+09 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 0.743706 sec 2,588,597,917 cycles # 2.648 GHz 3,526,137,396 instructions # 1.36 insn per cycle 1.048524808 seconds time elapsed ==PROF== Profiling "sigmaKin": launch__registers_per_thread 120 ==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100% ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CUDA [nvcc 11.0.221] FP precision = FLOAT (NaN/abnormal=2, zero=0) EvtsPerSec[MatrixElems] (3) = ( 1.516160e+09 ) sec^-1 EvtsPerSec[MECalcOnly] (3a) = ( 3.280236e+09 ) sec^-1 MeanMatrixElemValue = ( 1.371686e-02 +- 3.270219e-06 ) GeV^0 TOTAL : 0.690254 sec 2,368,009,568 cycles # 2.649 GHz 3,367,981,007 instructions # 1.42 insn per cycle 0.978942760 seconds time elapsed ==PROF== Profiling "sigmaKin": launch__registers_per_thread 48 ==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100% ========================================================================= Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = DOUBLE (NaN/abnormal=0, zero=0) Internal loops fptype_sv = SCALAR ('none': ~vector[1], no SIMD) OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 1.301297e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 7.192022 sec 19,241,362,073 cycles # 2.673 GHz 48,583,081,581 instructions # 2.52 insn per cycle 7.203110555 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 614) (avx2: 0) (512y: 0) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = DOUBLE (NaN/abnormal=0, zero=0) Internal loops fptype_sv = VECTOR[2] ('sse4': SSE4.2, 128bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 2.532683e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 4.831134 sec 12,924,690,078 cycles # 2.671 GHz 29,940,147,028 instructions # 2.32 insn per cycle 4.842102414 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 3274) (avx2: 0) (512y: 0) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = DOUBLE (NaN/abnormal=0, zero=0) Internal loops fptype_sv = VECTOR[4] ('avx2': AVX2, 256bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 4.593254e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 3.662869 sec 9,247,838,169 cycles # 2.520 GHz 16,560,392,033 instructions # 1.79 insn per cycle 3.673581024 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 0) (avx2: 2746) (512y: 0) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = DOUBLE (NaN/abnormal=0, zero=0) Internal loops fptype_sv = VECTOR[4] ('512y': AVX512, 256bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 4.892414e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 3.603951 sec 9,129,054,132 cycles # 2.528 GHz 16,497,282,072 instructions # 1.81 insn per cycle 3.614598246 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 0) (avx2: 2572) (512y: 95) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = DOUBLE (NaN/abnormal=0, zero=0) Internal loops fptype_sv = VECTOR[8] ('512z': AVX512, 512bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 3.738379e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 4.020666 sec 8,891,618,256 cycles # 2.208 GHz 13,361,398,930 instructions # 1.50 insn per cycle 4.035327755 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 0) (avx2: 1127) (512y: 205) (512z: 2045) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = FLOAT (NaN/abnormal=6, zero=0) Internal loops fptype_sv = SCALAR ('none': ~vector[1], no SIMD) OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 1.211023e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371707e-02 +- 3.270376e-06 ) GeV^0 TOTAL : 7.119093 sec 19,060,830,463 cycles # 2.675 GHz 47,728,069,981 instructions # 2.50 insn per cycle 7.129763876 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 578) (avx2: 0) (512y: 0) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = FLOAT (NaN/abnormal=6, zero=0) Internal loops fptype_sv = VECTOR[4] ('sse4': SSE4.2, 128bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 4.507245e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270375e-06 ) GeV^0 TOTAL : 3.351029 sec 8,971,452,327 cycles # 2.672 GHz 19,719,600,560 instructions # 2.20 insn per cycle 3.362076705 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 3719) (avx2: 0) (512y: 0) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = FLOAT (NaN/abnormal=5, zero=0) Internal loops fptype_sv = VECTOR[8] ('avx2': AVX2, 256bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 8.160704e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371705e-02 +- 3.270339e-06 ) GeV^0 TOTAL : 2.695110 sec 6,906,959,097 cycles # 2.556 GHz 12,504,366,929 instructions # 1.81 insn per cycle 2.706037106 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 0) (avx2: 3077) (512y: 0) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = FLOAT (NaN/abnormal=5, zero=0) Internal loops fptype_sv = VECTOR[8] ('512y': AVX512, 256bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 8.869343e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371705e-02 +- 3.270339e-06 ) GeV^0 TOTAL : 2.621328 sec 6,734,710,370 cycles # 2.562 GHz 12,522,685,250 instructions # 1.86 insn per cycle 2.631936677 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 0) (avx2: 2917) (512y: 81) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = FLOAT (NaN/abnormal=5, zero=0) Internal loops fptype_sv = VECTOR[16] ('512z': AVX512, 512bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 7.434054e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371705e-02 +- 3.270340e-06 ) GeV^0 TOTAL : 2.750946 sec 6,437,749,555 cycles # 2.334 GHz 10,930,642,414 instructions # 1.70 insn per cycle 2.761224316 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 0) (avx2: 1559) (512y: 179) (512z: 2157) ========================================================================= On lxplus770.cern.ch [CPU: Intel(R) Xeon(R) Silver 4216 CPU] [GPU: NVIDIA Tesla T4]: ========================================================================= Process = EPOCH1_EEMUMU_CUDA [nvcc 11.0.221] FP precision = DOUBLE (NaN/abnormal=0, zero=0) EvtsPerSec[MatrixElems] (3) = ( 3.891814e+07 ) sec^-1 EvtsPerSec[MECalcOnly] (3a) = ( 4.009506e+07 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 1.144038 sec 672,779,391 cycles:u # 0.552 GHz 1,362,529,910 instructions:u # 2.03 insn per cycle 1.285736848 seconds time elapsed ==PROF== Profiling "sigmaKin": launch__registers_per_thread 120 ==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100% ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CUDA [nvcc 11.0.221] FP precision = FLOAT (NaN/abnormal=2, zero=0) EvtsPerSec[MatrixElems] (3) = ( 6.377864e+08 ) sec^-1 EvtsPerSec[MECalcOnly] (3a) = ( 8.217691e+08 ) sec^-1 MeanMatrixElemValue = ( 1.371686e-02 +- 3.270219e-06 ) GeV^0 TOTAL : 0.825550 sec 386,914,589 cycles:u # 0.437 GHz 828,824,446 instructions:u # 2.14 insn per cycle 0.963103649 seconds time elapsed ==PROF== Profiling "sigmaKin": launch__registers_per_thread 64 ==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100% ========================================================================= Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = DOUBLE (NaN/abnormal=0, zero=0) Internal loops fptype_sv = SCALAR ('none': ~vector[1], no SIMD) OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 1.275975e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 7.515231 sec 19,209,549,764 cycles:u # 2.561 GHz 48,627,460,774 instructions:u # 2.53 insn per cycle 7.553956776 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 614) (avx2: 0) (512y: 0) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = DOUBLE (NaN/abnormal=0, zero=0) Internal loops fptype_sv = VECTOR[2] ('sse4': SSE4.2, 128bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 2.542904e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 4.980303 sec 12,899,873,466 cycles:u # 2.584 GHz 29,995,111,387 instructions:u # 2.33 insn per cycle 5.033503232 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 3274) (avx2: 0) (512y: 0) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = DOUBLE (NaN/abnormal=0, zero=0) Internal loops fptype_sv = VECTOR[4] ('avx2': AVX2, 256bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 4.508129e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 3.899591 sec 9,295,935,243 cycles:u # 2.391 GHz 16,619,353,171 instructions:u # 1.79 insn per cycle 3.969700585 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 0) (avx2: 2746) (512y: 0) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = DOUBLE (NaN/abnormal=0, zero=0) Internal loops fptype_sv = VECTOR[4] ('512y': AVX512, 256bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 4.775861e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 3.870736 sec 9,103,619,309 cycles:u # 2.376 GHz 16,556,440,384 instructions:u # 1.82 insn per cycle 3.911828159 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 0) (avx2: 2572) (512y: 95) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = DOUBLE (NaN/abnormal=0, zero=0) Internal loops fptype_sv = VECTOR[8] ('512z': AVX512, 512bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 3.381439e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 4.330337 sec 9,151,794,863 cycles:u # 2.106 GHz 13,418,541,159 instructions:u # 1.47 insn per cycle 4.463700520 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 0) (avx2: 1127) (512y: 205) (512z: 2045) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = FLOAT (NaN/abnormal=6, zero=0) Internal loops fptype_sv = SCALAR ('none': ~vector[1], no SIMD) OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 1.169022e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371707e-02 +- 3.270376e-06 ) GeV^0 TOTAL : 7.431409 sec 19,242,935,493 cycles:u # 2.591 GHz 47,777,821,032 instructions:u # 2.48 insn per cycle 7.485035275 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 578) (avx2: 0) (512y: 0) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = FLOAT (NaN/abnormal=6, zero=0) Internal loops fptype_sv = VECTOR[4] ('sse4': SSE4.2, 128bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 4.425426e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270375e-06 ) GeV^0 TOTAL : 3.526400 sec 8,942,606,781 cycles:u # 2.588 GHz 19,782,130,775 instructions:u # 2.21 insn per cycle 3.631748518 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 3719) (avx2: 0) (512y: 0) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = FLOAT (NaN/abnormal=5, zero=0) Internal loops fptype_sv = VECTOR[8] ('avx2': AVX2, 256bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 8.068817e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371705e-02 +- 3.270339e-06 ) GeV^0 TOTAL : 2.782846 sec 6,900,906,289 cycles:u # 2.473 GHz 12,568,910,963 instructions:u # 1.82 insn per cycle 2.833799464 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 0) (avx2: 3077) (512y: 0) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = FLOAT (NaN/abnormal=5, zero=0) Internal loops fptype_sv = VECTOR[8] ('512y': AVX512, 256bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 8.514326e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371705e-02 +- 3.270339e-06 ) GeV^0 TOTAL : 2.790172 sec 6,805,090,790 cycles:u # 2.475 GHz 12,587,787,277 instructions:u # 1.85 insn per cycle 2.943286999 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 0) (avx2: 2917) (512y: 81) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = FLOAT (NaN/abnormal=5, zero=0) Internal loops fptype_sv = VECTOR[16] ('512z': AVX512, 512bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 6.798061e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371705e-02 +- 3.270340e-06 ) GeV^0 TOTAL : 2.938570 sec 6,641,119,578 cycles:u # 2.249 GHz 10,993,670,890 instructions:u # 1.66 insn per cycle 2.979389891 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 0) (avx2: 1559) (512y: 179) (512z: 2157) =========================================================================

valassi · 2021-06-28T16:18:48Z

self merging - empty commits

valassi added 4 commits June 11, 2021 16:14

Merge remote-tracking branch 'upstream/master' into tput

795db2a

valassi merged commit 6cbe58a into madgraph5:master Jun 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty commits - latest logs used in the paper#218

Empty commits - latest logs used in the paper#218
valassi merged 4 commits intomadgraph5:masterfrom
valassi:tput

valassi commented Jun 28, 2021

Uh oh!

valassi commented Jun 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

valassi commented Jun 28, 2021

Uh oh!

valassi commented Jun 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant