Empty commits - latest logs used in the paper#218
Merged
valassi merged 4 commits intomadgraph5:masterfrom Jun 28, 2021
Merged
Conversation
On itscrd70.cern.ch [CPU: Intel(R) Xeon(R) Silver 4216 CPU] [GPU: NVIDIA Tesla V100S-PCIE-32GB]:
=========================================================================
Process = EPOCH1_EEMUMU_CUDA [nvcc 11.0.221]
FP precision = DOUBLE (NaN/abnormal=0, zero=0)
EvtsPerSec[MatrixElems] (3) = ( 5.021815e+08 ) sec^-1
EvtsPerSec[MECalcOnly] (3a) = ( 7.504267e+08 ) sec^-1
MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0
TOTAL : 0.747806 sec
2,558,737,342 cycles # 2.649 GHz
3,500,196,731 instructions # 1.37 insn per cycle
1.050331328 seconds time elapsed
==PROF== Profiling "sigmaKin": launch__registers_per_thread 136
==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100%
=========================================================================
Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv = SCALAR ('none': ~vector[1], no SIMD)
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 1.199446e+06 ) sec^-1
MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0
TOTAL : 7.595453 sec
20,332,815,475 cycles # 2.675 GHz
52,130,992,146 instructions # 2.56 insn per cycle
7.606473393 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 600) (avx2: 0) (512y: 0) (512z: 0)
-------------------------------------------------------------------------
Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv = VECTOR[4] ('512y': AVX512, 256bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 3.826392e+06 ) sec^-1
MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0
TOTAL : 3.941491 sec
9,885,547,963 cycles # 2.504 GHz
18,304,396,256 instructions # 1.85 insn per cycle
3.952184812 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 0) (avx2: 2525) (512y: 95) (512z: 0)
=========================================================================
On CUDA, the faster ixx/oxx give a factor 2 speedup in MECalconly
On C++, the faster ixx/oxx give a factor 1.3 speedup in MECalconly for 512y
On itscrd70.cern.ch [CPU: Intel(R) Xeon(R) Silver 4216 CPU] [GPU: NVIDIA Tesla V100S-PCIE-32GB]:
=========================================================================
Process = EPOCH1_EEMUMU_CUDA [nvcc 11.0.221]
FP precision = DOUBLE (NaN/abnormal=0, zero=0)
EvtsPerSec[MatrixElems] (3) = ( 7.230201e+08 ) sec^-1
EvtsPerSec[MECalcOnly] (3a) = ( 1.352934e+09 ) sec^-1
MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0
TOTAL : 0.733924 sec
2,541,933,264 cycles # 2.642 GHz
3,482,878,075 instructions # 1.37 insn per cycle
1.027491825 seconds time elapsed
==PROF== Profiling "sigmaKin": launch__registers_per_thread 120
==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100%
=========================================================================
Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv = SCALAR ('none': ~vector[1], no SIMD)
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 1.306635e+06 ) sec^-1
MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0
TOTAL : 7.166390 sec
19,172,782,268 cycles # 2.673 GHz
48,581,048,150 instructions # 2.53 insn per cycle
7.177041383 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 614) (avx2: 0) (512y: 0) (512z: 0)
-------------------------------------------------------------------------
Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv = VECTOR[4] ('512y': AVX512, 256bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 4.898666e+06 ) sec^-1
MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0
TOTAL : 3.583812 sec
9,072,290,427 cycles # 2.527 GHz
16,497,789,079 instructions # 1.82 insn per cycle
3.594563440 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 0) (avx2: 2572) (512y: 95) (512z: 0)
=========================================================================
This is the status of the current upstream/master
NB! The EvtsPerSec[MatrixElems] in CUDA are varying wildly.
In double I see this go from 5.9E8 to 6.7E8 today, yesterday 7.2E8 with the same code.
The EvtsPerSec[MatrixElems] however is much more stable around 1.37E9 in all those cases.
I will therefore keep the numbers in the paper from previous measurements and vCHEP
for EvtsPerSec[MatrixElems] (namely 7.25E8 /double and 1.59E9 /float)
and add any recent EvtsPerSec[MECalcOnly] as 1.37E9 /double and 3.28E9 /float
On itscrd70.cern.ch [CPU: Intel(R) Xeon(R) Silver 4216 CPU] [GPU: NVIDIA Tesla V100S-PCIE-32GB]:
=========================================================================
Process = EPOCH1_EEMUMU_CUDA [nvcc 11.0.221]
FP precision = DOUBLE (NaN/abnormal=0, zero=0)
EvtsPerSec[MatrixElems] (3) = ( 6.743408e+08 ) sec^-1
EvtsPerSec[MECalcOnly] (3a) = ( 1.367534e+09 ) sec^-1
MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0
TOTAL : 0.743706 sec
2,588,597,917 cycles # 2.648 GHz
3,526,137,396 instructions # 1.36 insn per cycle
1.048524808 seconds time elapsed
==PROF== Profiling "sigmaKin": launch__registers_per_thread 120
==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100%
-------------------------------------------------------------------------
Process = EPOCH1_EEMUMU_CUDA [nvcc 11.0.221]
FP precision = FLOAT (NaN/abnormal=2, zero=0)
EvtsPerSec[MatrixElems] (3) = ( 1.516160e+09 ) sec^-1
EvtsPerSec[MECalcOnly] (3a) = ( 3.280236e+09 ) sec^-1
MeanMatrixElemValue = ( 1.371686e-02 +- 3.270219e-06 ) GeV^0
TOTAL : 0.690254 sec
2,368,009,568 cycles # 2.649 GHz
3,367,981,007 instructions # 1.42 insn per cycle
0.978942760 seconds time elapsed
==PROF== Profiling "sigmaKin": launch__registers_per_thread 48
==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100%
=========================================================================
Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv = SCALAR ('none': ~vector[1], no SIMD)
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 1.301297e+06 ) sec^-1
MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0
TOTAL : 7.192022 sec
19,241,362,073 cycles # 2.673 GHz
48,583,081,581 instructions # 2.52 insn per cycle
7.203110555 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 614) (avx2: 0) (512y: 0) (512z: 0)
-------------------------------------------------------------------------
Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv = VECTOR[2] ('sse4': SSE4.2, 128bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 2.532683e+06 ) sec^-1
MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0
TOTAL : 4.831134 sec
12,924,690,078 cycles # 2.671 GHz
29,940,147,028 instructions # 2.32 insn per cycle
4.842102414 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 3274) (avx2: 0) (512y: 0) (512z: 0)
-------------------------------------------------------------------------
Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv = VECTOR[4] ('avx2': AVX2, 256bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 4.593254e+06 ) sec^-1
MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0
TOTAL : 3.662869 sec
9,247,838,169 cycles # 2.520 GHz
16,560,392,033 instructions # 1.79 insn per cycle
3.673581024 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 0) (avx2: 2746) (512y: 0) (512z: 0)
-------------------------------------------------------------------------
Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv = VECTOR[4] ('512y': AVX512, 256bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 4.892414e+06 ) sec^-1
MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0
TOTAL : 3.603951 sec
9,129,054,132 cycles # 2.528 GHz
16,497,282,072 instructions # 1.81 insn per cycle
3.614598246 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 0) (avx2: 2572) (512y: 95) (512z: 0)
-------------------------------------------------------------------------
Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv = VECTOR[8] ('512z': AVX512, 512bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 3.738379e+06 ) sec^-1
MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0
TOTAL : 4.020666 sec
8,891,618,256 cycles # 2.208 GHz
13,361,398,930 instructions # 1.50 insn per cycle
4.035327755 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 0) (avx2: 1127) (512y: 205) (512z: 2045)
-------------------------------------------------------------------------
Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision = FLOAT (NaN/abnormal=6, zero=0)
Internal loops fptype_sv = SCALAR ('none': ~vector[1], no SIMD)
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 1.211023e+06 ) sec^-1
MeanMatrixElemValue = ( 1.371707e-02 +- 3.270376e-06 ) GeV^0
TOTAL : 7.119093 sec
19,060,830,463 cycles # 2.675 GHz
47,728,069,981 instructions # 2.50 insn per cycle
7.129763876 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 578) (avx2: 0) (512y: 0) (512z: 0)
-------------------------------------------------------------------------
Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision = FLOAT (NaN/abnormal=6, zero=0)
Internal loops fptype_sv = VECTOR[4] ('sse4': SSE4.2, 128bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 4.507245e+06 ) sec^-1
MeanMatrixElemValue = ( 1.371706e-02 +- 3.270375e-06 ) GeV^0
TOTAL : 3.351029 sec
8,971,452,327 cycles # 2.672 GHz
19,719,600,560 instructions # 2.20 insn per cycle
3.362076705 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 3719) (avx2: 0) (512y: 0) (512z: 0)
-------------------------------------------------------------------------
Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision = FLOAT (NaN/abnormal=5, zero=0)
Internal loops fptype_sv = VECTOR[8] ('avx2': AVX2, 256bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 8.160704e+06 ) sec^-1
MeanMatrixElemValue = ( 1.371705e-02 +- 3.270339e-06 ) GeV^0
TOTAL : 2.695110 sec
6,906,959,097 cycles # 2.556 GHz
12,504,366,929 instructions # 1.81 insn per cycle
2.706037106 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 0) (avx2: 3077) (512y: 0) (512z: 0)
-------------------------------------------------------------------------
Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision = FLOAT (NaN/abnormal=5, zero=0)
Internal loops fptype_sv = VECTOR[8] ('512y': AVX512, 256bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 8.869343e+06 ) sec^-1
MeanMatrixElemValue = ( 1.371705e-02 +- 3.270339e-06 ) GeV^0
TOTAL : 2.621328 sec
6,734,710,370 cycles # 2.562 GHz
12,522,685,250 instructions # 1.86 insn per cycle
2.631936677 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 0) (avx2: 2917) (512y: 81) (512z: 0)
-------------------------------------------------------------------------
Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision = FLOAT (NaN/abnormal=5, zero=0)
Internal loops fptype_sv = VECTOR[16] ('512z': AVX512, 512bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 7.434054e+06 ) sec^-1
MeanMatrixElemValue = ( 1.371705e-02 +- 3.270340e-06 ) GeV^0
TOTAL : 2.750946 sec
6,437,749,555 cycles # 2.334 GHz
10,930,642,414 instructions # 1.70 insn per cycle
2.761224316 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 0) (avx2: 1559) (512y: 179) (512z: 2157)
=========================================================================
On lxplus770.cern.ch [CPU: Intel(R) Xeon(R) Silver 4216 CPU] [GPU: NVIDIA Tesla T4]:
=========================================================================
Process = EPOCH1_EEMUMU_CUDA [nvcc 11.0.221]
FP precision = DOUBLE (NaN/abnormal=0, zero=0)
EvtsPerSec[MatrixElems] (3) = ( 3.891814e+07 ) sec^-1
EvtsPerSec[MECalcOnly] (3a) = ( 4.009506e+07 ) sec^-1
MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0
TOTAL : 1.144038 sec
672,779,391 cycles:u # 0.552 GHz
1,362,529,910 instructions:u # 2.03 insn per cycle
1.285736848 seconds time elapsed
==PROF== Profiling "sigmaKin": launch__registers_per_thread 120
==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100%
-------------------------------------------------------------------------
Process = EPOCH1_EEMUMU_CUDA [nvcc 11.0.221]
FP precision = FLOAT (NaN/abnormal=2, zero=0)
EvtsPerSec[MatrixElems] (3) = ( 6.377864e+08 ) sec^-1
EvtsPerSec[MECalcOnly] (3a) = ( 8.217691e+08 ) sec^-1
MeanMatrixElemValue = ( 1.371686e-02 +- 3.270219e-06 ) GeV^0
TOTAL : 0.825550 sec
386,914,589 cycles:u # 0.437 GHz
828,824,446 instructions:u # 2.14 insn per cycle
0.963103649 seconds time elapsed
==PROF== Profiling "sigmaKin": launch__registers_per_thread 64
==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100%
=========================================================================
Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv = SCALAR ('none': ~vector[1], no SIMD)
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 1.275975e+06 ) sec^-1
MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0
TOTAL : 7.515231 sec
19,209,549,764 cycles:u # 2.561 GHz
48,627,460,774 instructions:u # 2.53 insn per cycle
7.553956776 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 614) (avx2: 0) (512y: 0) (512z: 0)
-------------------------------------------------------------------------
Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv = VECTOR[2] ('sse4': SSE4.2, 128bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 2.542904e+06 ) sec^-1
MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0
TOTAL : 4.980303 sec
12,899,873,466 cycles:u # 2.584 GHz
29,995,111,387 instructions:u # 2.33 insn per cycle
5.033503232 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 3274) (avx2: 0) (512y: 0) (512z: 0)
-------------------------------------------------------------------------
Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv = VECTOR[4] ('avx2': AVX2, 256bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 4.508129e+06 ) sec^-1
MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0
TOTAL : 3.899591 sec
9,295,935,243 cycles:u # 2.391 GHz
16,619,353,171 instructions:u # 1.79 insn per cycle
3.969700585 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 0) (avx2: 2746) (512y: 0) (512z: 0)
-------------------------------------------------------------------------
Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv = VECTOR[4] ('512y': AVX512, 256bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 4.775861e+06 ) sec^-1
MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0
TOTAL : 3.870736 sec
9,103,619,309 cycles:u # 2.376 GHz
16,556,440,384 instructions:u # 1.82 insn per cycle
3.911828159 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 0) (avx2: 2572) (512y: 95) (512z: 0)
-------------------------------------------------------------------------
Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv = VECTOR[8] ('512z': AVX512, 512bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 3.381439e+06 ) sec^-1
MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0
TOTAL : 4.330337 sec
9,151,794,863 cycles:u # 2.106 GHz
13,418,541,159 instructions:u # 1.47 insn per cycle
4.463700520 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 0) (avx2: 1127) (512y: 205) (512z: 2045)
-------------------------------------------------------------------------
Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision = FLOAT (NaN/abnormal=6, zero=0)
Internal loops fptype_sv = SCALAR ('none': ~vector[1], no SIMD)
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 1.169022e+06 ) sec^-1
MeanMatrixElemValue = ( 1.371707e-02 +- 3.270376e-06 ) GeV^0
TOTAL : 7.431409 sec
19,242,935,493 cycles:u # 2.591 GHz
47,777,821,032 instructions:u # 2.48 insn per cycle
7.485035275 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 578) (avx2: 0) (512y: 0) (512z: 0)
-------------------------------------------------------------------------
Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision = FLOAT (NaN/abnormal=6, zero=0)
Internal loops fptype_sv = VECTOR[4] ('sse4': SSE4.2, 128bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 4.425426e+06 ) sec^-1
MeanMatrixElemValue = ( 1.371706e-02 +- 3.270375e-06 ) GeV^0
TOTAL : 3.526400 sec
8,942,606,781 cycles:u # 2.588 GHz
19,782,130,775 instructions:u # 2.21 insn per cycle
3.631748518 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 3719) (avx2: 0) (512y: 0) (512z: 0)
-------------------------------------------------------------------------
Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision = FLOAT (NaN/abnormal=5, zero=0)
Internal loops fptype_sv = VECTOR[8] ('avx2': AVX2, 256bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 8.068817e+06 ) sec^-1
MeanMatrixElemValue = ( 1.371705e-02 +- 3.270339e-06 ) GeV^0
TOTAL : 2.782846 sec
6,900,906,289 cycles:u # 2.473 GHz
12,568,910,963 instructions:u # 1.82 insn per cycle
2.833799464 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 0) (avx2: 3077) (512y: 0) (512z: 0)
-------------------------------------------------------------------------
Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision = FLOAT (NaN/abnormal=5, zero=0)
Internal loops fptype_sv = VECTOR[8] ('512y': AVX512, 256bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 8.514326e+06 ) sec^-1
MeanMatrixElemValue = ( 1.371705e-02 +- 3.270339e-06 ) GeV^0
TOTAL : 2.790172 sec
6,805,090,790 cycles:u # 2.475 GHz
12,587,787,277 instructions:u # 1.85 insn per cycle
2.943286999 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 0) (avx2: 2917) (512y: 81) (512z: 0)
-------------------------------------------------------------------------
Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision = FLOAT (NaN/abnormal=5, zero=0)
Internal loops fptype_sv = VECTOR[16] ('512z': AVX512, 512bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 6.798061e+06 ) sec^-1
MeanMatrixElemValue = ( 1.371705e-02 +- 3.270340e-06 ) GeV^0
TOTAL : 2.938570 sec
6,641,119,578 cycles:u # 2.249 GHz
10,993,670,890 instructions:u # 1.66 insn per cycle
2.979389891 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 0) (avx2: 1559) (512y: 179) (512z: 2157)
=========================================================================
Member
Author
|
self merging - empty commits |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.