Skip to content

Empty commits - latest logs used in the paper#218

Merged
valassi merged 4 commits intomadgraph5:masterfrom
valassi:tput
Jun 28, 2021
Merged

Empty commits - latest logs used in the paper#218
valassi merged 4 commits intomadgraph5:masterfrom
valassi:tput

Conversation

@valassi
Copy link
Copy Markdown
Member

@valassi valassi commented Jun 28, 2021

No description provided.

valassi added 4 commits June 11, 2021 16:14
On itscrd70.cern.ch [CPU: Intel(R) Xeon(R) Silver 4216 CPU] [GPU: NVIDIA Tesla V100S-PCIE-32GB]:
=========================================================================
Process                     = EPOCH1_EEMUMU_CUDA [nvcc 11.0.221]
FP precision                = DOUBLE (NaN/abnormal=0, zero=0)
EvtsPerSec[MatrixElems] (3) = ( 5.021815e+08                 )  sec^-1
EvtsPerSec[MECalcOnly] (3a) = ( 7.504267e+08                 )  sec^-1
MeanMatrixElemValue         = ( 1.371706e-02 +- 3.270315e-06 )  GeV^0
TOTAL       :     0.747806 sec
     2,558,737,342      cycles                    #    2.649 GHz
     3,500,196,731      instructions              #    1.37  insn per cycle
       1.050331328 seconds time elapsed
==PROF== Profiling "sigmaKin": launch__registers_per_thread 136
==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100%
=========================================================================
Process                     = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision                = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv    = SCALAR ('none': ~vector[1], no SIMD)
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 1.199446e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.371706e-02 +- 3.270315e-06 )  GeV^0
TOTAL       :     7.595453 sec
    20,332,815,475      cycles                    #    2.675 GHz
    52,130,992,146      instructions              #    2.56  insn per cycle
       7.606473393 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4:  600) (avx2:    0) (512y:    0) (512z:    0)
-------------------------------------------------------------------------
Process                     = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision                = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv    = VECTOR[4] ('512y': AVX512, 256bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 3.826392e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.371706e-02 +- 3.270315e-06 )  GeV^0
TOTAL       :     3.941491 sec
     9,885,547,963      cycles                    #    2.504 GHz
    18,304,396,256      instructions              #    1.85  insn per cycle
       3.952184812 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4:    0) (avx2: 2525) (512y:   95) (512z:    0)
=========================================================================
On CUDA, the faster ixx/oxx give a factor 2 speedup in MECalconly
On C++, the faster ixx/oxx give a factor 1.3 speedup in MECalconly for 512y

On itscrd70.cern.ch [CPU: Intel(R) Xeon(R) Silver 4216 CPU] [GPU: NVIDIA Tesla V100S-PCIE-32GB]:
=========================================================================
Process                     = EPOCH1_EEMUMU_CUDA [nvcc 11.0.221]
FP precision                = DOUBLE (NaN/abnormal=0, zero=0)
EvtsPerSec[MatrixElems] (3) = ( 7.230201e+08                 )  sec^-1
EvtsPerSec[MECalcOnly] (3a) = ( 1.352934e+09                 )  sec^-1
MeanMatrixElemValue         = ( 1.371706e-02 +- 3.270315e-06 )  GeV^0
TOTAL       :     0.733924 sec
     2,541,933,264      cycles                    #    2.642 GHz
     3,482,878,075      instructions              #    1.37  insn per cycle
       1.027491825 seconds time elapsed
==PROF== Profiling "sigmaKin": launch__registers_per_thread 120
==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100%
=========================================================================
Process                     = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision                = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv    = SCALAR ('none': ~vector[1], no SIMD)
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 1.306635e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.371706e-02 +- 3.270315e-06 )  GeV^0
TOTAL       :     7.166390 sec
    19,172,782,268      cycles                    #    2.673 GHz
    48,581,048,150      instructions              #    2.53  insn per cycle
       7.177041383 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4:  614) (avx2:    0) (512y:    0) (512z:    0)
-------------------------------------------------------------------------
Process                     = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision                = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv    = VECTOR[4] ('512y': AVX512, 256bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 4.898666e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.371706e-02 +- 3.270315e-06 )  GeV^0
TOTAL       :     3.583812 sec
     9,072,290,427      cycles                    #    2.527 GHz
    16,497,789,079      instructions              #    1.82  insn per cycle
       3.594563440 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4:    0) (avx2: 2572) (512y:   95) (512z:    0)
=========================================================================
This is the status of the current upstream/master

NB! The EvtsPerSec[MatrixElems] in CUDA are varying wildly.
In double I see this go from 5.9E8 to 6.7E8 today, yesterday 7.2E8 with the same code.
The EvtsPerSec[MatrixElems] however is much more stable around 1.37E9 in all those cases.
I will therefore keep the numbers in the paper from previous measurements and vCHEP
for EvtsPerSec[MatrixElems] (namely 7.25E8 /double and 1.59E9 /float)
and add any recent EvtsPerSec[MECalcOnly] as 1.37E9 /double and 3.28E9 /float

On itscrd70.cern.ch [CPU: Intel(R) Xeon(R) Silver 4216 CPU] [GPU: NVIDIA Tesla V100S-PCIE-32GB]:
=========================================================================
Process                     = EPOCH1_EEMUMU_CUDA [nvcc 11.0.221]
FP precision                = DOUBLE (NaN/abnormal=0, zero=0)
EvtsPerSec[MatrixElems] (3) = ( 6.743408e+08                 )  sec^-1
EvtsPerSec[MECalcOnly] (3a) = ( 1.367534e+09                 )  sec^-1
MeanMatrixElemValue         = ( 1.371706e-02 +- 3.270315e-06 )  GeV^0
TOTAL       :     0.743706 sec
     2,588,597,917      cycles                    #    2.648 GHz
     3,526,137,396      instructions              #    1.36  insn per cycle
       1.048524808 seconds time elapsed
==PROF== Profiling "sigmaKin": launch__registers_per_thread 120
==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100%
-------------------------------------------------------------------------
Process                     = EPOCH1_EEMUMU_CUDA [nvcc 11.0.221]
FP precision                = FLOAT (NaN/abnormal=2, zero=0)
EvtsPerSec[MatrixElems] (3) = ( 1.516160e+09                 )  sec^-1
EvtsPerSec[MECalcOnly] (3a) = ( 3.280236e+09                 )  sec^-1
MeanMatrixElemValue         = ( 1.371686e-02 +- 3.270219e-06 )  GeV^0
TOTAL       :     0.690254 sec
     2,368,009,568      cycles                    #    2.649 GHz
     3,367,981,007      instructions              #    1.42  insn per cycle
       0.978942760 seconds time elapsed
==PROF== Profiling "sigmaKin": launch__registers_per_thread 48
==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100%
=========================================================================
Process                     = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision                = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv    = SCALAR ('none': ~vector[1], no SIMD)
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 1.301297e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.371706e-02 +- 3.270315e-06 )  GeV^0
TOTAL       :     7.192022 sec
    19,241,362,073      cycles                    #    2.673 GHz
    48,583,081,581      instructions              #    2.52  insn per cycle
       7.203110555 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4:  614) (avx2:    0) (512y:    0) (512z:    0)
-------------------------------------------------------------------------
Process                     = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision                = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv    = VECTOR[2] ('sse4': SSE4.2, 128bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 2.532683e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.371706e-02 +- 3.270315e-06 )  GeV^0
TOTAL       :     4.831134 sec
    12,924,690,078      cycles                    #    2.671 GHz
    29,940,147,028      instructions              #    2.32  insn per cycle
       4.842102414 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 3274) (avx2:    0) (512y:    0) (512z:    0)
-------------------------------------------------------------------------
Process                     = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision                = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv    = VECTOR[4] ('avx2': AVX2, 256bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 4.593254e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.371706e-02 +- 3.270315e-06 )  GeV^0
TOTAL       :     3.662869 sec
     9,247,838,169      cycles                    #    2.520 GHz
    16,560,392,033      instructions              #    1.79  insn per cycle
       3.673581024 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4:    0) (avx2: 2746) (512y:    0) (512z:    0)
-------------------------------------------------------------------------
Process                     = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision                = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv    = VECTOR[4] ('512y': AVX512, 256bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 4.892414e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.371706e-02 +- 3.270315e-06 )  GeV^0
TOTAL       :     3.603951 sec
     9,129,054,132      cycles                    #    2.528 GHz
    16,497,282,072      instructions              #    1.81  insn per cycle
       3.614598246 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4:    0) (avx2: 2572) (512y:   95) (512z:    0)
-------------------------------------------------------------------------
Process                     = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision                = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv    = VECTOR[8] ('512z': AVX512, 512bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 3.738379e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.371706e-02 +- 3.270315e-06 )  GeV^0
TOTAL       :     4.020666 sec
     8,891,618,256      cycles                    #    2.208 GHz
    13,361,398,930      instructions              #    1.50  insn per cycle
       4.035327755 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4:    0) (avx2: 1127) (512y:  205) (512z: 2045)
-------------------------------------------------------------------------
Process                     = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision                = FLOAT (NaN/abnormal=6, zero=0)
Internal loops fptype_sv    = SCALAR ('none': ~vector[1], no SIMD)
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 1.211023e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.371707e-02 +- 3.270376e-06 )  GeV^0
TOTAL       :     7.119093 sec
    19,060,830,463      cycles                    #    2.675 GHz
    47,728,069,981      instructions              #    2.50  insn per cycle
       7.129763876 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4:  578) (avx2:    0) (512y:    0) (512z:    0)
-------------------------------------------------------------------------
Process                     = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision                = FLOAT (NaN/abnormal=6, zero=0)
Internal loops fptype_sv    = VECTOR[4] ('sse4': SSE4.2, 128bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 4.507245e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.371706e-02 +- 3.270375e-06 )  GeV^0
TOTAL       :     3.351029 sec
     8,971,452,327      cycles                    #    2.672 GHz
    19,719,600,560      instructions              #    2.20  insn per cycle
       3.362076705 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 3719) (avx2:    0) (512y:    0) (512z:    0)
-------------------------------------------------------------------------
Process                     = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision                = FLOAT (NaN/abnormal=5, zero=0)
Internal loops fptype_sv    = VECTOR[8] ('avx2': AVX2, 256bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 8.160704e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.371705e-02 +- 3.270339e-06 )  GeV^0
TOTAL       :     2.695110 sec
     6,906,959,097      cycles                    #    2.556 GHz
    12,504,366,929      instructions              #    1.81  insn per cycle
       2.706037106 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4:    0) (avx2: 3077) (512y:    0) (512z:    0)
-------------------------------------------------------------------------
Process                     = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision                = FLOAT (NaN/abnormal=5, zero=0)
Internal loops fptype_sv    = VECTOR[8] ('512y': AVX512, 256bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 8.869343e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.371705e-02 +- 3.270339e-06 )  GeV^0
TOTAL       :     2.621328 sec
     6,734,710,370      cycles                    #    2.562 GHz
    12,522,685,250      instructions              #    1.86  insn per cycle
       2.631936677 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4:    0) (avx2: 2917) (512y:   81) (512z:    0)
-------------------------------------------------------------------------
Process                     = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision                = FLOAT (NaN/abnormal=5, zero=0)
Internal loops fptype_sv    = VECTOR[16] ('512z': AVX512, 512bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 7.434054e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.371705e-02 +- 3.270340e-06 )  GeV^0
TOTAL       :     2.750946 sec
     6,437,749,555      cycles                    #    2.334 GHz
    10,930,642,414      instructions              #    1.70  insn per cycle
       2.761224316 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4:    0) (avx2: 1559) (512y:  179) (512z: 2157)
=========================================================================

On lxplus770.cern.ch [CPU: Intel(R) Xeon(R) Silver 4216 CPU] [GPU: NVIDIA Tesla T4]:
=========================================================================
Process                     = EPOCH1_EEMUMU_CUDA [nvcc 11.0.221]
FP precision                = DOUBLE (NaN/abnormal=0, zero=0)
EvtsPerSec[MatrixElems] (3) = ( 3.891814e+07                 )  sec^-1
EvtsPerSec[MECalcOnly] (3a) = ( 4.009506e+07                 )  sec^-1
MeanMatrixElemValue         = ( 1.371706e-02 +- 3.270315e-06 )  GeV^0
TOTAL       :     1.144038 sec
       672,779,391      cycles:u                  #    0.552 GHz
     1,362,529,910      instructions:u            #    2.03  insn per cycle
       1.285736848 seconds time elapsed
==PROF== Profiling "sigmaKin": launch__registers_per_thread 120
==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100%
-------------------------------------------------------------------------
Process                     = EPOCH1_EEMUMU_CUDA [nvcc 11.0.221]
FP precision                = FLOAT (NaN/abnormal=2, zero=0)
EvtsPerSec[MatrixElems] (3) = ( 6.377864e+08                 )  sec^-1
EvtsPerSec[MECalcOnly] (3a) = ( 8.217691e+08                 )  sec^-1
MeanMatrixElemValue         = ( 1.371686e-02 +- 3.270219e-06 )  GeV^0
TOTAL       :     0.825550 sec
       386,914,589      cycles:u                  #    0.437 GHz
       828,824,446      instructions:u            #    2.14  insn per cycle
       0.963103649 seconds time elapsed
==PROF== Profiling "sigmaKin": launch__registers_per_thread 64
==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100%
=========================================================================
Process                     = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision                = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv    = SCALAR ('none': ~vector[1], no SIMD)
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 1.275975e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.371706e-02 +- 3.270315e-06 )  GeV^0
TOTAL       :     7.515231 sec
    19,209,549,764      cycles:u                  #    2.561 GHz
    48,627,460,774      instructions:u            #    2.53  insn per cycle
       7.553956776 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4:  614) (avx2:    0) (512y:    0) (512z:    0)
-------------------------------------------------------------------------
Process                     = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision                = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv    = VECTOR[2] ('sse4': SSE4.2, 128bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 2.542904e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.371706e-02 +- 3.270315e-06 )  GeV^0
TOTAL       :     4.980303 sec
    12,899,873,466      cycles:u                  #    2.584 GHz
    29,995,111,387      instructions:u            #    2.33  insn per cycle
       5.033503232 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 3274) (avx2:    0) (512y:    0) (512z:    0)
-------------------------------------------------------------------------
Process                     = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision                = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv    = VECTOR[4] ('avx2': AVX2, 256bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 4.508129e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.371706e-02 +- 3.270315e-06 )  GeV^0
TOTAL       :     3.899591 sec
     9,295,935,243      cycles:u                  #    2.391 GHz
    16,619,353,171      instructions:u            #    1.79  insn per cycle
       3.969700585 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4:    0) (avx2: 2746) (512y:    0) (512z:    0)
-------------------------------------------------------------------------
Process                     = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision                = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv    = VECTOR[4] ('512y': AVX512, 256bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 4.775861e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.371706e-02 +- 3.270315e-06 )  GeV^0
TOTAL       :     3.870736 sec
     9,103,619,309      cycles:u                  #    2.376 GHz
    16,556,440,384      instructions:u            #    1.82  insn per cycle
       3.911828159 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4:    0) (avx2: 2572) (512y:   95) (512z:    0)
-------------------------------------------------------------------------
Process                     = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision                = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv    = VECTOR[8] ('512z': AVX512, 512bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 3.381439e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.371706e-02 +- 3.270315e-06 )  GeV^0
TOTAL       :     4.330337 sec
     9,151,794,863      cycles:u                  #    2.106 GHz
    13,418,541,159      instructions:u            #    1.47  insn per cycle
       4.463700520 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4:    0) (avx2: 1127) (512y:  205) (512z: 2045)
-------------------------------------------------------------------------
Process                     = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision                = FLOAT (NaN/abnormal=6, zero=0)
Internal loops fptype_sv    = SCALAR ('none': ~vector[1], no SIMD)
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 1.169022e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.371707e-02 +- 3.270376e-06 )  GeV^0
TOTAL       :     7.431409 sec
    19,242,935,493      cycles:u                  #    2.591 GHz
    47,777,821,032      instructions:u            #    2.48  insn per cycle
       7.485035275 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4:  578) (avx2:    0) (512y:    0) (512z:    0)
-------------------------------------------------------------------------
Process                     = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision                = FLOAT (NaN/abnormal=6, zero=0)
Internal loops fptype_sv    = VECTOR[4] ('sse4': SSE4.2, 128bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 4.425426e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.371706e-02 +- 3.270375e-06 )  GeV^0
TOTAL       :     3.526400 sec
     8,942,606,781      cycles:u                  #    2.588 GHz
    19,782,130,775      instructions:u            #    2.21  insn per cycle
       3.631748518 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4: 3719) (avx2:    0) (512y:    0) (512z:    0)
-------------------------------------------------------------------------
Process                     = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision                = FLOAT (NaN/abnormal=5, zero=0)
Internal loops fptype_sv    = VECTOR[8] ('avx2': AVX2, 256bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 8.068817e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.371705e-02 +- 3.270339e-06 )  GeV^0
TOTAL       :     2.782846 sec
     6,900,906,289      cycles:u                  #    2.473 GHz
    12,568,910,963      instructions:u            #    1.82  insn per cycle
       2.833799464 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4:    0) (avx2: 3077) (512y:    0) (512z:    0)
-------------------------------------------------------------------------
Process                     = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision                = FLOAT (NaN/abnormal=5, zero=0)
Internal loops fptype_sv    = VECTOR[8] ('512y': AVX512, 256bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 8.514326e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.371705e-02 +- 3.270339e-06 )  GeV^0
TOTAL       :     2.790172 sec
     6,805,090,790      cycles:u                  #    2.475 GHz
    12,587,787,277      instructions:u            #    1.85  insn per cycle
       2.943286999 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4:    0) (avx2: 2917) (512y:   81) (512z:    0)
-------------------------------------------------------------------------
Process                     = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0]
FP precision                = FLOAT (NaN/abnormal=5, zero=0)
Internal loops fptype_sv    = VECTOR[16] ('512z': AVX512, 512bit) [cxtype_ref=YES]
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MECalcOnly] (3a) = ( 6.798061e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.371705e-02 +- 3.270340e-06 )  GeV^0
TOTAL       :     2.938570 sec
     6,641,119,578      cycles:u                  #    2.249 GHz
    10,993,670,890      instructions:u            #    1.66  insn per cycle
       2.979389891 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4:    0) (avx2: 1559) (512y:  179) (512z: 2157)
=========================================================================
@valassi
Copy link
Copy Markdown
Member Author

valassi commented Jun 28, 2021

self merging - empty commits

@valassi valassi merged commit 6cbe58a into madgraph5:master Jun 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant