processes for the paper #344

oliviermattelaer · 2022-01-24T20:14:43Z

I would suggest the following processes for the paper:

g g > t t~ g
g g > t t~ g g
g g > t t~ g g g

In term of processes to use to check that the code can handle most of the cases

import model heft; generate g g > h
generate u u~ > d d~

valassi · 2022-01-24T21:22:53Z

Hi Olivier, thanks. I would suggest adding also eemumu - first because it is quite different (much lighter in computations) and may have some interesting numbers, and second because this is what we had publiched in the CHEP proceedings, so it can be interesting to compare tp those numbers. What do you think?

jtchilders · 2022-01-24T21:36:45Z

looks good to me.

valassi · 2022-01-25T07:49:08Z

I am documenting a few specificities of ggttggg in #346. Feel free to add more observations please!

valassi · 2022-01-25T16:15:31Z

I have just merged PR #345. This contains a couple of useful things for the paper, following Olivier's suggestions

I have added code generation an dlogfiles for ggttg and ggttggg
I have yet another script to parse a selection of logfiles from various revisions

The summary of all results for the five processes I look at (eemumu, ggtt+0,1,2,3g) is here:
https://github.com/madgraph5/madgraph4gpu/blob/master/epochX/cudacpp/tput/summaryTable.txt

I tried several combinations

both double and float
my baseline cuda116/gcc102, but also cuda116/icx2021 (the latter is related to cuda116/clang13)
my baseline "no aggressive inlining" version with a "more aggressive inlining" version

There are quite a few differences, still to be understood/tweaked, between the two compilers and the two inlining options, but I would consider the baseline for our comparison.

Note that I give one CUDA number, and several SIMDs for C++. The nice thing is that a factor 4 between no SIMD and 512y SIMD for double (and a factor 8 for float) seems always there, also for ggttggg. The baseline of the baseline is thw single CUDA result, and the single 512y/C++ result.

As the complexity increases and the tests take longer, I reduce the number of events (or gpublocks/threads) for ggttggg. For C++, even a few events would be enough to reach plateau performance, but I always try to run a reasonable number for CUDA too. I always run in CUDA the same number of events as in C++, to compare the ME value. But in CUDA, for ggttgg and ggttggg I do a second run with more gpublocks/threads, to reach the plateau. Typically for V100 this is 64 blocks and 256 threads as bare minimum (below the performance always drops by factors). The detailed configs are here

madgraph4gpu/epochX/cudacpp/tput/throughputX.sh

Line 393 in 956f0e6

if [ "${exe%%/gg_ttggg*}" != "${exe}" ]; then

(look at exeArgs2 if it exists, else at exeArgs, for the CUDA blocks/threads).

Voila that's my full performance numbers as of today. They will still evolve (especially with split kernels etc).

I will also look at the two processes that Olivier suggested as a prrof of concept of generation.

(One final word of caution, I think I have some small functional bugs in the calculations, I will look at them. Related, or maybe independent, the different compilers start giving quite different results on ggttggg... maybe it's just the order of adding the 1000 diagrams...)

valassi · 2022-01-25T16:16:16Z

Just to put it in full, as of today:

*** FPTYPE=d ******************************************************************

Revision c2e67b4 [nvcc 11.6.55 (gcc 10.2.0)] 
HELINL=0
            eemumu      ggtt        ggttg       ggttgg      ggttggg     
CUD/none    1.35e+09    1.41e+08    1.45e+07    5.20e+05    1.18e+04    
CPP/none    1.67e+06    2.01e+05    2.48e+04    1.81e+03    7.22e+01    
CPP/sse4    3.13e+06    3.17e+05    4.54e+04    3.34e+03    1.32e+02    
CPP/avx2    5.54e+06    5.64e+05    8.86e+04    6.83e+03    2.61e+02    
CPP/512y    5.82e+06    6.15e+05    9.83e+04    7.49e+03    2.88e+02    
CPP/512z    4.65e+06    3.75e+05    7.19e+04    6.52e+03    2.94e+02    

Revision c2e67b4 [nvcc 11.6.55 (gcc 10.2.0)] 
HELINL=1
            eemumu      ggtt        ggttg       ggttgg      ggttggg     
CUD/none    1.38e+09    1.42e+08                3.85e+05                
CPP/none    4.97e+06    2.38e+05                3.91e+02                
CPP/sse4    8.95e+06    2.78e+05                2.94e+03                
CPP/avx2    1.20e+07    4.44e+05                6.09e+03                
CPP/512y    1.23e+07    4.54e+05                7.52e+03                
CPP/512z    8.28e+06    3.43e+05                6.66e+03                

Revision 4f3229d [nvcc 11.6.55 (icx 20210400, clang 13.0.0, gcc 10.2.0)] 
HELINL=0
            eemumu      ggtt        ggttg       ggttgg      ggttggg     
CUD/none    1.33e+09    1.42e+08    1.45e+07    5.14e+05    1.19e+04    
CPP/none    7.60e+06    2.15e+05    2.43e+04    1.50e+03    7.21e+01    
CPP/sse4    7.89e+06    4.45e+05    4.57e+04    2.82e+03    1.05e+02    
CPP/avx2    1.19e+07    6.93e+05    1.04e+05    7.61e+03    2.42e+02    
CPP/512y    1.19e+07    7.50e+05    1.09e+05    8.45e+03    2.74e+02    
CPP/512z    9.37e+06    5.09e+05    7.85e+04    5.85e+03    2.66e+02    

Revision 4f3229d [nvcc 11.6.55 (icx 20210400, clang 13.0.0, gcc 10.2.0)] 
HELINL=1
            eemumu      ggtt        ggttg       ggttgg      ggttggg     
CUD/none    1.33e+09    1.41e+08                3.85e+05                
CPP/none    7.71e+06    2.65e+05                1.92e+03                
CPP/sse4    7.93e+06    4.44e+05                3.77e+03                
CPP/avx2    1.19e+07    8.16e+05                1.00e+04                
CPP/512y    1.18e+07    8.64e+05                1.15e+04                
CPP/512z    9.19e+06    5.83e+05                1.09e+04                

*** FPTYPE=f ******************************************************************

Revision c2e67b4 [nvcc 11.6.55 (gcc 10.2.0)] 
HELINL=0
            eemumu      ggtt        ggttg       ggttgg      ggttggg     
CUD/none    3.26e+09    3.79e+08    4.75e+07    9.71e+05    2.66e+04    
CPP/none    1.72e+06    2.06e+05    2.50e+04    1.87e+03    7.67e+01    
CPP/sse4    6.14e+06    4.80e+05    8.33e+04    6.97e+03    2.87e+02    
CPP/avx2    1.15e+07    1.04e+06    1.75e+05    1.36e+04    5.21e+02    
CPP/512y    1.21e+07    1.10e+06    1.85e+05    1.48e+04    5.66e+02    
CPP/512z    9.32e+06    7.64e+05    1.47e+05    1.30e+04    5.81e+02    

Revision c2e67b4 [nvcc 11.6.55 (gcc 10.2.0)] 
HELINL=1
            eemumu      ggtt        ggttg       ggttgg      ggttggg     
CUD/none    3.23e+09    3.80e+08                7.48e+05                
CPP/none    1.22e+07    2.48e+05                4.74e+02                
CPP/sse4    1.80e+07    5.40e+05                6.75e+03                
CPP/avx2    2.53e+07    7.02e+05                1.19e+04                
CPP/512y    2.61e+07    7.15e+05                1.47e+04                
CPP/512z    1.74e+07    5.63e+05                1.30e+04                

Revision 4f3229d [nvcc 11.6.55 (icx 20210400, clang 13.0.0, gcc 10.2.0)] 
HELINL=0
            eemumu      ggtt        ggttg       ggttgg      ggttggg     
CUD/none    3.24e+09    3.79e+08    4.71e+07    9.66e+05    2.67e+04    
CPP/none    3.52e+06    2.07e+05    2.52e+04    1.84e+03    7.21e+01    
CPP/sse4    1.35e+07    6.88e+05    9.48e+04    6.12e+03    2.49e+02    
CPP/avx2    2.55e+07    1.12e+06    1.46e+05    1.36e+04    5.08e+02    
CPP/512y    2.57e+07    1.38e+06    2.10e+05    1.69e+04    5.22e+02    
CPP/512z    2.33e+07    5.70e+05    1.21e+05    1.22e+04    5.30e+02    

Revision 4f3229d [nvcc 11.6.55 (icx 20210400, clang 13.0.0, gcc 10.2.0)] 
HELINL=1
            eemumu      ggtt        ggttg       ggttgg      ggttggg     
CUD/none    3.25e+09    3.81e+08                7.49e+05                
CPP/none    3.58e+06    2.57e+05                2.47e+03                
CPP/sse4    1.40e+07    8.31e+05                8.59e+03                
CPP/avx2    2.57e+07    1.41e+06                1.96e+04                
CPP/512y    2.67e+07    1.55e+06                2.26e+04                
CPP/512z    2.33e+07    6.07e+05                2.01e+04

valassi · 2022-01-25T17:59:12Z

About uudd generation, I opened #349. I just completed PR #350, which I am about to merge.

The heft generation is a bit more tricky, I will create a separate PR.

valassi · 2022-01-25T18:28:24Z

I have opened issue #351 about the heft code generation, and a WIP PR #352. There are a few fundamental issues to discuss with Olivier first there (should the base plugin put the Higgs mass to cIPD so that it gets to constant memory?).

valassi · 2022-01-26T12:36:46Z

Hi @oliviermattelaer about the EFT Higgs in #351, in the end I have a physics question! I had two build problems

One, about how to propagate the mass of the Higgs to the sxxxx function. After looking at the code and the relationship between Parameters, CPPProcess and MatrixElementKernels (also fo rother reasons, Clarify roles of Parameters_sm, CPPProcess and MatrixElementKernels #356), I saw that it was perfectly trivial to get it as Parameters_heft::getInstance()->mdl_MH. So this issue is fixed.
The second problem however is a mismatch of the sxxxx arguments, because the helicity is not passed. I looked back at the way I cleaned up sxxxx, also by comparing to Fortran, and I noticed that in Fortran NEITHER the mass NOR the helicity (for SM cases) are passed at all to sxxxx. Indeed, in the end the three-component wavefunction for a scalar is computed in a way which does not depend on the mass (it could in principle??) nor of the helicity (always zero by definition). Even further, I saw in the HEFT example that VVS3_0 only uses the component S3[2] of this three-componet scalar wavefunction, and this is one which does not even depend on the momentum, it does not depend on anything!

So my questions:

can you confirm that I can remove the helicity argument from the sxxxx function? it is there unused in c++, it is absent in fortran
can you confirm that also the mass has nothing to do there and can be removed as an argument from the sxxx function? again it is there unused in c++, it is absent in fortran... if I can remove this, my initial problem also trivially dosappears, because I no longer need to pass Parameters_heft::getInstance()->mdl_MH at all!
just to be sure, can you confirm that the rest looks right, ie this calculation of EFT gg>h should not depend on the momentum of the Higgs? (seems to make sense, the momentum is zero always in the center of mass of the Higgs?)

For the moment I will assume that I can simply remove both the helicity and mass arguments, and modify sxxxx an dthe rest accordingly, then submit a PR to review. But let me know please!
Thanks
Andrea

oliviermattelaer · 2022-01-26T12:50:17Z

* can you confirm that I can remove the helicity argument from the sxxxx function? it is there unused in c++, it is absent in fortran

Yes this can be technically removed. Keeping it might be easier for the code generation but it's just an if statement.

* can you confirm that also the mass has nothing to do there and can be removed as an argument from the sxxx function? again it is there unused in c++, it is absent in fortran... if I can remove this, my initial problem also trivially dosappears, because I no longer need to pass  `Parameters_heft::getInstance()->mdl_MH` at all!

Yes this is correct.
(Note that MH might be needed due to the coupling/propagator but indeed not from initial state)
However, this type of need exists for other processes for
g g > t t~, you do depend of the top mass too and this one enters the ixxxx and oxxx
So you need to be able to access mass for such type of routine, how do you handle those?

* just to be sure, can you confirm that the rest looks right, ie this calculation of EFT gg>h should not depend on the momentum of the Higgs? (seems to make sense, the momentum is zero always in the center of mass of the Higgs?)

Yes this is correct for this computation.

valassi · 2022-01-26T14:41:28Z

Hi Olivier, thanks! Ok so I will remove those two arguments.
About the mass in ggtt, it is working out of the box. Everything which is used there gets translated to cIPC/cIPD and ends up in constant cuda memory (or static C++ memory). I probably changed something over time, but definitely it was originally your code (I am not even sure why they are called cIPC and cIPD!).
Andrea

valassi · 2022-01-26T15:21:50Z

Hi @oliviermattelaer again, next physics question! In #358.
I get a build warning from rambo, which makes me think that maybe a 2->1 process like gg>h is not a good example for this exercise (do we need phase space sampling at all)? Are we not repeating always the same ME calculation with the same momenta, indepndently of random numbers?
For the moment I would just ignore the warning anyway... let me know if you have other suggestions. (Anyway this was very useful to find other issues in the code!).
Thanks
Andrea

oliviermattelaer · 2022-01-26T15:58:30Z

Hi, Yes indeed 2>1 process are special for the integration. This is not the main point of this check but rather to check the case with scalar and with non SM model Cheers, Olivier On 26 Jan 2022, at 16:22, Andrea Valassi ***@***.******@***.***>> wrote: Hi @oliviermattelaer<https://github.com/oliviermattelaer> again, next physics question! In #358<#358>. I get a build warning from rambo, which makes me think that maybe a 2->1 process like gg>h is not a good example for this exercise (do we need phase space sampling at all)? Are we not repeating always the same ME calculation with the same momenta, indepndently of random numbers? For the moment I would just ignore the warning anyway... let me know if you have other suggestions. (Anyway this was very useful to find other issues in the code!). Thanks Andrea — Reply to this email directly, view it on GitHub<#344 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AH6535UXMQ5Z5FUWPGGNBKLUYAGRVANCNFSM5MWLLQPA>. Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.***>

valassi · 2022-01-26T16:45:40Z

Hi Olivier, ok very good, then I will just keep the warning in the code and check that the ME generation works (indeed it does now). Thanks!

valassi · 2023-04-07T08:10:43Z

I am not sure this issue is the best placed, but since it is open I will add these comments here. I just want to give an overview of the processes we already have and the ones we should be adding, and why.

Currently we have these 7 SA and 6 MAD processes for cudacpp

ee_mumu (sa/mad): this was our bootstrap, not really used anymore for performance, but keep it for reference (also, for instance, PR Upgrade to Olivier's latest gpucpp branch upstream ("split_nonidentical_grouping") #619 recently changed the P1 naming convention)
gg_tt, gg_ttg, gg_ttgg, gg_ttggg (sa/mad): our workhorses for performance tests
gq_ttq (sa/mad): this tests a different functionality, namely the handling of gu and gux collisions, which used to be in the same P1 directory with DSIG1 and DSIG2, and are now two P1 directories after Upgrade to Olivier's latest gpucpp branch upstream ("split_nonidentical_grouping") #619 - note, the builds and tests do not crash, but xsec results are wrong today (xsec from fortran and cpp differ in gg_uu tmad tests (bug in getGoodHel implementation) #630)
heft_gg_h (sa): this was the first initial/only test of non-SM process... we can probably remove it when we get better SMEFT/SUSY stuff as described below, or we may keep it

What I would like to add includes

pp_tttt (sa/mad): this test a different functionality, namely two P1 directories with different nwf values, which is not yet implemented, hence code generation is ok but builds fail (xsec from fortran and cpp differ in gg_uu tmad tests (bug in getGoodHel implementation) #630)
smeft_pp_tt (sa/mad): this tests SMEFT UFO functionality, is relevant for Zenny's reweighting, currently PR Fix builds of SMEFT gg_tttt #632, the code generates ok but the builds fail in both HRDCOD=1 and HRDCOD=1 modes, see Support for non-SM UFO models: HRDCOD=1 build fails in model SMEFTsim_topU3l_MwScheme_UFO #614 and Support for non-SM UFO models: standard build, HRDCOD=0 build fails in model SMEFTsim_topU3l_MwScheme_UFO #616
susy_gg_tt (sa/mad): this tests SUSY UFO functionality, is related to tests done by Nathan/Walter, currently PR fix susy gg_tt builds and add it to the repo #625, the code generates ok but builds fail, eg SUSY HRDCOD=1 builds fail because sin/cos/atan are not constexpr (only CUDA fails, while C++ succeeds) #627 (HRDCOD=1 but sin/cos are not constexpr, susy-specific) and wrong type argument to unary minus (FFV functions should not use "-COUPs" as argument) #628 (HIGH PRIORITY! FFV uses -COUPs, this also happens eg in SM "c s~ > w+ z")

Much lower priority, but eventually relevant for performance tests (runtime AND build speed!):\

gg_ttgggg (sa/mad): 2->6 process, initially suggested by Nathan, see WIP PR WIP: gg to ttgggg (2->6 process) #601, most builds do not converge in one week...

Comments welcome...

cc @oliviermattelaer @roiser @zeniheisser @hageboeck @whhopkins @jtchilders @nscottnichols

oliviermattelaer assigned roiser, valassi, smithdh, jtchilders and oliviermattelaer Jan 24, 2022

valassi mentioned this issue Jan 25, 2022

ggttggg performance studies #346

Open

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jan 25, 2022

[tex] add code generation for ggttg (madgraph5#344)

3714ef6

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jan 25, 2022

[tex] add code generation for ggttggg (madgraph5#344)

a2c7060

valassi mentioned this issue Jan 25, 2022

Create formatted table from log parser + A few more fixes for clang/isnan + Add ggttg/ggttggg generation/code/logs #345

Merged

This was referenced Jan 25, 2022

uudd generation: cIPC[0] should be excluded? #349

Closed

Fixes for uudd code generation #350

Merged

This was referenced Jan 25, 2022

Code generation for EFT gg>h : issues with model parameters #351

Closed

Fixes for EFT gg>h code generation #352

Merged

valassi mentioned this issue Jan 26, 2022

RAMBO with a single final particle? (Do we need sampling in a 2->1 process?) #358

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

processes for the paper #344

processes for the paper #344

oliviermattelaer commented Jan 24, 2022

valassi commented Jan 24, 2022

jtchilders commented Jan 24, 2022

valassi commented Jan 25, 2022

valassi commented Jan 25, 2022

valassi commented Jan 25, 2022

valassi commented Jan 25, 2022

valassi commented Jan 25, 2022

valassi commented Jan 26, 2022

oliviermattelaer commented Jan 26, 2022

valassi commented Jan 26, 2022

valassi commented Jan 26, 2022

oliviermattelaer commented Jan 26, 2022 via email

valassi commented Jan 26, 2022

valassi commented Apr 7, 2023