-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(DO NOT MERGE: for reference only) Further tests of Olivier's "fix_826" branch PR 852 #870
Conversation
…t channelId for susy_gg_t1t1 (fix issue madgraph5#826)
…ot channelId (and note that iconfig=1 is ok)
…g_t1t1 (will give zero cross section madgraph5#826)
… test a different iconfig In particular: the following triggers a SIGFPE reported in madgraph5#855 (crash in rotxxx that can be fixed adding volatile?) ./tmad/madX.sh -ggttgg -iconfig 104 -makeclean This also triggers a similar SIGFPE (initially reported in madgraph5#826) ./tmad/madX.sh -susyggt1t1 -iconfig 2 -makeclean
…g AS-IS Olivier's patches from the latest fix_826 branch for PR madgraph5#852 The gg_ttgg test still crashes (rotxxx madgraph5#855?) ./tmad/madX.sh -ggttgg -iconfig 104 -makeclean *** (2-none) EXECUTE MADEVENT_CPP x1 (create events.lhe) *** Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error: 0 0x7fce5ec23860 in ??? 1 0x7fce5ec22a05 in ??? 2 0x7fce5e854def in ??? 3 0x44b5ff in ??? 4 0x4087df in ??? 5 0x409848 in ??? 6 0x40bb83 in ??? 7 0x40d1a9 in ??? 8 0x45c804 in ??? 9 0x434269 in ??? 10 0x40371e in ??? 11 0x7fce5e83feaf in ??? 12 0x7fce5e83ff5f in ??? 13 0x403844 in ??? 14 0xffffffffffffffff in ??? ./tmad/madX.sh: line 387: 3913008 Floating point exception(core dumped) $timecmd $cmd < ${tmpin} > ${tmp} The susy_gg_t1t1 test also still crashes (see madgraph5#826?), this looks like the same crash as ggttgg above ./tmad/madX.sh -susyggt1t1 -iconfig 2 -makeclean *** (2-none) EXECUTE MADEVENT_CPP x1 (create events.lhe) *** Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error: 0 0x7f9f03423860 in ??? 1 0x7f9f03422a05 in ??? 2 0x7f9f03054def in ??? 3 0x43809f in ??? 4 0x40581f in ??? 5 0x4067b1 in ??? 6 0x408c71 in ??? 7 0x40a0a9 in ??? 8 0x444fdf in ??? 9 0x42bb38 in ??? 10 0x40371e in ??? 11 0x7f9f0303feaf in ??? 12 0x7f9f0303ff5f in ??? 13 0x403844 in ??? 14 0xffffffffffffffff in ??? ./tmad/madX.sh: line 387: 3907179 Floating point exception(core dumped) $timecmd $cmd < ${tmpin} > ${tmp} The gqttq test also still crashes intermittently, i.e. only on the second execution (madgraph5#845?) ./tmad/teeMadX.sh -gqttq +10x -fltonly -makeclean ./tmad/teeMadX.sh -gqttq +10x -fltonly Executing ' ./build.512z_f_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_gqttq_x1_cudacpp > /tmp/avalassi/output_gqttq_x1_cudacpp' Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error: 0 0x7fbafa623860 in ??? 1 0x7fbafa622a05 in ??? 2 0x7fbafa254def in ??? 3 0x7fbafad24034 in ??? 4 0x7fbafa9a1575 in ??? 5 0x7fbafad20c89 in ??? 6 0x7fbafad2abfd in ??? 7 0x7fbafad30491 in ??? 8 0x43008b in ??? 9 0x431c10 in ??? 10 0x432d47 in ??? 11 0x433b1e in ??? 12 0x44a921 in ??? 13 0x42ebbf in ??? 14 0x40371e in ??? 15 0x7fbafa23feaf in ??? 16 0x7fbafa23ff5f in ??? 17 0x403844 in ??? 18 0xffffffffffffffff in ??? ./madX.sh: line 387: 3922797 Floating point exception(core dumped) $timecmd $cmd < ${tmpin} > ${tmp} ERROR! ' ./build.512z_f_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_gqttq_x1_cudacpp > /tmp/avalassi/output_gqttq_x1_cudacpp' failed
…nd cudacpp.mk to improve the crash dumps The susyggt1t1 test clearly crashes in rotxxx (madgraph5#855): ./tmad/madX.sh -susyggt1t1 -iconfig 2 -makeclean *** (2-none) EXECUTE MADEVENT_CPP x1 (create events.lhe) *** Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error: 0 0x7fb7e1223860 in ??? 1 0x7fb7e1222a05 in ??? 2 0x7fb7e0e54def in ??? 3 0x43809f in rotxxx_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/susy_gg_t1t1.mad/Source/DHELAS/aloha_functions.f:1247 4 0x40581f in gentcms_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/genps.f:1480 5 0x4067b1 in one_tree_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/genps.f:1167 6 0x408c71 in gen_mom_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/genps.f:68 7 0x40a0a9 in x_to_f_arg_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/genps.f:60 8 0x444fdf in sample_full_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/susy_gg_t1t1.mad/Source/dsample.f:172 9 0x42bb38 in driver at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/driver.f:256 10 0x40371e in main at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/driver.f:301 ./tmad/madX.sh: line 387: 3928626 Floating point exception(core dumped) $timecmd $cmd < ${tmpin} > ${tmp} ERROR! ' ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_susyggt1t1_x1_cudacpp > /tmp/avalassi/output_susyggt1t1_x1_cudacpp' failed The ggttgg test also clearly crashes in rotxxx (madgraph5#855): ./tmad/madX.sh -ggttgg -iconfig 104 -makeclean^C *** (2-none) EXECUTE MADEVENT_CPP x1 (create events.lhe) *** Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error: 0 0x7fb141c23860 in ??? 1 0x7fb141c22a05 in ??? 2 0x7fb141854def in ??? 3 0x44b5ff in rotxxx_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/Source/DHELAS/aloha_functions.f:1247 4 0x4087df in gentcms_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/genps.f:1480 5 0x409848 in one_tree_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/genps.f:1167 6 0x40bb83 in gen_mom_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/genps.f:68 7 0x40d1a9 in x_to_f_arg_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/genps.f:60 8 0x45c804 in sample_full_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/Source/dsample.f:172 9 0x434269 in driver at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/driver.f:256 10 0x40371e in main at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/driver.f:301 ./tmad/madX.sh: line 387: 3933302 Floating point exception(core dumped) $timecmd $cmd < ${tmpin} > ${tmp} ERROR! ' ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggttgg_x1_cudacpp > /tmp/avalassi/output_ggttgg_x1_cudacpp' failed The gqttq test instead clearly crashes in sigmaKin (madgraph5#845): ./tmad/teeMadX.sh -gqttq +10x -fltonly -makeclean ./tmad/teeMadX.sh -gqttq +10x -fltonly Executing ' ./build.512z_f_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_gqttq_x10_cudacpp > /tmp/avalassi/output_gqttq_x10_cudacpp' Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error: 0 0x7f607ee23860 in ??? 1 0x7f607ee22a05 in ??? 2 0x7f607ea54def in ??? 3 0x7f607f607008 in _ZN9mg5amcCpu8sigmaKinEPKfS1_S1_S1_PfjS2_S2_PiS3_i._omp_fn.0 at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/CPPProcess.cc:1190 4 0x7f607f4ab575 in ??? 5 0x7f607f603c89 in _ZN9mg5amcCpu8sigmaKinEPKfS1_S1_S1_PfjS2_S2_PiS3_i at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/CPPProcess.cc:1093 6 0x7f607f60dbfd in _ZN9mg5amcCpu23MatrixElementKernelHost21computeMatrixElementsEj at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/MatrixElementKernels.cc:115 7 0x7f607f613491 in _ZN9mg5amcCpu6BridgeIdE12cpu_sequenceEPKdS3_S3_S3_jPdPiS5_b at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/Bridge.h:390 8 0x7f607f613491 in fbridgesequence_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/fbridge.cc:106 9 0x43008b in smatrix1_multi_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/auto_dsig1.f:618 10 0x431c10 in dsig1_vec_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/auto_dsig1.f:445 11 0x432d47 in dsigproc_vec_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/auto_dsig.f:1034 12 0x433b1e in dsig_vec_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/auto_dsig.f:327 13 0x44a921 in sample_full_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/Source/dsample.f:208 14 0x42ebbf in driver at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/driver.f:256 15 0x40371e in main at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/driver.f:301 ./madX.sh: line 387: 3941122 Floating point exception(core dumped) $timecmd $cmd < ${tmpin} > ${tmp} ERROR! ' ./build.512z_f_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_gqttq_x10_cudacpp > /tmp/avalassi/output_gqttq_x10_cudacpp' failed Conclusion: I would not merge 852 as it does not fix issues yet. Instead I would merge 857 to fix the rotxxx crash 855 using volatile, and reassess from there...
…ing my two recent changes in gpucpp
Fix conflicts in MG5aMC/mg5amcnlo (keep the latest gpucpp_826 version including the recent gpucpp changes)
I merged the latest upstream/master into this, including the CI tmad tests. This now has 24 failures, which should be (I did not check each one of them):
|
…ing the 'volatile' fix for rotxxx crashes
…syggt1t1 to test madgraph5#855 fix while still exposing madgraph5#826 and madgraph5#856
…test upstream/master (after cherry-picking the madX.sh changes too) GITMB=$(git merge-base --fork-point upstream/master HEAD) echo $GITMB a87e640 git checkout $GITMB $(git ls-tree --name-only $GITMB */CODEGEN*txt)
Fix conflicts in MG5aMC/mg5amcnlo (keep the latest gpucpp_826 version including the recent gpucpp changes)
FWIW I upgraded this to the latest upstream/master. But again this will not be merged. Note: on upstream/master (before adding these extra changes in PR 870) there were 9 errors in the CI, three fptype for each of the following three issues, #857 (comment)
With these extra patches in 870, there are 22 CI failures in https://github.com/madgraph5/madgraph4gpu/actions/runs/9704073463 . These seem to be the following
So maybe these patches changed something in color mapping, but actually broke them further in pp_tt012j? Not sure, anyway, will investigate more |
I am closing this. This is superseded by MR #873, providing several fixes in this area. |
his contains further tests of Olivier's "fix_826" branch PR #852. It does not look good. I will discuss the results in PR #852 directly.