master_june24: missing sanity checks that all channelids are the same in a "warp" (or at least inside each SIMD vector) #898

valassi · 2024-07-09T14:00:20Z

Another issue introduced in#830 and being reviewed in #882

This is related to (almost a sub-component of) the SIMD issue #894 but I prefer to address it separately.

As clarified during the meeting today and #894 (comment) (thanks @roiser), the cudacpp implementation should assume that all channelids are the same in a warp. To avoid all unexpected behaviour due to hidden assumption, a sanity check (an asser essentially is needed). I think we had discussed this in the past, maybe I will dig that out. Anyway, this is a prerequisite to simplifying the SIMD implementation.

I file this as a separate issue also because it may be appropriate to expose these assumptions in the fortan-cudacpp bridge, so that they have the same assumptions. (Example: warp_size can be modified in the runcard, there must be a sanity check on allowed values).

valassi · 2024-07-09T14:07:51Z

PS On second thought, what I will do is add a sanity check that channelids are all the same inside a SIMD vector. For GPUs, this is not strictly necessary: the implementation can be made to be functional anyway, but it will be slower.

Example. Take hardware SIMD vectors of 4 elements and hardware GPU warps of 32 threads. If every channelid is different for those elements, then:

the GPU implementation will give correct results anyway (the channelid of each thread is only accesses in each thread), but there will be a (small) thread divergence
the SIMD implementation in master_june24: wrong/bugged and unnecessary SIMD handling of numerators and denominators #894, conversely, must know this in advance: if it can assume 4 equa channelids, SIMD vectors can be used, otherwise explicit loops must be used... but this must be known at the time of writing the code...

valassi · 2024-07-09T15:37:12Z

Small complication/question: for mixed fptype the check must ensure that TWO double vectors have the same channelId? Or probably not, I must check.

valassi · 2024-07-09T15:57:50Z

Small complication/question: for mixed fptype the check must ensure that TWO double vectors have the same channelId? Or probably not, I must check.

As discussed in #892 (comment), I am moving to an implementation where calculate_wavefunction and sigmaKin have two differenbt interfaces, in line with what they do

calaculate_wavefunctions manages a single SIMD vector or single event, which all have a same calar channelId
sigmaKin does the loops over SIMD vectors and decodes channelids, and implements the check described here

With respect to FPTYPE=m, the point to notice is the following, eg take avx2 with 8 floats and 4 doubles

sigmakin does a loop over float vectors, so it checks that 8 events have the same channelid
calculatewavefunction then loops (iParity loop) over two double vectors (first 4 events, then another 4 events)
the single scalar channelid that sigmakin passes to calculate_wavefunctions is valid for 8 events, ie for both double vectors... and a single channelid argument is enough, again

valassi · 2024-07-09T16:02:00Z

NB: the complication here is that uint_v has 4 ints, while there is the need to check 8 ints, so probably this needs to be done with two separate vectors. I will check...

oliviermattelaer · 2024-07-16T09:43:24Z

Concerning:

Small complication/question: for mixed fptype the check must ensure that TWO double vectors have the same channelId? Or probably not, I must check.

I do not see any problem if the two vector will have different channelID in the "m" type. On the contrary it is better to have different one. The point is that the color computation (only part done in float) does not depend of the channelId.
So they are no penalty to have different ChannelID here, and since the more flipping of value we have in channelID, the better it is to avoid potential bias ...

Obviously in that case, you can not have "nb_warp" set to 1 but this is the only limitation.
So I would say that we do not need dedicated check for the "m" case and on the contrary allow for two different value here.

…use channels 1,2,3,1,2,3... for different events (previously it was 1 for all events) NB1: the cuda test now fails, the reference file must be recreated NB2: I expect the SIMD tests to fail using the CUDA reference, due to the different bugs in the current channelId implementation NB3: eventually madgraph5#898 the implementation should enforce that all events in a warp use the same channelid

…channel test madgraph5#896 to use channels 1,2,3,1,2,3... for different events (previously it was 1 for all events) NB1: the cuda test now fails, the reference file must be recreated NB2: I expect the SIMD tests to fail using the CUDA reference, due to the different bugs in the current channelId implementation NB3: eventually madgraph5#898 the implementation should enforce that all events in a warp use the same channelid

…ph5#894 is SIMD handling of numertaors and denominators Now the tests madgraph5#898 succeed when using channels123123123... for bck in none sse4 avx2 512y 512z cuda; do echo BACKEND=$bck; ./build.${bck}_d_inl0_hrd0/runTest_*.exe; done | egrep '(BACKEND| ME |r.ME> BACKEND=none BACKEND=sse4 BACKEND=avx2 BACKEND=512y BACKEND=512z BACKEND=cuda Note: the debug printout gives DEBUG: MEKB processed 512 events across 3 channels { 1 : 172, 2 : 170, 3 : 170 }

…cks madgraph5#898 that channelids are the same inside each warp As expected, the tests madgraph5#898 using 123123123 etc fail the assertions for bck in none sse4 avx2 512y 512z cuda; do echo BACKEND=$bck; ./build.${bck}_d_inl0_hrd0/runTest_*.exe; done |& egrep '(BACKEND|Assert)' BACKEND=none BACKEND=sse4 runTest_cpp.exe: CPPProcess.cc:335: void mg5amcCpu::calculate_wavefunctions(int, const fptype*, const fptype*, mgOnGpu::fptype*, const unsigned int*, mgOnGpu::fptype*, mgOnGpu::fptype*, mg5amcCpu::fptype_sv*, int): Assertion `channelId == channelIds_sv[i]' failed. BACKEND=avx2 runTest_cpp.exe: CPPProcess.cc:335: void mg5amcCpu::calculate_wavefunctions(int, const fptype*, const fptype*, mgOnGpu::fptype*, const unsigned int*, mgOnGpu::fptype*, mgOnGpu::fptype*, mg5amcCpu::fptype_sv*, int): Assertion `channelId == channelIds_sv[i]' failed. BACKEND=512y runTest_cpp.exe: CPPProcess.cc:335: void mg5amcCpu::calculate_wavefunctions(int, const fptype*, const fptype*, mgOnGpu::fptype*, const unsigned int*, mgOnGpu::fptype*, mgOnGpu::fptype*, mg5amcCpu::fptype_sv*, int): Assertion `channelId == channelIds_sv[i]' failed. BACKEND=512z runTest_cpp.exe: CPPProcess.cc:335: void mg5amcCpu::calculate_wavefunctions(int, const fptype*, const fptype*, mgOnGpu::fptype*, const unsigned int*, mgOnGpu::fptype*, mgOnGpu::fptype*, mg5amcCpu::fptype_sv*, int): Assertion `channelId == channelIds_sv[i]' failed. BACKEND=cuda

…ph5#898, check all is ok

…the fact madgraph5#898 that channelids are the same inside each warp This also fixes all pending CID_ACCESS::ieventAccessRecordConst calls madgraph5#899 and madgraph5#911 Note: CPPProcess.cc is now again very close to upstream/master (most of master_june24 changes from madgraph5#830 have been undone)

madgraph5#898, all ok

…h5#898 for all processes (using the CODEGEN/recreateRefs.sh script) Note: some processes fail (some of the fixes in upstream/master may be needed...) [ RUN ] SIGMA_SM_GG_TTXGG_GPU_MULTICHANNEL.compareMomAndME DEBUG: MatrixElementKernelDevice::computeMatrixElements 0x1420650 T 256 runTest_cuda.exe: GpuRuntime.h:26: void assertGpu(cudaError_t, const char*, int, bool): Assertion `code == gpuSuccess' failed. [ RUN ] SIGMA_SM_GG_TTXGGG_GPU_MULTICHANNEL.compareMomAndME DEBUG: MatrixElementKernelDevice::computeMatrixElements 0x3a1fe00 T 256 runTest_cuda.exe: GpuRuntime.h:26: void assertGpu(cudaError_t, const char*, int, bool): Assertion `code == gpuSuccess' failed. [----------] 1 test from SIGMA_SM_GG_TTXGG_GPU_MULTICHANNEL [ RUN ] SIGMA_SM_GG_TTXGG_GPU_MULTICHANNEL.compareMomAndME DEBUG: MatrixElementKernelDevice::computeMatrixElements 0x15e77f0 T 256 runTest_cuda.exe: GpuRuntime.h:26: void assertGpu(cudaError_t, const char*, int, bool): Assertion `code == gpuSuccess' failed.

valassi · 2024-07-19T11:27:44Z

The sanity check on th emain/first channelId SIMD vector is now implemented, essentially here (with many other things) ea2146b

Note: in #924 I added the sanity check that TWO SIMD vectors have the same channelId in mixed mode, essentially here valassi@9e77b2b

Concerning:

Small complication/question: for mixed fptype the check must ensure that TWO double vectors have the same channelId? Or probably not, I must check.

I do not see any problem if the two vector will have different channelID in the "m" type. On the contrary it is better to have different one. The point is that the color computation (only part done in float) does not depend of the channelId. So they are no penalty to have different ChannelID here, and since the more flipping of value we have in channelID, the better it is to avoid potential bias ...

Obviously in that case, you can not have "nb_warp" set to 1 but this is the only limitation. So I would say that we do not need dedicated check for the "m" case and on the contrary allow for two different value here.

Thanks Olivier.

See the extensive discussion I added in #911 (comment)

I was going to make many comments here about the mixed fptype, but in the end I made them here #926 (comment)

The sanity checks are done (in PR #882), both for one vector and (in mixed mode) for two vectors. This can be closed.

valassi self-assigned this Jul 9, 2024

valassi changed the title ~~master_june24: missing sanity checks that all channelids are the same in a "warp"~~ master_june24: missing sanity checks that all channelids are the same in a "warp" (or at least inside each SIMD vector) Jul 9, 2024

valassi mentioned this issue Jul 9, 2024

master_june24: wrong/bugged and unnecessary SIMD handling of numerators and denominators #894

Closed

valassi mentioned this issue Jul 9, 2024

master_june24: nomultichannel should be an array nullptr, not an array full of 0s #892

Closed

valassi mentioned this issue Jul 15, 2024

master_june24: major bug in memory access for channelIds in C++ calculate_wavefunctions (missing ieventAccessRecordConst) #899

Closed

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 17, 2024

[june24] regenerate gg_tt.mad after implementing sanity checks madgra…

8ef06cf

…ph5#898, check all is ok

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 17, 2024

[june24] regenerate gg_tt.mad after simplifying CPPProcess.cc thanks to

9199ad9

madgraph5#898, all ok

oliviermattelaer added this to the warp milestone Jul 18, 2024

valassi mentioned this issue Jul 19, 2024

(after master_june24) consider releasing the assumption that two adjacent SIMD vectors have the same channelid in mixed mode #926

Open

valassi closed this as completed Jul 19, 2024

valassi linked a pull request Jul 19, 2024 that will close this issue

Merge of master into master_june24 and channelid fixes/reimplementation #882

Merged

valassi mentioned this issue Jul 19, 2024

Merge of master into master_june24 and channelid fixes/reimplementation #882

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

master_june24: missing sanity checks that all channelids are the same in a "warp" (or at least inside each SIMD vector) #898

master_june24: missing sanity checks that all channelids are the same in a "warp" (or at least inside each SIMD vector) #898

valassi commented Jul 9, 2024

valassi commented Jul 9, 2024

valassi commented Jul 9, 2024

valassi commented Jul 9, 2024

valassi commented Jul 9, 2024

oliviermattelaer commented Jul 16, 2024

valassi commented Jul 19, 2024

master_june24: missing sanity checks that all channelids are the same in a "warp" (or at least inside each SIMD vector) #898

master_june24: missing sanity checks that all channelids are the same in a "warp" (or at least inside each SIMD vector) #898

Comments

valassi commented Jul 9, 2024

valassi commented Jul 9, 2024

valassi commented Jul 9, 2024

valassi commented Jul 9, 2024

valassi commented Jul 9, 2024

oliviermattelaer commented Jul 16, 2024

valassi commented Jul 19, 2024