# Calibrate Electrons on a GPU With CUDA

The second exercise shows some good practices for how one can offload complex
data structures to a GPU in an optimal way.

## Build the Project

Building the project works exactly the same as in the first exercise. If you
already went through that, your [scripts/build_env.sh](scripts/build_env.sh)
should already be set up correctly, so you should be able to execute.
  - On Perlmutter:

In [None]:
!./scripts/run_on_perlmutter.sh cmake -DCMAKE_BUILD_TYPE=Debug -S . -B build
!./scripts/run_on_perlmutter.sh cmake --build build

  - On SWAN:

In [None]:
!./scripts/run_on_swan.sh cmake -DCMAKE_BUILD_TYPE=Debug -S . -B build
!./scripts/run_on_swan.sh cmake --build build

## Code Structure

The code is set up slightly more realistically in this second example.
  - The algorithm's C\+\+ code is not exposed to the CUDA compiler, but rather
    kept in pure C\+\+ source files.
    ([ElectronCalibCUDAAlg.h](CUDAExamples/src/02_xAODCalib/ElectronCalibCUDAAlg.h),
    [ElectronCalibCUDAAlg.cxx](CUDAExamples/src/02_xAODCalib/ElectronCalibCUDAAlg.cxx))
    While at the time of writing we use the C\+\+20 standard both when building
    C\+\+ and CUDA code, the two may not always use the same standard. So it is
    good practice to limit the amount of Athena code that the `nvcc` compiler
    would be exposed to.
  - The device code is compiled using a "standalone" C\+\+ function.
    ([calibrateElectrons.h](CUDAExamples/src/02_xAODCalib/calibrateElectrons.h),
    [calibrateElectrons.cu](CUDAExamples/src/02_xAODCalib/calibrateElectrons.cu))
    It is set up to make use of `"GaudiKernel/StatusCode.h"`, but it could
    communicate errors back to the algorithm in a simpler way as well.
  - To represent the data coming from
    [xAOD::Electron](https://atlas-sw-doxygen.web.cern.ch/atlas-sw-doxygen/atlas_main--Doxygen/docs/html/d3/da7/classxAOD_1_1Electron__v1.html)
    objects, the example makes use of the
    [SoA helper code](https://acts-project.github.io/vecmem/namespacevecmem_1_1edm.html)
    of [VecMem](https://acts-project.github.io/vecmem/).
    [ElectronDeviceContainer.h](CUDAExamples/src/02_xAODCalib/ElectronDeviceContainer.h)
    Along with some of the memory management features provided by that library.

## Example Job

The same as the first exercise, this one also comes with a CA configuration
that you can try with:
  - On Perlmutter:

In [None]:
!./scripts/run_on_perlmutter.sh ./build/CMakeFiles/atlas_build_run.sh athena.py --CA CUDAExamples/02_xAODCalibConfig.py

  - On SWAN:

In [None]:
!./scripts/run_on_swan.sh ./build/CMakeFiles/atlas_build_run.sh athena.py --CA CUDAExamples/02_xAODCalibConfig.py

Much like the first example, this one is broken out of the box as well.

## Tasks

### 1. Make The Job Work

The example job should be stopping a couple of events into execution, with:

```text
...
AthenaEventLoopMgr                                   INFO   ===>>>  done processing event #2201575275, run #431906 5 events processed so far  <<<===
AthenaEventLoopMgr                                   INFO   ===>>>  start processing event #2201577975, run #431906 5 events processed so far  <<<===
GPUTutorial::ElectronCalibCUDAAlg                   FATAL Standard std::exception is caught in sysExecute
GPUTutorial::ElectronCalibCUDAAlg                   ERROR SG::ExcBadAuxVar: Attempt to retrieve nonexistent aux data item `::eta' (212).
GPUTutorial::ElectronCalibCUDAAlg                   ERROR Maximum number of errors ( 'ErrorMax':1) reached.
AthAlgSeq                                            INFO execute of [GPUTutorial::ElectronCalibCUDAAlg] did NOT succeed
AthAlgSeq                                           ERROR Maximum number of errors ( 'ErrorMax':1) reached.
...
```

The reason for this failure is something that is a very typical mistake when
writing GPU code. Even if the manifestation of the error is not quite the same
as how it would usually show up.

Run the job in a debugger to find the reason for the surprising exception.
Specifically keep an eye on container sizes event by event to see what's
different about the problematic one. As a hint, it's best to first set up
a breakpoint on `GPUTutorial::ElectronCalibCUDAAlg::execute`, and when the
program first stops, set up a catchpoint on all thrown exceptions. (During the
initialization of the job a number of exceptions are thrown. Which are best
avoided like this.)

### 2. Make The Job Thread Safe

Once you worked around the issue causing the previous failure, the code is
technically not thread safe. Though it may be hard to test this during the
tutorial. Still, try to run with many parallel CPU threads, and see if you can
make the code crash. (You may not be able to...)
  - On Perlmutter:

In [None]:
!./scripts/run_on_perlmutter.sh ./build/CMakeFiles/atlas_build_run.sh athena.py --threads=10 --CA CUDAExamples/02_xAODCalibConfig.py

  - On SWAN:

In [None]:
!./scripts/run_on_swan.sh ./build/CMakeFiles/atlas_build_run.sh athena.py --threads=10 --CA CUDAExamples/02_xAODCalibConfig.py

Since this may or may not present itself, check which memory resources get used
in the example algorithm. And remove the use of any thread-unsafe objects, if
you find any being used.

### 3. Make The Job Do Something Useful

The meat of the exercise is to try to do something useful inside of
`GPUTutorial::Kernels::calibrateElectrons`. Update the code to:
  - Send additional electron variables to the GPU beside "eta" and "phi";
  - Have the kernel perform some modification on the electron momentum, using
    the properties of the electron. Mimicking a sort of calibration.
  - Try to write a helper function that would be used by the kernel for this
    "calibration". See how you need to define/implement that helper function to
    make it usable from both host and device code.