# TODO

List of some todos for the paper...

## Python

### Accelerated Basic Algorithm

1. [x] Benchmark properly current version (`e8b51ef01e034f600a0e5b91174abe65a0ce5127`)
    - `numba1` results
2. [x] Adapt to recycle slots in the numpy jet arrays
    1. Add a jet_index array, that maps a numpy slot to the Python PseudoJets structure, so that indexes no longer have to be in sync
    2. Reduce the size of the numpy arrays as these will now not need spare slots at the end for merged jets
        - could even experiment with repacking to ensure always dense packing of the numpy arrays

This turned out to have far less impact that I was expecting - 10-15% reduction in runtime only
- `numba2` results

Profiling picked up that the `deepcopy` of the initial particles was having a very large impact (like 30% of runtime!). I adapted the code to avoid doing this in the timed loop
- `numba3` results

Realised that the `BasicJetInfo` class was not not needed - all of this state is tracked via the numpy arrays now. Removed that and got another speed up (~15%)
- `numba4` results

Aside - at this point I also commented out the `@njit` statements, just to understand what difference that would make. Looks like `numba` jitting is gaining around 40-50%.

Commenting out `add_step_to_history` code gives another speed-up of about 15%, so actually the Python code that's left is now a significant drag on overall performance! But need to reimplement something else to get around that, as this actually stores the results.
- Done (branch `history-numba`)
- In fact speed up is very small ~2%!

Discovered that removing all setter/getter code from `PseudoJet` is a but faster, so did that too


### Accelerated Tiled Algorithm

Need to have a think about to do this, but probably it needs to be a numpy array per-tile. This will require quite a lot more book-keeping, both to store the correct tiled structures and to keep track of the global state.

## Julia

### Philippe's Implementation

Track down the bug that is causing differences with FastJet for a few of the sample events
- event 32
- event 53

Interestingly, seeing minor differences in the numerical values of the antikt distance metric, e.g., for event 32:

```
py: Iteration 13: 0.0015722253198696117 for jet 122 and jet 181
jl: Iteration 13: 0.0015722253198696043 for jet 122 and jet 181
```

Then get a major difference here:

```
py: Iteration 88: 0.0131507280901848 for jet 322 and jet 323
jl: Iteration 88: 0.012617123337897836 for jet 683 and jet -1
```

This makes me suspect there is a bug in the Julia metric calculation!

### Atell's Implementation

Get this working with our HepMC input file, so that we have performance numbers for a basic Julia version

Use branch `graeme-chep` running a new `main()` from `chep.jl`

- Reads events from HepMC3
    - Using Philippe's hepmc3jl wrapper package
        - N.B. to get this to work you have to start the Julia package manager and do `(JetReconstruction) pkg> develop ./hepmc3jl`
- Events are read into PseudoJet objects, then converted to the Vector{Vector{Float64}} used by Atell's code
- Final results are recorded as FinalJet objects (rap, phi, pt)
- Added timing code that can wrap multiple runs and time the code properly
- Added dump option to dump final jets as JSON
    - This was used to check the results: Atell's code agrees with FastJet and all Python implenentations, confirming there is a small bug in Philippe's code somewhere

Results:

`julia --project=. ./chep.jl --maxevents=100 --nsamples=100 --gcoff ./test/data/events.hepmc3`

- Windows: Time per event 840.6328400000003 ± 101.1319694550256 μs
- WSL Ubuntu: Time per event 949.3078376000002 ± 166.92048348307372 μs

N.B. This is the only code that actually runs faster on Windows; the jitter on WSL is higher that makes the GC still suspect?
