# Parameterizing the LHCb RICH system using the pidgan's `GAN` algorithm

**Author:** [mbarbetti](https://github.com/mbarbetti)

**Date created:** 12/10/2023

**Last modified:** 12/10/2023

**Description:** This tutorial demonstrates how to parameterize the high-level response of the LHCb RICH system using a Generative Adversarial Network (GAN) [[1](https://arxiv.org/abs/1406.2661)]. The code is written using the [pidgan](https://github.com/mbarbetti/pidgan) package that relies on TensorFlow and Keras as backends.

## Introduction

### What is LHCb?

The [**LHCb experiment**](https://lhcb-outreach.web.cern.ch) has been originally designed to study rare decays of particles containing $b$ and $c$ quarks produced at the Large Hadron Collider (LHC). The LHCb detector, shown in the background of the following photo, is a single-arm forward spectrometer covering the pseudorapidity range of $2 < \eta < 5$. The detector includes:

- **Tracking system** - used for high-precision measurements of the momentum of charged particles and the position of the primary vertices
- **Particle Identification (PID) system** - used to distiguish different species of traversing particles (i.e., muons, pions, kaons, protons)

The LHCb PID system counts two ring-imaging Cherenkov (RICH) detectors whose response allows to separate different types of charged hadrons (e.g., pions, kaons, protons) using the [Cherenkov radiation](https://en.wikipedia.org/wiki/Cherenkov_radiation) of the traversing particles.

<div align="center">
  <img src="https://raw.githubusercontent.com/mbarbetti/pidgan-notebooks/main/.github/images/lhcb.jpeg" width="800"/>
</div>

### What are GANs?

Generative Adversarial Networks [[1](https://arxiv.org/abs/1406.2661)] are a powerful class of _generative models_ based on the simultaneous training of two neural networks:

*  **Discriminator network** ($D$) - trained by a classification task to separate the generator output from the reference dataset
* **Generator network** ($G$) - trained by a simulation task to reproduce the reference dataset trying to fake the discriminator

The goal is that $D$ optimally discriminates on the origin of the two samples, and simultaneously the training procedure for $G$ is to maximize the _probability_ of $D$ making a mistake. This framework corresponds to a **minimax two-player game** [[1](https://arxiv.org/abs/1406.2661)].

<div align="center">
  <img src="https://raw.githubusercontent.com/mbarbetti/pidgan-notebooks/main/.github/images/gan-scheme.png" width="800"/>
</div>

#### Mathematical details

The generator $G(z)$, fed by elements $z$ sampled according to a known distribution $p_z$ (typically gaussian), maps the **latent space** $\mathcal{Z}$ to the reference dataset $\mathcal{X}$, inducing a distribution $p_\rm{gen}$ trained to match with the target distribution $p_\rm{ref}$. The discriminator $D(x)$ outputs a single scalar, readable as the **probability** that $x$ comes from the reference dataset rather than $G$. Hence, the optimization problem corresponds to train $D$ to maximize the probability of correct labelling, and simultaneously training $G$ to minimize $\log(1 - D(G(z)))$.

Defining the **loss function** $\mathcal{L}_{\rm{GAN}}$ as follows

<center>$\mathcal{L}_{\rm{GAN}} (\theta_d, \theta_g) = \mathbb{E}_{x \sim p_\rm{ref}} \left[ \log{D_{\theta_d}(x)} \right] + \mathbb{E}_{z \sim p_\rm{z}} \left[ \log(1 - D_{\theta_d}(G_{\theta_g}(z))) \right]$</center>

the _minimax game_ can be written in this form:

<center>$\displaystyle{\min_G \, \max_D \, \mathcal{L}_{\rm{GAN}} (\theta_d, \theta_g)}$</center>

A unique solution exists, with $G$ recovering the reference distribution $p_\rm{ref}$ and $D$ equal to 1/2 everywhere [[1](https://arxiv.org/abs/1406.2661)].

Traditional GAN systems suffer from many issues, particularly during the training phase:

* the generator _may collapse_ producing only a single sample or a small family of very similar samples (**mode collapse**)
* the two players _may oscillate_ during training rather than converging to the **Nash equilibrium**
* if _imbalance_ between the two players occurs, then the system is incapable of learning at all

All these drawbacks result from the **vanishing gradient** problem, namely the lack of information for the update of the $G$ parameters. This is due to the saturation of the $D$ that is so good in distinguishing the origin of the two samples that no errors remain to the $G$ to improve the generated space. To fix the problem, one can add _continuous noise_ to both $D$ and $G$. This trick allows to learn thanks to a non-zero gradient [[2](https://arxiv.org/abs/1701.04862)].

#### Using input conditions

Feeding the generator with additional information besides the latent space sample allows to **conditionate** its output. In particular, it's sufficient to concatenate the conditional features $x$ to the random noise $z$ passed as input to $G$ to make it able to take into account this information: $y_{\rm{gen}}(x) = G(x, z)$. Obviously, to preserve the capability to learn through the minimax game, also the discriminator must be fed by the additional features, simply concatenating them to either elements of the reference sample or of the generated space: $D(x, y)$ with $y \in \{y_{\rm{ref}}, y_{\rm{gen}}\}$.

## Tutorial

### Installation and initial setup

The first step is to install the [pidgan](https://github.com/mbarbetti/pidgan) package and some other ones that are typically needed in machine learning applications to High Energy Physics (i.e., uproot, scipy, scikit-learn, matplotlib).

In [None]:
%%capture
!pip install pidgan[lamarr]

Now, let's verify the correct installation of pidgan printing its version:

In [None]:
import pidgan

pidgan.__version__

In case you're running this Notebook on a machine equipped with a GPU, let's also verify the correct installation of TensorFlow:

In [None]:
import tensorflow as tf

tf.config.list_physical_devices("GPU")  # outputs a non-empty list in case of GPU equipped

### Data loading

### Data preprocessing

### Model definition

### Training procedure

### Validation plots

## References

1. I.J. Goodfellow _et al._, "Generative Adversarial Networks", [arXiv:1406.2661](https://arxiv.org/abs/1406.2661)
2. A. Radford, L. Metz, S. Chintala, "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks", [arXiv:1511.06434](https://arxiv.org/abs/1511.06434)
3. X. Mao _et al._, "Least Squares Generative Adversarial Networks", [arXiv:1611.04076](https://arxiv.org/abs/1611.04076)
4. M. Arjovsky, S. Chintala, L. Bottou, "Wasserstein GAN", [arXiv:1701.07875](https://arxiv.org/abs/1701.07875)
5. I. Gulrajani _et al._, "Improved Training of Wasserstein GANs", [arXiv:1704.00028](https://arxiv.org/abs/1704.00028)
6. M.G. Bellemare _et al._, "The Cramer Distance as a Solution to Biased Wasserstein Gradients", [arXiv:1705.10743](https://arxiv.org/abs/1705.10743)
7. D. Terjék, "Adversarial Lipschitz Regularization", [arXiv:1907.05681](https://arxiv.org/abs/1907.05681)
8. M. Arjovsky, L. Bottou, "Towards Principled Methods for Training Generative Adversarial Networks", [arXiv:1701.04862](https://arxiv.org/abs/1701.04862)
9. T. Salimans _et al._, "Improved Techniques for Training GANs", [arXiv:1606.03498](https://arxiv.org/abs/1606.03498)
10. M. Mirza, S. Osindero, "Conditional Generative Adversarial Nets", [arXiv:1411.1784](https://arxiv.org/abs/1411.1784)
11. A. Rogachev, F. Ratnikov, "GAN with an Auxiliary Regressor for the Fast Simulation of the Electromagnetic Calorimeter Response", [arXiv:2207.06329](https://arxiv.org/abs/2207.06329)

## Credits
Most of the GAN algorithms are an evolution of what provided by the [mbarbetti/tf-gen-models](https://github.com/mbarbetti/tf-gen-models) repository. The `BceGAN` model is freely inspired by the TensorFlow tutorial [Deep Convolutional Generative Adversarial Network](https://www.tensorflow.org/tutorials/generative/dcgan) and the Keras tutorial [Conditional GAN](https://keras.io/examples/generative/conditional_gan). The `WGAN_ALP` model is an adaptation of what provided by the [dterjek/adversarial_lipschitz_regularization](https://github.com/dterjek/adversarial_lipschitz_regularization) repository.