# Studio 14: Finding Cosmic Ray Signals with RNNs

Adapted from https://deeplearningforphysicsresearchbook.github.io/deep-learning-physics/ by Javier Duarte (2023)

Further modifications by Julieta Gruszko (2025)

Large arrays of radio antennas can be used to measure cosmic rays by recording the electromagnetic radiation generated in the atmosphere.
These radio signals are strongly contaminated by galactic noise as well as signals from human origin. Since these signals appear to be similar to the background, the discovery of cosmic-ray events can be challenging.

The data file for this studio is a bit too large to host on Github. It's available on Canvas instead, go ahead and download it from there.

## Identification of signals
In this exercise, we design an RNN to classify if the recorded radio signals contain a cosmic-ray event or only noise.

The signal-to-noise ratio (SNR) of a measured trace $S(t)$ is defined as follows:

$SNR=\frac{|S(t)_{max}|}{RMS[S(t)]}$

where $|S(t)_{max}|$ denotes the maximum amplitude of the (true) signal.

Typical cosmic-ray observatories enable a precise reconstruction at an SNR of roughly 3.

We choose a challenging setup in this task and try to identify cosmic-ray events in signal traces with an SNR of 2.   



In [None]:
import numpy as np
import pandas as pd
from scipy import stats
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
from sklearn.utils import shuffle
from sklearn.preprocessing import StandardScaler

import matplotlib
import matplotlib.pyplot as plt

import torch as torch
import torch.nn as nn



In [None]:
import os
os.environ["KERAS_BACKEND"] = "torch"

Training RNNs can be computationally demanding, thus, we recommend to use a GPU for this task if possible.
### Depending on your computer, you should run just one of the two options below to switch to GPU, if available.

### For Macs:
If you have an Apple Silicon chip in your laptop (e.g. M1 or above), you can use the integrated GPU by running the code below. 

In [None]:
!xcode-select --install

In [None]:
if torch.backends.mps.is_available():
    device = torch.device("mps")
    print("Using Apple GPU with Metal backend.")
else:
    device = torch.device("cpu")
    print("Using CPU.")


### For Others:
If you have a non-Mac computer, you can check for an available GPU with by running this code:

In [None]:
if torch.cuda.is_available():
    print("GPU is available for PyTorch.")
    print(f"Number of GPUs available: {torch.cuda.device_count()}")
    print(f"Current GPU device name: {torch.cuda.get_device_name(0)}") # Get name of the first GPU
else:
    print("GPU is NOT available for PyTorch. Using CPU.")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


In [None]:
import keras

from keras import Input

from keras.models import Sequential #the model is built adding layers one after the other

from keras.layers import Dense #fully connected layers: every output talks to every input

from keras.layers import Activation

from keras.layers import SimpleRNN, LSTM, Bidirectional, Flatten, Dropout

### Load and prepare dataset
In this task, we use a simulation of cosmic-ray-induced air showers that are measured by radion antennas.  
For more information, see https://arxiv.org/abs/1901.04079.  
The task is to design an RNN which is able to identify if the measured signal traces (shortened to 500 time steps) contains a signal or not.

In [None]:
f = np.load("../Data/cosmic-radio-showers_with_signals.npz")

The file contains measured traces, labels, and for cosmic-ray events, the true signal. For non cosmic-ray events, the signal object is set to 0. The traces and signals are each 500 samples long.

In [None]:
f.files

## Plot example signal traces
Left: signal trace containing a cosmic-ray event. The underlying cosmic-ray signal is shown in red, the backgrounds + signal is shown in blue.
Right: background noise, with 0 underlying signal trace.

In [None]:
from matplotlib import pyplot as plt
fs = 180e6  # Sampling frequency of antenna setup 180 MHz
t = np.arange(500) / fs * 1e6
idx_sig = 54 #pick the n'th signal trace in the file
idx_noise = 17 #pick the n'th noise trace in the file

plt.figure(1, (12, 4))
plt.subplot(1, 2, 1)
plt.plot(t, np.real(f["traces"][np.where(f["labels"] == 1)][idx_sig]), linewidth = 1, color="b", label="Measured trace")
plt.plot(t, np.real(f["signals"][np.where(f["labels"] == 1)][idx_sig]), linewidth = 1, color="r", label="CR signal")
plt.ylabel('Amplitude / mV')
plt.xlabel('Time / $\mu$ s')
plt.legend()
plt.title("Cosmic-ray event")
plt.subplot(1, 2, 2)

plt.plot(t, np.real(f["traces"][np.where(f["labels"] == 0)][idx_noise]), linewidth = 1, color="b", label="Measured trace")
plt.plot(t, np.real(f["signals"][np.where(f["labels"] == 0)][idx_noise]), linewidth = 1, color="r", label="CR signal")

plt.ylabel('Amplitude / mV')
plt.xlabel('Time / $\mu$ s')
plt.legend()
plt.title("Noise event")

plt.grid(True)
plt.tight_layout()

In [None]:
np.shape(f['traces'])

In [None]:
print(np.shape(f['labels'][f['labels']==1])) # cosmic ray events
print(np.shape(f['labels'][f['labels']==0])) # noise events

### Questions:
- How many instances are in our learning set?
- Is the data set balanced or unbalanced?
- In this notebook, we'll try to identify whether a signal is present in each trace. When type of RNN structure do we need to use? E.g. sequence-to-sequence, vector-to-sequence, or sequence-to-vector? 
- What type of task are we performing (e.g. classification, regression, generation)? What type of activation function should our final layer use? What type of loss should we optimize?

As we did last week, we will use sk-learn's $\texttt{train\_test\_split}$ to split the data in to random selections with appropriate fractions. This time, instead of making a validation set ourselves, we'll use Keras's built-in validation split option. We'll just split off test data for now.

In [None]:
# split the samples into training and test data sets:
train_frac = 0.8
test_frac = 0.2

x_train, x_test, y_train, y_test = train_test_split(
    f["traces"], f['labels'], test_size=test_frac, random_state=42
)

### Scaling
Let's scale the data so that the train data traces have standard deviation of 1. We won't bother shifting the mean, since the data traces are already centered around 0. We could use sk-learn's Standard Scaler object to do this, but it's also easy enough to do it by hand. 

In [None]:
sigma = x_train.std()
x_train /= sigma
x_test /= sigma

In [None]:
x_train.std()

### Define RNN model
In the following, design a cosmic-ray model to identify cosmic-ray events using an RNN-based classifier.

We'll start with a Simple RNN, then try a bi-directional RNN, then finally a one-direction LSTM. 

By default, the output of an RNN layer contains a single vector per sample. This vector is the RNN cell output corresponding to the last timestep, containing information about the entire input sequence. The shape of this output is $\texttt{(batch\_size, units)}$ where $\texttt{units}$ corresponds to the units argument passed to the layer's constructor.

A RNN layer can also return the entire sequence of outputs for each sample (one vector per timestep per sample), if you set $\texttt{return\_sequences=True}$. The shape of this output is $\texttt{(batch\_size, timesteps, units)}$.

$\texttt{return\_sequences=True}$ is used when you want to return a sequence (e.g. one-to-many and many-to-many RNNs). When you're stacking RNN layers, as we do below, each RNN layer is expecting a sequence as an input, so you'll want to keep it set to $\texttt{True}$. In the final RNN layer for a many-to-one RNN, we can choose: either we can either use an RNN layer as the last step, setting it to $\texttt{False}$ to return a single value, or we can output the sequence from our last RNN layer, sending it to a fully connected layer with a single output for the last step.


## RNN 1: Simple RNN

This model is probably a bit too shallow to perform really well, but we'll keep it simple to reduce the training time to something that can be done in class. The commented-out layer can be reintroduced if you want to try something deeper later on.

In [None]:
model_rnn = Sequential()
model_rnn.add(keras.Input(shape=(500, 1)))
model_rnn.add(SimpleRNN(32, return_sequences=True, recurrent_initializer="glorot_uniform"))
#model_rnn.add(SimpleRNN(64, return_sequences=True, recurrent_initializer="glorot_uniform"))
model_rnn.add(SimpleRNN(10, return_sequences=True, recurrent_initializer="glorot_uniform"))
model_rnn.add(Flatten())
model_rnn.add(Dropout(0.3))
model_rnn.add(Dense(1, activation="sigmoid"))

model_rnn.summary()

### Questions:
- How many hidden layers does this model have?
- How many free parameters?

### Compile the model

We'll use AdamW, our old standby optimizer. This time we'll set the $\texttt{weight\_decay}$ option to a smaller value than the default; this parameter controls how strong the regularization is. 

In [None]:
model_rnn.compile(
    loss='binary_crossentropy',
    optimizer=keras.optimizers.AdamW(1e-3, weight_decay=0.00008),
    metrics=['accuracy'])

### Fitting the model on GPU

To actually run the training on GPU, we'll need to explicitly move the data over to the GPU. Once the data is on GPU, Keras automatically handles sending the model to GPU for training. To move the data over, we first need to switch it from a numpy array to a torch tensor. 

If you ran the right cells above, $\texttt{device}$ should be set correctly for your computer (CPU if you don't have a GPU, or the appropriate GPU options for Macs or non-Macs). 

In [None]:
x_train = torch.from_numpy(x_train)
x_train = x_train.to(device)

y_train = torch.from_numpy(y_train)
y_train = y_train.to(device)

x_test = torch.from_numpy(x_test)
x_test = x_test.to(device)

y_test = torch.from_numpy(y_test)
y_test = y_test.to(device)

For the fit, we're going to use a couple of new options that should help the optimization process avoid some of the pitfalls we saw last week:
- Reduce the learning rate when the loss stops improving, so your optimizer takes smaller steps when it's getting close to the minimum.
- Apply early stopping, so training stops if the loss doesn't improve after some number of generations.

Even using a GPU, training RNN's is very slow: they need to process samples one at a time. On my laptop (using an M2 GPU) this model took about 26 seconds per epoch to train; about 13 minutes in total. 

30 epochs is probably a bit short, but it should be close enough while keeping to our time constraints.

In [None]:
results_rnn = model_rnn.fit(x_train[...,np.newaxis], y_train,
                    batch_size=1024,
                    epochs=30,
                    verbose=1,
                    validation_split=0.1,
                    callbacks = [keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=5, min_delta=.0005, verbose=1, min_lr=1e-5),
                                 keras.callbacks.EarlyStopping(patience=10, min_delta=.0001, verbose=1)]
                    )

### Questions:
- Describe the conditions that will lead the learning rate to be reduced, given our settings above.
- Did the automated loss rate reduction we applied kick in during training? If yes, how many times was a reduction in loss rate applied?
- Describe the conditions that will lead to early stopping, given our settings above.
- Did early stopping kick in during training?

Now that the model is trained, let's see the results on the test data. This will also take a little time, since evalaution in RNN's also happens sequentially. On my laptop, it took 2 minutes.

In [None]:
model_rnn.evaluate(x_test[...,np.newaxis], y_test)

### Plot loss and accuracy

In [None]:
plt.figure(1, (12, 4))
plt.subplot(1, 2, 1)
plt.plot(results_rnn.history['loss'])
plt.plot(results_rnn.history['val_loss'])
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper right')

plt.subplot(1, 2, 2)
plt.plot(results_rnn.history['accuracy'])
plt.plot(results_rnn.history['val_accuracy'])
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
plt.tight_layout()

### Questions: 
- Did the model converge?
- Does the model have high bias, high variance, both, or neither?
- How well does our model perform on the test data?
- What would you suggest we try to improve the model performance?

# These two networks take longer to train. I suggest each group member train one of them in studio, and then you can compare your results. 
If you have extra time, you can go back and train the one you skipped.


## RNN 2: Bidirectional RNN

In [None]:
model_birnn = Sequential()
model_birnn.add(keras.Input(shape=(500, 1)))
model_birnn.add(Bidirectional(SimpleRNN(32, return_sequences=True, recurrent_initializer="glorot_uniform")))
#model_rnn.add(Bidirectional(SimpleRNN(64, return_sequences=True, recurrent_initializer="glorot_uniform")))
model_birnn.add(Bidirectional(SimpleRNN(10, return_sequences=True, recurrent_initializer="glorot_uniform")))
model_birnn.add(Flatten())
model_birnn.add(Dropout(0.3))
model_birnn.add(Dense(1, activation="sigmoid"))

model_birnn.summary()

### Questions:
- How many hidden layers does this model have?
- How many free parameters?
- Why does this model have more parameters than our first RNN model?

### Compile and fit the model

With more parameters, this model takes longer to train! We're training 4 RNNs here, not 2: for each RNN layer, we train one that moves forward through the sequence and one that moves backward through the sequence.

For me, this network took ~1m per epoch to train, so about 30 mins in total.

Feel free to have an project group check-in discussion while your networks train.


In [None]:
model_birnn.compile(
    loss='binary_crossentropy',
    optimizer=keras.optimizers.AdamW(1e-3, weight_decay=0.00008),
    metrics=['accuracy'])

In [None]:
results_birnn = model_birnn.fit(x_train[...,np.newaxis], y_train,
                    batch_size=1024,
                    epochs=30,
                    verbose=1,
                    validation_split=0.1,
                    callbacks = [keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=5, min_delta=.0005, verbose=1, min_lr=1e-5),
                                 keras.callbacks.EarlyStopping(patience=10, min_delta=.0001, verbose=1)]
                    )

### Questions:
- Did the automated loss rate reduction we applied kick in during training? If yes, how many times was a reduction in loss rate applied?
- Did early stopping kick in during training?

Now that the model is trained, let's see the results on the test data. This will also take a little time, since evalaution in RNN's also happens sequentially. On my laptop, it took 2 minutes.

In [None]:
model_birnn.evaluate(x_test[...,np.newaxis], y_test)

### Plot loss and accuracy

In [None]:
plt.figure(1, (12, 4))
plt.subplot(1, 2, 1)
plt.plot(results_birnn.history['loss'])
plt.plot(results_birnn.history['val_loss'])
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper right')

plt.subplot(1, 2, 2)
plt.plot(results_birnn.history['accuracy'])
plt.plot(results_birnn.history['val_accuracy'])
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
plt.tight_layout()

### Questions: 
- Did the model converge?
- Does the model have high bias, high variance, both, or neither?
- How well does our model perform on the test data?
- What would you suggest we try to improve the model performance?


## RNN 3: LSTM

This time, we'll try a relatively shallow uni-directional LSTM for comparison.

In [None]:
model_lstm = Sequential()
model_lstm.add(keras.Input(shape=(500, 1)))
model_lstm.add(LSTM(32, return_sequences=True, recurrent_initializer="glorot_uniform"))
#model_lstm.add(LSTM(64, return_sequences=True, recurrent_initializer="glorot_uniform"))
model_lstm.add(LSTM(10, return_sequences=True, recurrent_initializer="glorot_uniform"))
model_lstm.add(Flatten())
model_lstm.add(Dropout(0.3))
model_lstm.add(Dense(1, activation="sigmoid"))

model_lstm.summary()

### Questions:
- How many hidden layers does this model have?
- How many free parameters?

#### LSTM Training

In [None]:
model_lstm.compile(
    loss='binary_crossentropy',
    optimizer=keras.optimizers.AdamW(1e-3, weight_decay=0.00008),
    metrics=['accuracy'])

More parameters mean longer training times: this one took ~1m per epoch on my laptop, about 28 minutes in total.

Feel free to have an project group check-in discussion while your networks train.

In [None]:
results_lstm = model_lstm.fit(x_train[...,np.newaxis], y_train,
                    batch_size=1024,
                    epochs=30,
                    verbose=1,
                    validation_split=0.1,
                    callbacks = [keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=5, min_delta=.0005, verbose=1, min_lr=1e-5),
                                 keras.callbacks.EarlyStopping(patience=10, min_delta=.0001, verbose=1)]
                    )

### Questions:
- Did the automated loss rate reduction we applied kick in during training? If yes, how many times was a reduction in loss rate applied?
- Did early stopping kick in during training?

In [None]:
model_lstm.evaluate(x_test[...,np.newaxis], y_test)

### Plot loss and accuracy

In [None]:
plt.figure(1, (12, 4))
plt.subplot(1, 2, 1)
plt.plot(results_lstm.history['loss'])
plt.plot(results_lstm.history['val_loss'])
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper right')

plt.subplot(1, 2, 2)
plt.plot(results_lstm.history['accuracy'])
plt.plot(results_lstm.history['val_accuracy'])
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
plt.tight_layout()

### Questions: 
- Did the model converge?
- Does the model have high bias, high variance, both, or neither?
- How well does our model perform on the test data?
- What would you suggest we try to improve the model performance?

### Conclusion Questions:
- Based on the 3 networks, does adding more information about long-term correlations in the signal improve performance? Which networks should you compare to evaluate this? Connect your discussion to the cosmic ray signal shape you observed during the initial data exploration.
- Based on the 3 networks, does adding more information about backwards-running correlations in the signal improve performance? Which networks should you compare to evaluate this? Connect your discussion to the cosmic ray signal shape you observed during the initial data exploration.
- Can we conclude that for this task, LSTMs and bi-directionality won't ever help? Or are there other changes we should make before re-evaluating those questions? If so, describe the changes you'd suggest.

### Acknowledgement Statement:

After that, you're done with studio for today! Upload your notebook to Gradescope, making sure to merge the written responses for all 3 RNNs into a single notebook, even if all 3 weren't trained in that notebook. 