# PyRIID Primer

Presented by **Tyler Morrow**

November 2022

## Introduction

### What is PyRIID?

![pyriid](./images/pyriid_logo.png)

PyRIID (pronounced: PIE-rid) stands for Python-based Radioisotope IDentification (RIID).

PyRIID is a Python package containing utilities for making radioisotope identification, and related problems easier to study with an emphasis on machine learning approaches.

PyRIID is Open Source and freely available to anyone.

### What can I do with PyRIID?

1. Synthesis gamma spectra
2. Fit models
3. Visualize results
4. Save and load pretty much everything

### Why "machine learning" (ML)?

We use ML to fit models that represent relationships between some specific spaced of gamma spectra and *some output*, typically isotopes.
Inference with the resulting models is *fast*, and the data generation processes necessary to train the model are also convenient for exploring performance.

## The `SampleSet`

We start our tour of PyRIID by answering the question, "What is a SampleSet?"

The `SampleSet` is the primary data structure in which we store spectral data and related information.

So much of PyRIID is built around the idea of passing around `SampleSet` objects.

- Spectra and related info are stored in `SampleSet`s together.
- Normalizing spectra?  There are multiple `SampleSet` methods for this.
- Mixing spectra? A `SampleSet` goes into the `SeedMixer` and you get a `SampleSet` out.
- Want perform inference with a model? Pass its `predict()` method a `SampleSet`.
- Where are my model's predictions?  They are in `prediction_probas` property of the `SampleSet` you just gave it.
- Visualization functions?  Almost all take one or more `SampleSet`s as input.

`SampleSet`s can be saved as either HDF files or PCF's, and read in from either (some restrictions apply in the case of PCF's).

**Terminology Disclaimer:**

Whenever you see the term "foreground" (often shortened to `fg`), you should think of a spectrum containing only source information (sometimes you will see the term "source" used interchangeably with "foreground").
Likewise, "background" (often shortened to `bg`) is a spectrum containing counts exclusively from background sources.
In this way, the "foreground" source is the novel, anomalous presence in our detector's view of its environment.

    gross = fg + bg

## Seed Synthesis

**Seed** Synthesis is where we obtain the pure spectra (think templates) we will use to *seed* other synthesizers.

Here's the full process:

![full-process](./images/full_process.png)

### Basic seed synthesis

The API for using PyRIID's Seed Synthesis essentially asks you to write a specification for performing one or more injects in terms of the various parameters made available via the GADRAS API.

Your seed specification is your model's first set of assumptions.

In [None]:
"""Synthesizing seeds"""
from riid.data.synthetic.seed import SeedSynthesizer

seed_syn = SeedSynthesizer()
fg_seeds_ss, bg_seeds_ss = seed_syn.generate("./spec_nai_basic.yaml")
# The YAML file defining the seed synthesis specification is ultimately parsed into a dictionary.
# You can also load it yourself and pass in the dictionary instead - this is useful for varying detector parameters!

In [None]:
"""Inspecting SampleSets"""
fg_seeds_ss
# fg_seeds_ss.spectra
# fg_seeds_ss.sources
# fg_seeds_ss.info
# fg_seeds_ss.prediction_probas

In [None]:
"""Plotting spectra"""
from riid.visualize import plot_spectra
import matplotlib.pyplot as plt


# Plot foreground(s)
am241_only_ss = fg_seeds_ss[fg_seeds_ss.get_labels() == "Am241"]
fig, ax = plot_spectra(
    am241_only_ss, 
    in_energy=True, 
    figsize=(9.6, 4.8),
    ylim=(1e-10, None),
    xlim=(0, 1000),
    target_level="Seed", 
    show=False
)
ax.axvline(59, color="green", label="59 KeV")
ax.legend()
plt.show()

# Plot background(s)
_ = plot_spectra(bg_seeds_ss, in_energy=True, ylim=(1e-5, None))
bg_seeds_ss.sources

In [None]:
"""Combining SampleSets"""
from riid.data import SampleSet

all_seeds_ss = SampleSet()
all_seeds_ss.concat([fg_seeds_ss, bg_seeds_ss])
all_seeds_ss.sources

In [None]:
"""Saving and loading SampleSets"""
fg_seeds_ss.to_hdf("./two_fgs.h5")
bg_seeds_ss.to_hdf("./one_bg.h5")

fg_seeds_ss.to_pcf("./two_fgs.pcf")
bg_seeds_ss.to_pcf("./one_bg.pcf")

### Aside: Seed Mixing

In some cases, it makes sense to "identify" a spectrum in terms of a dominant or high priority radioisotope - multiclass classification.
In other cases, perhaps we would like to identify a spectrum as merely containing one or more radioisotopes - multiclass classifcation.
And going one step beyond, perhaps there is a desire to actually look at and utilize the radioisotope proportions predicted my a model - label proportion estimation.

In either of the latter two cases, PyRIID's `SeedMixer` is useful as a brute-force approach to randomly combine the seeds you give it, with care given to not combine seeds falling within the same isotope.

In [None]:
"""Seed mixing"""
from riid.data.synthetic.static import get_dummy_sampleset
from riid.data.synthetic.seed import SeedSynthesizer, SeedMixer

fg_seeds_ss, _ = SeedSynthesizer().generate("./spec_nai_basic.yaml")

seed_mixer = SeedMixer(
    mixture_size=2,
    min_source_contribution=0.1
)

mixed_fg_seeds_ss = seed_mixer.generate(fg_seeds_ss, n_samples=100)
mixed_fg_seeds_ss.sources

## Static Synthesis

**Static** Synthesis is where we take our seeds and generate noisy spectra which vary across SNR and live time.

The configuration used for Static Synthesis represents yet another set of assumptions one must make.

In [None]:
"""Seed synthesis"""
from riid.data.synthetic.seed import SeedSynthesizer
from riid.data.synthetic.static import StaticSynthesizer

seed_syn = SeedSynthesizer()
fg_seeds_ss, bg_seeds_ss = seed_syn.generate("./spec_nai_many_fgs_one_bg.yaml")

In [None]:
"""Static Synthesis"""
static_syn = StaticSynthesizer(
    samples_per_seed=250,
    background_cps=300,
    live_time_function="uniform",
    live_time_function_args=(0.125, 10),
    snr_function="log10",
    snr_function_args=(0.1, 100),
    apply_poisson_noise=True,
    balance_level="Seed"
)
fg_ss, bg_ss, gross_ss = static_syn.generate(fg_seeds_ss, bg_seeds_ss)

In [None]:
"""Normalization"""
gross_ss.normalize()
bg_ss.normalize()

## Model Training

Now that we have data, we can start making models.

![gross_bg_model](./images/gross%2Bbg.png)

In [None]:
"""Model fitting"""
from riid.models.neural_nets import MLPClassifier
from riid.models.metrics import single_f1

model = MLPClassifier(
    hidden_layers=(512,),
    learning_rate=3e-4,
    metrics=[single_f1]
)

history = model.fit(gross_ss, bg_ss, target_level="Isotope", verbose=True)

In [None]:
"""Model learning curve"""
from riid.visualize import plot_learning_curve

_ = plot_learning_curve(history.history["loss"], history.history["val_loss"])

In [None]:
"""Generate some in-distribution data the model has not seen."""
static_syn.samples_per_seed = 50
_, test_bg_ss, test_gross_ss = static_syn.generate(fg_seeds_ss, bg_seeds_ss)
test_bg_ss.normalize()
test_gross_ss.normalize()

In [None]:
"""Use the model!"""
model.predict(test_gross_ss, test_bg_ss)  # Saved in your SampleSet containing non-background sources (the gross spectra)

In [None]:
"""Calculate performance metric"""
from sklearn.metrics import f1_score

labels = test_gross_ss.get_labels()
predictions = test_gross_ss.get_predictions()
f1_score(labels, predictions, average="micro")

In [None]:
"""Confusion Matrix"""
from riid.visualize import confusion_matrix

_ = confusion_matrix(test_gross_ss)

In [None]:
"""SNR vs. Model Score"""
from riid.visualize import plot_snr_vs_score

_ = plot_snr_vs_score(test_gross_ss)

In [None]:
"""Save model"""
model.save("./nai_many_fgs_one_bg_model.h5")

### Aside: Moving detector

The models and data we've seen today are intended for scenarios where the detector is static.

To utilize for scenarios where the detector is moving, one could swap anomaly detection and identification, leading to a need for a new type of anomaly detection.
This approach also necessitates, at minimum, an expansion of the training regimen to cover various background environments.

![edge-detection](./images/edge_detection.png)

## Conclusions

Many challenges and pitfalls exist:

- Deciding on configurations for seed synthesis, seed mixing, and static synthesis, i.e., defining your problem
- The typical normalization and hyperparameter choices, however these generally do not seem to require much investigation
- Deciding how to compare models trained to different problems (hint: don't)
- With synthetic data, performance can be what the creators wants it to be.
- Understanding the difference between your training problem space and deployment problem space
- Being able to describe your problem space(s) properly
    - "We don't care about all possible worlds, only the one we live in. If we know something about the world and incorporate it into our learner, it now has an advantage over random guessing." - The Master Algorithm, 2018
    - "This is of considerable theoretical interest but, I think, of limited practical value, because the space of all possible problems likely includes many extremely unusual and pathological problems which are rarely if ever seen in practice." - Essentials of Metaheuristics, 2011
- Varying DRF params and mixing seeds *quickly* blows up your problem, and subsuquently trying to model `spectra -> radioisotope` may be problematic without additional knowledge - no free lunch (Wolpert and McCready, 1997).
    - "In the meantime, the practical consequence of the 'no free lunch' theorem is that there's no such thing as learning without knowledge. Data alone is not enough." - The Master Algorithm, 2018
- **Build trust in this technology, like you do any other technology, by spending time with it and evaluating it.**

### Topics We Didn't Get To

- Demonstration of large-scale DRF parameter variation - it takes a long time :(
- Models that take other models as input
- Multi-isotope
- Alternative normalization methods
- Alternative visualization methods

### Upcoming PyRIID Changes

1. Version 2.0.0 published to PyPI
1. Expanded seed synthesis
1. Our first multi-isotope model developed by Alan Van Omen in collaboration with University of Michigan for his thesis.

and more...

![multi-isotope approach](./images/lpe_approach.png)