<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)
doconce format html MSUFeb10.do.txt --no_mako -->
<!-- dom:TITLE: Artificial Intelligence and Machine Learning in Nuclear Physics -->

# Artificial Intelligence and Machine Learning in Nuclear Physics
**Morten Hjorth-Jensen, Dean Lee and Witek Nazarewicz**, Department of Physics and Astronomy and FRIB/NSCL Laboratory, Michigan State University

Date: **Research Discussion, FRIB, Michigan State University, February 10, 2022**

## What is this talk about?
The main emphasis is to give you a short and pedestrian introduction to the whys and hows we can use Machine Learning methods
to solve quantum mechanical many-body problems and how we can use such techniques in analysis of  experiments. And why this could (or should) be of interest.

These slides are at <https://mhjensenseminars.github.io/MachineLearningTalk/doc/pub/>.

**See also Artificial Intelligence and Machine Learning in Nuclear Physics**, Amber Boehnlein et al., [arXiv:2112.02309](https://arxiv.org/abs/2112.02309) and Reviews Modern of Physics, submitted.  MSU co-authors are Deal Lee, Witek Nazarewicz, Peter Ostroumov, and MHJ

## A simple perspective on the interface between ML and Physics

<!-- dom:FIGURE: [figures/mlimage.png, width=700 frac=0.9] -->
<!-- begin figure -->

<img src="figures/mlimage.png" width="700"><p style="font-size: 0.9em"><i>Figure 1: </i></p>
<!-- end figure -->

## ML in Nuclear  Physics

<!-- dom:FIGURE: [figures/ML-NP.png, width=700 frac=0.9] -->
<!-- begin figure -->

<img src="figures/ML-NP.png" width="700"><p style="font-size: 0.9em"><i>Figure 1: </i></p>
<!-- end figure -->

## ML in Materials Science and nanotechnologies, predicting candidates for quantum technologies (add figure)

<!-- dom:FIGURE: [figures/fig2.png, width=700 frac=0.9] -->
<!-- begin figure -->

<img src="figures/fig2.png" width="700"><p style="font-size: 0.9em"><i>Figure 1: </i></p>
<!-- end figure -->

## AI/ML and some statements you may hear more and more

1. Fei-Fei Li on ImageNet: **map out the entire world of objects** ([The data that transformed AI research](https://cacm.acm.org/news/219702-the-data-that-transformed-ai-research-and-possibly-the-world/fulltext))

2. Russell and Norvig in their popular textbook: **relevant to any intellectual task; it is truly a universal field** ([Artificial Intelligence, A modern approach](http://aima.cs.berkeley.edu/))

3. Woody Bledsoe puts it more bluntly: **in the long run, AI is the only science** (quoted in McCorduck, [Machines who think](https://www.pamelamccorduck.com/machines-who-think))

If you wish to have a critical read on AI/ML from a societal point of view, see [Kate Crawford's recent text Atlas of AI](https://www.katecrawford.net/)

**Here: with AI/ML we intend a collection of machine learning methods with an emphasis on statistical learning and data analysis**

## Scientific Machine Learning

An important and emerging field is what has been dubbed as scientific ML, see the article by Deiana et al [Applications and Techniques for Fast Machine Learning in Science, arXiv:2110.13041](https://arxiv.org/abs/2110.13041)

The authors discuss applications and techniques for fast machine
learning (ML) in science - the concept of integrating power ML
methods into the real-time experimental data processing loop to
accelerate scientific discovery. The report covers three main areas

1. applications for fast ML across a number of scientific domains;

2. techniques for training and implementing performant and resource-efficient ML algorithms;

3. and computing architectures, platforms, and technologies for deploying these algorithms.

## Machine learning & low-energy nuclear theory: Why?

1. ML tools can help us to speed up the scientific process cycle and hence facilitate discoveries

2. Enabling fast emulation for big simulations

3. Revealing the information content of measured observables w.r.t. theory

4. Identifying crucial experimental data for better constraining theory

5. Providing meaningful input to applications and planned measurements

6. ML tools can help us to reveal the structure of our models

7. Parameter estimation with heterogeneous/multi-scale datasets

8. Model reduction

9. ML tools can help us to provide predictive capability

10. Theoretical results often involve ultraviolet  and infrared extrapolations due to Hilbert-space truncations 

11. Uncertainty quantification essential

12. Theoretical models are often applied to entirely new nuclear systems and conditions that are not accessible to experiment

## The plethora  of machine learning algorithms/methods

1. Deep learning: Neural Networks (NN), Convolutional NN, Recurrent NN, Boltzmann machines, autoencoders and variational autoencoders  and generative adversarial networks 

2. Bayesian statistics and Bayesian Machine Learning, Bayesian experimental design, Bayesian Regression models, Bayesian neural networks, Gaussian processes and much more

3. Dimensionality reduction (Principal component analysis), Clustering Methods and more

4. Ensemble Methods, Random forests, bagging and voting methods, gradient boosting approaches 

5. Linear and logistic regression, Kernel methods, support vector machines and more

6. Reinforcement Learning

## More on Machine Learning methods and applications in nuclear physics

**Machine  learning  for  data  mining:** Oftentimes,  it  is necessary to be able to accurately calculate observables that have not been measured, to supplement the existing databases.
**Nuclear  density  functional   theory:** Energy density functional calibration   involving Bayesian optimization  and NN  ML. A promising avenue for ML applications is the emulation of DFT results.
**Nuclear properties with ML:** Improving predictive power of nuclear models by emulating model residuals.
**Effective field theory and A-body systems:** Truncation errors and low-energy coupling constant calibration, nucleon-nucleon scattering calculations, variational calculations with ANN for light nuclei, NN extrapolation of nuclear structure observables
**Nuclear  shell  model  UQ:** ML methods  have  been  used  to  provide  UQ  of  configuration  interaction  calculations.
**Low-energy nuclear reactions UQ:** Bayesian optimization studies of the nucleon-nucleus optical potential, R-matrix analyses,  and  statistical spatial networks to study patterns in nuclear reaction networks.
**Neutron star properties and nuclear matter equation of state:** constraining the equation of state by properties on neutron stars and selected properties of finite nuclei
**Experimental design:** Bayesian ML provides a framework  to  maximize  the  success  of  on  experiment  based on  the  best  information  available  on existing  data, experimental conditions, and theoretical models.

## Machine Learning and Physics
Machine learning  is an extremely rich field, in spite of its young age. The
increases we have seen during the last three decades in computational
capabilities have been followed by developments of methods and
techniques for analyzing and handling large date sets, relying heavily
on statistics, computer science and mathematics.  The field is rather
new and developing rapidly. 

Popular software packages written in Python for ML are

* [Scikit-learn](http://scikit-learn.org/stable/), 

* [Tensorflow](https://www.tensorflow.org/),

* [PyTorch](http://pytorch.org/)

* [Keras](https://keras.io/),

and more. These are all freely available at their respective GitHub sites. They 
encompass communities of developers in the thousands or more. And the number
of code developers and contributors keeps increasing.

## Lots of room for creativity
Not all the
algorithms and methods can be given a rigorous mathematical
justification, opening up thereby for experimenting
and trial and error and thereby exciting new developments.

A solid command of linear algebra, multivariate theory, 
probability theory, statistical data analysis, optimization algorithms, 
understanding errors and Monte Carlo methods is important in order to understand many of the 
various algorithms and methods.

**Job market, a personal statement**: [A familiarity with ML is almost becoming a prerequisite for many of the most exciting employment opportunities](https://www.analyticsindiamag.com/top-countries-hiring-most-number-of-artificial-intelligence-machine-learning-experts/). And add quantum computing and there you are!

## Types of Machine Learning

The approaches to machine learning are many, but are often split into two main categories. 
In *supervised learning* we know the answer to a problem,
and let the computer deduce the logic behind it. On the other hand, *unsupervised learning*
is a method for finding patterns and relationship in data sets without any prior knowledge of the system.
Some authours also operate with a third category, namely *reinforcement learning*. This is a paradigm 
of learning inspired by behavioural psychology, where learning is achieved by trial-and-error, 
solely from rewards and punishment.

Another way to categorize machine learning tasks is to consider the desired output of a system.
Some of the most common tasks are:

  * Classification: Outputs are divided into two or more classes. The goal is to   produce a model that assigns inputs into one of these classes. An example is to identify  digits based on pictures of hand-written ones. Classification is typically supervised learning.

  * Regression: Finding a functional relationship between an input data set and a reference data set.   The goal is to construct a function that maps input data to continuous output values.

  * Clustering: Data are divided into groups with certain common traits, without knowing the different groups beforehand.  It is thus a form of unsupervised learning.

## ML in Nuclear Physics, Examples

The large amount of degrees of freedom pertain to both theory and experiment in nuclear physics. With increasingly complicated experiments that produce large amounts data, automated classification of events becomes increasingly important. Here, deep learning methods offer a plethora of interesting research avenues. 

* Reconstruction of particle trajectories or classification of events are typical examples where ML methods are being used. However, since these data can often be extremely noisy, the precision necessary for discovery in physics requires algorithmic improvements. Research along such directions, interfacing nuclear physics with AI/ML is expected to play a significant role in physics discoveries related to new facilities.  The treatment of corrupted data in imaging and image processing is also a relevant topic. 

* Design of detectors represents an important area of applications for ML/AI methods in nuclear physics.

* Many of the above classification problems have also have direct application in theoretical nuclear physics (including Lattice QCD calculations).

## More examples

* An important application of AI/ML methods is to improve the estimation of bias or uncertainty due to the introduction of or lack of physical constraints in various theoretical models.

* In theory, we expect to use AI/ML algorithms and methods to improve our knowledged about  correlations of physical model parameters in data for quantum many-body systems. Deep learning methods like Boltzmann machines and various types of Recurrent Neural networks show great promise in circumventing the exploding dimensionalities encountered in quantum mechanical many-body studies. 

* Merging a frequentist approach (the standard path in ML theory) with a Bayesian approach, has the potential to infer better probabilitity distributions and error estimates. As an example, methods for fast Monte-Carlo- based Bayesian computation of nuclear density functionals show great promise in providing a better understanding 

* Machine Learning and Quantum Computing is a very interesting avenue to explore. See for example talk of [Sofia Vallecorsa](https://www.youtube.com/watch?v=7WPKv1Q57os&list=PLUPPQ1TVXK7uHwCTccWMBud-zLyvAf8A2&index=5&ab_channel=ECTstar).

## Selected References
* [Mehta et al.](https://arxiv.org/abs/1803.08823) and [Physics Reports (2019)](https://www.sciencedirect.com/science/article/pii/S0370157319300766?via%3Dihub).

* [Machine Learning and the Physical Sciences by Carleo et al](https://link.aps.org/doi/10.1103/RevModPhys.91.045002)

* [Ab initio solution of the many-electron Schrödinger equation with deep neural networks by Pfau et al.](https://journals.aps.org/prresearch/abstract/10.1103/PhysRevResearch.2.033429).

* [Machine Learning and the Deuteron by Kebble and Rios](https://www.sciencedirect.com/science/article/pii/S0370269320305463?via%3Dihub)

* [Variational Monte Carlo calculations of $A\le 4$ nuclei with an artificial neural-network correlator ansatz by Adams et al.](https://arxiv.org/abs/2007.14282)

* [Unsupervised Learning for Identifying Events in Active Target Experiments by Solli et al.](https://arxiv.org/abs/2006.05422)

* [Report from the A.I. For Nuclear Physics  Workshop by Bedaque et al.](https://arxiv.org/abs/2006.05422)

* [Applications and Techniques for Fast Machine Learning in Science](https://arxiv.org/abs/2110.13041)

* [Particle Data Group summary on ML methods](https://pdg.lbl.gov/2021/reviews/rpp2021-rev-machine-learning.pdf)

## What are the basic ingredients?
Almost every problem in ML and data science starts with the same ingredients:
* The dataset $\mathbf{x}$ (could be some observable quantity of the system we are studying)

* A model which is a function of a set of parameters $\mathbf{\alpha}$ that relates to the dataset, say a likelihood  function $p(\mathbf{x}\vert \mathbf{\alpha})$ or just a simple model $f(\mathbf{\alpha})$

* A so-called **loss/cost/risk** function $\mathcal{C} (\mathbf{x}, f(\mathbf{\alpha}))$ which allows us to decide how well our model represents the dataset. 

We seek to minimize the function $\mathcal{C} (\mathbf{x}, f(\mathbf{\alpha}))$ by finding the parameter values which minimize $\mathcal{C}$. This leads to  various minimization algorithms. It may surprise many, but at the heart of all machine learning algortihms there is an optimization problem.

## Neural network types
An artificial neural network (NN), is a computational model that consists of layers of connected neurons, or *nodes*. 
It is supposed to mimic a biological nervous system by letting each neuron interact with other neurons
by sending signals in the form of mathematical functions between layers. 
A wide variety of different NNs have
been developed, but most of them consist of an input layer, an output layer and eventual layers in-between, called
*hidden layers*. All layers can contain an arbitrary number of nodes, and each connection between two nodes
is associated with a weight variable. 

<!-- dom:FIGURE: [figures/dnn.png, width=500 frac=0.6] -->
<!-- begin figure -->

<img src="figures/dnn.png" width="500"><p style="font-size: 0.9em"><i>Figure 1: </i></p>
<!-- end figure -->

## [Nuclear Physics Experiments Argon-46, Solli et al.](https://www.sciencedirect.com/science/article/abs/pii/S0168900221004460?via%3Dihub)

Two- and three-dimensional representations of two events from the
Argon-46 experiment. Each row is one event in two projections,
where the color intensity of each point indicates higher charge values
recorded by the detector. The bottom row illustrates a carbon event with
a large fraction of noise, while the top row shows a proton event
almost free of noise. See [Unsupervised Learning for Identifying Events in Active Target Experiments by Solli et al.](https://arxiv.org/abs/2008.02757) for more detials.

<!-- dom:FIGURE: [figures/examples_raw.png, width=500 frac=0.6] -->
<!-- begin figure -->

<img src="figures/examples_raw.png" width="500"><p style="font-size: 0.9em"><i>Figure 1: </i></p>
<!-- end figure -->

## Why Machine Learning?

The traditional Monte Carlo event selection process does not have a
well-defined method to quantify the effectiveness of the event
selection.

In addition, the selection task normally produces  a binary result only, either
a **good** or **bad** fit to the event of interest. A **bad**
fit is then assumed to be a different event type, and is removed from
the analysis. 

In a broader perspective, an
unsupervised classification algorithm would offer the possibility to
*discover* rare events which may not be expected or are
overlooked. These events would likely be filtered out using the
traditional methods. From a practical point of view, compared to
supervised learning, it also avoids the necessary labeling task of the
learning set events, which is error prone and time consuming.

## Why Machine Learning for Experimental Analysis?

The $\chi^2$ approach used in the traditional analysis performed on
the Argon-46 data is extremely expensive from a computational stand
because it involves the simulation of thousands of tracks for each
recorded event.

These events are in turn simulated for each iteration of the Monte
Carlo fitting sequence.  Even though the reaction of interest in the
above experiment had the largest cross section (elastic scattering),
the time spent on Monte Carlo fitting of *all* of the events
produced in the experiment was the largest computational bottleneck in
the analysis. In the case of an experiment where the reaction of
interest would represent less than a few percent of the total cross
section, this procedure would become highly inefficient and
prohibitive. Adding to this the large amount of data produced in this
experiment (with even larger data sets expected in future
experiments), the analysis simply begs for more efficient analysis
tools.

## More arguments

The computationally expensive fitting procedure
would be applied to every event, instead of the few percent of the
events that are of interest for the analysis.  An unsupervised ML
algorithm able to separate the data without *a priori* knowledge
of the different types of events increases the efficiency of the
analysis tremendously, and allows the downstream analysis to
concentrate on the fitting efforts only on events of interest. In
addition, the clustering allows for more exploration of the data,
potentially enabling new discovery of unexpected reaction types.

## Quantum Monte Carlo Motivation
Given a hamiltonian $H$ and a trial wave function $\Psi_T$, the variational principle states that the expectation value of $\langle H \rangle$, defined through

$$
\langle E \rangle =
   \frac{\int d\boldsymbol{R}\Psi^{\ast}_T(\boldsymbol{R})H(\boldsymbol{R})\Psi_T(\boldsymbol{R})}
        {\int d\boldsymbol{R}\Psi^{\ast}_T(\boldsymbol{R})\Psi_T(\boldsymbol{R})},
$$

is an upper bound to the ground state energy $E_0$ of the hamiltonian $H$, that is

$$
E_0 \le \langle E \rangle.
$$

In general, the integrals involved in the calculation of various  expectation values  are multi-dimensional ones. Traditional integration methods such as the Gauss-Legendre will not be adequate for say the  computation of the energy of a many-body system.

## Running the codes
You can find the codes, in c++,  for the simple two-electron case at the Github repository <https://github.com/mhjensenseminars/MachineLearningTalk/tree/master/doc/Programs/MLcpp/src> or in python at <http://compphysics.github.io/ComputationalPhysics2/doc/LectureNotes/_build/html/boltzmannmachines.html> 

The trial wave function are based on the product of a Slater determinant with either only Hermitian polynomials or Gaussian orbitals, with and without a Pade-Jastrow factor (PJ).

## Deep Learning Neural Networks

[Machine Learning and the Deuteron by Kebble and Rios](https://www.sciencedirect.com/science/article/pii/S0370269320305463?via%3Dihub) and 
[Variational Monte Carlo calculations of $A\le 4$ nuclei with an artificial neural-network correlator ansatz by Adams et al.](https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.127.022502)

**Adams et al**:

$$
H_{LO} =-\sum_i \frac{{\vec{\nabla}_i^2}}{2m_N}
+\sum_{i<j} {\left(C_1  + C_2\, \vec{\sigma_i}\cdot\vec{\sigma_j}\right)
e^{-r_{ij}^2\Lambda^2 / 4 }}
\nonumber
$$

<!-- Equation labels as ordinary links -->
<div id="_auto1"></div>

$$
\begin{equation} 
+D_0 \sum_{i<j<k} \sum_{\text{cyc}}
{e^{-\left(r_{ik}^2+r_{ij}^2\right)\Lambda^2/4}}\,,
\label{_auto1} \tag{1}
\end{equation}
$$

where $m_N$ is the mass of the nucleon, $\vec{\sigma_i}$ is the Pauli
matrix acting on nucleon $i$, and $\sum_{\text{cyc}}$ stands for the
cyclic permutation of $i$, $j$, and $k$. The low-energy constants
$C_1$ and $C_2$ are fit to the deuteron binding energy and to the
neutron-neutron scattering length

## Replacing the Jastrow factor with Neural Networks

An appealing feature of the ANN ansatz is that it is more general than the more conventional product of two-
and three-body spin-independent Jastrow functions

<!-- Equation labels as ordinary links -->
<div id="_auto2"></div>

$$
\begin{equation}
|\Psi_V^J \rangle = \prod_{i<j<k} \Big( 1-\sum_{\text{cyc}} u(r_{ij}) u(r_{jk})\Big) \prod_{i<j} f(r_{ij}) | \Phi\rangle\,,
\label{_auto2} \tag{2}
\end{equation}
$$

which is commonly used for nuclear Hamiltonians that do not contain tensor and spin-orbit terms.
The above function is replaced by a four-layer Neural Network. 

<!-- dom:FIGURE: [figures/energyconvergence.png, width=700 frac=0.9] -->
<!-- begin figure -->

<img src="figures/energyconvergence.png" width="700"><p style="font-size: 0.9em"><i>Figure 1: </i></p>
<!-- end figure -->

## Conclusions and where do we stand
* Lots of experimental analysis coming, see for example [Unsupervised Learning for Identifying Events in Active Target Experiments by Solli et al.](https://arxiv.org/abs/2008.02757) as well references and examples in  [Report from the A.I. For Nuclear Physics  Workshop by Bedaque et al.](https://arxiv.org/abs/2006.05422).

* Extension of the work of [G. Carleo and M. Troyer, Science **355**, Issue 6325, pp. 602-606 (2017)](http://science.sciencemag.org/content/355/6325/602) gives excellent results for two-electron systems as well as good agreement with standard VMC calculations for many  electrons.

* Promising results with neural Networks as well. Next step is to use trial wave function in final Green's function Monte Carlo calculations. 

* Minimization problem can be tricky.

* Anti-symmetry dealt with multiplying the trail wave function with either a simple or an optimized Slater determinant.

* Extend to more fermions. How do we deal with the antisymmetry of the multi-fermion wave function?

a. Here we also used standard Hartree-Fock theory to define an optimal Slater determinant. Takes care of the antisymmetry. What about constructing an anti-symmetrized network function?

b. Use thereafter ML to determine the correlated part of the wafe function (including a standard Jastrow factor).

* Can we use ML to find out which correlations are relevant and thereby diminish the dimensionality problem in standard many-body  theories? 

* And many more exciting research avenues

Need for Machine Learning and Uncertainty Quantification in nuclear physics theory.
Good progress but much remains to be done.

To solve many complex problems in the field and facilitate discoveries, multidisciplinary efforts efforts are required involving scientists in  nuclear physics, statistics, computational science, and applied math. 
The community needs to invest in relevant educational efforts.