diff --git a/content/exercise-linear-tikhonov-inversion.md b/content/exercise-linear-tikhonov-inversion.md
index 41bc2b5..8252221 100644
--- a/content/exercise-linear-tikhonov-inversion.md
+++ b/content/exercise-linear-tikhonov-inversion.md
@@ -63,7 +63,7 @@ The numerical details regarding how we obtain a solution to the inverse problem
   - _Do you get the same model with different ranges of $\beta$ values?_
   - _What happens the range of $\beta$ values is too small? - try several small ranges_
   - _What happens if are only a few values of $\beta$ over a large range?_
-  - _Does the optimal value $\beta^_$ stay the same or similar when any of the changes in this investigation are made?\*
+  - _Does the optimal value {math}`\beta^*` stay the same or similar when any of the changes in this investigation are made?_
   - _In Explore mode view the $\phi_d~\text{vs}~\phi_m$ plot and adjust $\beta_i$ - similar to the examples in _[_2.7. Objective Function for the Inverse Problem_](oxa:VNMrkxzChhdveZyf6lmb/46OlD42gDBzA8SkHBwSK '2.7. Objective Function for the Inverse Problem')_-{numref}`Figure %s <LxF0fImli3FufWtnmAV5>` and {numref}`Figure %s <Hx4fVnPbS2LUU4eiZOLF>`_
   - _How does the model change when a $\beta$ that is too large is chosen? (overfitting the data)_
     - _How does the model change when a $\beta$ that is too small is chosen? (underfitting the data)_
diff --git a/content/forward-problem.md b/content/forward-problem.md
index f7de0ce..dd88c85 100644
--- a/content/forward-problem.md
+++ b/content/forward-problem.md
@@ -2,187 +2,117 @@
 title: Forward Problem
 description: ''
 date: '2021-07-27T18:32:54.975Z'
-name: forward-problem
 venue: Linear Tikhonov Inversion
-oxa: oxa:VNMrkxzChhdveZyf6lmb/xqM34l8moONMZ4iDWIQf
-tags: []
-keywords: []
 ---
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/DMWKEs4uFE0tm2fBjdUr.7","tags":[]}
-
 For a linear system, the forward problem ({eq}`3ffde9a8`) can often be represented in the form of an integral equation (technically a Fredholm equation of the Second kind) as shown below.
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/kcG1fkQEwzcMhorgrH32.12","tags":[]}
-
 ```{math}
 :label: f40ae85f
 
 d_j=\int^b_ag_j(x)m(x)dx
 ```
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/9lMYejPm485dUq7eA2Wc.1","tags":[]}
-
 The model $m(x)$ is a function defined on the closed region $[a,b]$, and $g_j(x)$ is the kernel function which encapsulates the physics of the problem. The datum, $d_j$ is the inner product of the kernel and the model. It is sometimes helpful to think of the kernel as the “window” through which a model is viewed. In this section we will step through the essential details for carrying out the forward modelling. We first design a model, introduce a “mesh” to discretize it, discretize the kernels and form sensitivities, generate the data through a matrix vector multiplication and then add noise. These data will then be inverted.
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/KNLhGSpYYfAko6951cZ1.5","tags":[]}
-
 For our synthetic problem, we start by creating the function that we will later want to retrieve with the inversion. The model can be any function but here we combine a background, box car, and a Gaussian; the domain will be \[0,1\], shown in [Figure A](https://curvenote.com/oxa:VNMrkxzChhdveZyf6lmb/8WfrIYP5O9W6zakEQaa3.7).
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/8WfrIYP5O9W6zakEQaa3.7","tags":[]}
-
 ```{mdast} forward-problem.mdast.json#QZMVNZHNwK
 
 ```
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/vKzG4OucE2WYlaSjDkHV.4","tags":[]}
-
 **Figure A:** Default model from the corresponding inversion application. The model combines a background value with box car and Gaussian functions on the domain \[0,1\]. [LinearTikhonovInversion_Notebook.ipynb](oxa:VNMrkxzChhdveZyf6lmb/lb7CgEnVPzfs79VcKpB1 'LinearTikhonovInversion_Notebook.ipynb')
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/2L7K4fl4gBMQacD284lu.4","tags":[]}
-
 ### Mesh
 
 In our next step we design a mesh on which our model is defined and on which all of the numerical computations are carried. We discretize our domain into $M$ cells of uniform thickness. If we think about the “x-direction” as being depth, then this discretization would be like having a layered earth.
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/TWKjAN0EuRr4mN1jslTl.2","tags":[]}
-
 ```{figure} images/VNMrkxzChhdveZyf6lmb-TWKjAN0EuRr4mN1jslTl-v2.png
 :name: TWKjAN0EuRr4mN1jslTl
 :align: center
 :width: 70%
 ```
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/T6sWWsMvnDL5JuJbHPsm.4","tags":[]}
-
 Our “model” is now an M-length vector $\mathbf m = (m_1, m_2, …, m_M)$. In fact, the function plotted in [Figure A](https://curvenote.com/oxa:VNMrkxzChhdveZyf6lmb/8WfrIYP5O9W6zakEQaa3.7) has already been discretized.
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/qvGJ12IANr3L9Jk4UzMn.9","tags":[]}
-
 ### Kernels and Data
 
 Our goal is to carry out an experiment that produces data that are sensitive to the model shown in [Figure A](https://curvenote.com/oxa:VNMrkxzChhdveZyf6lmb/8WfrIYP5O9W6zakEQaa3.7). For our linear system ({eq}`f40ae85f`) this means choosing the kernel functions. In reality, these kernel functions are controlled by the governing physical equations and the specifics of the sources and receivers for the experiment. For our investigation we select oscillatory functions which decay with depth. These are chosen because they are mathematically easy to manipulate and they also have a connection with many geophysical surveys, for example, in a frequency domain electromagnetic survey a sinusoidal wave propagates into the earth and continually decays as energy is dissipated. The kernel $g_j(x)$ corresponding to $d_j$ is given by
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/xQEIP2FNnxfk6ELkP28o.8","tags":[]}
-
 ```{math}
 :label: 4f6a44b9
 
 g_j(x)= e^{p_jx}\cos(2\pi q_jx)
 ```
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/Y47bmurJfiCyDyYRpma2.2","tags":[]}
-
 Thus $p_j$ controls the rate of decay of the kernel and $q_j$ controls the frequency; the kernel will undergo $q_j$ complete cycles in the domain \[0,1\]. In our example, each of the ranges $[p_{min}, p_{max}]$ and $[q_{min}, q_{max}]$ is divided into M intervals but this is only for convenience. In principle these numbers can be arbitrarily specified. As an example the image below displays three kernels produced with $q = [1, 2, 3]$ and $p = [0, 1, 2]$. Note the successive decrease in amplitude at $x=1.0$.
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/2btzRAPgnmItoV2GkvSV.10","tags":[]}
-
 ```{mdast} forward-problem.mdast.json#vGPcPxYoBq
 
 ```
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/RZ8O7dw1PjELPxImDYX7.5","tags":[]}
-
 **Figure B:** Example of three kernels ({eq}`4f6a44b9`) for the app where q=\[1,2,3\] and p=\[0,1,2\]. [LinearTikhonovInversion_Notebook.ipynb](oxa:VNMrkxzChhdveZyf6lmb/lb7CgEnVPzfs79VcKpB1 'LinearTikhonovInversion_Notebook.ipynb')
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/NbyT8tGzv13fNGkHWrYF.6","tags":[]}
-
 To simulate the data we need to evaluate {eq}`f40ae85f`. The model has been discretized with the 1D mesh. The expression for the data becomes
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/Gm7gHsFXic5G1yoZSCkn.10","tags":[]}
-
 ```{math}
 :label: be6adec9
 
 \begin{aligned} d_j&=\int_0^{x_1}g_j(x)m_1dx +\int_{x_1}^{x_2}g_j(x)m_2dx+\dots \\ &=\sum^M_{i=1}\left(\int_{x_{k-1}}^{x_k}g_j\left(x\right)dx\right)m_i\\ &\\ d_j &= \mathbf g_j \mathbf m\end{aligned}
 ```
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/wjpBYcrqF20qxcYzGhmG.2","tags":[]}
-
 where $\mathbf g_j$ is now referred to as a sensitivity kernel. When the discretization is uniform, the only difference between the kernel $g_j(x)$ and the sensitivity $\mathbf g_j$ is a scaling factor that is equal to the discretization width. However, for nonuniform meshes these quantities can look quite different, and confusing kernels and sensitivities, can lead to unintended consequences in an inversion. We shall make this distinction clear at the outset.
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/QAUzP83WiVsFwT45K4v2.1","tags":[]}
-
 To expand the above to deal with $N$ data. We define a sensitivity matrix $\mathbf{G}$. The $j^{th}$ row of $\mathbf{G}$ is formed by $\mathbf g_j$ so $\mathbf{d}$ looks like
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/UY5AXhKxS1h3WyGYrl2H.13","tags":[]}
-
 ```{math}
 :label: d031bcf8
 
 \begin{aligned} \mathbf{d} = \mathbf{G}\mathbf{m} = \begin{bmatrix} d_1\\ \vdots\\ d_{N} \end{bmatrix}\end{aligned}
 ```
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/Ry2YW3DdFtdKzELaESoU.1","tags":[]}
-
 where the individual elements of $G$ are
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/2phjcwor9mrVgYufGT39.7","tags":[]}
-
 ```{math}
 :label: 2664f2ef
 
 G_{jk} = \int_{x_{k-1}}^{x_k} g_j(x) dx
 ```
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/UO99dTvoas4VsZCvxLKG.7","tags":[]}
-
 $\mathbf{G}$ is an $N \times M$ matrix ($N$ data and $M$ model elements). Using the model in [Figure A](https://curvenote.com/oxa:VNMrkxzChhdveZyf6lmb/8WfrIYP5O9W6zakEQaa3.7) and our sensitivity matrix $\mathbf{G}$ we forward model the data. The model, rows of the sensitivity matrix, and corresponding data are shown in [Figure C](https://curvenote.com/oxa:VNMrkxzChhdveZyf6lmb/gqkXJ3NUlhOxm8lbUq2p.14). The data are considered “clean” or “true” because they do not contain noise.
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/gqkXJ3NUlhOxm8lbUq2p.14","tags":[]}
-
 ```{mdast} forward-problem.mdast.json#tx5YkrqMmm
 
 ```
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/RFRP8LmlqdS43tHPhh8p.5","tags":[]}
-
 **Figure C:** Default display from the app of the model, rows (sensitivity kernels) of the matrix $\mathbf{G}$, and clean data. [LinearTikhonovInversion_Notebook.ipynb](oxa:VNMrkxzChhdveZyf6lmb/lb7CgEnVPzfs79VcKpB1 'LinearTikhonovInversion_Notebook.ipynb')
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/BkSnNlITIGCWKaHVZM7F.3","tags":[]}
-
 ### Adding Noise
 
 Until now, we have only calculated the data $\mathbf{d}$, but observed data $\mathbf{d}^{obs}$ contain additive noise,
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/GtIrygcvqO2irqYxf3Ek.8","tags":[]}
-
 ```{math}
 :label: adc69a91
 
 \mathbf{d}^{obs}=\mathbf{d}+\mathbf{n}
 ```
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/wjec6ZKKu9MOlOxJmh9v.2","tags":[]}
-
 Throughout our work, the noise for a datum $d_j$ is assumed to be a realization of a Gaussian random variable with zero mean and standard deviation
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/BGPdowi0KZw5sVvtq3iu.9","tags":[]}
-
 ```{math}
 :label: 9e813186
 
 \epsilon_j = \%|d_j| + \nu_j
 ```
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/qMpnnaEGMyfNg4YfvMrb.4","tags":[]}
-
 this is a percentage of the datum plus a floor. The reason for this choice is as follows. In every experiment there is a base-level of noise due to instrument precision and other factors such as wind noise or ground vibrations. This can be represented as a Gaussian random variable with zero mean and standard deviation $\nu_j$ and a single value might be applicable for all of the data in the survey. This is often when the data do not have a large dynamic range such as might be found in gravity or magnetic data. In other cases the data can have a large dynamic range, such as in DC resistivity surveys or time domain EM data. To capture uncertainties in the data, a percentage value is more appropriate. If data range from $1.0$ to $1e-4$ then a standard deviation of $10 \%$ of the smallest datum is likely an under-estimate for the datum that has unit amplitude.
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/wMAfxMpTBwoswZfrOVbY.3","tags":[]}
-
 These are important concepts and we’ll revisit them in more detail later. For now it suffices that “noise” can be added to the data according to {eq}`adc69a91`. Here we choose $\epsilon_j = 0.03$. An example of the clean (true) data, a realization of the noise, and the noisy data are shown in [Figure D](https://curvenote.com/oxa:VNMrkxzChhdveZyf6lmb/f4ZOfRjwPn87rmRQmgT3.9). The error bars are superposed on the noisy data.
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/f4ZOfRjwPn87rmRQmgT3.9","tags":[]}
-
 ```{mdast} forward-problem.mdast.json#NfDRb6qOjD
 
 ```
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/qtFqSCengsdJ79xXoIN7.4","tags":[]}
-
 **Figure D:** Display of the clean data from Figure 2.5 with the added noise to create the noisy data. [LinearTikhonovInversion_Notebook.ipynb](oxa:VNMrkxzChhdveZyf6lmb/lb7CgEnVPzfs79VcKpB1 'LinearTikhonovInversion_Notebook.ipynb')
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/jLmU23G920AaVhBYnOTC.10","tags":[]}
-
 The construction of the forward problem for the 1D synthetic example provided many of the elements needed for the inverse problem. We have observed data, an estimate of the uncertainty within the data, the ability to forward model and we have discretized our problem. Our goal is to find the model that gave rise to the data. Within the context of our flow chart in [2.2. Defining the Inverse Problem](oxa:VNMrkxzChhdveZyf6lmb/iPh2GcyPHcbKFJrzGLc4 '2.2. Defining the Inverse Problem')-{numref}`Figure %s <cFtpNM4K3qAT83AO13wl>` the next two items to address are the misfit criterion and model norm. We first address the issue of data misfit.
diff --git a/content/inverse-problem-fundamental-challenges.md b/content/inverse-problem-fundamental-challenges.md
index fd08461..f5cca1b 100644
--- a/content/inverse-problem-fundamental-challenges.md
+++ b/content/inverse-problem-fundamental-challenges.md
@@ -132,7 +132,7 @@ Whereas in @fig:inverse-mapping the application of $\mathcal{F}^{-1}$ maps to a
 :width: 40%
 ```
 
-The nonuniqueness is exacerbated when we have a finite number of data contaminated with noise. As shown in {numref}`Figure %s <R7LoaGMcurI6TlSKSRyH>` below there are now more ways to interpolate the data and each interpolation will produce a different $v_{int}(t)$.
+The nonuniqueness is exacerbated when we have a finite number of data contaminated with noise. As shown in @a1e83824 below there are now more ways to interpolate the data and each interpolation will produce a different $v_{int}(t)$.
 
 ```{figure} images/VNMrkxzChhdveZyf6lmb-OaY83EoVsmYutnckX6Db-v1.png
 :name: a1e83824
diff --git a/content/model-norm.md b/content/model-norm.md
index c3abb10..177b418 100644
--- a/content/model-norm.md
+++ b/content/model-norm.md
@@ -2,97 +2,61 @@
 title: Model Norm
 description: ''
 date: '2021-07-27T19:02:24.879Z'
-name: model-norm
 venue: Linear Tikhonov Inversion
-oxa: oxa:VNMrkxzChhdveZyf6lmb/QI5J8WYB64kekRbVUIeg
-tags: []
-keywords: []
 ---
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/PRu8rIAtGUN3m6Qc2nM0.4","tags":[]}
-
 Although all of our computations for solving the inverse problem are done using vectors and matrices, for most of our inverse problems were are attempting to find a function in 1D, 2D, or 3D that could have given rise to the data. When we address nonuniqueness it is helpful to remember that the theoretical underpinnings of our problem reside in a function space. As such we first introduce some norms for our function space and then discretize them with our mesh to get the final quantity to be minimized.
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/s6G5GBmzkACaVMesCwQW.4","tags":[]}
-
 ### Smallest Model Norm
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/hptZlzW1HfA6qBTD3N54.4","tags":[]}
-
 ```{math}
 :label: 5bb81c63
 
 \phi_m=\int (m-m^{ref})^2 dx
 ```
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/NMv89WeOxYo3VtCHKolI.1","tags":[]}
-
 This norm penalizes the amplitude difference between the sought model $m$ and the reference model $m^{ref}$. Minimizing $\phi_m$ produces a solution that is close to $m^{ref}$ everywhere on the domain. Incorporating $m^{ref}$ a powerful way to include additional information into the inversion. It is common for inversionists to omit the reference model term and then say that they have not included apriori information. That is not correct; omitting it is the same as setting $m^{ref}=0$. The inversion will then produce a solution that is as close to zero as possible, and the details and character of that result might be quite incorrect.
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/bnA45YXqCAIa26ekWFAJ.2","tags":[]}
-
 ### Flattest Model Norm
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/V0RZ0YR4ZH7hHSIFgXMR.4","tags":[]}
-
 ```{math}
 :label: dd987b9d
 
 \phi_m=\int \left(\frac{d(m-m^{ref})}{dx}\right)^2 dx
 ```
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/bkDiC02rXvbmZDW10emW.5","tags":[]}
-
 This norm penalizes derivatives and reduces the amount of structure in the final solution. The reference model $m^{ref}$ can be kept in this term or omitted. In the latter case the final solution is smooth throughout the domain. In working with this norm we often talk about the “smoothest” model but any verbal confusion is clarified by the equations. The choice of including a nonzero reference depends upon what you know about the underlying model and your goals for the inversion. We will address these subtitles later.
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/gY2E8FqgRyZQipsdTSsF.5","tags":[]}
-
 ### Combined Norms
 
 In most cases we desire a model that is close to a prescribed reference model and is also smooth. Mathematically this can be accomplished by combining the above two norms in a single objective function
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/MWve8oSE6siZ6gn9obmF.6","tags":[]}
-
 ```{math}
 :label: 05257042
 
 \phi_m=\alpha_s\int (m-m^{ref})^2 dx+\alpha_x\int (\frac{d(m-m^{ref})}{dx})^2 dx
 ```
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/mBesKSYLrgPUPGP3dlhD.2","tags":[]}
-
 The quantities $\alpha_s$ and $\alpha_x$ are nonnegative constants that adjust the relative importance of the two terms. If $\alpha_x=0$ then the result will be a smallest model (often with a substantial amount of structure) while setting $\alpha_s=0$ will generate a smooth model with fewer oscillations.
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/BQdtW332fMdBy0N9dDto.6","tags":[]}
-
 Intuitively, it might seem that if $\alpha_s = \alpha_x$ then the components of the model norm are equally important in controlling the final model. However, this is not the case. We note that the two integrals in {eq}`05257042` have different dimensions. The flattest model norms is scaled by $1/{dx^2}$ compared to the smallest model norm. If the cell size is 0.01, as it is in the app, then that amplifies the size of the norm by a factor of $10^4$. For the two terms, $\alpha_s \phi_s$ and $\alpha_x \phi_x$ to have similar magnitudes, $\alpha_s \simeq 10^4$ . This is a somewhat subtle point, but it is very important. The decision about what values of $\alpha$ are small, or large, depends upon the functions inside the integral and the discretization used in the problem. We shall revisit this aspect in later chapters.
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/vNc91OgEX3SLI9AnhFxC.1","tags":[]}
-
 The function norms presented above have their counter-parts in vector space. The smallest model can be written as
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/ZemZcVOzmmmfCXWHLIiW.4","tags":[]}
-
 ```{math}
 :label: 64af53ab
 
 \phi_m=\|\mathbf{m - m^{ref}}\|^2=\sum_{i=1}^M(m_i-m^{ref}_i)^2
 ```
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/zVs6tYheYq5ctDesqZ44.6","tags":[]}
-
 For our toy problem, if $m^{ref}=0$, then the solution that fits the data is $\mathbf{m} = (0.392, 0.784)$ and $\phi_m = 0.768$. It is the black dot in [2.5. Nonuniqueness](oxa:VNMrkxzChhdveZyf6lmb/GnzA9JWkZwIOCRGpfoEU '2.5. Nonuniqueness')-{numref}`Figure %s <QnSQFBXrMEDwAXuinSO3>`. If the reference model is $\mathbf{m^{ref}}=(1, 1)$ the solution is $\mathbf{m} = (0.8, 0.6)$ and 0.2. It is shown as the blue dot.
 
 The smoothness or flattest model term, written as
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/yETuK7zWoVYTBA0gnIC7.4","tags":[]}
-
 ```{math}
 :label: 8242954b
 
 \phi_m=\|\frac{d\mathbf{m}}{dx}\|^2=\sum_{i=1}^{M-1}(m_{i+1}-m_i)^2
 ```
 
-+++ {"oxa":"oxa:VNMrkxzChhdveZyf6lmb/UTHOKZGsHWf56LOVo2fi.2","tags":[]}
-
 uses discrete differences and therefore has one fewer element in the summation. For our toy problem $\phi_m=(m_2-m_1)^2$; the solution is $\mathbf{m} = (0.667, 0.667)$ and $\phi_m = 0$, shown as the red dot in[2.5. Nonuniqueness](oxa:VNMrkxzChhdveZyf6lmb/GnzA9JWkZwIOCRGpfoEU '2.5. Nonuniqueness')-{numref}`Figure %s <QnSQFBXrMEDwAXuinSO3>`.
diff --git a/content/nonuniqueness.md b/content/nonuniqueness.md
index 62f8206..a73af9d 100644
--- a/content/nonuniqueness.md
+++ b/content/nonuniqueness.md
@@ -7,7 +7,7 @@ venue: Linear Tikhonov Inversion
 
 An inversion attempts to solve a linear system of equations where we have fewer equations $(N)$ than we have unknowns $(M)$. In the app the default number of parameters was $100$ while the number of data was $20$. The system of equations is underdetermined and there is an infinite number of models that can fit the data.
 
-This nonuniqueness, and insight about how to deal with it, is exemplified by a toy example consisting of two model parameters $\mathbf m = (m_1, m_2)$ and one datum $m_1+2m_2=2$. What is the solution to this problem? In fact any point along the straight line in {numref}`Figure %s <QnSQFBXrMEDwAXuinSO3>` below is a valid solution. Possibilities include $\mathbf m = (1, 0.5)$ or $\mathbf m = (2, 0)$.
+This nonuniqueness, and insight about how to deal with it, is exemplified by a toy example consisting of two model parameters $\mathbf m = (m_1, m_2)$ and one datum $m_1+2m_2=2$. What is the solution to this problem? In fact any point along the straight line in @QnSQFBXrMEDwAXuinSO3 below is a valid solution. Possibilities include $\mathbf m = (1, 0.5)$ or $\mathbf m = (2, 0)$.
 
 ```{figure} images/VNMrkxzChhdveZyf6lmb-QnSQFBXrMEDwAXuinSO3-v2.png
 :name: QnSQFBXrMEDwAXuinSO3
@@ -17,7 +17,7 @@ This nonuniqueness, and insight about how to deal with it, is exemplified by a t
 Nonuniqueness
 ```
 
-If our goal is to find a single “best” answer to our inverse problem then we clearly need to incorporate additional information as a constraint. A metaphor that is sometimes helpful is the problem of selecting a specific person in a school classroom as portrayed in the image below {cite:p}`{numref}`Figure %s <8Fvod2SfmrYQX8isU0AC>``. To select a single individual via an optimization framework, we need to define a ruler by which to measure each candidate. A ruler, when applied to any member of the set, generates a single number, and this allows us to find the biggest or smallest member as evaluated with that ruler. The potential rulers are unlimited in number and they could be associated with: height, age, length of fingers, amount of hair, number of wrinkles, etc. In general, choosing a different ruler yields a different solution; although it is possible that the youngest person is also the shortest.
+If our goal is to find a single “best” answer to our inverse problem then we clearly need to incorporate additional information as a constraint. A metaphor that is sometimes helpful is the problem of selecting a specific person in a school classroom as portrayed in the image below [@8Fvod2SfmrYQX8isU0AC]. To select a single individual via an optimization framework, we need to define a ruler by which to measure each candidate. A ruler, when applied to any member of the set, generates a single number, and this allows us to find the biggest or smallest member as evaluated with that ruler. The potential rulers are unlimited in number and they could be associated with: height, age, length of fingers, amount of hair, number of wrinkles, etc. In general, choosing a different ruler yields a different solution; although it is possible that the youngest person is also the shortest.
 
 ```{figure} images/VNMrkxzChhdveZyf6lmb-8Fvod2SfmrYQX8isU0AC-v2.png
 :name: 8Fvod2SfmrYQX8isU0AC
@@ -27,6 +27,6 @@ If our goal is to find a single “best” answer to our inverse problem then we
 Selecting a specific person in a school classroom.
 ```
 
-The analogy with vectors is straight forward. Rulers for measuring length of vectors are quantified via a norm. For instance in the toy example above, the smallest Euclidean length vector that lies along the constraint line is shown by the red dot {cite:p}`{numref}`Figure %s <QnSQFBXrMEDwAXuinSO3>``. Of all of the points it is the one that is closest to the origin, that is, the one for which $(m_1^2 + m_2^2)$ is smallest.
+The analogy with vectors is straight forward. Rulers for measuring length of vectors are quantified via a norm. For instance in the toy example above, the smallest Euclidean length vector that lies along the constraint line is shown by the red dot [@QnSQFBXrMEDwAXuinSO3]. Of all of the points it is the one that is closest to the origin, that is, the one for which $(m_1^2 + m_2^2)$ is smallest.
 
 A methodology by which we can get a solution to the inverse problem is now clear. We define a “ruler” that can measure the size of each element in our solution space. We choose the one that has the smallest length and that still adequately fits the data. The name for our “ruler” varies but any of the following descriptors are valid: model norm, model objective function, regularization function or regularizer. We’ll use the term “model norm” since we are explicitly concerned the ranking the elements by their “size”.
diff --git a/content/objective-function-for-the-inverse-problem.md b/content/objective-function-for-the-inverse-problem.md
index 40bec21..4f7c165 100644
--- a/content/objective-function-for-the-inverse-problem.md
+++ b/content/objective-function-for-the-inverse-problem.md
@@ -23,7 +23,7 @@ As a metaphorical example to understand the role of $\beta$ , we consider the op
 \phi=T+\beta F
 ```
 
-When $\beta → 0$ we minimize the time irrespective of the fuel consumption. The gas peddle is on the floor. When $\beta → \infin$ the driver wants to use the absolute minimum amount of fuel so the gas peddle is barely engaged. This is displayed in {numref}`Figure %s <d31AdH08dO30KTmaxlYO>` where both T and F are plotted as a function of $\beta$. It is customary have the $\beta$ axis extend from a high value $\beta_H$ to a low value $\beta_L$ and this is indicated in the first two plots. A plot of $T~\text{vs}~F$ is shown in the third plot of {numref}`Figure %s <d31AdH08dO30KTmaxlYO>`. This is a monotonic curve and each point on the curve corresponds to a single $\beta$.
+When $\beta → 0$ we minimize the time irrespective of the fuel consumption. The gas peddle is on the floor. When $\beta → \infty$ the driver wants to use the absolute minimum amount of fuel so the gas peddle is barely engaged. This is displayed in {numref}`Figure %s <d31AdH08dO30KTmaxlYO>` where both T and F are plotted as a function of $\beta$. It is customary have the $\beta$ axis extend from a high value $\beta_H$ to a low value $\beta_L$ and this is indicated in the first two plots. A plot of $T~\text{vs}~F$ is shown in the third plot of {numref}`Figure %s <d31AdH08dO30KTmaxlYO>`. This is a monotonic curve and each point on the curve corresponds to a single $\beta$.
 
 ```{figure} images/VNMrkxzChhdveZyf6lmb-d31AdH08dO30KTmaxlYO-v2.png
 :name: d31AdH08dO30KTmaxlYO