
# Inference of Mass Eruption Rate from uncertain observations of Plume Height

In `Example1 <sphx_glr_examples_example1.py>`, we showed how merph can be used to generate plausible estimates of the Mass Eruption Rate (MER) from a single observation of the plume height (H) using Mastin's data set.

Here we explore a more likely situation, where our observation of the plume height is uncertain, and illustrate the sampling from the posterior predictive distribution provides a useful method of calculating a distribution of plausible values for the mass eruption.

In this example we will again use the Mastin dataset.


<div class="alert alert-info"><h4>Note</h4><p>This example can be viewed as a Jupyter notebook, launched from the commandline using `merph --example 2` or from an interactive python session using

```python
import merph
merph.launch_jupyter_example(2)</p></div>
```


## Imports and data loading

Import the merph package as well as:
- numpy for numerical data
- pandas for data frames
- matplotlib for plotting
- some scipy.stats functions



In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.stats import truncnorm, uniform

import merph

Load the Mastin dataset from `merph`



In [None]:
Mastin = merph.load_Mastin()

We will set the variables, with plume height $H$ as the independent (explanatory) variable and mass eruption rate $Q$ as the dependent response variable:



In [None]:
Mastin.set_vars(xvar='H', yvar='Q')

## Multiple plausible observations

In `Example1 <sphx_glr_examples_example1.py>`, we performed posterior prediction using a single value of the plume height ($H = 10$ km).

First, we will compare the posterior predictive distributions when different values of the plume height are used.

Suppose we have two methods of observing the plume (e.g. ground-based observer and satellite remote sensing), with the first providing a plume height of $H = 10$ km while the second gives a plume height of $H = 15$ km.  We can produce two posterior predictive distributions for the mass eruption rate using these observations:



In [None]:
pred = Mastin.posterior_predictive([10, 15])

To explore the differences in the MER estimates, we will simulate 1000 samples from the posterior predictive distribution for each of these observations, returning the results as a pandas dataframe and plotting histograms of the two posterior predictive distributions.



In [None]:
mer_predictions = pred.simulate(1000, plot=True)
mer_predictions

We see that while there is some overlap in the posterior predictive distributions for the MER, the median values are clearly distinct:



In [None]:
mer_median = pred.median()
print('Median value of the posterior predictive distributions for MER:')
print(f'  MER | H = 10 km: {mer_median[0]:.3g} kg/s')
print(f'  MER | H = 15 km: {mer_median[1]:.3g} kg/s')

We can calculate the probability that the MER takes a specified value from the posterior predictive distribution, using the methods in `posterior_probability`.

For example, we can find the probability density that the MER is 5 million kg/s for each of the plume heights.



In [None]:
Q = 5e6
mer_pdf = pred.pdf(Q)
print(f"Probability density for MER = {Q:.3g} kg/s with:")
print(f"   H = 10 km: {mer_pdf[0][0]: .3e}")
print(f"   H = 15 km: {mer_pdf[1][0]: .3e}")

<div class="alert alert-info"><h4>Note</h4><p>The posterior probability densities integrate to one, so the pdf at values of the MER are small, since the MER varies over several orders of magnitude.</p></div>

We could also find the probability that the MER is less than a specified value, say 10 million kg/s, for each observation,



In [None]:
Q = 1e7
mer_prob = pred.cdf(Q)
print(f"Probability that MER <= {Q:.3g} kg/s with:")
print(f"   H = 10 km: {mer_prob[0][0]: .3f}")
print(f"   H = 15 km: {mer_prob[1][0]: .3f}")

or we could find the probability that the MER lies in some specified interval, say 4 -- 6 million kg/s, for each observation:



In [None]:
Q0, Q1 = 4e6, 6e6
mer_prob_interval = pred.cdf(Q1) - pred.cdf(Q0)
print(f"Probability that {Q0: .3g} <= MER <= {Q1:.3g} kg/s with:")
print(f"   H = 10 km: {mer_prob_interval[0][0]:0.3f}")
print(f"   H = 15 km: {mer_prob_interval[1][0]:0.3f}")

Another useful method is `ppf` that computes the percent point function (inverse of the cdf).  This allows us to find the MER $Q$ such that $P(MER <= Q) = p$ for a specified $0<p<1$.

For example, we can find the upper 10th-percentile MER for each observation:



In [None]:
mer_upper10 = pred.ppf(0.9)
print(f"90th-percentile value of the posterior predictive distribution for MER:")
print(f"   H = 10 km: {mer_upper10[0][0]: .3g} kg/s")
print(f"   H = 15 km: {mer_upper10[1][0]: .3g} kg/s")

## Plausible range of observations

In other situations, we may have an instrument that provides a plume height observation within a range (e.g. a radar observation with a known error bar).

Here we can simulate the posterior distribution of the MER with plume heights sampled from the specified range.

The `joint_predictive` subclass allows us to input a probability distribution to represent the observational uncertainty and to calculate from the joint posterior predictive distribution, using the relationship

\begin{align}p(Q,H) = p(Q|H)p(H)\end{align}


### Uniform distribution

Suppose the plume height is measured in the range 10 -- 15 km, where any value in the range is equally likely.  We can construct a uniform distribution on this range (using the `uniform` function from `scipy.stats`):



In [None]:
fH = uniform(loc=10, scale=5)

We then use this distribution in the `joint_predictive` subclass of Mastin's data:



In [None]:
joint_uniform = Mastin.joint_predictive(fH)

and draw MER values from the posterior predictive distribution.



In [None]:
HQ = joint_uniform.rvs(10)
print(f'H samples: {HQ[0,:]}')
print(f'Q samples: {HQ[1,:]}')

The `simulate` method returns random samples from the joint distribution as a DataFrame.  We can plot these as a scatter plot, with marginal histograms for the MER and plume height.



In [None]:
joint_uniform_sample = joint_uniform.simulate(1000, plot=True)
joint_uniform_sample

The scatter plot shows the posterior predictive samples of the MER as a function of the uncertain observed plume height.  We see that, for any given plume height, there is a distribution of plausible MER values (as expected, given the uncertainty in the data set from which the regression is derived).  With the uncertainty in the observation included, the distribution of MER lies between those with precise observations at the extreme values, as shown below:



In [None]:
post_10 = Mastin.posterior_predictive(np.log10(10), logscale=True)
post_15 = Mastin.posterior_predictive(np.log10(15), logscale=True)

logq = np.linspace(4,9,100)

fig, ax = plt.subplots()
ax.hist(joint_uniform_sample['log Q'], bins='auto', density=True, histtype='step', label=r'10 < H < 15')
ax.plot(logq, post_10.pdf(logq).squeeze(), label=r'H = 10')
ax.plot(logq, post_15.pdf(logq).squeeze(), label=r'H = 15')
ax.legend()
ax.set_xlabel('log Q')
_ = ax.set_ylabel('probability density')
plt.show()

If we are able to better constrain the uncertainty in the plume height observation, we can slice into the pandas DataFrame of the posterior predictive samples to obtain a new distribution of MER estimates.

For example, suppose that the plume height can be constrained to the range 12--14 km, then we obtain MER estimates with:



In [None]:
joint_uniform_constained = joint_uniform_sample.loc[(joint_uniform_sample.H >= 12.) & (joint_uniform_sample.H <= 14.)]
joint_uniform_constained

<div class="alert alert-info"><h4>Note</h4><p>Slicing into the dataframe gives a smaller number of samples, so that the posterior predictive distribution for the MER is less well resolved.</p></div>

An alternative (and just about as quick to implement) is to reset the observations and resample the posterior predictive distribution, and we can now specify the number of samples.



In [None]:
fH = uniform(loc=12, scale=2)
joint_uniform = Mastin.joint_predictive(fH)
joint_uniform_sample = joint_uniform.simulate(1000, plot=True)
joint_uniform_sample

### Truncated Normal distribution and general PDF

It may not be appropriate to characterize the uncertainty in the plume height observation using a uniform distribution.  There may be reasons to prefer some values in an interval.

If we can specify a probability distribution for the plume height observation then, similarly to above, we can draw samples from this distribution and then sample the posterior predictive distribution of the MER for each plume height sample.

As an example, suppose we can characterize the plume height observation as a truncated Normal distribution on the interval 10 -- 15 km, with a mean at 11 km and a standard deviation of 1 km. 



In [None]:
H_lower = 10. # Lower bound of the plume height observation interval
H_upper = 15. # Upper bound of the plume height observation interval
H_mean = 11. # Mean of the plume height observation distribution
H_std = 1. # Standard deviation of the plume height observation distribution

# Find the endpoints for the truncnorm class from scipy
a = (H_lower - H_mean)/H_std
b = (H_upper - H_mean)/H_std

H = truncnorm(a, b, loc=H_mean, scale=H_std) # Initialize and freeze the distribution
fig, ax = plt.subplots()
ax.hist(H.rvs(1000), density=True) # Draw 1000 samples from the truncated normal distribution
hh = np.linspace(H_lower, H_upper, 100)
ax.plot(hh,H.pdf(hh)) # Plot the pdf
ax.set_xlabel('H (km)')
_ = ax.set_ylabel('Probability density')
plt.show()

We can construct a joint distribution using the truncated normal distribution for the observations of `H`.



In [None]:
joint_truncnorm = Mastin.joint_predictive(H)

This approach works for general PDFs for the uncertain observation.  Any of the distributions in `scipy.stats` can be used, or defined using its `rv_continuous` class.

### Selecting plume height -- MER pairs.

It is important to remember that the MER estimates are conditional on the plume height observation (i.e. we are drawing samples from the distribution of MER | H).

We may wish to construct a set of pairs of plume height, $H$, and mass eruption rate, $Q$, that can be used e.g. to forecast ash dispersion.

To do this, we can draw $H$ and $Q$ from each of their distributions at specified probability levels.  The probability of the pairs can be determined by the law of conditional probabilities.  The example below illustrates this, using the truncated Normal distribution for $H$ as above:



In [None]:
H_dist = truncnorm(a, b, loc=H_mean, scale=H_std) # Specify the truncated Normal as a frozen distribution
H_90 = H_dist.ppf(0.9) # Get the height that corresponds to the 90th percentile of the distribution for the plume height
print(f"90%ile of plume height: {H_90:.2f} km")

Q_H90 = Mastin.posterior_predictive(H_90) # Set this as the observation

Q90_H90 = Q_H90.ppf(0.9)[0][0] # Determine the 90th percentile value of the posterior predictive distribution at this plume height.

print('High plume with large MER:')
print(f'P(Q < {Q90_H90: .3g} | H = {H_90: .3f}) = 0.9')
print(f'P(H = {H_90: .3f}) = {H_dist.pdf(H_90): .3f}')
print(f'P(Q < {Q90_H90: .3g} & H = {H_90: .3f}) = P(Q < {Q90_H90: .3g} | H = {H_90: .3f})*P(H = {H_90: .3f})' 
    + f' = 0.9 *{H_dist.pdf(H_90): .3f} = {0.9*H_dist.pdf(H_90): .3f}')

Q10_H90 = Q_H90.ppf(0.1)[0][0] # Determine the 10th percentile value of the posterior predictive distribution at this plume height.

print('High plume with small MER:')
print(f'P(Q < {Q10_H90: .3g} | H = {H_90: .3f}) = 0.1')
print(f'P(H = {H_90: .3f}) = {H_dist.pdf(H_90): .3f}')
print(f'P(Q < {Q10_H90: .3g} & H = {H_90: .3f}) = P(Q < {Q10_H90: .3g} | H = {H_90: .3f})*P(H = {H_90: .3f})' 
    + f' = 0.1 *{H_dist.pdf(H_90): .3f} = {0.1*H_dist.pdf(H_90): .3f}')

H_10 = H_dist.ppf(0.1) # Get the height that corresponds to the 10th percentile of the distribution for the plume height

Q_H10 = Mastin.posterior_predictive(H_10) # Set this as the observation

Q90_H10 = Q_H10.ppf(0.9)[0][0] # Determine the 90th percentile value of the posterior predictive distribution at this plume height.

print('Low plume with large MER:')
print(f'P(Q < {Q90_H10: .3g} | H = {H_10: .3f}) = 0.9')
print(f'P(H = {H_10: .3f}) = {H_dist.pdf(H_10): .3f}')
print(f'P(Q < {Q90_H10: .3g} & H = {H_10: .3f}) = P(Q < {Q90_H10: .3g} | H = {H_10: .3f})*P(H = {H_10: .3f})'
    + f' = 0.9 *{H_dist.pdf(H_10): .3f} = {0.9*H_dist.pdf(H_10): .3f}')

Q10_H10 = Q_H10.ppf(0.1)[0][0] # Determine the 10th percentile value of the posterior predictive distribution at this plume height.

print('Low plume with small MER:')
print(f'P(Q < {Q10_H10: .3g} | H = {H_10: .3f}) = 0.1')
print(f'P(H = {H_10: .3f}) = {H_dist.pdf(H_10): .3f}')
print(f'P(Q < {Q10_H10: .3g} & H = {H_10: .3f}) = P(Q < {Q10_H10: .3g} | H = {H_10: .3f})*P(H = {H_10: .3f})'
    + f' = 0.1 *{H_dist.pdf(H_10): .3f} = {0.1*H_dist.pdf(H_10): .3f}')