# Bootstrap estimates

In [None]:
from magentropy import MagentroData

magdata = MagentroData('magdata.dat')
magdata.process_data()

In [None]:
magdata.processed_df

The problem of estimating true statistical model parameters using a single data set is commonly
approached using bootstrap procedures. Given data of length {math}`N`, bootstrap resampling involves
repeatedly sampling {math}`N` points from the data *with replacement*, fitting a model to each of
the {math}`N_\mathrm{B}` data samples, and computing the parameter of interest from the
{math}`N_\mathrm{B}` fitted models.

In our case, we want to estimate the error at each output point of the smoothed magnetic moment.
To do this, the standard deviation of each smoothed magnetic moment point is computed from the
values of {math}`N_\mathrm{B}` fitted models at each point. Every model is computed using a subset
(again, sampled with replacement) of the original data, though the smoothed moment is evaluated at
the same linearly-spaced points every time. (The output points are specified in {{ presets }} as
part of {doc}`data processing <processing_data>`.)

There are a few significant caveats associated with this approach. Each caveat get its own little
admonition below. Please read!

```{attention}
The bootstrap method presented here is purely experimental and is not detailed in either of the
sources listed on the {doc}`homepage <../index>`.
```

```{caution}
{math}`N_\mathrm{B}` regularization problems must be solved for every temperature sweep taken at a
particular field strength. As such, this method is computationally expensive and can take upwards
of ten minutes to run on typical magnetization data, depending on the size of the data and how many
models are fitted at each field.
```

```{important}
Bootstrap estimates in the context of regularization are dependent on the chosen regularization
parameter {math}`\lambda`. These error estimates should not be viewed as "true" estimates but
rather as the estimates for a *given* {math}`\lambda`. This method should only be used once the
user is confident their {math}`\lambda`'s are appropriate.
```

Caveats aside, the method is simple, if time-consuming. Two arguments are supported: `n_bootstrap`
(the number of models to fit at each field) and `random_seed` (for reproducibility).

In [None]:
magdata.bootstrap(n_bootstrap=100, random_seed=0)

The error columns in {{ processed_df }} are now filled:

In [None]:
magdata.processed_df

We can make a simple plot of the errors using the usual {mod}`matplotlib` tools.
Built-in plotting of errors may be added in the future.

In [None]:
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 180

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(6, 4))

for i in range(0, 4001, 1000):
    ax.plot('T', 'M_per_mass_err', '.', data=magdata.processed_df.iloc[i:i+1000, :], markersize=0.5)

ax.set(
    xlabel='$T$ (K)',
    ylabel='Moment per mass error (emu/g)',
    title='Bootstrap error estimates'
);