# Fitting `lmfit.Model`s to `xarray` objects

`xarray-lmfit` adds some methods to xarray objects that allows you to fit data with lmfit models: {meth}`xarray.DataArray.xlm.modelfit` and {meth}`xarray.Dataset.xlm.modelfit`, depending on whether you want to fit a single DataArray or multiple DataArrays in a Dataset.

## The fit result Dataset

The accessor returns a {class}`xarray.Dataset` including the best-fit parameters and the
fit statistics.

:::{hint}

The syntax of the accessors are similar to the xarray native methods {meth}`xarray.DataArray.curvefit` and {meth}`xarray.Dataset.curvefit`.

:::

First, let us generate a Gaussian peak on a linear background.

In [None]:
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
import lmfit

# Generate toy data
x = np.linspace(0, 10, 50)
y = -0.1 * x + 2 + 3 * np.exp(-((x - 5) ** 2) / (2 * 1**2))

# Add some noise with fixed seed for reproducibility
rng = np.random.default_rng(5)
yerr = np.full_like(x, 0.3)
y = rng.normal(y, yerr)

y_arr = xr.DataArray(y, dims=("x",), coords={"x": x})

y_arr

Here, `y_arr` is the DataArray that contains both the values we want to fit, along with the independent variable `x` as a coordinate.

After importing `xarray_lmfit`, we can use {meth}`xarray.DataArray.xlm.modelfit` to fit the data to a model.

In [None]:
import xarray_lmfit

model = lmfit.models.GaussianModel() + lmfit.models.LinearModel()

y_arr.xlm.modelfit(
    "x",
    model=model,
    params=model.make_params(slope=-0.1, center=5.0, sigma={"value": 0.1, "min": 0}),
)

Let's take a closer look at the data variables in the resulting Dataset.

- `modelfit_results` contains the underlying {class}`lmfit.model.ModelResult` object from the fit.
- `modelfit_coefficients` and `modelfit_stderr` contain the best-fit coefficients and their errors, respectively.
- `modelfit_stats` contains the [goodness-of-fit statistics](https://lmfit.github.io/lmfit-py/fitting.html#goodness-of-fit-statistics).

When called on a Dataset instead of a DataArray, these variables will be prefixed with the name of the data variable they correspond to.

It may not be immediately obvious how this is useful, but the true power of the accessor comes from its ability to utilize xarray's powerful broadcasting capabilities, as described in the next section.

## Fitting across multiple dimensions

Suppose you have to fit a single model to multiple data points across some dimension, or even multiple dimensions. The accessor can handle this with ease.

:::{admonition} Work in Progress
:class: warning

This part of the user guide is still under construction.

:::


## Fitting multidimensional models


Fitting is not limited to 1D models. The following example demonstrates how to fit a 2D Gaussian peak to a 2D DataArray.

In [None]:
# Generate synthetic 2D data
x = np.linspace(-10, 10, 50)
y = np.linspace(-10, 10, 50)
x_arr = xr.DataArray(x, dims=("x",), coords={"x": x})
y_arr = xr.DataArray(y, dims=("y",), coords={"y": y})
z_arr = lmfit.lineshapes.gaussian2d(
    x_arr,
    y_arr,
    amplitude=4.0,
    centerx=0.0,
    centery=0.0,
    sigmax=1.0,
    sigmay=2.0,
).rename("z")

# Add some noise with fixed seed for reproducibility
rng = np.random.default_rng(5)
z_arr = z_arr.copy(data=rng.normal(z_arr, 0.01))

Fitting a 2D model is as simple as providing multiple coordinate names for different independent variables:

In [None]:
result_ds = z_arr.xlm.modelfit(
    ("x", "y"),
    model=lmfit.models.Gaussian2dModel(),
    params=model.make_params(
        amplitude=2.0, centerx=0.0, centery=0.0, sigmax=1.0, sigmay=2.0
    ),
)
result_ds

Let's take a look at the best fit and residuals:

In [None]:
fig, axs = plt.subplots(1, 3, figsize=(12, 3), layout="compressed")

z_arr.plot(ax=axs[0], center=False)
axs[0].set_title("Data")

result_ds.modelfit_best_fit.plot(ax=axs[1])
axs[1].set_title("Fit")

(z_arr - result_ds.modelfit_best_fit).plot(ax=axs[2])
axs[2].set_title("Data $-$ Fit")

for ax in axs:
    ax.set_aspect("equal")

Providing initial guesses
-------------------------

Using the broadcasting capabilities of xarray, you can provide initial guesses and
bounds for the fitting parameters as DataArrays. This is useful when you have multiple
data that you want to fit with the same model, but with different initial guesses for
each data point.

To demonstrate, let's create some data containing multiple Gaussian peaks, each with a
different center.

In [None]:
# Define coordinates
x = np.linspace(-5.0, 5.0, 100)
y = np.linspace(-1.0, 1.0, 3)

# Center of the peaks along y
center = np.array([-2.0, 0.0, 2.0])[:, np.newaxis]

# Gaussian peak on a linear background
z = -0.1 * x + 2 + 3 * np.exp(-((x - center) ** 2) / (2 * 1**2))

# Add some noise with fixed seed for reproducibility
rng = np.random.default_rng(5)
zerr = np.full_like(z, 0.1)
z = rng.normal(z, zerr)

# Construct DataArray
darr = xr.DataArray(z, dims=["y", "x"], coords={"y": y, "x": x})
darr.plot()

We can provide different initial guesses for the peak positions along `y` by passing a dictionary of DataArrays to the `params` argument. 

In [None]:
result_ds = darr.xlm.modelfit(
    coords="x",
    model=lmfit.models.GaussianModel() + lmfit.models.LinearModel(),
    params={
        "center": xr.DataArray([-2, 0, 2], coords=[darr.y]),
        "slope": -0.1,
    },
)
result_ds

Let's overlay the fitted peak positions on the data.

In [None]:
result_ds.modelfit_data.plot()
result_center = result_ds.sel(param="center")

plt.plot(result_center.modelfit_coefficients, result_center.y, "o-")

The same can be done with all parameter attributes that can be passed to {func}`lmfit.create_params` (e.g., `vary`, `min`, `max`, etc.). For example:

In [None]:
result_ds = darr.xlm.modelfit(
    coords="x",
    model=lmfit.models.GaussianModel() + lmfit.models.LinearModel(),
    params={
        "center": {
            "value": xr.DataArray([-2, 0, 2], coords=[darr.y]),
            "min": -5.0,
            "max": xr.DataArray([0, 2, 5], coords=[darr.y]),
        },
        "slope": -0.1,
    },
)
result_ds

## Parallelization

:::{warning}

Parallelization is still a work in progress, and the API may change in the future.

:::

The accessors are tightly integrated with `xarray`, so passing a dask array will
parallelize the fitting process. See [Parallel Computing with Dask](https://docs.xarray.dev/en/stable/user-guide/dask.html) for more information.

For non-dask objects, you can achieve `joblib`-based parallelization:

- For non-dask Datasets, basic parallelization across multiple data variables can be
    achieved with the ``parallel`` argument to {meth}`xarray.Dataset.xlm.modelfit`.

## Saving and loading fits

Since the fit results are stored in an xarray Dataset, they can be easily saved as
netCDF files by serializing lmfit objects to JSON. This can be done with {func}`xarray_lmfit.save_fit`:

```python
import xarray_lmfit as xlm

xlm.save_fit(result_ds, "fit_results.nc")
```

The saved Dataset can be loaded back with {func}`xarray_lmfit.load_fit`.

```python
result_ds = xlm.load_fit("fit_results.nc")
```

:::{warning}

Saving full model results that includes the model functions can be difficult. Instead of saving the fit results, it is recommended to save the code that can reproduce the fit. See [the relevant lmfit documentation](https://lmfit.github.io/lmfit-py/model.html#saving-and-loading-modelresults) for more information.

:::