Skip to content

Commit

Permalink
Spline model (#804)
Browse files Browse the repository at this point in the history
* add SplineModel: variadic cubic spline

* add test for SplineModel

* doc updates

* more doc updates

* more doc updates

* more doc updates

* add Spline example, with data

* some test cleanups

* some test cleanups

* fix section level

* cleanups

* tell codespell to ignore all of lmfit/models.py because of 1 mis-spelling

* cleanups

* fix codespell pre-commit hook

Co-authored-by: reneeotten <reneeotten@users.noreply.github.com>
  • Loading branch information
newville and reneeotten committed Sep 16, 2022
1 parent 97ea5f1 commit 3bef74f
Show file tree
Hide file tree
Showing 7 changed files with 989 additions and 22 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ repos:
exclude: 'doc/doc_examples_to_gallery.py'
# escaped characters currently do not work correctly
# so \nnumber is considered a spelling error....
args: [-L nnumber]
args: ["-L nnumber", "-L mone"]

- repo: https://github.com/asottile/yesqa
rev: v1.4.0
Expand Down
280 changes: 271 additions & 9 deletions doc/builtin_models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,9 @@ Linear and Polynomial Models
These models correspond to polynomials of some degree. Of course, lmfit is
a very inefficient way to do linear regression (see :numpydoc:`polyfit`
or :scipydoc:`stats.linregress`), but these models may be useful as one
of many components of a composite model.
of many components of a composite model. The SplineModel below corresponds
to a cubic spline.


:class:`ConstantModel`
~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -186,6 +188,12 @@ of many components of a composite model.

.. autoclass:: PolynomialModel

:class:`SplinelModel`
~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: SplineModel



Periodic Models
---------------
Expand Down Expand Up @@ -317,17 +325,17 @@ could to define this in a script:

script = """
def mycurve(x, amp, cen, sig):
loren = lorentzian(x, amplitude=amp, center=cen, sigma=sig)
gauss = gaussian(x, amplitude=amp, center=cen, sigma=sig)
return log(loren) * gradient(gauss) / gradient(x)
loren = lorentzian(x, amplitude=amp, center=cen, sigma=sig)
gauss = gaussian(x, amplitude=amp, center=cen, sigma=sig)
return log(loren) * gradient(gauss) / gradient(x)
"""

and then use this with :class:`ExpressionModel` as:

.. jupyter-execute::

mod = ExpressionModel('mycurve(x, height, mid, wid)', init_script=script,
independent_vars=['x'])
independent_vars=['x'])

As above, this will interpret the parameter names to be ``height``, ``mid``,
and ``wid``, and build a model that can be used to fit data.
Expand Down Expand Up @@ -604,10 +612,10 @@ this, and by defining an :func:`index_of` function to limit the data range.
That is, with::

def index_of(arrval, value):
"""Return index of array *at or below* value."""
if value < min(arrval):
return 0
return max(np.where(arrval <= value)[0])
"""Return index of array *at or below* value."""
if value < min(arrval):
return 0
return max(np.where(arrval <= value)[0])

ix1 = index_of(x, 75)
ix2 = index_of(x, 135)
Expand Down Expand Up @@ -645,3 +653,257 @@ and without any bounds on parameters at all:

This script is in the file ``doc_builtinmodels_nistgauss2.py`` in the examples folder,
and the figure above shows an improved initial estimate of the data.


Example 4: Using a Spline Model
--------------------------------

In the example above, the two peaks might represent the interesting part of
the data, and the exponential decay could be viewed a "background" which
might be due to other physical effects or part of some response of the
instrumentation used to make the measurement. That is, the background
might be well-understood to be modeled as an exponential decay, as in the
example above and so easily included in the full analysis. As the results
above show, there is some -- but not huge -- correlation of the parameters
between the peak amplitudes and the decay of the exponential function.
That means that it is helpful to include all of those components in a
single fit, as the uncertainties in the peak amplitudes (which would be
interpreted as "line strength" or "area") will reflect some of the
uncertainty in how well we modeled the background.

Sometimes a background is more complex or at least has a less obvious
functional form. In these cases, it can be useful to use a *spline* to
model part of the curve. Just for completeness, a spline is a piecewise
continuous polynomial function (typically made of cubic polynomials) that
has a series of ``x`` values known as "knots" at which the highest order
derivative is allowed to be discontinuous. By adding more knots, the
spline function has more flexibility to follow a particular function.

As an example (see the example file "doc_builtinmodels_splinemodel.py"), we
start with data with a single peak and a background that is hard to
characterize clearly as a simple decay, oscillatory structure.

.. jupyter-execute::
:hide-output:

import numpy as np
import matplotlib.pyplot as plt
from lmfit.models import SplineModel, GaussianModel

data = np.loadtxt('test_splinepeak.dat')
x = data[:, 0]
y = data[:, 1]

plt.plot(x, y, label='data')
plt.legend()
plt.show()

which shows (figure below):

.. jupyter-execute::
:hide-code:

plt.plot(x, y, label='data')
plt.legend()
plt.show()


There is definitely a peak there, so we could start with building a model
for a Gaussian peak, say with:

.. jupyter-execute::
:hide-output:

model = GaussianModel(prefix='peak_')
params = model.make_params(amplitude=8, center=16, sigma=1)


To account for that changing background, we'll use a spline, but need to
know where to put the "knots". Picking points away from the peak makes
sense -- we don't want to fit the peak -- but we want it to have some
flexibility near the peak. Let's try spacing knot points at ``x=1, 3, ...,
13``, then skip over the peak at around ``x=16`` and then pick up knots points
at ``x=19, 21, 23, 25``.

.. jupyter-execute::
:hide-output:

knot_xvals = np.array([1, 3, 5, 7, 9, 11, 13, 19, 21, 23, 25])

bkg = SplineModel(prefix='bkg_', xknots=knot_xvals)
params.update(bkg.guess(y, x))


Note that we used ``bkg.guess()`` to guess the initial values of the spline
parameters and then update the ``params`` Parameters object with these 11
parameters to account for the spline. These will be very close to the ``y``
values at the knot ``x`` values. The precise definition of the spline knot
parameters is not "the y-values through which the resulting spline curve
goes", but these values are pretty good estimates for the resulting spline
values. You'll see below that these initial values are close.

With a spline background defined, we can create a composite model, and run
a fit.

.. jupyter-execute::
:hide-output:

model = model + bkg

params['peak_amplitude'].min = 0
params['peak_center'].min = 10
params['peak_center'].max = 20

out = model.fit(y, params, x=x)
print(out.fit_report(min_correl=0.3))

You'll see that we first set some "sanity bounds" on the peak parameters to
prevent the peak from going completely wrong. This really is not necessary
in this case, but it is often a reasonable thing to do - the general advice
for this is to be generous in the bounds, not overly restrictive.

This fit will print out a report of

.. jupyter-execute::
:hide-code:

print(out.fit_report(min_correl=0.3))


from this we can make a few observations. First, the correlation between
the "spline" parameters" and the "peak parameters" is noticeable, but not
extremely high -- that's good, and the estimated uncertainties do account
for this correlation. The spline components are correlated with each other
(especially with the N-1 and N+1 spline parameter). Second, we can see
that the initial values for the background spline parameters are pretty
good.

We can plot the results and fit components with

.. jupyter-execute::
:hide-output:

comps = out.eval_components()
plt.plot(x, out.best_fit, label='best fit')
plt.plot(x, comps['bkg_'], label='background')
plt.plot(x, comps['peak_'], label='peak')
plt.legend()

which will generate the plot shown below:

.. jupyter-execute::
:hide-code:

plt.plot(x, y, label='data')
plt.plot(x, out.best_fit, label='best fit')
plt.plot(x, comps['bkg_'], label='background')
plt.plot(x, comps['peak_'], label='peak')
plt.legend()
plt.show()


If we're interested in seeing the locations of the knots, you might do

.. jupyter-execute::
:hide-output:

knot_yvals = np.array([o.value for o in out.params.values() if o.name.startswith('bkg')])
plt.plot(knot_xvals, knot_yvals, 'o', color='black', label='spline knots values')

which will generate be shown as

.. jupyter-execute::
:hide-code:

plt.plot(x, y, label='data')
plt.plot(x, out.best_fit, label='best fit')
plt.plot(x, comps['bkg_'], label='background')
plt.plot(x, comps['peak_'], label='peak')
knot_yvals = np.array([o.value for o in out.params.values() if o.name.startswith('bkg')])
plt.plot(knot_xvals, knot_yvals, 'o', color='black', label='spline knots values')

plt.legend()
plt.show()


You might be interested in trying to assess what impact the select of the
knots has on the resulting peak intensity. For example, you might try some
of the following set of knot values:

.. jupyter-execute::
:hide-output:

knot_xvals1 = np.array([1, 3, 5, 7, 9, 11, 13, 19, 21, 23, 25])
knot_xvals2 = np.array([1, 3, 5, 7, 9, 11, 13, 16, 19, 21, 23, 25])
knot_xvals3 = np.array([1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25])


and re-run the fit with these different sets of knot points. The results
are shown in the table below.


.. _models_spline_results-table:

Table of Peak amplitudes with varying spline points

+-------------------+------+----------------------------------------+
| spline x points | N | Peak amplitude value and uncertainty |
+===================+======+========================================+
| knot_xvals1 | 11 | 12.223 (0.295) |
+-------------------+------+----------------------------------------+
| knot_xvals2 | 12 | 11.746 (0.594) |
+-------------------+------+----------------------------------------+
| knot_xvals3 | 13 | 12.052 (0.872) |
+-------------------+------+----------------------------------------+

Adding more spline points, especially near the peak center around ``x=16.4``,
can impact the measurement of the amplitude but the uncertainty increases
dramatically enough to mostly cover the same range of values. This is a
interesting case of adding more parameters to a fit and having the
uncertainties in the fitted parameters getting worse. The interested
reader is encouraged to explore the fit reports and plot these different case.


Finally, the basic case above used 11 spline points to fit the baseline.
In fact, it would be reasonable to ask whether that is enough parameters
to fit the full spectra. By imposing that there is also a Gaussian
peak nearby makes the spline fit only the background, but without the
Gaussian, the spline could fit the full curve. By way of example, we'll
just try increasing the number of spline points to fit this data

.. jupyter-execute::
:hide-output:

plt.plot(x, y, 'o', label='data')
for nknots in (10, 15, 20, 25):
model = SplineModel(prefix='bkg_', xknots=np.linspace(0, 25, nknots))
params = model.guess(y, x)
out = model.fit(y, params, x=x)
plt.plot(x, out.best_fit, label=f'best-fit ({nknots} knots)')

plt.legend()
plt.show()



which will show the fit below:

.. jupyter-execute::
:hide-code:

plt.plot(x, y, 'o', label='data')
for nknots in (10, 15, 20, 25):
model = SplineModel(prefix='bkg_', xknots=np.linspace(0, 25, nknots))
params = model.guess(y, x)
out = model.fit(y, params, x=x)
plt.plot(x, out.best_fit, label=f'best-fit ({nknots} knots)')

plt.legend()
plt.show()


By itself, 10 knots does not give a very good fit, but 25 knots or more
does give a very good fit to the peak. This should give some confidence
that the fit with 11 parameters for the background spline is acceptable,
but also give some reason to be careful in selecting the number of spline
points to use.
63 changes: 63 additions & 0 deletions examples/doc_builtinmodels_splinemodel.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# <examples/doc_builtinmodels_splinemodel.py>
import matplotlib.pyplot as plt
import numpy as np

from lmfit.models import GaussianModel, SplineModel

data = np.loadtxt('test_splinepeak.dat')
x = data[:, 0]
y = data[:, 1]

plt.plot(x, y, label='data')

model = GaussianModel(prefix='peak_')
params = model.make_params(amplitude=8, center=16, sigma=1)

# make a background spline with knots evenly spaced over the background,
# but sort of skipping over where the peak is
knot_xvals3 = np.array([1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25])
knot_xvals2 = np.array([1, 3, 5, 7, 9, 11, 13, 16, 19, 21, 23, 25]) # noqa: E241
knot_xvals1 = np.array([1, 3, 5, 7, 9, 11, 13, 19, 21, 23, 25]) # noqa: E241

bkg = SplineModel(prefix='bkg_', xknots=knot_xvals1)
params.update(bkg.guess(y, x))

model = model + bkg

params['peak_center'].min = 10
params['peak_center'].max = 20
params['peak_amplitude'].min = 0

plt.plot(x, model.eval(params, x=x), label='initial')

out = model.fit(y, params, x=x)
print(out.fit_report(min_correl=0.3))
comps = out.eval_components()

plt.plot(x, out.best_fit, label='best fit')
plt.plot(x, comps['bkg_'], label='background')
plt.plot(x, comps['peak_'], label='peak')

knot_yvals = np.array([o.value for o in out.params.values() if o.name.startswith('bkg')])
plt.plot(knot_xvals1, knot_yvals, 'o', color='black', label='spline knots values')
plt.legend()
plt.show()


# knot positions | peak amplitude
# 11, 13, 19, 21 | 12.223 0.295
# 11, 13, 16, 19, 21 | 11.746 0.594
# 11, 13, 15, 17, 19, 21 | 12.052 0.872


plt.plot(x, y, 'o', label='data')

for nknots in (10, 15, 20, 25, 30):
model = SplineModel(prefix='bkg_', xknots=np.linspace(0, 25, nknots))
params = model.guess(y, x)
out = model.fit(y, params, x=x)
plt.plot(x, out.best_fit, label=f'best-fit ({nknots} knots)')
plt.legend()
plt.show()

# <end examples/doc_builtinmodels_splinemodel.py>
Loading

0 comments on commit 3bef74f

Please sign in to comment.