Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spline model #804

Merged
merged 14 commits into from
Sep 16, 2022
282 changes: 273 additions & 9 deletions doc/builtin_models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,9 @@ Linear and Polynomial Models
These models correspond to polynomials of some degree. Of course, lmfit is
a very inefficient way to do linear regression (see :numpydoc:`polyfit`
or :scipydoc:`stats.linregress`), but these models may be useful as one
of many components of a composite model.
of many components of a composite model. The SplineModel below corresponds
to a cubic spline.


:class:`ConstantModel`
~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -181,6 +183,12 @@ of many components of a composite model.

.. autoclass:: PolynomialModel

:class:`SplinelModel`
~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: SplineModel



Periodic Models
---------------
Expand Down Expand Up @@ -312,17 +320,17 @@ could to define this in a script:

script = """
def mycurve(x, amp, cen, sig):
loren = lorentzian(x, amplitude=amp, center=cen, sigma=sig)
gauss = gaussian(x, amplitude=amp, center=cen, sigma=sig)
return log(loren) * gradient(gauss) / gradient(x)
loren = lorentzian(x, amplitude=amp, center=cen, sigma=sig)
gauss = gaussian(x, amplitude=amp, center=cen, sigma=sig)
return log(loren) * gradient(gauss) / gradient(x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure this will render correctly indented Python code on the documentation website (here and in other places where you made similar whitespace changes) so that, for example, people can copy-and-paste code from the docs to their Python shell?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what is happening there... My editor seems to be putting tabs into rst files for indents. They do seem to render to HTML correctly

"""

and then use this with :class:`ExpressionModel` as:

.. jupyter-execute::

mod = ExpressionModel('mycurve(x, height, mid, wid)', init_script=script,
independent_vars=['x'])
independent_vars=['x'])

As above, this will interpret the parameter names to be ``height``, ``mid``,
and ``wid``, and build a model that can be used to fit data.
Expand Down Expand Up @@ -599,10 +607,10 @@ this, and by defining an :func:`index_of` function to limit the data range.
That is, with::

def index_of(arrval, value):
"""Return index of array *at or below* value."""
if value < min(arrval):
return 0
return max(np.where(arrval <= value)[0])
"""Return index of array *at or below* value."""
if value < min(arrval):
return 0
return max(np.where(arrval <= value)[0])

ix1 = index_of(x, 75)
ix2 = index_of(x, 135)
Expand Down Expand Up @@ -640,3 +648,259 @@ and without any bounds on parameters at all:

This script is in the file ``doc_builtinmodels_nistgauss2.py`` in the examples folder,
and the figure above shows an improved initial estimate of the data.



Example 4: Using a Spline Model
===============================
reneeotten marked this conversation as resolved.
Show resolved Hide resolved


In the example above, the two peaks might represent the interesting part of
the data, and the exponential decay could be viewed a "background" which
might be due to other physical effects or part of some response of the
instrumentation used to make the measurement. That is, the background
might be well-understood to be modeled as an exponential decay, as in the
example above and so easily included in the full analysis. As the results
above show, there is some -- but not huge -- correlation of the parameters
between the peak amplitudes and the decay of the exponential function.
That means that it is helpful to include all of those components in a
single fit, as the uncertainties in the peak amplitudes (which would be
interpreted as "line strength" or "area") will reflect some of the
uncertainty in how well we modeled the background.

Sometimes a background is more complex or at least has a less obvious
functional form. In these cases, it can be useful to use a *spline* to
model part of the curve. Just for completeness, a spline is a piecewise
continuous polynomial function (typically made of cubic polynomials) that
has a series of ``x`` values known as "knots" at which the highest order
derivative is allowed to be discontinuous. By adding more knots, the
spline function has more flexibility to follow a particular function.

As an example (see the example file "doc_builtinmodels_splinemodel.py"), we
start with data with a single peak and a background that is hard to
characterize clearly as a simple decay, oscillatory structure.

.. jupyter-execute::
:hide-output:

import numpy as np
import matplotlib.pyplot as plt
from lmfit.models import SplineModel, GaussianModel

data = np.loadtxt('test_splinepeak.dat')
x = data[:, 0]
y = data[:, 1]

plt.plot(x, y, label='data')
plt.legend()
plt.show()

which shows (figure below):

.. jupyter-execute::
:hide-code:

plt.plot(x, y, label='data')
plt.legend()
plt.show()


There is definitely a peak there, so we could start with building a model
for a Gaussian peak, say with:

.. jupyter-execute::
:hide-output:

model = GaussianModel(prefix='peak_')
params = model.make_params(amplitude=8, center=16, sigma=1)


To account for that changing background, we'll use a spline, but need to
know where to put the "knots". Picking points away from the peak makes
sense -- we don't want to fit the peak -- but we want it to have some
flexibility near the peak. Let's try spacing knot points at ``x=1, 3, ...,
13``, then skip over the peak at around ``x=16`` and then pick up knots points
at ``x=19, 21, 23, 25``.

.. jupyter-execute::
:hide-output:

knot_xvals = np.array([1, 3, 5, 7, 9, 11, 13, 19, 21, 23, 25])

bkg = SplineModel(prefix='bkg_', xknots=knot_xvals)
params.update(bkg.guess(y, x))


Note that we used ``bkg.guess()`` to guess the initial values of the spline
parameters and then update the ``params`` Parameters object with these 11
parameters to account for the spline. These will be very close to the ``y``
values at the knot ``x`` values. The precise definition of the spline knot
parameters is not "the y-values through which the resulting spline curve
goes", but these values are pretty good estimates for the resulting spline
values. You'll see below that these initial values are close.

With a spline background defined, we can create a composite model, and run
a fit.

.. jupyter-execute::
:hide-output:

model = model + bkg

params['peak_amplitude'].min = 0
params['peak_center'].min = 10
params['peak_center'].max = 20

out = model.fit(y, params, x=x)
print(out.fit_report(min_correl=0.3))

You'll see that we first set some "sanity bounds" on the peak parameters to
prevent the peak from going completely wrong. This really is not necessary
in this case, but it is often a reasonable thing to do - the general advice
for this is to be generous in the bounds, not overly restrictive.

This fit will print out a report of

.. jupyter-execute::
:hide-code:

print(out.fit_report(min_correl=0.3))


from this we can make a few observations. First, the correlation between
the "spline" parameters" and the "peak parameters" is noticeable, but not
extremely high -- that's good, and the estimated uncertainties do account
for this correlation. The spline components are correlated with each other
(especially with the N-1 and N+1 spline parameter). Second, we can see
that the initial values for the background spline parameters are pretty
good.

We can plot the results and fit components with

.. jupyter-execute::
:hide-output:

comps = out.eval_components()
plt.plot(x, out.best_fit, label='best fit')
plt.plot(x, comps['bkg_'], label='background')
plt.plot(x, comps['peak_'], label='peak')
plt.legend()

which will generate the plot shown below:

.. jupyter-execute::
:hide-code:

plt.plot(x, y, label='data')
plt.plot(x, out.best_fit, label='best fit')
plt.plot(x, comps['bkg_'], label='background')
plt.plot(x, comps['peak_'], label='peak')
plt.legend()
plt.show()


If we're interested in seeing the locations of the knots, you might do

.. jupyter-execute::
:hide-output:

knot_yvals = np.array([o.value for o in out.params.values() if o.name.startswith('bkg')])
plt.plot(knot_xvals, knot_yvals, 'o', color='black', label='spline knots values')

which will generate be shown as

.. jupyter-execute::
:hide-code:

plt.plot(x, y, label='data')
plt.plot(x, out.best_fit, label='best fit')
plt.plot(x, comps['bkg_'], label='background')
plt.plot(x, comps['peak_'], label='peak')
knot_yvals = np.array([o.value for o in out.params.values() if o.name.startswith('bkg')])
plt.plot(knot_xvals, knot_yvals, 'o', color='black', label='spline knots values')

plt.legend()
plt.show()


You might be interested in trying to assess what impact the select of the
knots has on the resulting peak intensity. For example, you might try some
of the following set of knot values:

.. jupyter-execute::
:hide-output:

knot_xvals1 = np.array([1, 3, 5, 7, 9, 11, 13, 19, 21, 23, 25])
knot_xvals2 = np.array([1, 3, 5, 7, 9, 11, 13, 16, 19, 21, 23, 25])
knot_xvals3 = np.array([1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25])


and re-run the fit with these different sets of knot points. The results
are shown in the table below.


.. _models_spline_results-table:

Table of Peak amplitudes with varying spline points

+-------------------+------+----------------------------------------+
| spline x points | N | Peak amplitude value and uncertainty |
+===================+======+========================================+
| knot_xvals1 | 11 | 12.223 (0.295) |
+-------------------+------+----------------------------------------+
| knot_xvals2 | 12 | 11.746 (0.594) |
+-------------------+------+----------------------------------------+
| knot_xvals3 | 13 | 12.052 (0.872) |
+-------------------+------+----------------------------------------+

Adding more spline points, especially near the peak center around ``x=16.4``,
can impact the measurement of the amplitude but the uncertainty increases
dramatically enough to mostly cover the same range of values. This is a
interesting case of adding more parameters to a fit and having the
uncertainties in the fitted parameters getting worse. The interested
reader is encouraged to explore the fit reports and plot these different case.


Finally, the basic case above used 11 spline points to fit the baseline.
In fact, it would be reasonable to ask whether that is enough parameters
to fit the full spectra. By imposing that there is also a Gaussian
peak nearby makes the spline fit only the background, but without the
Gaussian, the spline could fit the full curve. By way of example, we'll
just try increasing the number of spline points to fit this data

.. jupyter-execute::
:hide-output:

plt.plot(x, y, 'o', label='data')
for nknots in (10, 15, 20, 25):
model = SplineModel(prefix='bkg_', xknots=np.linspace(0, 25, nknots))
params = model.guess(y, x)
out = model.fit(y, params, x=x)
plt.plot(x, out.best_fit, label=f'best-fit ({nknots} knots)')

plt.legend()
plt.show()



which will show the fit below:

.. jupyter-execute::
:hide-code:

plt.plot(x, y, 'o', label='data')
for nknots in (10, 15, 20, 25):
model = SplineModel(prefix='bkg_', xknots=np.linspace(0, 25, nknots))
params = model.guess(y, x)
out = model.fit(y, params, x=x)
plt.plot(x, out.best_fit, label=f'best-fit ({nknots} knots)')

plt.legend()
plt.show()


By itself, 10 knots does not give a very good fit, but 25 knots or more
does give a very good fit to the peak. This should give some confidence
that the fit with 11 parameters for the background spline is acceptable,
but also give some reason to be careful in selecting the number of spline
points to use.
63 changes: 63 additions & 0 deletions examples/doc_builtinmodels_splinemodel.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# <examples/doc_builtinmodels_splinemodel.py>
import numpy as np
import matplotlib.pyplot as plt
from lmfit.models import SplineModel, GaussianModel

data = np.loadtxt('test_splinepeak.dat')
x = data[:, 0]
y = data[:, 1]

plt.plot(x, y, label='data')

model = GaussianModel(prefix='peak_')
params = model.make_params(amplitude=8, center=16, sigma=1)

# make a background spline with knots evenly spaced over the background,
# but sort of skipping over where the peak is
knot_xvals3 = np.array([1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25])
knot_xvals2 = np.array([1, 3, 5, 7, 9, 11, 13, 16, 19, 21, 23, 25])
knot_xvals = np.array([1, 3, 5, 7, 9, 11, 13, 19, 21, 23, 25])


bkg = SplineModel(prefix='bkg_', xknots=knot_xvals)
params.update(bkg.guess(y, x))

model = model + bkg

params['peak_center'].min = 10
params['peak_center'].max = 20
params['peak_amplitude'].min = 0

plt.plot(x, model.eval(params, x=x), label='initial')

out = model.fit(y, params, x=x)
print(out.fit_report(min_correl=0.3))
comps = out.eval_components()

plt.plot(x, out.best_fit, label='best fit')
plt.plot(x, comps['bkg_'], label='background')
plt.plot(x, comps['peak_'], label='peak')

knot_yvals = np.array([o.value for o in out.params.values() if o.name.startswith('bkg')])
plt.plot(knot_xvals, knot_yvals, 'o', color='black', label='spline knots values')
plt.legend()
plt.show()


# knot positions | peak amplitude
# 11, 13, 19, 21 | 12.223 0.295
# 11, 13, 16, 19, 21 | 11.746 0.594
# 11, 13, 15, 17, 19, 21 | 12.052 0.872


plt.plot(x, y, 'o', label='data')

for nknots in (10, 15, 20, 25, 30):
model = SplineModel(prefix='bkg_', xknots=np.linspace(0, 25, nknots))
params = model.guess(y, x)
out = model.fit(y, params, x=x)
plt.plot(x, out.best_fit, label=f'best-fit ({nknots} knots)')
plt.legend()
plt.show()

# <end examples/doc_builtinmodels_splinemodel.py>
Loading