Spline model (#804)

* add SplineModel: variadic cubic spline * add test for SplineModel * doc updates * more doc updates * more doc updates * more doc updates * add Spline example, with data * some test cleanups * some test cleanups * fix section level * cleanups * tell codespell to ignore all of lmfit/models.py because of 1 mis-spelling * cleanups * fix codespell pre-commit hook Co-authored-by: reneeotten <reneeotten@users.noreply.github.com>
lmfit · Sep 16, 2022 · 3bef74f · 3bef74f
1 parent 97ea5f1
commit 3bef74f
Show file tree

Hide file tree

Showing 7 changed files with 989 additions and 22 deletions.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -58,7 +58,7 @@ repos:
         exclude: 'doc/doc_examples_to_gallery.py'
         # escaped characters currently do not work correctly
         # so \nnumber is considered a spelling error....
-        args: [-L nnumber]
+        args: ["-L nnumber", "-L mone"]
 
 -   repo: https://github.com/asottile/yesqa
     rev: v1.4.0

diff --git a/doc/builtin_models.rst b/doc/builtin_models.rst
@@ -164,7 +164,9 @@ Linear and Polynomial Models
 These models correspond to polynomials of some degree. Of course, lmfit is
 a very inefficient way to do linear regression (see :numpydoc:`polyfit`
 or :scipydoc:`stats.linregress`), but these models may be useful as one
-of many components of a composite model.
+of many components of a composite model.  The SplineModel below corresponds
+to a cubic spline.
+
 
 :class:`ConstantModel`
 ~~~~~~~~~~~~~~~~~~~~~~
@@ -186,6 +188,12 @@ of many components of a composite model.
 
 .. autoclass:: PolynomialModel
 
+:class:`SplinelModel`
+~~~~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: SplineModel
+
+
 
 Periodic Models
 ---------------
@@ -317,17 +325,17 @@ could to define this in a script:
 
     script = """
     def mycurve(x, amp, cen, sig):
-        loren = lorentzian(x, amplitude=amp, center=cen, sigma=sig)
-        gauss = gaussian(x, amplitude=amp, center=cen, sigma=sig)
-        return log(loren) * gradient(gauss) / gradient(x)
+	loren = lorentzian(x, amplitude=amp, center=cen, sigma=sig)
+	gauss = gaussian(x, amplitude=amp, center=cen, sigma=sig)
+	return log(loren) * gradient(gauss) / gradient(x)
     """
 
 and then use this with :class:`ExpressionModel` as:
 
 .. jupyter-execute::
 
     mod = ExpressionModel('mycurve(x, height, mid, wid)', init_script=script,
-                          independent_vars=['x'])
+			  independent_vars=['x'])
 
 As above, this will interpret the parameter names to be ``height``, ``mid``,
 and ``wid``, and build a model that can be used to fit data.
@@ -604,10 +612,10 @@ this, and by defining an :func:`index_of` function to limit the data range.
 That is, with::
 
     def index_of(arrval, value):
-        """Return index of array *at or below* value."""
-        if value < min(arrval):
-            return 0
-        return max(np.where(arrval <= value)[0])
+	"""Return index of array *at or below* value."""
+	if value < min(arrval):
+	    return 0
+	return max(np.where(arrval <= value)[0])
 
     ix1 = index_of(x, 75)
     ix2 = index_of(x, 135)
@@ -645,3 +653,257 @@ and without any bounds on parameters at all:
 
 This script is in the file ``doc_builtinmodels_nistgauss2.py`` in the examples folder,
 and the figure above shows an improved initial estimate of the data.
+
+
+Example 4: Using a Spline Model
+--------------------------------
+
+In the example above, the two peaks might represent the interesting part of
+the data, and the exponential decay could be viewed a "background" which
+might be due to other physical effects or part of some response of the
+instrumentation used to make the measurement.  That is, the background
+might be well-understood to be modeled as an exponential decay, as in the
+example above and so easily included in the full analysis. As the results
+above show, there is some -- but not huge -- correlation of the parameters
+between the peak amplitudes and the decay of the exponential function.
+That means that it is helpful to include all of those components in a
+single fit, as the uncertainties in the peak amplitudes (which would be
+interpreted as "line strength" or "area") will reflect some of the
+uncertainty in how well we modeled the background.
+
+Sometimes a background is more complex or at least has a less obvious
+functional form.  In these cases, it can be useful to use a *spline* to
+model part of the curve.  Just for completeness, a spline is a piecewise
+continuous polynomial function (typically made of cubic polynomials) that
+has a series of ``x`` values known as "knots" at which the highest order
+derivative is allowed to be discontinuous.  By adding more knots, the
+spline function has more flexibility to follow a particular function.
+
+As an example (see the example file "doc_builtinmodels_splinemodel.py"), we
+start with data with a single peak and a background that is hard to
+characterize clearly as a simple decay, oscillatory structure.
+
+.. jupyter-execute::
+    :hide-output:
+
+    import numpy as np
+    import matplotlib.pyplot as plt
+    from lmfit.models import SplineModel, GaussianModel
+
+    data = np.loadtxt('test_splinepeak.dat')
+    x = data[:, 0]
+    y = data[:, 1]
+
+    plt.plot(x, y, label='data')
+    plt.legend()
+    plt.show()
+
+which shows (figure below):
+
+.. jupyter-execute::
+    :hide-code:
+
+    plt.plot(x, y, label='data')
+    plt.legend()
+    plt.show()
+
+
+There is definitely a peak there, so we could start with building a model
+for a Gaussian peak, say with:
+
+.. jupyter-execute::
+    :hide-output:
+
+    model = GaussianModel(prefix='peak_')
+    params = model.make_params(amplitude=8, center=16, sigma=1)
+
+
+To account for that changing background, we'll use a spline, but need to
+know where to put the "knots".  Picking points away from the peak makes
+sense -- we don't want to fit the peak -- but we want it to have some
+flexibility near the peak.  Let's try spacing knot points at ``x=1, 3, ...,
+13``, then skip over the peak at around ``x=16`` and then pick up knots points
+at ``x=19, 21, 23, 25``.
+
+.. jupyter-execute::
+    :hide-output:
+
+    knot_xvals = np.array([1, 3, 5, 7, 9, 11, 13, 19, 21, 23, 25])
+
+    bkg = SplineModel(prefix='bkg_', xknots=knot_xvals)
+    params.update(bkg.guess(y, x))
+
+
+Note that we used ``bkg.guess()`` to guess the initial values of the spline
+parameters and then update the ``params`` Parameters object with these 11
+parameters to account for the spline.  These will be very close to the ``y``
+values at the knot ``x`` values. The precise definition of the spline knot
+parameters is not "the y-values through which the resulting spline curve
+goes", but these values are pretty good estimates for the resulting spline
+values.  You'll see below that these initial values are close.
+
+With a spline background defined, we can create a composite model, and run
+a fit.
+
+.. jupyter-execute::
+    :hide-output:
+
+    model = model + bkg
+
+    params['peak_amplitude'].min = 0
+    params['peak_center'].min = 10
+    params['peak_center'].max = 20
+
+    out = model.fit(y, params, x=x)
+    print(out.fit_report(min_correl=0.3))
+
+You'll see that we first set some "sanity bounds" on the peak parameters to
+prevent the peak from going completely wrong.  This really is not necessary
+in this case, but it is often a reasonable thing to do - the general advice
+for this is to be generous in the bounds, not overly restrictive.
+
+This fit will print out a report of
+
+.. jupyter-execute::
+    :hide-code:
+
+    print(out.fit_report(min_correl=0.3))
+
+
+from this we can make a few observations.  First, the correlation between
+the "spline" parameters" and the "peak parameters" is noticeable, but not
+extremely high -- that's good, and the estimated uncertainties do account
+for this correlation.  The spline components are correlated with each other
+(especially with the N-1 and N+1 spline parameter).  Second, we can see
+that the initial values for the background spline parameters are pretty
+good.
+
+We can plot the results and fit components with
+
+.. jupyter-execute::
+   :hide-output:
+
+    comps = out.eval_components()
+    plt.plot(x, out.best_fit, label='best fit')
+    plt.plot(x, comps['bkg_'], label='background')
+    plt.plot(x, comps['peak_'], label='peak')
+    plt.legend()
+
+which will generate the plot shown below:
+
+.. jupyter-execute::
+   :hide-code:
+
+    plt.plot(x, y, label='data')
+    plt.plot(x, out.best_fit, label='best fit')
+    plt.plot(x, comps['bkg_'], label='background')
+    plt.plot(x, comps['peak_'], label='peak')
+    plt.legend()
+    plt.show()
+
+
+If we're interested in seeing the locations of the knots, you might do
+
+.. jupyter-execute::
+   :hide-output:
+
+    knot_yvals = np.array([o.value for o in out.params.values() if o.name.startswith('bkg')])
+    plt.plot(knot_xvals, knot_yvals, 'o', color='black', label='spline knots values')
+
+which will generate be shown as
+
+.. jupyter-execute::
+   :hide-code:
+
+    plt.plot(x, y, label='data')
+    plt.plot(x, out.best_fit, label='best fit')
+    plt.plot(x, comps['bkg_'], label='background')
+    plt.plot(x, comps['peak_'], label='peak')
+    knot_yvals = np.array([o.value for o in out.params.values() if o.name.startswith('bkg')])
+    plt.plot(knot_xvals, knot_yvals, 'o', color='black', label='spline knots values')
+
+    plt.legend()
+    plt.show()
+
+
+You might be interested in trying to assess what impact the select of the
+knots has on the resulting peak intensity.  For example, you might try some
+of the following set of knot values:
+
+.. jupyter-execute::
+   :hide-output:
+
+    knot_xvals1 = np.array([1, 3, 5, 7, 9, 11, 13,         19, 21, 23, 25])
+    knot_xvals2 = np.array([1, 3, 5, 7, 9, 11, 13,   16,   19, 21, 23, 25])
+    knot_xvals3 = np.array([1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25])
+
+
+and re-run the fit with these different sets of knot points.  The results
+are shown in the table below.
+
+
+.. _models_spline_results-table:
+
+ Table of Peak amplitudes with varying spline points
+
+ +-------------------+------+----------------------------------------+
+ | spline x points   |  N   | Peak amplitude value and uncertainty   |
+ +===================+======+========================================+
+ | knot_xvals1       | 11   |  12.223 (0.295)                        |
+ +-------------------+------+----------------------------------------+
+ | knot_xvals2       | 12   |  11.746 (0.594)                        |
+ +-------------------+------+----------------------------------------+
+ | knot_xvals3       | 13   |  12.052 (0.872)                        |
+ +-------------------+------+----------------------------------------+
+
+Adding more spline points, especially near the peak center around ``x=16.4``,
+can impact the measurement of the amplitude but the uncertainty increases
+dramatically enough to mostly cover the same range of values.  This is a
+interesting case of adding more parameters to a fit and having the
+uncertainties in the fitted parameters getting worse.  The interested
+reader is encouraged to explore the fit reports and plot these different case.
+
+
+Finally, the basic case above used 11 spline points to fit the baseline.
+In fact, it would be reasonable to ask whether that is enough parameters
+to fit the full spectra.  By imposing that there is also a Gaussian
+peak nearby makes the spline fit only the background, but without the
+Gaussian, the spline could fit the full curve.  By way of example, we'll
+just try increasing the number of spline points to fit this data
+
+.. jupyter-execute::
+   :hide-output:
+
+    plt.plot(x, y, 'o', label='data')
+    for nknots in (10, 15, 20, 25):
+	model = SplineModel(prefix='bkg_',   xknots=np.linspace(0, 25, nknots))
+	params = model.guess(y, x)
+	out = model.fit(y, params, x=x)
+	plt.plot(x, out.best_fit, label=f'best-fit ({nknots} knots)')
+
+    plt.legend()
+    plt.show()
+
+
+
+which will show the fit below:
+
+.. jupyter-execute::
+   :hide-code:
+
+    plt.plot(x, y, 'o', label='data')
+    for nknots in (10, 15, 20, 25):
+	model = SplineModel(prefix='bkg_',   xknots=np.linspace(0, 25, nknots))
+	params = model.guess(y, x)
+	out = model.fit(y, params, x=x)
+	plt.plot(x, out.best_fit, label=f'best-fit ({nknots} knots)')
+
+    plt.legend()
+    plt.show()
+
+
+By itself, 10 knots does not give a very good fit, but 25 knots or more
+does give a very good fit to the peak.  This should give some confidence
+that the fit with 11 parameters for the background spline is acceptable,
+but also give some reason to be careful in selecting the number of spline
+points to use.
diff --git a/examples/doc_builtinmodels_splinemodel.py b/examples/doc_builtinmodels_splinemodel.py
@@ -0,0 +1,63 @@
+# <examples/doc_builtinmodels_splinemodel.py>
+import matplotlib.pyplot as plt
+import numpy as np
+
+from lmfit.models import GaussianModel, SplineModel
+
+data = np.loadtxt('test_splinepeak.dat')
+x = data[:, 0]
+y = data[:, 1]
+
+plt.plot(x, y, label='data')
+
+model = GaussianModel(prefix='peak_')
+params = model.make_params(amplitude=8, center=16, sigma=1)
+
+# make a background spline with knots evenly spaced over the background,
+# but sort of skipping over where the peak is
+knot_xvals3 = np.array([1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25])
+knot_xvals2 = np.array([1, 3, 5, 7, 9, 11, 13,   16,   19, 21, 23, 25])  # noqa: E241
+knot_xvals1 = np.array([1, 3, 5, 7, 9, 11, 13,         19, 21, 23, 25])  # noqa: E241
+
+bkg = SplineModel(prefix='bkg_', xknots=knot_xvals1)
+params.update(bkg.guess(y, x))
+
+model = model + bkg
+
+params['peak_center'].min = 10
+params['peak_center'].max = 20
+params['peak_amplitude'].min = 0
+
+plt.plot(x, model.eval(params, x=x), label='initial')
+
+out = model.fit(y, params, x=x)
+print(out.fit_report(min_correl=0.3))
+comps = out.eval_components()
+
+plt.plot(x, out.best_fit, label='best fit')
+plt.plot(x, comps['bkg_'], label='background')
+plt.plot(x, comps['peak_'], label='peak')
+
+knot_yvals = np.array([o.value for o in out.params.values() if o.name.startswith('bkg')])
+plt.plot(knot_xvals1, knot_yvals, 'o', color='black', label='spline knots values')
+plt.legend()
+plt.show()
+
+
+#   knot positions         | peak amplitude
+#  11, 13, 19, 21          |  12.223  0.295
+#  11, 13, 16, 19, 21      |  11.746  0.594
+#  11, 13, 15, 17, 19, 21  |  12.052  0.872
+
+
+plt.plot(x, y, 'o', label='data')
+
+for nknots in (10, 15, 20, 25, 30):
+    model = SplineModel(prefix='bkg_', xknots=np.linspace(0, 25, nknots))
+    params = model.guess(y, x)
+    out = model.fit(y, params, x=x)
+    plt.plot(x, out.best_fit, label=f'best-fit ({nknots} knots)')
+plt.legend()
+plt.show()
+
+# <end examples/doc_builtinmodels_splinemodel.py>