nschloe/smoothfit

Smooth data fitting in N dimensions.
Python Makefile
nschloe Merge pull request #7 from nschloe/update-images
update images with properly scaled axes
Latest commit d27dec5 Aug 15, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information. .circleci Aug 14, 2019 examples experimental Aug 14, 2019 smoothfit Aug 15, 2019 test Aug 15, 2019 .flake8 Sep 28, 2018 .gitignore Apr 3, 2018 LICENSE.txt Aug 13, 2019 Makefile Aug 7, 2019 README.md codecov.yml Apr 3, 2018 setup.py Aug 13, 2019 test_requirements.txt Apr 16, 2018

Given experimental data, it is often desirable to produce a function whose values match the data to some degree. This package implements a robust approach to data fitting based on the minimization problem (A similar idea is used in for data smoothing in signal processing; see, e.g., section 8.3 in this document.)

Unlike polynomial regression or Gauss-Newton, smoothfit makes no assumptions about the function other than that it is smooth.

The generality of the approach makes it suitable for function whose domain is multidimensional, too.

Pics or it didn't happen

Runge's example import matplotlib.pyplot as plt
import numpy
import smoothfit

a = -1.5
b = +1.5

# plot original function
x = numpy.linspace(a, b, 201)
plt.plot(x, 1 / (1 + 25 * x ** 2), "-", color="0.8", label="1 / (1 + 25 * x**2)")

# 21 sample points
x0 = numpy.linspace(-1.0, 1.0, 21)
y0 = 1 / (1 + 25 * x0 ** 2)
plt.plot(x0, y0, "xk")

u = smoothfit.fit1d(x0, y0, a, b, 1000, degree=1, lmbda=1.0e-6)
x = numpy.linspace(a, b, 201)
vals = [u(xx) for xx in x]
plt.plot(x, vals, "-", label="smooth fit")

plt.ylim(-0.1)
plt.grid()
plt.show()

Runge's example function is a tough nut for classical polynomial regression.

If there is no noise in the input data, the parameter lmbda can be chosen quite small such that all data points are approximated well. Note that there are no oscillations in the output function u.

Runge's example with noise

import matplotlib.pyplot as plt
import numpy
import smoothfit

a = -1.5
b = +1.5

# plot original function
x = numpy.linspace(a, b, 201)
plt.plot(x, 1 / (1 + 25 * x ** 2), "-", color="0.8", label="1 / (1 + 25 * x**2)")

# 21 sample points
numpy.random.seed(0)
n = 51
x0 = numpy.linspace(-1.0, 1.0, n)
y0 = 1 / (1 + 25 * x0 ** 2)
y0 += 1.0e-1 * (2 * numpy.random.rand(n) - 1)
plt.plot(x0, y0, "xk")

lmbda = 5.0e-2
u = smoothfit.fit1d(x0, y0, a, b, 1000, degree=1, lmbda=lmbda)
x = numpy.linspace(a, b, 201)
vals = [u(xx) for xx in x]
plt.plot(x, vals, "-", label="smooth fit")

plt.grid()
plt.show()

If the data is noisy, lmbda needs to be chosen more carefully. If too small, the approximation tries to resolve all data points, resulting in many small oscillations. If it's chosen too large, no details are resolved, not even those of the underlying data.

Few samples import numpy
import smoothfit

x0 = numpy.array([0.038, 0.194, 0.425, 0.626, 1.253, 2.500, 3.740])
y0 = numpy.array([0.050, 0.127, 0.094, 0.2122, 0.2729, 0.2665, 0.3317])
u = smoothfit.fit1d(x0, y0, 0, 4, 1000, degree=1, lmbda=1.0)

Some noisy example data taken from Wikipedia.

A two-dimensional example import meshzoo
import numpy
import smoothfit

n = 200
numpy.random.seed(123)
x0 = numpy.random.rand(n, 2) - 0.5
y0 = numpy.cos(numpy.pi * numpy.sqrt(x0.T ** 2 + x0.T ** 2))

# create a triangle mesh for the square
points, cells = meshzoo.rectangle(-1.0, 1.0, -1.0, 1.0, 32, 32)

u = smoothfit.fit2d(x0, y0, points, cells, lmbda=1.0e-4, solver="dense-direct")

# Write the function to a file
from dolfin import XDMFFile
xdmf = XDMFFile("temp.xdmf")
xdmf.write(u)

This example approximates a function from R2 to R (without noise in the samples). Note that the absence of noise the data allows us to pick a rather small lmbda such that all sample points are approximated well.

Comparison with other approaches

Polynomial fitting/regression The classical approach to data fitting is polynomial regression. Polynomials are chosen because they are very simple, can be evaluated quickly, and can be made to fit any function very closely.

There are, however, some fundamental problems:

This above plot highlights the problem with oscillations.

Fourier smoothing One approach to data fitting with smoothing is to create a function with all data points, and simply cut off the high frequencies after Fourier transformation.

This approach is fast, but only works for evenly spaced samples.

import matplotlib.pyplot as plt
import numpy

numpy.random.seed(0)

# original function
x0 = numpy.linspace(-1.0, 1.0, 1000)
y0 = 1 / (1 + 25 * x0 ** 2)
plt.plot(x0, y0, color="k", alpha=0.2)

# create sample points
n = 51
x1 = numpy.linspace(-1.0, 1.0, n)  # only works if samples are evenly spaced
y1 = 1 / (1 + 25 * x1 ** 2) + 1.0e-1 * (2 * numpy.random.rand(x1.shape) - 1)
plt.plot(x1, y1, "xk")

# Cut off the high frequencies in the transformed space and transform back
X = numpy.fft.rfft(y1)
X[5:] = 0.0
y2 = numpy.fft.irfft(X, n)
#
plt.plot(x1, y2, "-", label="5 lowest frequencies")

plt.grid()
plt.show()

Testing

To run the smoothfit unit tests, check out this repository and type

pytest