center

REGRESSIONS et EQUATIONS INTEGRALES

center

Jean Jacquelin

center

Sample applications to various functions:
Gaussian <reei1-sec3>
Power, Exponential, Logarithmic, Weibull <reei2>
Sinusoidal <reei3>
Logistic
Generalization of Sinusoidal Regression
Damped Sinusoidal Regression
Sum of Two Exponentials and Sum of Two Powers
Multivariate Regression <reei10>

[ First Edition : 14 January 2009 - Updated : 3 January 2014 ]

center

Translator's Note

I came across this paper while searching for efficient, preferably non-iterative, routines for fitting exponentials. The techniques presented in this paper have served my purpose well, and this translation (and to some extent the entire scikit) were the result. I hope that my endeavors do justice to Jean Jacquelin's work, and prove as useful to someone else as they did to me.

The paper translated here is a compilation of related original papers by the author, gathered into a single multi-chapter unit. Some of the original material is in French and some in English. I have attempted to translate the French as faithfully as I could. I have also attempted to conform the English portions to what I consider to be modern American usage.

There are certain to be some imperfections and probably outright errors in the translation. The translation is not meant to be a monolith. It is presented on GitHub so that readers so inclined can easily submit corrections and improvements.

A small number of technical corrections were made to the content of this paper throughout the translation. In all cases, the errata are clearly marked with footnotes with detailed descriptions of the fix in question. Errata are to be submitted to Jean Jacquelin before publication of this translation.

I have re-generated the plots and tables in the paper as faithfully as I could, rather than copying them out of the original. This has provided a measure of confidence in the results. There may be some slight disagreement in the randomly generated datasets, but the discrepancies are both expected and miniscule.

-- Joseph Fox-Rabinovitz

23rd September 2018

center

Regressions and Integral Equations

center

Jean Jacquelin

Abstract

The primary aim of this publication is to draw attention to a rarely used method for solving certain types of non-linear regression problems.

The key to the method presented in this paper is the principle of linearization through differential and integral equations, which enables the conversion of a complex non-linear problem into simple linear regression.

The calculus shown shown here has a fundamental difference from traditional solutions to similar problems in that it is non-recursive, and therefore does not require the usual iterative approach.

In order to demonstrate the theory through concrete application, detailed numerical examples have been worked out. Regressions of power, exponential, and logarithmic functions are presented, along with the Gaussian and Weibull distributions commonly found in statistical applications.

Regressions and Integral Equations

center

Jean Jacquelin

center

The first revision of the paper Regressions and Integral Equations was dated 01/14/2009.
The current version was updated on 04/27/2009.

1. Introduction

The study presented here falls into the general framework of regression problems. For example, given the coordinates of a sequence of n points: (x₁, y₁), (x₂, y₂), ..., (x_k, y_k), ..., (x_n, y_n), we wish to find the function y = F(a, b, c, ...; x) which lies as close as possible to the data by optimizing the parameters a, b, c, ...

The commonly known solution to linear regression merits only a brief discussion, which is to be found in Appendix 1 <reei1-appendix1>. Some problems can be solved through linear regression even though they appear non-linear at first glance. The Gaussian cumulative distribution function is a specific example which is discussed in Appendix 2 <reei1-appendix2>.

Excepting such simple cases, we are confronted with the daunting problem of non-linear regression. There is extensive literature on the subject. Even the most cursort review would derail us from the purpose of this paper. It is also unnecessary because our goal is to reduce non-linear problems to linear regression through non-iterative and non-recursive procedures (otherwise, how would our proposed method be innovative with respect to existing methods?).

In the next section, we will proceed to the heart of the matter: rendering non-linear problems to linear form by means of suitable differential and/or integral equations. The preliminary discussion will show that in the context of such problems, integral equations tend to be more numerically stable than differential equations, with very few exceptions.

The principle of using integral equations will be explained and demonstrated in practice using the Gaussian probability distribution function as a concrete example. Other examples of regression using integral equations will be described in a detailed manner in the two following papers:

reei2
reei3

2. Principle of Linearization Through Differential and/or Integral Equations

We begin with a summary of numerical methods for approximating derivatives and integrals. Given n points (x_k, y_k) located near the curve of a function y(x), and given another function g(x), we can calculate approximations for the following derivatives and integrals with g_k = g(x_k):

$$D_k = \frac{g_{k+1} y_{k+1} - g_{k-1} y_{k-1}}{x_{k+1} - x_{k-1}} \simeq \left( \frac{d}{dx} g(x) y(x) \right)_{\left( x = x_k \right)}$$

$$DD_k = \frac{D_{k+1} - D_{k-1}}{x_{k+1} - x_{k-1}} \simeq \left( \frac{d^2}{dx^2} g(x)y(x) \right)_{\left( x = x_k \right)}$$

And so on, for subsequent derivatives, as necessary.

$$\begin{aligned} S_k \simeq \int_{x_1}^x g(u)y(u)du \quad \begin{cases} S_1 = 0 \hfill \text{and for $k = 2 \rightarrow n$:} & \\\ S_k = S_{k-1} + \frac{1}{2} (g_ky_k + g_{k-1}y_{k-1})(x_k - x_{k-1}) & \end{cases} \end{aligned}$$

$$\begin{aligned} SS_k \simeq \int_{x_1}^x \left( \int_{x_1}^v g(u)y(u)du \right)dv \quad \begin{cases} SS_1 = 0 \hfill \text{and for $k = 2 \rightarrow n$:} & \\\ SS_k = SS_{k-1} + \frac{1}{2} (S_k + S_{k-1})(x_k - x_{k-1}) & \end{cases} \end{aligned}$$

And do on, for subsequent integrals, as necessary.

It goes without saying that the points must first be ranked in order of ascending x_k.

It is possible to use more sophisticated approximations for numerical differentiation and integration. Nothing prevents us from selecting a lower limit (or lower limits) of integration other than x₁, and using different limits for the successive integrations. However, that would complicate the formulas and explanations unnecessarily. For the sake of simplicity, let us agree to use these formulas, at least for this stage of the presentation.

Returning to our initial formulation of the problem, we wish to optimize the parameters a, b, c, ... of a function y(a, b, c, ...; x) so that its curve approaches the n points (x_k, y_k) as closely as possible. Evidently, the exact expressions of the derivatives and anti-derivatives of the function depend on the pameters a, b, c, .... However the approximate values calculated using the formulas shown above, i.e. the numerical values of D_k, DD_k, ..., S_k, SS_k, ..., are computed solely from the data points x_k, y_k, without requiring prior knowledge of a, b, c, .... This observation is the crux of the method that is to be shown.

Let us suppose that the function y(a, b, c, ...; x) is the solution to a differential and/or integral equation of the form:

$$y(x) = A \; \Phi(x) + B\int G(x)y(x)dx + C\int\int H(x)y(x)dx^2 + ... + \alpha\frac{d}{dx}g(x)y(x) + \beta\frac{d^2}{dx^2}h(x)y(x) + ...$$

with Φ(x), G(x), H(x), ..., g(x), h(x), ... predetermined functions independent of a, b, c, ..., and A, B, C, ..., α, β, ... dependent on a, b, c, .... The approximate values are then respectively (with Φ_k = Φ(x_k); G_k = G(x_k); H_k = H(x_k); ...)[errata-reei-1]:

$$D_k = \frac{g_{k+1}y_{k+1} - g_{k-1}y_{k-1}}{x_{k+1} - x_{k-1}}$$

$$DD_k = \frac{\Delta_{k+1} - \Delta_{k-1}}{x_{k+1} - x_{k-1}} \quad \text{with} \quad \Delta_k = \frac{h_{k+1}y_{k+1} - h_{k-1}y_{k-1}}{x_{k+1} - x_{k-1}}$$

$$S_1 = 0 \quad ; \quad S_k = S_{k-1} + \frac{1}{2}\left(G_ky_k + G_{k-1}y_{k-1}\right)\left(x_k - x_{k-1}\right)$$

$$\begin{aligned} \begin{cases} SS_1 = 0 \quad ; \quad SS_k = SS_{k-1} + \frac{1}{2}\left(\Xi_k + \Xi_{k-1}\right)\left(x_k - x_{k-1}\right) \\\ \text{with: } \Xi_k = 0 \quad ; \quad \Xi_k = \Xi_{k-1} + \frac{1}{2}\left(H_ky_k + H_{k-1}y_{k-1}\right)\left(x_k - x_{k-1}\right) \end{cases} \end{aligned}$$

If we replace the exact derivatives and/or anti-derivatives with their numerical approximations, the equation will no longer hold true. We can therefore work with the sum of the squares of the residuals:

$$\sum_{k=1}^n \varepsilon_k^2 = \sum_{k=1}^n \left( -y_k + A \; \Phi_k + B \; S_k + C \; SS_k + ... + \alpha \; D_k + \beta \; DD_k + ... \right)^2$$

The relationship is linear with respect to A, B, C, ..., α, β, .... We have therefore returned to classical linear regression, which allows us to calculate the optimal values of A₀, B₀, C₀, ..., α₀, β₀, .... Finally, since A, B, C, ..., α, β, ... are known functions of a, b, c, ..., we must solve the system of equations A(a, b, c, ...) = A₀ ; B(a, b, c, ...) = B₀ ; ... ; α(a, b, c, ...) = α₀ ; β(a, b, c, ...) = β₀ to obtain the optimal values of the parameters a, b, c, ....

There are some additional considerations that must be taken into account when choosing the differental and/or integral equation. Other than the requirement for linearity in its coefficients (but not necessarily in the functions, since we have the choice of G(x), H(x), ..., g(x), h(x), ...), the equation should preferably have as many coefficients A₀, B₀, ..., α₀, β₀, ... as there are initial parameters a, b, c, ... to optimize. If there are fewer coefficients, an additional regression (or regressions) will be necessary to calculate the coefficients that do not figure explicitly in the equation.

Moreover, to avoid an overburdened explanation, we have been considering a simplified form of differential and/or integral equation. In fact, the equation could have any number of different functions Φ(x), several different derivatives (corresponding to various choices of g(x)), several different integrals (corresponding to various choice of G(x)), and so on for subsequent multiple derivatives and integrals.

Clearly, there is a multitude of choices for the differential and/or integral equation that we bring to bear on the problem. However, practical considerations limit our choices. One of the main stumbling blocks is the difficulty inherent in numerical approximation of derivatives. In fact, in cases where the points have an irregular distribution, and are too sparse, and if, to make matters worse, the values of y_k are not sufficiently precise, the computed derivatives will fluctuate so much and be so dispersed as to render the regression ineffective. On the other hand, numerical approximations of integrals retain their stability even in these difficult cases (this does not mean that the inevitable deviations are insignificant, but that they remain damped, which is essential for the robustness of the overall process). Except in special cases, the preference is therefore to seek an integral equation rather than a differential one.

The generality of the presentation that has just been made may give the impression that the method is complicated and difficult to implement in practice. The fact of the matter is quite the opposite, as we will see once we shift focus from an the abstract discussion of many possibilities to solving a single concrete example.

One of the most spectacular examples is that of sinusoidal regression (which we only mention in passing, without going into depth here, but which will be treated in detail in the attached paper reei3). It concerns the optimization of the parameters a, b, c and ω in the equation:

y(x) = a + b sin(ω x) + c cos(ω x)

This function is the solution to the differential equation:

$$y(x) = A + B \frac{d^2y}{dx^2} \quad \text{with} \quad A = a \quad \text{and} \quad B = -\frac{1}{\omega^2}$$

This is a linear equation with respect to A and B, which are themselves (very simple) functions of a and ω. Moreover, the parameters b and c no longer figure in the differential equation directly. This case is therefore a typical application of the method, and among the easiest to implement, except that it contains a second derivative, which makes it virtually useless. Fortunately, there is no a-priori reason not to use an integral equation whose solution is a sinusoid instead. The integral method is hardly any more complicated and gives largely satisfactory results (which are studied in detail in the attached paper: reei3).

For our first demonstration of the method, let us look for a simpler example. In the following section, the method of regression through an integral equation will be applied to the Gaussian probability density function.

3. Example: Illustration of the Gaussian Probability Density Function

We consider a probability density function of two parameters, σ and μ, defined by

$$f(x) = \frac{1}{\sigma \sqrt{2 \pi}} \; \text{exp} \left( -\frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^2 \right)$$

The general notation y(x) of the previous sections is replaced with f(x) here due to the specificity of this case.

The integration gauss-pdf-int leads to the integral equation gauss-pdf-eq, of which f(x) is the solution[errata-reei-2]:

∫_x₁^x(t−μ)f(t)dt = − σ²(f(x)−f(x₁))

$$\begin{aligned} \begin{cases} f(x) - f(x_1) = A \int_{x_1}^x f(t)dt + B \int_{x_1}^x t \; f(t)dt \\\ \text{with:} \quad A = \frac{\mu}{\sigma^2} \quad \text{and} \quad B = -\frac{1}{\sigma^2} \end{cases} \end{aligned}$$

This is a linear integral equation, consisting of two simple integrals, which places it among the applications mentioned in the previous section. We compute the respective approximations, the first denoted S with G(x) = 1, and the second denoted T with G(x) = x:

$$\begin{aligned} \begin{cases} S_1 = 0 \\\ S_k = S_{k-1} + \frac{1}{2}\left(f_k + f_{k-1}\right) \left(x_k - x_{k-1}\right) \quad k = 2 \rightarrow n \end{cases} \end{aligned}$$

$$\begin{aligned} \begin{cases} T_1 = 0 \\\ T_k = T_{k-1} + \frac{1}{2}\left(x_k f_k + x_{k-1} f_{k-1}\right) \left(x_k - x_{k-1}\right) \quad k = 2 \rightarrow n \end{cases} \end{aligned}$$

When we replace f(x_k) with f_k, f(x₁) with f₁ and the integrals with S_k and T_k, respectively, equation gauss-pdf-eq no longer holds true. We seek to minimize the sum of the squares of the residuals:

$$\sum_{k=1}^n \varepsilon_k^2 = \sum_{k=1}^n \left( -\left( f_k - f_1 \right) + A \; S_k + B \; T_k \right)^2$$

Notice that had we chosen a lower limit of integration different from x₁, it would have changed not only the value of f₁, but also the numerical values of S_k and T_k, in a way that would cancel out the differences without changing the final result.

The relationship gauss-pdf-resid is none other than the than the equation of a linear regression, which we know how to optimize for the parameters A₁ and B₁[errata-reei-3]:

$$\begin{aligned} \begin{bmatrix}A_1 \\ B_1\end{bmatrix} = \begin{bmatrix} \sum \left(S_k\right)^2 & \sum S_k T_k \\\ \sum S_k T_k & \sum \left(T_k\right)^2 \end{bmatrix}^{-1} \begin{bmatrix} \sum \left(f_k - f_1\right) S_k \\\ \sum \left(f_k - f_1\right) T_k \end{bmatrix} \end{aligned}$$

By convention, $\sum \equiv \sum_{k=1}^n$. We then deduce σ₁ and μ₁ according to gauss-pdf-eq[errata-reei-4]:

$$\sigma_1 = \sqrt{-\frac{1}{B_1}} \quad ; \quad \mu_1 = -\frac{A_1}{B_1}$$

Here is a summary of the numerical computation:

listing

To illustrate the calculation (reei-gauss-pdf-plot), numerical data (reei-gauss-pdf-data) was genererated in the following manner: x_k values were chosen at random from the domain under consideration. From the "exact" values σ_e and μ_e (defining the "exact" function f(x), whose representative curve is plotted as a dashed line in reei-gauss-pdf-plot), we computed the exact values of f(x_k) given by equation gauss-pdf-fx[errata-reei-7]. We then added perturbations whose amplitude was drawn randomly from a range - to +10% of f(x_k), which, after rounding, gave us the numerical values of f_k in reei-gauss-pdf-data.

The outrageous error model is motivated by the need for legibility in the figure, so that the so called "experimental" points, represented by crosses, lie far enough from the dashed curve to be clearly distinguishable. In the same vein, only a handful of points was chosen so that the deviations between the "exact" dashed curves and the calculated solid curves are are highlighted for both the intermediate and the final calculation. The fact that the points are not uniformly distributed across the domain is also a significant complication.

A sample regression of the Gaussian probability density function.

In reei-gauss-pdf-plot, the shape of the curves of the "exact" integrals and the points (x_k,S_k) and (x_k,T_k) make the primary reason for the deviations in this method of calculation clearly apparent: numerical integration, while preferable to derivation, is not by any means perfect, and causes the deviations in (σ₁,μ₁).

To form an objective opinion about the qualities and defects of the method exposed here, it would be necessary to perform a systematic experimental study of a very large number of cases and examples. In the current state of progress of this study, this remains yet to be done.

It is certain that the deviations, caused by the defects inherent in numerical integration, would be considerably reduced if the points were sufficiently numerous and their abscissae were partitioned at sufficiently regular intervals.

4. Discussion

It would be unreasonable to imagine that the method presented here could replace what is currently in use in commercial software, which has the benefits of long term study, experimentation and improvements. We must ask then, what is the motivation behind this work.

Of course, recursive methods generally require a first approximation within the same order of magnitude as the target value. This is not generally a handicap since users are not completely in the dark. One might be tempted to consider this method of integral equations to satisfy the need for initial approximation. However, the need is quite marginal, and should not be seen as a serious motivation.

A simple method, easy to program, like the one shown here, might certainly seduce some potential users in particular situations where we seek full mastery over the calculations that are performed. Users of commercial software are usually satisfied with the results they provide, but may occasionally deplore not knowing precisely what the sophisticated software is doing. Nevertheless, it would be poor motivation indeed for this study to attempt to provide a less powerful tool than what already exists, just to aleviate some of the frustration caused by using tools whose exact mechanisms are unknown.

In fact, we must see in this paper not a specific utilitarian motivation in the case of the Gaussian distribution, but the intention to understand and draw attention to a more general idea: the numerous possibilities offered by integral equations to transform a problem of non-linear regression into a linear regression, computed through a non-iterative process.

It is out of the question to compete against what already exists, and what is more imporant, works well. On the other hand, it would be a pity to forget a method that might potentially help resolve future problems: the method that has been the subject of this paper, whose essence is presented in Section 2 <reei1-sec2>.

center

Appendix 1: Review of Linear Regression

When the function that we seek to optimize, y = F(a, b, c, ...; x), can be written in the form y = a f(x) + b g(x) + c h(x) + ..., according to some number of parameters a, b, c, ... and with known functions f(x), g(x), h(x), ..., the algorithm is linear with respect to the optimization parameters.

Even more generally, if the function y = F(a, b, c, ...; x) can be transformed into F(x, y) = A f(x, y) + B g(x, y) + C h(x, y) with known functions F(x, y), f(x, y), g(x, y), h(x, y), ..., A(a, b, c, ...), B(a, b, c, ...), C(a, b, c, ...), ... the algorithm is again linear with respect to the coefficients A, B and C, even if it is not linear with respect to a, b, c, .... But it always reverts to a linear regression. The "least squares" method effectively consists of finding the minimum of:

$$\begin{aligned} \begin{cases} \varepsilon^2_{\left(A, B, C, ...\right)} = \sum_{k=1}^n \left( F_k - \left( A \; f_k + B \; g_k + C \; h_k + ... \right) \right)^2 \\\ F_k \equiv F(x_k, y_k) \; ; \; f_k \equiv f(x_k, y_k) \; ; \; g_k \equiv g(x_k, y_k) \; ; \; h_k \equiv h(x_k, y_k) \end{cases} \end{aligned}$$

The partial derivatives with respect to A, B, C, ... determine a system of equations for which solutions, A₀, B₀, C₀, ... are optimal:

$$\begin{aligned} \begin{cases} \left( \frac{\partial \left(\varepsilon^2 \right)}{\partial A} \right)_{A_0, B_0, C_0, ...} = -\sum_{k=1}^n \left( F_k - \left( A_0 \; f_k + B_0 \; g_k + C_0 \; h_k, ... \right) \right) f_k = 0 \\\ \end{aligned}$$$$\begin{aligned} \left( \frac{\partial \left( \varepsilon^2 \right)}{\partial B} \right)_{A_0, B_0, C_0, ...} = -\sum_{k=1}^n \left( F_k - \left( A_0 \; f_k + B_0 \; g_k + C_0 \; h_k, ... \right) \right) g_k = 0 \\\ \end{aligned}$$$$\begin{aligned} \left( \frac{\partial \left( \varepsilon^2 \right)}{\partial C} \right)_{A_0, B_0, C_0, ...} = -\sum_{k=1}^n \left( F_k - \left( A_0 \; f_k + B_0 \; g_k + C_0 \; h_k, ... \right) \right) h_k = 0 \\\ \end{aligned}$$$$... \end{cases}$$

The solution to this system, conventionally written with $\sum \equiv \sum_{k=1}^n$ leads to:

$$\begin{aligned} \begin{bmatrix} A_0 \\ B_0 \\ C_0 \\ ... \end{bmatrix} = \begin{bmatrix} \sum f_k^2 & \sum f_k g_k & \sum f_k h_k & ... \\\ \sum f_k g_k & \sum g_k^2 & \sum g_k h_k & ... \\\ \sum f_k h_k & \sum g_k h_k & \sum h_k^2 & ... \\\ ... & ... & ... & ... \end{bmatrix}^{-1} \begin{bmatrix} \sum F_k f_k \\ \sum F_k g_k \\ \sum F_k h_k \\ ... \end{bmatrix} \end{aligned}$$

Then we obtain the optimized values of a, b, c, ... corresponding to the following system, where the unknowns are a₀, b₀, c₀, ...:

$$\begin{aligned} \begin{cases} A(a_0, b_0, c_0, ...) = A_0 \\\ B(a_0, b_0, c_0, ...) = B_0 \\\ C(a_0, b_0, c_0, ...) = C_0 \\\ ... \end{cases} \end{aligned}$$

which is a system that is non-linear in the same measure that the functions A(a, b, c, ...), B(a, b, c, ...), C(a, b, c, ...), ... are non-linear. But this does not prevent the regression that was performed from being linear, so even this case has its rightful place in this section.

Of course, this can be further extended by considering more variables, for example x, y, z, t, ..., instead of just x, y, thus working in 3D, or 4D, ..., instead of 2D. Everything mentioned here figures in literature in a more detailed and more structured manner, with presentations adapted to the exposition of general theory. The purpose here was only to present a brief review, with a specific notation to be used consistently throughout the remainder of the work.

center

Appendix 2: Linear Regression of the Gaussian Cumulative Distribution Function

We consider the cumulative Gaussian distribution function of two parameters, σ and μ, defined by[errata-reei-8]:

$$F(x) = \frac{1}{\sqrt{2 \pi} \; \sigma} \int_{-\infty}^x \text{exp} \left( -\frac{1}{2} \left( \frac{t - \mu}{\sigma}\right)^2 \right) dt$$

An example is shown in reei-gauss-cdf-plot (the dashed curve).

Sample regression of the Gaussian cumulative distribution function.

The data are the points that we call "experimental": (x₁, F₁), (x₂, F₂), ..., (x_k, F_k), ..., (x_n, F_n), which, in reei-gauss-cdf-plot[errata-reei-10] have a particular dispersion relative to their respective nominal positions (x_k,F(x_k)) on the dashed curve representing F(x).

We can write F(x) equivalently in terms of the Erf funcion, pronounced "error function" and defined by:

$$\text{erf}(z) = \frac{2}{\sqrt{\pi}} \int_0^z \text{exp} \left( -\tau^2 \right) d\tau$$

The change of variable $t = \mu + \sqrt{2} \; \sigma \; \tau$ in gauss-cdf-fx gives the relationship:

$$F(x) = \frac{1}{\sqrt{\pi}} \int_{-\infty}^{\frac{x - \mu}{\sqrt{2} \sigma}} \text{exp} \left( -\tau^2 \right) d\tau = \frac{1}{2} + \frac{1}{2} \text{erf} \left( \frac{x - \mu}{\sqrt{2} \sigma} \right)$$

The inverse function to Erf is designated Erf^(-1), or Erfinv or argErf. We will use the last notation.

Thus, the inverse relationship to gauss-cdf-fx2 can be written:

$$\frac{x - \mu}{\sqrt{2} \sigma} = \text{argErf}(2F(x) - 1)$$

Which in turn leads to a linear relationship in A and B, defined by:

$$\begin{aligned} y(x) = \text{argErf}(2F(x) - 1) = A \; x + B \quad \begin{cases} A = \frac{1}{\sqrt{2} \sigma} \\\ B = -\frac{\mu}{\sqrt{2} \sigma} \end{cases} \end{aligned}$$

This reduces to the most elementary form of linear regression, relative to the points (x_k, y_k) with y_k previously calculated from:

y_k = argErf(2F_k − 1)

The optimal values of A₁ and B₁ are the solutions to the following system:

$$\begin{aligned} \begin{bmatrix} A_1 \\ B_1 \end{bmatrix} = \begin{bmatrix} \sum \left(x_k\right)^2 & \sum x_k \\\ \sum x_k & n \end{bmatrix}^{-1} \begin{bmatrix} \sum y_k x_k \\\ \sum y_k \end{bmatrix} \end{aligned}$$

with the convention that $\sum \equiv \sum_{k=1}^n$. We then deduce σ₁ and μ₁ from gauss-cdf-ab:

$$\sigma_1 = \frac{1}{\sqrt{2} A_1} \quad ; \quad \mu_1 = -\frac{B_1}{A_1}$$

For the example shown, the numerical values of σ₁ and μ₁ that we obtain are shown in reei-gauss-cdf-data[errata-reei-9]. The curve of the corresponding function is marked with a solid line in reei-gauss-cdf-plot [errata-reei-11]. It neighbors the "theoretical" curve, which is dashed.

In fact, the example was intentionally chosen to have a very small number of widely dispersed points, so as to make the two curves clearly distinct from each other. This is depreciative rather than representative of the quality of fit that can usually be obtained.

Here is a summary of the very simple numerical computation:

listing

If a software package implementing argErf is not readily available, a sample listing for the functions Erf and argErf is shown in the next section.

red listing

Note: The material in Appendices 1 and 2 is well known. All the same, it was useful to highlight the fundamental differences between the regression problems reviewed in Appendix 1 <reei1-appendix1> and those in Section 2 <reei1-sec2> of the main paper. It is equally useful to provide examples that showcase the noteworthy differences between the application of regression to the Gaussian distribution, on one hand to the cumulative function (Appendix 2 <reei1-appendix2>) and on the other to the more difficult case of the density function (Section 3 <reei1-sec3> of the main text).

Listings for the functions Erf and argErf

The approximate values of Erf(x) are obtained with a minimum of eight significant digits of precision. We use the following series expansion:

$$\begin{aligned} \text{Erf}(x) \simeq \frac{2x}{\sqrt{\pi}} \sum_{k=0}^{30} \frac{(-1)^k x^{2k}}{k!(2k + 1)} \begin{cases} \left| x \right| < 2.7 \\\ \left| \text{Erf}(x) \right| < 0.999866 \end{cases} \end{aligned}$$

completed by the asymptotic expansion:

$$\begin{aligned} \text{Erf}(x) \simeq \pm 1 - \frac{e^{-x^2}}{x \sqrt{\pi}} \sum_{k=0}^5 \frac{(-1)^k (2k + 1)!!}{x^{2k}} \begin{cases} + \enspace \text{if} \enspace x > 2.7 \enspace ; \enspace - \enspace \text{if} \enspace x < 2.7 \\\ (2k + 1)!! = 1 * 3 * ... * (2k + 1) \\\ 0.999865 < \left| \text{Erf}(x) \right| < 1 \end{cases} \end{aligned}$$

The inverse function argErf(y) is calculated using Newton-Raphson's method. The result is obtained to at least eight significant digits of precision if |y| < 0.999999999998 → |argErf(y)| < 5. Outside that domain, the result is insignificant.

The following listing is written in Pascal, and uses only the most elementary contructs and syntax. It should not be difficult to translate into any other language of choice.

Function Erf(x:extended):extended;
var
   y,p:extended;
   k:integer;
begin
     y:=0;
     p:=1;
     if ((x>-2.7)and(x<2.7)) then
     begin
          for k:=0 to 30 do
          begin
               y:=y+p/(2*k+1);
               p:=-p*x*x/(k+1);
          end;
          y:=y*2*x/sqrt(pi);
     end else
     begin
          for k:= 0 to 5 do
          begin
               y:=y+p;
               p:=-p*(2*k+1)/(2*x*x);
          end;
          y:=y*exp(-x*x)/(x*sqrt(pi));
          if x>0 then y:=1-y else y:=-1-y;
     end;
     Erf:=y;
end;

Function argErf(y:extended):extended;
var
   x:extended;
   k:integer;
begin
     x:=0;
     for k:=1 to 30 do
     begin
          x:=x+exp(x*x)*sqrt(pi)*(y-Erf(x))/2;
     end;
     argErf:=x;
end;

We can test the computation by comparing against the values in the table below (as well as the negatives of the same values):

center

Non-Linear Regression of the Types: Power, Exponential, Logarithmic, Weibull

center

Jean Jacquelin

center

The first revision of this paper was dated 01/18/2009.
The current version was updated on 04/23/2009.

Abstract

The parameters of power, exponential, logarithmic and Weibull functions are optimized by a non-iterative regression method based on an appropriate integral equation.

Non-Linear Regression of the Types: Power, Exponential, Logarithmic, Weibull

center

Jean Jacquelin

1. Introduction

The following two examples of regression will be treated simultaneously:

y = a + b X^c (X > 0)

y = a + b exp(c x)

Indeed, if we take formula pow-fx, with data points (X₁, y₁), ..., (X_k, y_k), ..., (X_n, y_n), we can pre-calculate

x_k = ln(X_k)

which turns into exp-fx with the data points (x₁, y₁), ..., (x_k, y_k), ..., (x_n, y_n).

Various other equations can be transformed into these two formulae:

The equation y = a + b′ exp(c (x − μ)) is identical to exp-fx with the substitution b = b′ exp( − c μ).
The equation y = α + β ln(x − γ) turns into exp-fx when we interchange the x and y values. This motivates the regression of logarithmic functions of three parameters.

Add to scikit
And so on. In particular, the case of the Weibull function of three parameters will be treated in Section 3 <reei2-sec3>.

The method to be used was described in Section 2 <reei1-sec2> of the article reei1-paper.

2. Regression of Functions of the Form y(x) = a + b exp(c x)

Integating the function y(x) results in:

$$\int_{x1}^x y(u)du = a \; (x - x_1) + \frac{b}{c} \; \text{exp}(c x) - \frac{b}{c} \; \text{exp}(c x_1)$$

Replacing exp(c x) with exp-fx:

$$\int_{x1}^x y(u)du = a \; (x - x_1) + \frac{1}{c} \; (y - a) - \frac{b}{c} \; \text{exp}(c \; x_1)$$

From which we have the integral equation to be used:

y − (a + b exp(c x₁)) = − a c (x − x₁) + c∫_x₁^xy(u)du

The numerical approximations of the integral for x = x_k are calculated with:

$$\begin{aligned} \begin{cases} S_1 = 0 \quad \text{and for} \enspace k = 2 \rightarrow n \enspace \text{:} \\\ S_k = S_{k-1} + \frac{1}{2}(y_k + y_{k-1})(x_k - x_{k-1}) \end{cases} \end{aligned}$$

When we replace the exact unknowns in exp-eq with their respective approxmations, the equation no longer holds true:

y_k − y₁ ≃ − a c (x_k − x₁) + c S_k

Minimizing the sum of the squared residuals:

$$\sum_{k=1}^n \varepsilon_k^2 = \sum_{k=1}^n \left( A \; (x_k - x_1) + B \; S_k - (y_k - y_1) \right)^2$$

with

A = − a c ; B = c

results in a linear regression with respect to the coefficients A and B, whose optimal values A₁ and B₁ can be obtained in the usual manner (with the convention that $\sum \equiv \sum_{k=1}^n$):

$$\begin{aligned} \begin{bmatrix}A_1 \\ B_1\end{bmatrix} = \begin{bmatrix} \sum \left((x_k - x_1)^2\right) & \sum (x_k - x_1) \; S_k \\\ \sum (x_k - x_1) \; S_k & \sum \left(S_k^2\right) \end{bmatrix}^{-1} \begin{bmatrix} \sum (y_k - y_1)(x_k - x_1) \\\ \sum (y_k - y_1) \; S_k \end{bmatrix} \end{aligned}$$

We the get the optimal values a₁ and c₁ according to exp-param1:

$$a_1 = -\frac{A_1}{B_1} \quad ; \quad c_1 = B_1$$

Only two parameters are resolvable from the form of integral equation that we chose. The third parameter appears in the numerical calculations, but not directly in the equation, so another regression is necessary to obtain it. In fact, considering exp-fx, the result would be further improved by computing the regression on both a and b:

$$\sum_{k=1}^n \varepsilon_k^2 = \sum_{k=1}^n \left( a + b \; \text{exp}(c_1 \; x_k) - y_k \right)^2$$

Setting:

c₂ = c₁ ; θ_k = exp(c₂ x_k)

the resulting regression is:

$$\begin{aligned} \begin{bmatrix}a_2 \\ b_2\end{bmatrix} = \begin{bmatrix} n & \sum \theta_k \\\ \sum \theta_k & \sum \theta_k^2 \end{bmatrix}^{-1} \begin{bmatrix} \sum y_k \\\ \sum y_k \theta_k \end{bmatrix} \end{aligned}$$

Here is a summary of the computation:

listing

To illustrate this calculation (reei-exp-plot), numerical data were generated in the following manner: x_k were drawn at random from the domain under consideration. Initially, "exact" values a_e, b_e and c_e defined the function y(x), which we call the "true" form of equation exp-fx, and whose curve is traced by a dotted line in the figure. The exact values of y(x_k) were then assigned perturbations of an amplitude drawn at random from a range between - and +10% of y(x_k), which, after rounding, produced the numrical values of y_k in reei-exp-data, represented by crosses in the figure.

Finally, the result (a₂, b₂, c₂) was substituted into equation exp-fx, to obtain the fitted curve, plotted as a solid line.

$A sample regression for the function y = a + b \; \text{exp}(c \; x)$

A sample regression for the function y = a + b exp(c x)

To form an objective opinion of the qualities and defects of the method presented here, it would be necessary to perform a systematic study of a very large number of cases and examples. It is certain even now that the deviations caused by errors in numerical integration would be considerably reduced as the number of points increases and their spacing becomes more uniform.

3. Regression of the Three-Parameter Weibull Cumulative Distribution Function

The Weibull cumulative distribution function of three parameters (α, β, and μ) is defined by:

$$F(t) = 1 - \text{exp} \left( -\left( \frac{t - \mu}{\beta} \right)^\alpha \right)$$

For data given as (t₁, F₁), ..., (t_k, F_k), ..., (t_n, F_n), we seek to optimize μ, β, and α in such a way that relationship weibull-cdf-fx is approximately and optimally satisfied across the n data points.

The inverse function of weibull-cdf-fx is:

$$t = \mu + \beta \left( -\text{ln}(1 - F) \right)^{\frac{1}{\alpha}}$$

Setting:

x = ln(−ln(1−F))

and:

$$y = t \quad ; \quad a = \mu \quad ; \quad b = \beta \quad ; \quad c = \frac{1}{\alpha}$$

We can see immediately that we have transformed the problem into exp-fx, shown above: y = a + b exp(c x).

The algorithm is deduced immediately as we transpose the notation:

listing

In graphical representations, it is customary to display ln(t_k) as the abscissa and ln( − ln(1 − F_k)) as the ordinate. This is a legacy of the graphical linearization method used for the case where μ = 0. In order to respect this tradition, we will interchange the axes and display ln(t_k) as the abscissa and x_k as the ordinate.

Weibull's law applies generally to the failure of materials and objects. Since the variable t is time, the t_k and F_k are effectively sorted in ascending order.

A sample regression is shown in reei-weibull-cdf-plot. To simulate an experiment, numerical data (t_k, F_k) were generated from "exact" values α_e, β_e, μ_e, defining the "true" F(t) according to weibull-cdf-fx, whose curve is shown as a dashed line in the figure. A value of t_k calculated from weibull-cdf-inv corresponds to each F_k. These values of t are subjected to random deviations, simulating experimental uncertainty, which gives us the values of t_k in reei-weibull-cdf-data. Finally, the result (α_c, β_c, and μ_c) substituted into weibull-cdf-fx gives the "fitted" function F_c(t) (whose curve is plotted as a solid line):

$$F_c(t) = 1 - \text{exp} \left( - \left( \frac{t - \mu_c}{\beta_c} \right)^{\alpha_c} \right)$$

A sample regression of a Weibull cumulative distribution function of three parameters.

The representation in the traditional coordinate system shows that the presence of a non-zero μ_c parameter prevents the usual graphical linearization, which is naturally to be expected. Carrying out the regression through an integral equation allows us to linearize and obtain a curve which may be substituted for the traditional method, appreciably improving the fit.

4. Discussion

The preceding examples (Section 2 <reei2-sec2> and Section 3 <reei2-sec3>), show how a non-linear regression can be reduced to a linear one given the appropriate integral equation. In this manner, the usual iterative procedures can be relpaced by a simple linear calculation.

center

Regression of Sinusoids

center

Jean Jacquelin

center

The first revision of the paper Regressions of Sinusoids was dated 01/09/2009.
The current version was published on 02/15/2009.

1. Introduction

Under the seemingly innocuous title "Regression of Sinusoids", ought to appear the subtitle "An Optimization Nightmare" to add a touch of realism. Indeed, we must have already been concerned with the problem to fully understand the relevance of the word. But what is it, really?

As with any number of similar problems, the data is comprised of n experimental points (x₁, y₁), (x₂, y₂), ..., (x_k, y_k), ..., (x_n, y_n). We seek to adjust the parameters of a function y = f(x) in such a way that its curve passes "as closely as possible" to the data points. In this particular case, we will deal with the following sinusoidal function of the four parameters a, b, c, and ω:

f(x) = a + b sin(ω x) + c cos(ω x)

This function is equivalent to:

$$\begin{aligned} \begin{cases} f(x) = a + \rho \; \text{sin}(\omega \; x + \varphi) \\\ \rho = \sqrt{b^2 + c^2} \quad ; \quad b = \rho \; \text{cos}(\varphi) \quad ; \quad c = \rho \; \text{sin}(\varphi) \end{cases} \end{aligned}$$

The expression "as closely as possible" implies an optimization criterion. Specifically, we consider the sum of the squared residuals:

$$\varepsilon^2_{a, b, c, \omega} = \sum_{k=1}^n \left( y_k - f(x_k) \right )^2 = \sum_{k=1}^n \left( y_k - \left( a + b \; \text{sin}(\omega \; x_k) + c \; \text{cos}(\omega \; x_k) \right) \right)^2$$

Since that is the sum that we wish to minimize, we have the generic name "least squares method".

When ω is known a-priori, the solution is trivial. Equation sin-fx effectively becomes linear in the optimization parameters (a, b and c). This well defined case merits only the briefest mention, which will be made in the next section.

In all other cases, the regression (or optimization) is non-linear, due to the fact that the sum of squares has a non-linear dependency on ω.

An almost equally favorable situation arises when we have a "good enough" approximation for ω, suitable to initialize an arbitrary non-linear optimization method. Literature is full of descriptions of such methods and many are implemented in software. To discuss these methods further would exceed the scope of the framework of this paper.

But the dreaded nightmare is not far. It is crucial that the initial estimate for ω is "good enough"... And this is where sinusoidal functions differ from more accomodating non-linear functions: The more periods that the x_k span, the more randomly they are distributed, or the more noise is present in the y_k values, the more the condition "good enough" must be replaced with "very good", tending to "with great precision". In other words, we must know the value of ω we are looking for in advance!

The original method proposed in Section 3 <reei3-sec3> provides a starting point for addressing this challenge. Admittedly, it would be wrong to pretend that the method is particularly robust: Section 4 <reei3-sec4> will discuss some of its deficiencies. Nevertheless, thanks to the first result, we will see in Section 5 <reei3-sec5> that an original regression method (a saw's tooth), allows for a much more accurate approximation of ω through an improved linearization. Finally, Section 6 <reei3-sec6> presents a summary of preformance in the course of systematic experiments. A synopsis of the entire method, which does not involve any iterative calculations, is presented in the Appendices <reei3-appendix1>.

Before getting into the heart of the matter, a warning must be given regarding some of the figures presented here (reei-sin-exact-plot, reei-sin-nomega-plot, reei-sin-int-plot, reei-sin-saw-plot, reei-sin-final-plot). They serve merely as illustrations of the procedures being described. To create them, we were obligated to fix on a particuar numeroical data set, which is not necessarily representative of the multitudes of possible cases. Given these figures alone, it would be absurd to form any opinion of the method in question, favorable or otherwise. This is especially true as the example has been selected with data that exaggerate all the defects that are explained in the text, to allow for easy identification and unambiguous discussion of the important features.

We see in reei-sin-exact-data that our "experimental" points are sparse, non-uniformly distributed, and widely dispersed. One might suspect that this is not real experimental data, but rather a simulation obtained in the following way: An "exact" function is given, along with the coefficients a_e, b_e, c_e, and ω_e, indicated in reei-sin-exact-data. The curve is plotted in reei-sin-exact-plot with a dashed line. The x_k were selected at random from the domain, and rounded to the values shown in reei-sin-exact-data. The corresponding exact y values are then computed from equation sin-fx. The y_k were then randomly perturbed with a distribution whose root mean square σ_e was approximately 10% of the amplitude ρ_e of the sinusoid. This represents a much wider dispersion compared to what is usually seen in practice.

"Exact" sinusoid along with the numerical data of our example.

The root mean square is defined by:

$$\sigma = \sqrt{\frac{1}{n} \sum_{k=1}^n \left( f(x_k) - y_k \right)^2}$$

The values of the function f(x) are to be computed using the parameters a, b, c and ω from the example under consideration.

The values of y_k, indicated in reei-sin-exact-data and plotted in reei-sin-exact-plot, are the data for the numerical examples in the following sections. It should be noted that the exact values a_e, b_e, c_e and ω_e, shown in reei-sin-exact-data are not used further on (except ω_e in the "trivial" case in Section 2 <reei3-sec2>). They should be forgotten for the remainder of the algorithm, which uses only (x_k, y_k) as data. While the exact sinusoid will be shown on the various figures as a reminder, it is not an indication of the fact that the function sin-fx with the parameters (a_e, b_e, c_e, ω_e) is ever used directly.

The tables serve another role, which will be appreciated by readers interested in implementing these techniques in a computer program. If so desired, all of the sample calculations shown here can be reproduced exactly with the data and results reported below. This provides a means to validate and correct a regression program implementing the algorithms described here.

2. Case Where ω is Known A-Priori

When the value ω = ω_e is fixed, the optimization only needs to be performed for the parameters a, b and c of sin-fx. Taking the partial derivatives of sin-resid with respect to the optimization parameters result in a system of three equations:

$$\begin{aligned} \begin{cases} \left( \frac{\partial \varepsilon^2}{\partial a} \right)_ {(a_0, b_0, c_0)} = -2 \sum_{k=1}^n \left( y_k - \left( a_0 + b_0 \; \text{sin}(\omega_e \; x_k) + c_0 \; \text{cos}(\omega_e \; x_k)\right) \right) = 0 \\\ \left( \frac{\partial \varepsilon^2}{\partial b} \right)_ {(a_0, b_0, c_0)} = -2 \sum_{k=1}^n \left( y_k - \left( a_0 + b_0 \; \text{sin}(\omega_e \; x_k) + c_0 \; \text{cos}(\omega_e \; x_k)\right) \right) \; \text{sin}(\omega_e \; x_k) = 0 \\\ \left( \frac{\partial \varepsilon^2}{\partial c} \right)_ {(a_0, b_0, c_0)} = -2 \sum_{k=1}^n \left( y_k - \left( a_0 + b_0 \; \text{sin}(\omega_e \; x_k) + c_0 \; \text{cos}(\omega_e \; x_k)\right) \right) \; \text{cos}(\omega_e \; x_k) = 0 \\\ \end{cases} \end{aligned}$$

The solution is given in the following system sin-nomega-lsq, with the convention that $\sum \equiv \sum_{k=1}^n$:

$$\begin{aligned} \begin{bmatrix} a_0 \\ b_0 \\ c_0 \end{bmatrix} = \begin{bmatrix} n & \sum \text{sin}(\omega_e x_k) & \sum \text{cos}(\omega_e x_k) \\\ \sum \text{sin}(\omega_e x_k) & \sum \text{sin}^2(\omega_e x_k) & \sum \text{sin}(\omega_e x_k) \text{cos}(\omega_e x_k) \\\ \sum \text{cos}(\omega_e x_k) & \sum \text{sin}(\omega_e x_k) \text{cos}(\omega_e x_k) & \sum \text{cos}^2(\omega_e x_k) \end{bmatrix}^{-1} \begin{bmatrix} \sum y_k \\\ \sum y_k \; \text{sin}(\omega_e x_k) \\\ \sum y_k \; \text{cos}(\omega_e x_k) \end{bmatrix} \end{aligned}$$

The result is presented in figure reei-sin-nomega-plot. We note that the root mean square σ₀ is practically identical to σ_e in reei-sin-exact-data. This indicates that the regression did not increase the residuals of the fitted sinusoid relative to the "exact" one. Obviously, this is too good to always be true. So, after this brief review, we must tackle the crux of the problem: to compute a sufficiently accurate approximation for ω, since it will not be know a-priori as it was in the preceding example.

$Case where \omega is know exactly.$

Case where ω is know exactly.

3. Linearization Through an Integral Equation

The goal being to reduce the problem to a linear regression, it is sometimes tempting to use a linear differential equation whose solution is our function of interest as the intermediary. An example is the follwing equation, for which the sinusiod sin-fx is a solution[errata-reei-12]:

$$f(x) = a - \frac{1}{\omega^2} \frac{d^2 f(x)}{dx^2}$$

Taking the appropriate partial derivatives of sin-diff-resid leads to a system of linear equations of two unknowns a and $\beta = \frac{1}{\omega^2}$.

$$\varepsilon^2_{(a, b, c, \omega)} = \sum_{k=1}^n \left( y_k - f(x_k) \right)^2 = \sum_{k=1}^n \left( y_k - a - \beta y''(x_k) \right)^2$$

Unfortunately, this is not a viable solution in practice (unless we have at our disposal a very large number of especially well distributed samples). The stumbling block here is the calculation of y″(x_k) from the n data points (x_k, y_k): as illustrated in reei-sin-int-plot, the fluctuations are generally too unstable for meaningful approximation.

On the other hand, the computation of numerical integrals is clearly less problematic than that of derivatives. It is therefore no surprise that we will seek an integral equation having for its solution our function of interest. For example, in the present case, we have equation sin-int-eq, for which the sinusoid sin-fx is a solution:

f(x) = − ω²∫_{x_i}^x∫_{x_j}^vf(u)du dv + P(x)

P(x) is a second-degree polynomial whose coefficiens depend on a, b, c, ω and the limits of integration x_i and x_j.

We could, of course, go into a great amount of detail regarding P(x), but that would derail the presentation without much immediate benefit. We can simply use the fact that a complete analysis shows that the limits of integration have no bearing on the regression that follows, at least with regard to the quantity of interest, the optimized value of ω (noted as ω₁). Therefore, to simplify, we will set x_i = x_j = x₁, which results in a function of the following form:

f(x) = A SS(x) + B x² + C x + D

$$\begin{aligned} \text{with:} \; \begin{cases} SS(x) = \int_{x_1}^x \int_{x_1}^v f(u) du \; dv \quad ; \quad A = -\omega^2 \quad ; \quad B = \frac{1}{2}a \; \omega^2 \\\ C = -a \; \omega^2 \; x_1 + b \; \omega \; \text{cos}(\omega \; x_1) - c \; \omega \; \text{cos}(\omega \; x_1) \\\ D = a + \frac{1}{2} \; a \; \omega^2 \; x_1^2 + (b + c \; \omega \; x_1) \; \text{sin}(\omega \; x_1) + (c - b \; \omega \; x_1) \; \text{cos}(\omega \; x_1) \end{cases} \end{aligned}$$

The coefficients A, B, C, D are unknown. However, they can be approximated through linear regression based on the precomputed values of SS_k. To accomplish this, we perform two numeric integrations:

$$\begin{aligned} \begin{cases} S(x_1) = S_1 = 0 \\\ S(x_k) \simeq S_k = S_{k-1} + \frac{1}{2}(y_k + y_{k-1}) (x_k - x_{k-1}) \quad k = 2 \; \text{to} \; n \end{cases} \end{aligned}$$

$$\begin{aligned} \begin{cases} SS(x_1) = SS_1 = 0 \\\ SS(x_k) \simeq SS_k = SS_{k-1} + \frac{1}{2}(S_k + S_{k-1}) (x_k - x_{k-1}) \quad k = 2 \; \text{to} \; n \end{cases} \end{aligned}$$

It goes without saying that the points must first be ranked in order of ascending x_k. It follows that the sum of squared residuals to minimize is the following[errata-reei-13]:

$$\varepsilon^2_{a, b, c, \omega} = \sum_{k=1}^n \left( y_k - \left( A \; SS_k + B \; x_k^2 + C \; x_k + D \right) \right)^2$$

There is no need to reiterate the partial derivatives with respect to A, B, C and D, used to obtain the optimized A₁, B₁, C₁ and D₁, followed by a₁, b₁, c₁ and ω₁ (according to sin-int-vars):

$$\begin{aligned} \begin{bmatrix} A_1 \\ B_1 \\ C_1 \\ D_1 \end{bmatrix} = \begin{bmatrix} \sum \left( SS_k \right)^2 & \sum x_k^2 \; SS_k & \sum x_k \; SS_k & \sum SS_k \\\ \sum x_k^2 \; SS_k & \sum x_k^4 & \sum x_k^3 & \sum x_k^2 \\\ \sum x_k \; SS_k & \sum x_k^3 & \sum x_k^2 & \sum x_k \\\ \sum SS_k & \sum x_k^2 & \sum x_k & n \end{bmatrix}^{-1} \begin{bmatrix} \sum y_k \; SS_k \\ \sum y_k x_k^2 \\ \sum y_k x_k \\ \sum y_k \end{bmatrix} \end{aligned}$$

$$\begin{aligned} \begin{cases} \omega_1 = \sqrt{-A_1} \quad ; \quad a_1 = \frac{2 B_1}{\omega_1^2} \\\ b_1 = \left( B_1 \; x_1^2 + C_1 \; x_1 + D_1 - a_1 \right) \text{sin}(\omega \; x_1) + \frac{1}{\omega_1} \left( C_1 + 2 B_1 x_1 \right) \text{cos}(\omega \; x_1) \\\ c_1 = \left( B_1 \; x_1^2 + C_1 \; x_1 + D_1 - a_1 \right) \text{cos}(\omega \; x_1) - \frac{1}{\omega_1} \left( C_1 + 2 B_1 x_1 \right) \text{sin}(\omega \; x_1) \end{cases} \end{aligned}$$

The results for our sample dataset are displayed further along in reei-sin-final-plot. The numerical values of ω₁, a₁, b₁ and c₁ are shown in reei-sin-final-data. To compare the graphical and numerical representations with resepect to the data points (x_k, y_k), refer to the curve and table column labeled (1).

Incidentally, it is interesting to compare the results of numerical integrations sin-int-int1 and sin-int-int2 with respect to differentiation (reei-sin-int-plot). Observing the significant oscillations of the latter (a sample point is noted f′_k on the plot), and how they would be exacerbated by further differentiation (not shown due to the excessive fluctuation amplitude), it becomes patently obvious that a feasible solution must be based on integral equations rather than differential.

Numerical integrations and comparison with differentiations.[errata-reei-14].

While noting that with more numerous and uniformly distributed x_k, coupled with less scattered y_k would result in more acceptable fluctuations in the derivatives, we must also remember that a good solution must apply not only to the trivial cases, but the difficult ones as well.

4. A Brief Analysis of Performance

The most important parameter to optimize is ω. This comes with the understanding that once the optimization has succeeded, we can always fall back to the conventional regression method in Section 2 <reei3-sec2> to obtain a, b and c. We will therefore concentrate on an investigation restricted to results with respect to ω. The three most influential factors are:

The number of points, n_p, in each period of the sinusoid.

The distribution of samples in the domain:

Either uniform: x_k + 1 − x_k = constant

Or random: x_k is drawn at random from the available domain.

The scatter of the ordinates y_k, characterized by (σ₁/ρ_e), the ratio of the root mean squared error sin-rms to the amplitude of the sinusoid.

4.1 Uniform Distribution of Abscissae With Non-Dispersed Ordinates

We find that the ratio ω₁/ω_e, which is expected to be equal to 1, is affected by a deviation inversely proportional to n_p (reei-sin-eq-nd-plot). It is conceivably possible to construct an empirical function to correct the deviation. But such a function would not be very interesting, as the correction would not be satisfactory for the randomly distributed points we well investigate further on. A more general approach, described in Section 5 <reei3-sec5>, seems more appropriate.

Effect of the number of points per period, with a uniform distribution.

Place table next to figure if possible: .figure {float: left;}?

4.2 Random Distribution of Abscissae With Non-Dispersed Ordinates

Since x_k are randomly distributed, the dispersion in the calculated values of ω₁ increases as n_p becomes small. For a fixed value of n_p, the cumulative distribution (computed from 10000 simulations), is represented in reei-sin-rand-nd-plot. It can be seen that the expected median ω_1m/ω_e, which is expected to be equal to 1, is affected even more than under the conditions described in Section 4.1 <reei3-sec4-1>.

$Cumulative distributions of \omega_1 (random distribution of x_k, non-dispersed y_k).$

Cumulative distributions of ω₁ (random distribution of x_k, non-dispersed y_k).

Place table next to figure if possible: .figure {float: left;}?

4.3 Random Distribution of Abscissae With Dispersed Ordinates

As expected, the dispersion of ω₁ is greater than in the previous case. reei-sin-rand-d-plot shows the the case where (σ₁/ρ_e) = 10%, compared to (σ₁/ρ_e) = 0 as represented in reei-sin-rand-nd-plot. Nevertheless, the median values are barely affected (compare reei-sin-rand-d-data to the previous data in reei-sin-rand-nd-data).

$Cumulative distributions of \omega_1 with randomly distributed ordinates such that (\sigma_1/\rho_e = 0.1)$

Cumulative distributions of ω₁ with randomly distributed ordinates such that (σ₁/ρ_e = 0.1)

Place table next to figure if possible: .figure {float: left;}?

5. Further Optimizations Based on Estimates of a and ρ

We now shift our interest to a function y = f(x), expressed in the form sin-fx2. The inverse function is an arcsine, but can also be expressed as an arctangent:

$$\begin{aligned} \begin{cases} \Phi(x) = \text{arctan} \left( \frac{f(x) - a} {\sqrt{\rho^2 - \left( f(x) - a \right)^2}} \right) \\\ \omega \; x + \varphi = \pm \Phi(x) + \pi K_{(x)} \end{cases} \end{aligned}$$

arctan denotes the principal values (between $-\frac{\pi}{2}$ and $\frac{\pi}{2}$) of the multivalued function. The sign of Φ and the integer K_(x) depend on the half-period of the sinusoid in which the point (x, y) is found, making their dependence on x discontinuous. Additionally, we can show that the sign is + when K_(x) is even and - when odd.

If we consider Φ(x) by itself, it is a sawtooth function, as shown in reei-sin-saw-plot by the dotted curve. The points (x_k, Φ_k), with Φ_k = Φ(x_k), are represented by crosses. Rewritten in this manner, the problem becomes one of regression of a sawtooth function, which is largely as "nightmarish" as regression of the sinusoid. Since we have no information other than the points (x_k, Φ_k), determining K_(x) difficult and mostly empirical, and therefore not guaranteed to always be feasible in the general case. The situation is different in our case because we have already found the orders of magnitude of the parameters: a₁, b₁, c₁ and ω₁, and consequently ρ₁ and φ₁ through sin-fx2. In the current instance, which will be represented by the subscript 2, we posit that:

$$\begin{aligned} a_2 = a_1 \quad ; \quad \rho_2 = \rho_1 = \sqrt{b_1^2 + c_1^2} \quad ; \quad \rho_1 \; \text{cos}(\varphi_1) = b_1 \quad ; \quad \rho_1 \; \text{sin}(\varphi_1) = c_1 \\\ \text{If} \; b_1 > 0 \quad \rightarrow \quad \varphi_1 = \text{arctan} \left( \frac{c_1}{b_1} \right) \quad ; \quad \text{if} \; b_1 < 0 \quad \rightarrow \quad \varphi_1 = \text{arctan} \left( \frac{c_1}{b_1}\right) + \pi \end{aligned}$$

Now it is possible to calculate K₁, K₂, ..., K_k, ..., K_n in number of different ways. One possibility is as follows, where round rounds the real argument to the nearest integer:

$$K_k = \text{round} \left( \frac{\omega_1 \; x_k + \varphi_1}{\pi} \right)$$

Another way to write sin-inv-fx, applied to x = x_k, better demonstrates the nearly linear relationship θ_k ≈ ω₂ x_k + φ₂ between x_k and θ_k, defined by:

$$\begin{aligned} \begin{cases} \theta_k = (-1)^{K_k} \; \text{arctan} \left( \frac{y_k - a_2} {\sqrt{\rho_2^2 - (y_k - a_2)^2}^2} \right) + \pi K_k \\\ \text{if} \; \rho_2^2 \leq (y_k - a_2)^2 \quad \rightarrow \quad \text{arctan} = +\frac{\pi}{2} \; \text{if} \; y_k > a_2 \; \text{or} \; = -\frac{\pi}{2} \; \text{if} \; y_k < a_2 \end{cases} \end{aligned}$$

We see in reei-sin-saw-plot, that the series of points (x_k, Φ_k) represented by crosses, is transformed into a series of points (x_k, θ_k) represented by squares, some of which coincide.

Transformation of a sawtooth function in preparation for linear regression[errata-reei-16]

It is clear that the points (x_k, θ_k) tend towards a straight line, which makes it possible to perform a linear regression to obtain ω₂ from the slope and φ₂ from the x-intercept. Once we have computed θ_k from sin-theta, the result can be obtained in the usual manner:

$$\begin{aligned} \begin{bmatrix} \omega_2 \\ \varphi_2 \end{bmatrix} = \begin{bmatrix} \sum (x_k)^2 & \sum x_k \\ \sum x_k & n \end{bmatrix}^{-1} \begin{bmatrix} \sum \theta_k x_k \\ \sum \theta_k \end{bmatrix} \end{aligned}$$

Completed by:

b₂ = ρ₂ cos(φ₂) ; c₂ = ρ₂ sin(φ₂)

The numerical results are summarized in reei-sin-final-data (in the table column labeled (2)), and the curve is shown in reei-sin-final-plot.

The performance measured in terms of the optimization of ω is shown in reei-sin-rand-nd2-plot and reei-sin-rand-d2-plot. Note the improvements in the optimized median value, ω_m, shown in reei-sin-rand-nd2-data and reei-sin-rand-d2-data, which was the objective of this phase of the algorithm.

$Cumulative distributions of \omega_2 (random distribution of x_k, non-dispersed y_k).$

Cumulative distributions of ω₂ (random distribution of x_k, non-dispersed y_k).

Fix the broken figure

Fix the broken table

$Cumulative distributions of \omega_2 with randomly distributed ordinates such that (\sigma_2/\rho_e = 0.1)$

Cumulative distributions of ω₂ with randomly distributed ordinates such that (σ₂/ρ_e = 0.1)

Fix the broken figure

Fix the broken table

6. Final Steps and Results

As soon as we have a good approximation (ω₂) of ω, the classical method shown in Section 2 <reei3-sec2> can be applied:

$$\begin{aligned} \begin{bmatrix} a_3 \\ b_3 \\ c_3 \end{bmatrix} = \begin{bmatrix} n & \sum \text{sin}(\omega_2 \; x_k) & \sum \text{cos}(\omega_2 \; x_k) \\\ \sum \text{sin}(\omega_2 \; x_k) & \sum \text{sin}^2(\omega_2 \; x_k) & \sum \text{sin}(\omega_2 \; x_k) \; \text{cos}(\omega_2 \; x_k) \\\ \sum \text{cos}(\omega_2 \; x_k) & \sum \text{sin}(\omega_2 \; x_k) \; \text{cos}(\omega_2 \; x_k) & \sum \text{cos}^2(\omega_2 \; x_k) \end{bmatrix}^{-1} \begin{bmatrix} \sum y_k \\\ \sum y_k \; \text{sin}(\omega_2 \; x_k) \\\ \sum y_k \; \text{cos}(\omega_2 \; x_k) \end{bmatrix} \end{aligned}$$

This allows us to perform a final optimization of the dimensional parameters of the sinusoid, which was not possible in the previous step because a₂ and ρ₂ were fixed. The final result is presented in reei-sin-final-plot. The numerical results are summarized in reei-sin-final-data (in the table column labeled (3)). The intermediate results (sinusoids (1) and (2), along with their respective parameters) are reported in the same figure and table.

Regression results for a sample sinusoid.

This figure might give the appearance that the prodedure is apparently converging the solid curves with the dotted one with successive approximations. However, attempts to use this approach iteratively would have no effect: repeating the calculations would not modify the final result because it is based on the initial numerical integrations, whose inherent noise will not decrease with further iterations.

Sampling a large number of simulations of different conditions would be required to form an objective opinion of the properties of this method. We can refer to reei-sin-rand-nd2-data and reei-sin-rand-d2-data for a summary of the results for the the optimization of ω, which is the essential part of the algorithm. Naturally, the final results are unchanged from what is shown in reei-sin-rand-nd2-plot and reei-sin-rand-d2-data since ω₃ = ω₂.

For each simulated value of (ω₃, a₃, b₃, c₃), the root mean square is computed from

$$\sigma_3 = \sqrt{\frac{1}{n} \sum_{k=1}^n \left( y_k - \left( a_3 + b_3 \; \text{sin}(\omega_3 \; x_k) + c_3 \; \text{cos}(\omega_3 \; x_k) \right) \right)^2}$$

The distribution of (σ₃/ρ_e) for each fixed value of n_p is plotted from 10000 simulations, each with a different set of (x_k, y_k), since the ordinates are perturbed with a characteristic root mean square of σ_e about the nominal (defined in Section1 <reei3-sec1>). For the plot in figure sin-11, the dispersed ordinates are generated so that the ratio σ_e/ρ_e is always a constant, in this case 0.1. The abscissae, on the other hand, are generated differently:

reei-sin-rms-a-plot: The abscissae of successive points are distributed uniformly.

reei-sin-rms-b-plot: The abscissae are distributed randomly (with n_p points per full period)

It might seem surprising that the root mean squared error σ₃ is smaller than the initial σ_e. The explanation is that the "optimized" sinusoid will often pass closer to the data points than the "exact" one, especially with more dispersed points. The differences stem from the optimization of ω₂, which can differ significantly from ω_e, as we saw in reei-sin-rand-nd2-plot and reei-sin-rand-d2-plot. Consequently, for dispersed ordinates, the error does not increase as significantly as we might have feared in moving away from uniformly distributed abscissae.

Cumulative distribution functions for the root mean squared error, with uniformly distributed absissae.

Cumulative distribution functions for the root mean squared error, with non-uniformly distributed absissae.

The biggest difference between the uniform and non-uniform cases is the failure rate of the process. This must be mentioned, since it is the common lot of all methods when the points are irregularly distributed. Inevitably (not frequently, but nevertheless with a non-zero probabililty), we run into data sets in which all the points are tightly clustered. The sinusoid for such data is not defined, and therefore can be neither characterized nor approximated. This can be expressed differently in different portions of the computation, for example through indeterminacy (division by a number too close to zero), non-invertibility of one of the matrices, square root of a negative number, etc.

The calculation of ω₁ in sin-int-params is where the majority of failures occur in this method, fortunately quite rarely. For the hundreds of thousands of simulations performed with uniformly distributed abscissae, no failure was ever encountered. This is not surprising since the points could never be clumped in that situation. The case with randomly selected abscissae, on the other hand, showed a very low failure rate, dependent on both the dispersion and the quantity of data points under consideration, as shown in reei-sin-fail-plot (which required nearly five hundred thousand simulations for a coherent view to emerge).

Failure rates (orders of magnitude).

7. Discussion

center

Appendix 1: Summary of Sinusoidal Regression Algorithm

center

Appendix 2: Detailed Procedure for Sinusoidal Regression

center

Application to the Logistic Distribution (Three Parameters)

center

Application to the Logistic Distribution (Four Parameters)

center

Mixed Linear and Sinusoidal Regression

center

Generalized Sinusoidal Regression

center

Damped Sinusoidal Regression

center

Double Exponential Regression & Double Power Regression

center

Multivariate Regression

Rather than a single variable, x, the objective function depends on multiple variables, x, t, ....

Review of the Linear Case

In the simplest case, the function y(x, t, ...) is linear with respect to the optimization parameters λ₁, λ₂, ..., λ_j, ..., λ_m:

y(x, t, ...) = λ₁f₁(x, t, ...) + ... + λ_jf_j(x, t, ...) + ... + λ_mf_m(x, t, ...)

The given functions f_j(x, t, ...) do not themselves depend on the optimization parameters.

Given the n data points (x₁, t₁, ...; y₁), (x₂, t₂, ...; y₂), ..., (x_k, t_k, ...; y_k), ..., (x_n, t_n, ...; y_n)[errata-reei-17], the fitting parameters can be computed using the least-squares method:

With: f_j, k = f_j(x_k, t_k, ...)

$$\begin{aligned} \begin{bmatrix} \lambda_1 \\ ... \\ \lambda_j \\ ... \\ \lambda_m \end{bmatrix} = \begin{bmatrix} \sum_{k=1}^n \left(f_{1,k}\right)^2 & ... & \sum_{k=1}^n f_{1,k} f_{j,k} & ... & \sum_{k=1}^n f_{1,k} f_{m,k} \\\ ... & ... & ... & ... & ... \\\ \sum_{k=1}^n f_{1,k} f_{j,k} & ... & \sum_{k=1}^n \left(f_{j,k}\right)^2 & ... & \sum_{k=1}^n f_{j,k} f_{m,k} \\\ ... & ... & ... & ... & ... \\\ \sum_{k=1}^n f_{1,k} f_{m,k} & ... & \sum_{k=1}^n f_{j,k} f_{m,k} & ... & \sum_{k=1}^n \left(f_{m,k}\right)^2 \end{bmatrix}^{-1} \begin{bmatrix} \sum_{k=1}^n y_k f_{1,k} \\ ... \\\ \sum_{k=1}^n y_k f_{j,k} \\ ... \\ \sum_{k=1}^n y_k f_{m,k} \end{bmatrix} \end{aligned}$$

Non-Linear Case

If one or more of the functions depend on the fitting parameter(s), the corresponding y(x, t, ...) will no longer be linear with respect to the optimization parameters:

y(x, t, ...) = b₁φ₁(p₁, p₂, ...; x, t, ...) + b₂φ₂(p₁, p₂, ...; x, t, ...) + ... + λ₁f₁(x, t, ...) + ... + λ_jf_j(x, t, ...) + ... + λ_mf_m(x, t, ...)

The functions given as φ₁, φ₂, ... depend on the fitting parameters p₁, p₂, .... The complete set of parameters to optimize is therefore p₁, p₂, ..., b₁, b₂, ..., λ₁, ..., λ_j, ..., lambda_m. The least squares method can not be applied directly in this case. A multitude of different methods exist, based generally on forming an initial estimate of the non-linear parameters (p₁, p₂, ...), followed by recursive approximations to progressively optimize the initial guess.

The method studied here follows from a very different principle. We perform one or more integrations of y(x, t, ...), for example ∫y(x, y, ...)dx, or ∫y(x, t, ...)dt, or ∫∫y(x, t, ...)dxdt, or other multiple integrals.

We can can also introduce new functions g(x, t, ...), to simplify the integrals: ∫g(x, t, ...)y(x, t, ...)dx, or ∫g(x, t, ...)y(x, t, ...)dt, or ∫∫g(x, t, ...)y(x, t, ...)dxdt, or other multiple integrations.

We can see that the possibilities are endless. It is sometimes possible to form linear combinations of such equations that cancel out the non linear terms depending on the parameters p₁, p₂, ....

For Example, consider the following function:

y(x, t) = bexp(pxt) + λ(x − t)²

$$\begin{aligned} \begin{cases} b_1 = b \quad ; \quad b_2 = b_3 = ... = 0 \\ p_1 = p \quad ; \quad p_2 = p_3 = ... = 0 \\ \varphi(p_1, p_2, ...; x, t, ...) = \text{exp}(p x t) \\ \lambda_1 = \lambda \quad ; \quad \lambda_2 = \lambda_3 = ... = 0 \\ f_1(x, t, ...) = (x - t)^2 \end{cases} \\ \end{aligned}$$

$$\begin{aligned} \begin{cases} \text{with :} \\ g(x, t) = t \end{cases} \begin{cases} \int g(x, t) y(x, y) dx = \int t b \text{exp}(p x t) dx + \int t \lambda(x - t)^2 dx \\ = \frac{b}{p} \text{exp}(p x t) + \frac{\lambda}{3} t (x-t)^3 + ... \end{cases} \end{aligned}$$

We can see that a linear combination between this equation and the definition of y(x, t) can be used to cancel out the term exp(pxt):

$$\begin{aligned} y(x, t) = p \int t y(x, t) dx - \frac{1}{3} \lambda p t (x - t)^3 + \lambda (x - t)^2 + ... \\\ y(x, t) = p S(x, t) + \lambda_1 f_1(x, t) + \lambda_2 f_2(x, t) + ... \end{aligned}$$$$\begin{aligned} \begin{cases} S(x, t) = \int t y(x, t) dx \\\ f_1(x, t) = t(x - t)^3 \\\ f_2(x, t) = (x - t)^2 \end{cases} \end{aligned}$$

S(x, t) will be approximated numerically from the data.

This relationship is linear in p, λ₁, λ₂. Least squares regression gives us the desired approximation of p. This reduces the equation for y(x, t) to a linear relationship with regards to b and lambda.

Nevertheless, the equations above are not quite correct, as they were deliberately left incomplete, so as to simplify the following explanation. The integrals can not be left in their indefinite form, especially when it comes to numerical integration. It is necessary to define a lower limit if integration. This requirement does not pose any fundamental difficulty for a function y(x) of one variable. It does, however, become a sensitive problem for functions y(x, t, ...) of multiple variables. The potential problems with numerical integration must be examined carefully in the presence of multiple variables (x, t, ...), even when the integration is only carried out over one of them.

Files

translation.rst

Latest commit

History

translation.rst

File metadata and controls

REGRESSIONS et EQUATIONS INTEGRALES

Translator's Note

Regressions and Integral Equations

Abstract

1. Introduction

2. Principle of Linearization Through Differential and/or Integral Equations

3. Example: Illustration of the Gaussian Probability Density Function

4. Discussion

Appendix 1: Review of Linear Regression

Appendix 2: Linear Regression of the Gaussian Cumulative Distribution Function

Listings for the functions Erf and argErf

Non-Linear Regression of the Types: Power, Exponential, Logarithmic, Weibull

Abstract

1. Introduction

2. Regression of Functions of the Form y(x) = a + b exp(c x)

3. Regression of the Three-Parameter Weibull Cumulative Distribution Function

4. Discussion

Regression of Sinusoids

1. Introduction

2. Case Where ω is Known A-Priori

3. Linearization Through an Integral Equation

4. A Brief Analysis of Performance

4.1 Uniform Distribution of Abscissae With Non-Dispersed Ordinates

4.2 Random Distribution of Abscissae With Non-Dispersed Ordinates

4.3 Random Distribution of Abscissae With Dispersed Ordinates

5. Further Optimizations Based on Estimates of a and ρ

6. Final Steps and Results

7. Discussion

Appendix 1: Summary of Sinusoidal Regression Algorithm

Appendix 2: Detailed Procedure for Sinusoidal Regression

Application to the Logistic Distribution (Three Parameters)

Application to the Logistic Distribution (Four Parameters)

Mixed Linear and Sinusoidal Regression

Generalized Sinusoidal Regression

Damped Sinusoidal Regression

Double Exponential Regression & Double Power Regression

Multivariate Regression

Review of the Linear Case

Non-Linear Case