estOLS

Ordinary Least Squares problem, guide, and solver

Fitting a line/plane to experimental data

If you want the best-fit line or plane to a set of disordered observations, you want to solve a least squares problem. A simple example is linear regression: finding a straight line to best fit a set of (x,y) observations, as illustrated below. The unknown line is defined by two parameters: slope and intercept; and we seek the set of parameters that minimizes the squared error of the observations (y) vs. their inputs (x).

Practical problems typically consist of a large number of samples, and more than one dimension of inputs (x), or "explanatory variables." Ordinary least squares finds the curve, plane, or hyperplane that best fits the observations.

Ways to do this

Excel's linest command will solve single and multiple linear regression problems, see here.

Matlab's fit and lsqcurvefit commands do this, also see their writeup on least squares fitting.

A number of tutorials demonstrate ordinary least squares in R, like here and here and here.

For very large problems, an iterative method will likely be faster. Consider NVIDIA's cuSOLVER, or implementations of the Levenberg–Marquardt algorithm like this and this.

For moderate-sized problems, you can use this software from the command line or copy it into your own C or C++ code.

Build and run this software

On a RHEL/Fedora/CentOS workstation, you can install all prerequisites using:

sudo dnf install make gcc-c++ eigen3-devel

or on Debian/Ubuntu with:

sudo apt install build-essential
sudo apt-get install libeigen3-dev

When that's done, try the following:

git clone https://github.com/markstock/estOLS.git
cd estOLS
make
./estOLS -test 2 10

If that works, then prepare your regression matrix and vector of observations in two comma-separated value files (or use the sample files provided) and run

./estOLS -x xmat.csv -y obsv.csv > out.csv

or

./estOLS -x xmat.csv -y obsv.csv -o out.csv

The solution procedure uses the normal equations by default, but if those do not provide an answer or are not precise enough, add the -qr option to force the solution to use the QR decomposition instead.

Performance

We ran this code on a set of random inputs (both regression matrix and observations [-1..1]) in double precision using the normal equations solution method and timed the calculation on an AMD Ryzen 9 3950X. One set of problems set the number of observations to be 2 times the number of independent variables, and a second set used a factor of 10. The performance for a variety of problem sizes (number of independent variables m) appears below. Total run time seems to scale as O(m³ n^0.75).

m	n	time (sec)	n	time (sec)
100	200	0.000379	1000	0.001167
200	400	0.002203	2000	0.003802
500	1000	0.013250	5000	0.049815
1000	2000	0.098714	10000	0.397258
2000	4000	0.802492	20000	3.195111
5000	10000	18.00390	50000	55.97039
10000	20000	141.7969	100000	437.0685

How to get help

Go to Issues (just below the title bar on this page), and "create an issue."

Thanks

Thanks to the writers of Eigen3 for their excellent matrix mathematics package. The image of linear regression is from Hutcheson, G. D. (2011). Ordinary Least-Squares Regression. In L. Moutinho and G. D. Hutcheson, The SAGE Dictionary of Quantitative Management Research. Pages 224-228 [link]

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
estOLS.cpp		estOLS.cpp
obsv.csv		obsv.csv
ols_example.png		ols_example.png
perf.png		perf.png
xmat.csv		xmat.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

estOLS

Fitting a line/plane to experimental data

Ways to do this

Build and run this software

Performance

How to get help

Thanks

About

Releases

Packages

Languages

License

markstock/estOLS

Folders and files

Latest commit

History

Repository files navigation

estOLS

Fitting a line/plane to experimental data

Ways to do this

Build and run this software

Performance

How to get help

Thanks

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages