# Longley's Economic Regression Data

To demonstrate multiple linear regression, we're going to use the [`longley`](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/longley.html) dataset from the R [`datasets`](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html) package. It is a macroeconomic dataset which provides a well-known example for a highly collinear regression. For convenience, a copy of this dataset is provided at http://cobweb.cs.uga.edu/~mec/longley.csv. First, let's load in the data using a [`Relation`](http://cobweb.cs.uga.edu/~jam/scalation_1.3/scalation_mathstat/target/scala-2.12/api/scalation/relalgebra/Relation$.html) to see what's available:

In [2]:
import scalation.columnar_db._
val url = "http://cobweb.cs.uga.edu/~mec/longley.csv"
val rel = Relation(url, "longley", "SDDDDDDD", 0, ",")
rel.show()

import scalation.columnar_db._
url: String = http://cobweb.cs.uga.edu/~mec/longley.csv
0
rel: scalation.columnar_db.Relation =
Relation(longley, 0,
WrappedArray(id, GNP.deflator, GNP, Unemployed, Armed.Forces, Population, Year, Employed),
VectorS(1947,	1948,	1949,	1950,	1951,	1952,	1953,	1954,	1955,	1956,	1957,	1958,	1959,	1960,	1961,	1962)
VectorD(83.0000,	88.5000,	88.2000,	89.5000,	96.2000,	98.1000,	99.0000,	100.000,	101.200,	104.600,	108.400,	110.800,	112.600,	114.200,	115.700,	116.900)
VectorD(234.289,	259.426,	258.054,	284.599,	328.975,	346.999,	365.385,	363.112,	397.469,	419.180,	442.769,	444.546,	482.704,	502.601,	518.173,	554.894)
VectorD(235.600,	232.500,	368.200,	335.100,	209.900,	193.200,	187.000,	357.800,	290.400,	282.200,	293.600,	468.100,	381.300,	393.100,	480.600,	400.700)
VectorD(159.000,	145.600,	161.600,	165.000,	309.900,	359.400,	354.700,	335.000,	304.800...
|-------------------------------------------------------------------------------------------------------------

Suppose we want to model `Employed` using the other variables in a multiple linear regression. We first need to create the design matrix `x` and response vector `y` from the `Relation`. Then we create and train a `Regression` model.

In [3]:
import scalation.analytics.Regression
val (x, y) = rel.toMatriDD((1 to 6).toSeq, 7)
val rg = new Regression(x, y)
rg.train()
rg.summary

import scalation.analytics.Regression
x: scalation.linalgebra.MatriD =

MatrixD(83.0000,	234.289,	235.600,	159.000,	107.608,	1947.00,
	88.5000,	259.426,	232.500,	145.600,	108.632,	1948.00,
	88.2000,	258.054,	368.200,	161.600,	109.773,	1949.00,
	89.5000,	284.599,	335.100,	165.000,	110.929,	1950.00,
	96.2000,	328.975,	209.900,	309.900,	112.075,	1951.00,
	98.1000,	346.999,	193.200,	359.400,	113.270,	1952.00,
	99.0000,	365.385,	187.000,	354.700,	115.094,	1953.00,
	100.000,	363.112,	357.800,	335.000,	116.219,	1954.00,
	101.200,	397.469,	290.400,	304.800,	117.388,	1955.00,
	104.600,	419.180,	282.200,	285.700,	118.734,	1956.00,
	108.400,	442.769,	293.600,	279.800,	120.445,	1957.00,
	110.800,	444.546,	468.100,	263.700,	121.950,	1958.00,
	112.600,	482.704,	381.300,	255.200,	123.366,	1959.00,
	114.200,	502.601,	393.100,	251.400,	125....
rg: scalation.analytics.Regression = scalation.analytics.Regression@c683cf5
res2: scalation.analytics.Regression = scalation.analytics.Regression@c683cf5
res3: S

The resulting model is known to be highly collinear, as evidenced by the large p-values in the table.

## References

* J. W. Longley (1967) An appraisal of least-squares programs from the point of view of the user. *Journal of the American Statistical Association* 62, 819–841.
* Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) *The New S Language.* Wadsworth & Brooks/Cole.