omphalos/smr

Streaming implementation of multiple regression in JavaScript
JavaScript
Switch branches/tags
Nothing to show
Latest commit 2402cd6 Nov 26, 2017
 Failed to load latest commit information. .gitignore Feb 13, 2016 .travis.yml Feb 13, 2016 .zuul.yml Feb 13, 2016 LICENSE Aug 18, 2013 README.md Feb 25, 2016 favicon.ico Feb 13, 2016 package.json Nov 26, 2017 performance.js Feb 13, 2016 smr.js Nov 26, 2017 smr.min.js Nov 26, 2017 tests.js Nov 26, 2017

smr

This is an implementation of multiple regression in JavaScript. It is mostly incremental -- you can incrementally add observations and the coefficient calculation will still be quick for lower-dimensional problems. This is particularly useful if you want to run multiple regression in real-time or over very large datasets that won't fit into memory all at once.

Quick Start

From Node.js:

``````npm install smr
node

var smr = require('smr')
``````

In the browser use browserify.

Example

``````var regression = new smr.Regression({ numX: 2, numY: 1 })

regression.push({ x: [10, 11], y: [100] })
regression.push({ x: [9, 12], y: [99] })

regression.calculateCoefficients() // Returns [[4.29], [5.29]]

regression.push({ x: [8, 15], y: [80] })
regression.calculateCoefficients() // Returns [[-0.16], [10.55]]
regression.hypothesize({ x: [1, 2] }) // Returns [20.93]
``````

Formula

To calculate multiple regression, we use the following formula:

``````(X' * X) ^ -1 * X' * Y
``````

Where X is a matrix of independent variables, X' is its transpose, Y is a matrix of dependent variables, and ^ -1 indicates taking the pseudoinverse.

Mechanics

Internally, we incrementally calculate the two matrix products, X' * X and X' * Y, as new observations are added. Whenever you request the coefficients, either through calculateCoefficients() or indirectly through hypothesize(), the library will find the pseudoinverse of the readily-available X' * X and multiply this by the readily-available X' * Y.

Tests

``````git clone https://github.com/omphalos/smr
cd smr
npm install
``````

Then you can run unit tests with:

``````npm test
``````

You can run a simple performance test with:

``````node ./performance.js 500
``````

This will show the performance with a harder (500-dimensional) problem. The bottleneck with higher-dimensional problems is the pseudoinverse calculation, which is something like N^3. As an example, on a test machine, 500 dimensions takes over 11 seconds, whereas a 200-dimensional problems takes ~100 milliseconds.