Skip to content

maxgfr/regressio

Repository files navigation

regressio

Zero-dependency TypeScript regression, classification & statistics library with full statistical outputs, diagnostics, and preprocessing. Ships with an optional Rust/WASM engine for accelerated linear algebra.

Install

bun add regressio
# or
npm install regressio
# or
pnpm add regressio

Quick Start

import { LinearRegression } from 'regressio';

const model = new LinearRegression();
model.fit([1, 2, 3, 4, 5], [2.1, 3.9, 6.2, 7.8, 10.1]);

console.log(model.coefficients);  // [2.02]
console.log(model.intercept);     // 0.06
console.log(model.predict([6]));  // [12.18]
console.log(model.summary());     // R-style formatted summary table

Models

Regression

Model Class What it does
OLS LinearRegression Fits a linear relationship between features and target using Ordinary Least Squares solved via QR decomposition. The foundational regression method.
Polynomial PolynomialRegression Fits non-linear curves by expanding a single feature into polynomial terms (x, x², x³, ...) then applying OLS.
Ridge (L2) RidgeRegression Adds an L2 penalty (sum of squared coefficients) to OLS to handle multicollinearity and prevent overfitting. Shrinks coefficients toward zero but never exactly to zero.
Lasso (L1) LassoRegression Adds an L1 penalty (sum of absolute coefficients) via coordinate descent. Forces some coefficients to exactly zero, performing automatic feature selection.
Elastic Net ElasticNet Combines L1 and L2 penalties. Balances Lasso's feature selection with Ridge's stability for correlated features.
WLS WeightedRegression Weighted Least Squares. Assigns different importance to each observation. Useful when some data points are more reliable than others.
Robust RobustRegression Resistant to outliers. Uses Iteratively Reweighted Least Squares (IRLS) with Huber or Tukey bisquare M-estimators to downweight extreme values.

Classification

Model Class What it does
Logistic LogisticRegression Binary classification (0/1). Models the probability of class membership using a sigmoid function, fitted via Newton-Raphson/IRLS.
Multiclass Logistic MulticlassLogisticRegression Extends logistic regression to K classes using softmax. Fitted via gradient descent on the cross-entropy loss.
K-Nearest Neighbors KNearestNeighbors Non-parametric method. Predicts by majority vote (classification) or mean (regression) of the k closest training points. Supports Euclidean and Manhattan distances.

Neural Network

Model Class What it does
Feedforward NN NeuralNetwork Multi-layer perceptron with backpropagation. Configurable hidden layers, activations (relu, sigmoid, tanh, softmax), and learning rate. Supports both regression and classification tasks.

Usage

import {
  LinearRegression,
  PolynomialRegression,
  RidgeRegression,
  LassoRegression,
  ElasticNet,
  WeightedRegression,
  RobustRegression,
  LogisticRegression,
  MulticlassLogisticRegression,
  KNearestNeighbors,
  NeuralNetwork,
} from 'regressio';

// --- Regression ---

// OLS: multiple regression
const ols = new LinearRegression();
ols.fit([[1, 2], [3, 4], [5, 6]], [10, 22, 34]);

// Polynomial: fit a cubic curve
const poly = new PolynomialRegression({ degree: 3 });
poly.fit([1, 2, 3, 4, 5], [1, 8, 27, 64, 125]);

// Ridge: regularized regression for correlated features
const ridge = new RidgeRegression({ alpha: 0.5 });
ridge.fit(X, y);

// Lasso: automatic feature selection
const lasso = new LassoRegression({ alpha: 0.1 });
lasso.fit(X, y);
// Some coefficients will be exactly 0

// Elastic Net: mix of L1 and L2
const enet = new ElasticNet({ alpha: 0.1, l1Ratio: 0.5 });
enet.fit(X, y);

// Weighted Least Squares: different reliability per observation
const wls = new WeightedRegression();
wls.fit(X, y, weights);

// Robust: resistant to outliers
const robust = new RobustRegression({ method: 'huber' });
robust.fit(X, y);

// --- Classification ---

// Binary logistic regression
const logit = new LogisticRegression();
logit.fit(X, y); // y must be 0/1
logit.predictProbability(Xnew); // [0.12, 0.87, ...]

// Multiclass logistic regression (softmax)
const multi = new MulticlassLogisticRegression({ learningRate: 0.05 });
multi.fit(X, y); // y = 0, 1, 2, ...
multi.predictProbability(Xnew); // [[0.7, 0.2, 0.1], ...]

// K-Nearest Neighbors (classification or regression)
const knn = new KNearestNeighbors({ k: 5, mode: 'classification' });
knn.fit(X, y);
knn.predict(Xnew);

// --- Neural Network ---

// Regression with a neural network
const nn = new NeuralNetwork({
  layers: [
    { units: 16, activation: 'relu' },
    { units: 8, activation: 'relu' },
  ],
  learningRate: 0.01,
  epochs: 200,
  task: 'regression',
});
nn.fit(X, y);
nn.predict(Xnew);

// Classification with a neural network
const clf = new NeuralNetwork({
  layers: [{ units: 10, activation: 'sigmoid' }],
  learningRate: 0.1,
  epochs: 100,
  task: 'classification',
});
clf.fit(X, y); // y = 0, 1, 2, ...
clf.predict(Xnew);

Statistical Outputs

Every linear model (OLS, Ridge, Lasso, Elastic Net, WLS, Robust, Polynomial) provides statistics() and summary():

const stats = model.statistics();
// {
//   rSquared,              -- proportion of variance explained (0 to 1)
//   adjustedRSquared,      -- R² penalized for number of predictors
//   standardErrors,        -- uncertainty of each coefficient estimate
//   tStatistics,           -- coefficient / standard error for each predictor
//   pValues,               -- probability of observing the t-stat under H0 (no effect)
//   confidenceIntervals,   -- 95% confidence range for each coefficient
//   fStatistic,            -- overall model significance test
//   fPValue,               -- p-value for the F-test
//   residualStandardError, -- estimated standard deviation of residuals
//   aic,                   -- Akaike Information Criterion (lower = better fit/complexity trade-off)
//   bic,                   -- Bayesian Information Criterion (stronger complexity penalty than AIC)
//   degreesOfFreedom,      -- n - k (observations minus parameters)
//   nObservations,         -- number of data points
// }

console.log(model.summary());
// Coefficients:
//                 Estimate    Std. Error  t value   Pr(>|t|)
// (Intercept)     0.0600      0.1200      0.50      0.6300
// x1              2.0200      0.0400      50.20     0.0000 ***
// ---
// Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Binary logistic regression provides classification metrics:

const stats = logit.statistics();
// { accuracy, precision, recall, f1Score, confusionMatrix,
//   pseudoRSquared, logLikelihood, aic, bic }

Multiclass logistic regression provides per-class metrics:

const stats = multi.statistics();
// { accuracy, precision (per class), recall (per class),
//   nClasses, logLikelihood }

Diagnostics

Functions to validate model assumptions and detect problems.

Function What it does
residualDiagnostics(X, y, yHat) Returns raw residuals, studentized residuals, Cook's distance, and leverage for each observation.
studentizedResiduals(X, y, yHat) Residuals scaled by their estimated standard deviation. Values > 2-3 suggest outliers.
cooksDistance(X, y, yHat) Measures how much each observation influences the fitted model. Values > 4/n flag influential points.
leverage(X) Hat matrix diagonal. Measures how far each observation's features are from the center. High leverage = unusual feature values.
durbinWatson(residuals) Tests for autocorrelation in residuals. Returns statistic in [0,4]: ~2 = no autocorrelation, <2 = positive, >2 = negative. Critical for time series.
breuschPagan(X, residuals) Tests for heteroscedasticity (non-constant variance). Low p-value = variance depends on X, meaning standard errors are unreliable.
shapiroWilk(data) Tests whether data follows a normal distribution. Low p-value = non-normal. Important because p-values and CIs assume normal residuals.
vif(X) Variance Inflation Factor for each feature. VIF > 10 signals multicollinearity (features are too correlated).
correlationMatrix(X) Pairwise Pearson correlation matrix. Pairs with
conditionNumber(X) Ratio of largest to smallest singular value of X. Values > 30 signal numerical instability from multicollinearity.
import {
  residualDiagnostics, leverage, cooksDistance, studentizedResiduals,
  durbinWatson, breuschPagan, shapiroWilk,
  vif, correlationMatrix, conditionNumber,
} from 'regressio';

const diag = residualDiagnostics(X, y, yHat);
const dw = durbinWatson(model.residuals());
const bp = breuschPagan(X, model.residuals());
const sw = shapiroWilk(model.residuals());
const vifs = vif(X);
const corr = correlationMatrix(X);
const kappa = conditionNumber(X);

Preprocessing

Functions to prepare data before fitting models.

Function What it does
standardize(X) Z-score normalization: transforms each feature to mean=0, std=1. Essential for Lasso/Ridge/Elastic Net and neural networks.
unstandardize(X, params) Reverses standardization back to the original scale.
normalize(X) Min-max scaling: transforms each feature to [0, 1] range.
unnormalize(X, params) Reverses normalization back to the original scale.
oneHotEncode(column, categories?, dropFirst?) Converts categorical values to binary columns. Use dropFirst=true to avoid the multicollinearity trap.
polynomialFeatures(X, degree) Generates polynomial terms (x, x², x³, ...) for each feature. Use with LinearRegression for polynomial fitting with multiple features.
interactionFeatures(X, pairs?) Generates interaction terms (xi * xj) for all or specified feature pairs.
dropMissing(X, y?) Removes rows containing NaN or null values.
imputeMean(X) Replaces NaN values with the column mean.
imputeMedian(X) Replaces NaN values with the column median. More robust to outliers than mean imputation.
import {
  standardize, unstandardize, normalize, unnormalize,
  oneHotEncode, polynomialFeatures, interactionFeatures,
  dropMissing, imputeMean, imputeMedian,
} from 'regressio';

const { transformed, means, stds } = standardize(X);
const original = unstandardize(transformed, { means, stds });
const { transformed: normed, mins, maxs } = normalize(X);
const dummies = oneHotEncode(['cat', 'dog', 'cat'], undefined, true);
const polyX = polynomialFeatures(X, 3);
const interX = interactionFeatures(X);
const clean = dropMissing(X, y);
const imputed = imputeMean(X);

Prediction Intervals

Functions to quantify prediction uncertainty.

Function What it does
confidenceInterval(X, y, yHat, newX, newYHat) Confidence interval on the mean prediction. Answers: "where is the true regression line?" Narrower near the center of the training data.
predictionInterval(X, y, yHat, newX, newYHat) Prediction interval for a new individual observation. Always wider than the confidence interval because it includes observation noise.
bootstrapCoefficients(X, y, nBootstrap?) Non-parametric bootstrap: resamples data with replacement, refits the model many times, and returns empirical confidence intervals on coefficients. No distributional assumptions.
import { confidenceInterval, predictionInterval, bootstrapCoefficients } from 'regressio';

const ci = confidenceInterval(X, y, yHat, newX, newYHat);
// [{ predicted, lower, upper }, ...]

const pi = predictionInterval(X, y, yHat, newX, newYHat);
// Always wider than ci

const boot = bootstrapCoefficients(X, y, 1000);
// { coefficients, confidenceIntervals, standardErrors }

Advanced: Matrix Class

Low-level matrix operations for advanced users. Backed by Float64Array in row-major order.

import { Matrix } from 'regressio';

const A = Matrix.fromArray([[1, 2], [3, 4]]);
const B = Matrix.identity(2);
const C = A.multiply(B);
console.log(C.determinant());  // -2
console.log(C.trace());        // 5
console.log(C.transpose().toArray());

WASM Acceleration

regressio ships with a pre-compiled Rust/WASM engine that activates automatically — no configuration needed. When the WASM binary is available, heavy computations are dispatched to compiled Rust code for significantly faster execution.

Accelerated operations:

  • Matrix: multiply, transpose, add, subtract, scale, dot product, norm, determinant
  • Decompositions: QR, Cholesky, SVD, eigenvalues (tridiagonal QL)
  • Solvers: forward/back substitution
  • Models: Lasso/Elastic Net coordinate descent, logistic regression IRLS, softmax, KNN distance matrices
  • Diagnostics: correlation matrix, VIF (via correlation matrix inverse)
  • Predictions: bootstrap OLS (1000+ resamples in a single WASM call)

If WASM is unavailable (e.g. unsupported runtime), all operations fall back silently to pure TypeScript.

import { isWasmActive } from 'regressio';

console.log(isWasmActive()); // true if WASM loaded

// Everything just works — WASM is used transparently
const model = new LinearRegression();
model.fit(X, y); // QR decomposition runs in Rust

Rebuilding WASM

The pre-built WASM binary is included in the package. To rebuild from Rust source (requires Rust with wasm32-unknown-unknown target):

bun run build:wasm

License

MIT

About

Zero-dependency TypeScript regression, classification & statistics library. OLS, Ridge, Lasso, Elastic Net, Logistic, KNN, Neural Network + diagnostics + preprocessing with a Rust/WASM engine.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors