Mixed Linear Regression (MLR)

MLR is a simple generalization of linear regression. In MLR, each sample belongs to one of $K$ unknown linear models, and it is not known to which one. MLR can thus be viewed as a combination of linear regression and clustering. A book reference can be found in Bishop (2006), Pattern Recognition, Chapter 14.

Mix-IRLS

This repository contains MATLAB and Python implementations of the Mix-IRLS algorithm, described in this arXiv preprint. The key idea that distinguishes Mix-IRLS from most other methods is sequential recovery: rather than recoverying all the $K$ models simultaneously, Mix-IRLS recovers them one by one, using tools from robust regression. Specifically, it uses iteratively reweighted least squares (IRLS) starting from a random initialization. Importantly, in its internediate steps, Mix-IRLS allows an "I don't know" assignment to low-confidence samples, namely samples whose model identity is uncertain. These samples are ignored, and only used in a later refinement phase. See an illustration of Mix-IRLS in the figure below.

Mix-IRLS deals well with balanced mixtures, where the proportions of the $K$ components are approximately equal; however, it is particulary effective for imbalanced mixtures. In addition, Mix-IRLS can handle outliers in the data, and overestimated/unknown number of components $K$.

Usage

Simple demos demonstrating the usage of the algorithms are available, see MLR_demo.m and MLR_demo.py. The simulation depends on the following configuration parameters:

d: problem dimension
n_values: list of sample sizes
K: true number of components
distrib: mixture proportions
noise_level: Gaussian noise standard deviation
overparam: overparameterization, such that the algorithm gets K+overparam as input
corrupt_frac: fraction of outliers

Mix-IRLS depends on the following parameters, described in the manuscript:

rho $\geq 1$: oversampling ratio. Default value: $1$ for synthetic data, $2$ for real data
$0 <$ w_th_init $< 1$: weight threshold initialization. Default value: $0.01$
nu $> 0$: tuning parameter. Default value: $0.5$ for synthetic data, $1$ for real data
$0 <$ corrupt_frac $< 1$: outlier fraction estimate. Default value: $0$
unknownK: true if $K$ is unknown, false otherwise. Default value: false
tol: stopping criterion tolerance. Default value: $\min(1, \max(1, 0.01 \cdot \text{noise-level}, 2\epsilon_\text{machine-precision}))$

In addition, Mix-IRLS gets as input an initialization for the regression vectors. By default, this initialization is random. In case the parameter unknownK is false, namely $K$ is given to the algorithm as input, Mix-IRLS sets $K$ to the number of the initialization vectors.

Citation

If you refer to the Mix-IRLS method or the manuscript, please cite them as:

@article{zilber2023imbalanced,
  title={Imbalanced Mixed Linear Regression},
  author={Zilber, Pini and Nadler, Boaz},
  journal={arXiv preprint arXiv:2301.12559},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
MATLAB		MATLAB
Python		Python
LICENSE		LICENSE
MixIRLS_illustration.png		MixIRLS_illustration.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mixed Linear Regression (MLR)

Mix-IRLS

Usage

Citation

About

Releases

Packages

Languages

License

pizilber/MLR

Folders and files

Latest commit

History

Repository files navigation

Mixed Linear Regression (MLR)

Mix-IRLS

Usage

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages