Adaptive Capped Least Squares (python package)
This package includes the randomized gradient descent method method applied to minimize the adaptive capped least squares loss.
Suppose we observe data vectors (xi,yi) that follow a linear model yi=xiTβ+εi, i=1,...n, where yi is a univariate response, xi is a d-dimensional predictor, β denotes the vector of regression coefficients, and εi is a random error. We propose the adpative capped least squares loss, ℓ(x)=x2/2 if |x| ≤ τ; τ2/2, if |x| > τ, where τ=τ(n) > 0 is referred to as the adaptive capped least squares parameter. The proposed methods are applied to find β that minimizes L(β)= n-1 ∑ ℓ(yi-xiT β ).
Install ACLS_python from GitHub:
pip install git+https://github.com/rruimao/ACLS_python.git
- RGD: Randomized gradient descent method.
We present two examples: random generated data with y-outliers and random generated data with x-outliers and y-outliers.
we generate contaminated random errors εi from a mixture of normal distribution 0.9N(0,1)+0.1N(10,1) and xi's are independently and identically distributed (i.i.d.) from N(0,Σ) where Σ=0.5|j-k|. We set β* =(0,3,4,1,2,0)T to generate yi. We provide one example of this type, "ex_1.csv", and it can be downloaded from example file.
We randomly generate 10 initials β* ~ Unif(B2(τ)), where Unif(B2(τ)) is a uniform distribution on the l2-ball B2(τ)={x: ||x||2 ≤ τ }. This method finds the initial that provides the smallest adaptive capped least squares loss.
from ACLS.RGD_bindings import RGD
import numpy as np
import pandas as pd
df = pd.read_csv('ex_1.csv')
Y=df['Y'].to_numpy()
X=df[['Intercept','X1','X2','X3','X4','X5']].to_numpy()
n=50
iter=10
eta_0=1e-3
alpha=2
tau=np.sqrt(n)/np.log(np.log(n))
beta_1=RGD(X,Y,tau,iter,eta_0,alpha)
we generate contaminated random errors εi from a mixture of normal distribution 0.9N(0,1)+0.1N(10,1) and xi's are independently and identically distributed (i.i.d.) from N(0,Σ) where Σ=0.5|j-k|. We then add a random perturbation vector zi ~ N(10 × 1d-1,Id-1) to each covariate xi in the contaminated samples. We also use β* =(0,3,4,1,2,0)T and use uncontaminated xi to generate yi. We provide one example of this type, "ex_2.csv", and it can be downloaded from example file.
df = pd.read_csv('ex_2.csv')
Y=df['Y'].to_numpy()
X=df[['Intercept','X1','X2','X3','X4','X5']].to_numpy()
n=50
iter=10
eta_0=1e-3
alpha=2
tau=np.sqrt(n)/np.log(np.log(n))
beta_2=RGD(X,Y,tau,iter,eta_0,alpha)
Sun, Q., Mao, R. and Zhou, W.-X. Adaptive capped least squares. Paper