ACLS_python

Adaptive Capped Least Squares (python package)

Description

This package includes the randomized gradient descent method method applied to minimize the adaptive capped least squares loss.

Suppose we observe data vectors (x_i,y_i) that follow a linear model y_i=x_i^Tβ+ε_i, i=1,...n, where y_i is a univariate response, x_i is a d-dimensional predictor, β denotes the vector of regression coefficients, and ε_i is a random error. We propose the adpative capped least squares loss, ℓ(x)=x²/2 if |x| ≤ τ; τ²/2, if |x| > τ, where τ=τ(n) > 0 is referred to as the adaptive capped least squares parameter. The proposed methods are applied to find β that minimizes L(β)= n^-1 ∑ ℓ(y_i-x_i^T β ).

Installation

Install ACLS_python from GitHub:

pip install git+https://github.com/rruimao/ACLS_python.git

Function

RGD: Randomized gradient descent method.

Examples

We present two examples: random generated data with y-outliers and random generated data with x-outliers and y-outliers.

First example: random generated data with y-outliers

we generate contaminated random errors ε_i from a mixture of normal distribution 0.9N(0,1)+0.1N(10,1) and x_i's are independently and identically distributed (i.i.d.) from N(0,Σ) where Σ=0.5^|j-k|. We set β^* =(0,3,4,1,2,0)^T to generate y_i. We provide one example of this type, "ex_1.csv", and it can be downloaded from example file.

We randomly generate 10 initials β^* ~ Unif(B₂(τ)), where Unif(B₂(τ)) is a uniform distribution on the l₂-ball B₂(τ)={x: ||x||₂ ≤ τ }. This method finds the initial that provides the smallest adaptive capped least squares loss.

from ACLS.RGD_bindings import RGD
import numpy as np
import pandas as pd
df = pd.read_csv('ex_1.csv')
Y=df['Y'].to_numpy()
X=df[['Intercept','X1','X2','X3','X4','X5']].to_numpy()
n=50
iter=10
eta_0=1e-3
alpha=2
tau=np.sqrt(n)/np.log(np.log(n))
beta_1=RGD(X,Y,tau,iter,eta_0,alpha)

Second example: random generated data with x-outliers and y-outliers

we generate contaminated random errors ε_i from a mixture of normal distribution 0.9N(0,1)+0.1N(10,1) and x_i's are independently and identically distributed (i.i.d.) from N(0,Σ) where Σ=0.5^|j-k|. We then add a random perturbation vector z_i ~ N(10 × 1_d-1,I_d-1) to each covariate x_i in the contaminated samples. We also use β^* =(0,3,4,1,2,0)^T and use uncontaminated x_i to generate y_i. We provide one example of this type, "ex_2.csv", and it can be downloaded from example file.

df = pd.read_csv('ex_2.csv')
Y=df['Y'].to_numpy()
X=df[['Intercept','X1','X2','X3','X4','X5']].to_numpy()
n=50
iter=10
eta_0=1e-3
alpha=2
tau=np.sqrt(n)/np.log(np.log(n))
beta_2=RGD(X,Y,tau,iter,eta_0,alpha)

Reference

Sun, Q., Mao, R. and Zhou, W.-X. Adaptive capped least squares. Paper

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
build		build
dist		dist
examples		examples
lib		lib
src		src
.DS_Store		.DS_Store
CMakeLists.txt		CMakeLists.txt
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ACLS_python

Description

Installation

Function

Examples

First example: random generated data with y-outliers

Second example: random generated data with x-outliers and y-outliers

Reference

About

Releases

Packages

Languages

rruimao/ACLS_python

Folders and files

Latest commit

History

Repository files navigation

ACLS_python

Description

Installation

Function

Examples

First example: random generated data with y-outliers

Second example: random generated data with x-outliers and y-outliers

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages