# RidgeSketch : Tutorial to solve Ridge Regression problem

    RidgeSketch package
    Authors : Nidham Gazagnadou, Robert Gower, Mark Ibrahim
   
The aim of this project is to provide an open source package in Python for solving large scale ridge regression using the sketch-and-project technique.

This tutorial gives an overview of :
- loading data
- how to set up a problem
- select a sketching method
- solve the ridge sketch problem

### Table of content

[0. Prerequisites](#prerequisites)<br>
[1. Load data](#load_data)<br>
[2. Build and fit model](#model)<br>

<a id='prerequisites'></a>
## 0. Prerequisites

In [10]:
import sys
sys.path.append("../")

In [11]:
# Make sure the `ridgesketch-env` environment is activated
# and that all requirements are installed with
# $pip install -r requirements.txt

In [12]:
import numpy as np

from datasets.data_loaders import BostonDataset, CaliforniaHousingDataset, Rcv1Dataset
from ridge_sketch import RidgeSketch
from kernel_ridge_sketch import KernelRidgeSketch

<a id='load_data'></a>
## 1. Load data

In [13]:
np.random.seed(0)

# Generating a dataset
n_samples, n_features = 1000, 500
X = np.random.rand(n_samples, n_features)
y = np.random.rand(n_samples, 1) # Warning: y should be of size (n_samples, 1) not (n_samples, )
print(f"X shape {(n_samples, n_features)}, y shape {y.shape}")

X shape (1000, 500), y shape (1000, 1)


In [14]:
# One can also load regression data using '.load_X_y()' method of the desired dataset
# dataset = CaliforniaHousingDataset()
# X, y = dataset.load_X_y()
# n_samples, n_features = X.shape

<a id='model'></a>
## 2. Build and fit model

User can select the settings of the ridge regression problem and of the solver he wants to use.

In [15]:
# Regularization parameter of the ridge problem
alpha = 1e-1

# Choose an method through the `algo_mode` variable
# - Vanilla Ridge Sketch: algo_mode = "auto"
# - Ridge Sketch wigth momentum: algo_mode = "mom" 
#   (then `eta_mom` or `beta_mom` and `step_size` parameters can be set)
# - Accelearated Ridge Sketch: algo_mode = "accel"
#   (then `accel_mu` and `accel_nu` parameters can be set)
algo_mode = "mom"

# Choose a solver from the following list
# SKETCH_SOLVERS = {"subsample", "coordinate descent", "gaussian", "count", "subcount", "hadamard", "direct", "cg",}
solver = "subsample"
# Warning: `direct` and `cg` solvers are not available with momentum and acceleration

# Size of the sketched matrix SA
sketch_size = 10 # should be smaller than min(n_samples, n_features)

# Build or not the ridge matrix A
operator_mode = False # Warning: not all solver are available for operator mode

model = RidgeSketch(
            alpha=alpha,
            algo_mode=algo_mode,
            solver=solver,
            sketch_size=sketch_size,
            operator_mode=operator_mode,
            verbose=1,
        )

# Solve the ridge regression problem and fit the model on the data
model.fit(X, y)


iter:0 | res : 5.63e+03
iter:50 | res : 3.81e+01
iter:100 | res : 2.45e+01
iter:150 | res : 1.89e+01
iter:200 | res : 1.41e+01
iter:250 | res : 1.11e+01
iter:300 | res : 8.92e+00
iter:350 | res : 6.60e+00
Tolerance (1.00e-03) in 391 iterations, residual norm = 9.98e-04


In [16]:
# For Kernel ridge regression run

# Select the kernel from {"RBF", "Matern"}
kernel = "RBF" # Radial Basis Function (RBF) kernel

model_kernel = KernelRidgeSketch(
                   solver=solver,
                   alpha=alpha,
                   sketch_size=sketch_size,
                   kernel=kernel,
                   verbose=1,
               )

# Solve the kernel ridge regression problem and fit the model on the data
model_kernel.fit(X, y)