# Dual-sPLS

Dual-sPLS implements a modified version of sPLS, providing a more intuitive way to decide how much information we want to keep with a shrinking ratio ("replacing" $\lambda$ in sPLS)

## Theory

#### Dual Norm: Definition

According to the paper:

Definition 3.1: Dual Norm

Let $\Omega (.)$ be a norm on $\mathbb{R}^p$. For any $z \in \mathbb{R}^p$, the associated dual norm, denoted $\Omega^*(.)$, is defined as

$$
\Omega^*(.) = max_w (z^Tw) \quad s.t. \quad \Omega(w) = 1 \quad (21)
$$

#### Generalizing sPLS to many other regularization

Taking the expression of the regularization problem for PLS:

$$
max_w (y^TXw) \quad s.t. \quad ||w||_2 = 1
$$

We can generalize it to any norm

$$
max_w (y^TXw) \quad s.t. \quad \Omega(w) = 1
$$

And get the expression for $\hat{w}$

$$
\hat{w} = argmin_w (-z^Tw) \quad s.t. \quad \Omega(w) = 1
$$

The method becomes powerful because we can put any norm in $\Omega$. For example, we can have a lasso penalization as in sPLS and find the same result, but also combination of norms, with for example the first proposition made by the paper: pseudo-lasso:

$$
\Omega(w) = \lambda ||w||_1 + ||w||_2
$$

which will be used to illustrate the method in this notebook.

We can then apply our Lagrangian method:

$$
\mathcal{L}(w) = -z^Tw + \mu(\Omega(w) - 1) \quad ; \quad \mu > 0 ^*
$$

\* Usually not the case for an equality constraint, but here we want the constraint to be active

With a very similar reasoning than we have in sPSL (see sPLS.ipynb), we get:

$$
\nabla \Omega(w) = \frac{w}{\mu}
$$

to solve the non-differentiability issues 

($u_i = +1 \quad if \quad  w_i > 0$; $u_i = -1 \quad if \quad  w_i < 0$; $u_i \in [-1, +1] \quad if \quad  w_i = 0$)

Which gives us the same soft-thresholding than seen in sPLS. 

___

Let's find the right expression for pseudo-lasso as we will need it now:

$$
\nabla \Omega(w) = \lambda \delta + \frac{w}{||w||_2}
$$

where $\delta = sign(w) = sign(z)$, see sPLS

$$
\nabla \Omega(w) = \frac{z}{\mu} = \lambda \delta + \frac{w}{||w||_2} \iff \frac{w}{||w||_2} = \frac{z}{\mu} - \lambda \delta \Rightarrow \frac{w_p}{||w||_2} = \frac{1}{\mu}\delta_p(|z_p| - \nu)_+
$$

where $\nu = \lambda \mu$

Then, we can decide to keep $\xi \%$ of the most important values, and find the right value for $\nu$ by computing the quantile in $z$ for $\xi$. 

<div style="text-align: center;">
    <img src="assets/dualsplsfig1.png" alt="Description" width="400" />
</div>


But we cannot simply keep zeros in $z$ after the soft-thresholding; we still need to respect $\Omega(w) = 1$. "To guarantee the unit norm property for $w$, we set $\mu = ||z_\nu||_2$ where $z_\nu$ is the vector of coordinates $\delta_p(|z_p| - \nu)_+$ for $p\in \{ 1, ..., P\}$. Consequently,

$$
w = \frac{\mu}{\nu||z_\nu||_1 + ||z_\nu||_2^2}z_\nu
$$

The rationale behind constrainting the direction $w$ instead of the regression coeﬃcients $\hat{\beta}$ is their collinearity. Indeed, the estimator writes
$\hat{\beta} = W(T^TT)^{−1}T^Ty$. Being collinear, soft-thresholding $w$ performs a variable selection at the same location in $\hat{\beta}$ coordinates."

# Implementation

In [16]:
import numpy as np

### Utils:

def soft_thresholding(z, nu):
    """
    Modified to use nu
    """
    sign_z = np.sign(z)
    abs_z_shifted = np.maximum(np.abs(z) - nu, 0)
    z_nu = sign_z * abs_z_shifted
    
    # On remet le signe et on retourne
    return z_nu

def center_matrix(M):
    """Center matrix computing means on columns

    Args:
        M (np.array): 2D-matrix

    Returns:
        np.array: centered 2D-matrix
    """
    means = np.mean(M, axis=0)
    return M - means, means

In [17]:
def base_dual_spls_lasso(E, F, ppnu=0.8):
    """
    Modified version of the previous base_sPLS (and base_PLS) functions

    ppnu: Xi in the explanations, percentage
    """
    F = F.reshape(-1, 1)

    z = np.transpose(E) @ F  

    #### Modification 1: compute the adaptative nu
    nu = np.quantile(np.abs(z), ppnu)

    #### Modification 2: different soft-thresholding
    z_nu = soft_thresholding(z, nu)

    #### Modification 3: Find paramters 
    z_nu_1 = np.linalg.norm(z_nu, 1)
    z_nu_2 = np.linalg.norm(z_nu, 2)
    
    mu=z_nu_2 
    _lambda = nu/mu

    #### Compute w
    scaling_factor = mu / (nu * z_nu_1 + mu**2) # Scaling factor, see theory
    w = scaling_factor*z_nu

    #### Compute t, same as sPLS
    t = E @ w

    # + Modification 4: Normalize t instead of w and c 
    norm_t = np.linalg.norm(t)
    if norm_t > 1e-10:
        t = t / norm_t
    else:
        t = np.zeros_like(t)

    # c is not used by the R package

    return w, t, _lambda

In [18]:
def dual_spls_lasso(X, y, n_components=3, ppnu=0.8):
    #### Center data
    E, F = X.copy(), y.copy()
    E, E_mean = center_matrix(E)
    F, F_mean = center_matrix(F)

    if F.ndim == 1:
        F = F.reshape(-1, 1)

    #### Initializations
    N, p = X.shape[0], X.shape[1] # nbr of observations, nbr of variables
    WW = np.zeros((p, n_components)) # W: X weights
    TT = np.zeros((N, n_components)) # T: X scores
    listeLambda = np.zeros((n_components))
    Bhat = np.zeros((p, n_components)) # Matrix to store Beta for each n_components step
    intercept = np.zeros(n_components)
    RES = np.zeros((N, n_components)) 
    zerovar = np.zeros(n_components, dtype=int)
    YY_pred = np.zeros((N, n_components)) # Fitted values
    ind_diff0 = {} 

    Ec = E.copy()

    for k in range(n_components):
        # Step 1: base dual-spls:
        w, t, _lambda = base_dual_spls_lasso(E, F, ppnu=ppnu)

        # Store results
        WW[:, k], TT[:, k] = w.reshape(-1), t.reshape(-1)
        listeLambda[k] = _lambda

        # Deflate E: 
        E = E - t @ (t.T @ E)


        W_k = WW[:, :k+1]
        T_k = TT[:, :k+1]
        L = np.transpose(T_k) @ Ec @ W_k # "backsolve"
        L = np.triu(L) # "R[row>col]=0"

        try:
            L_inv = np.linalg.inv(L)
        except:
            L_inv = np.linalg.pinv(L)
        
        bk = W_k @ L_inv @ T_k.T @ F
        bk_flat = bk.flatten() 
        Bhat[:, k] = bk_flat

        intercept[k] = (F_mean - E_mean @ bk).item()

        # Zero variables : count almost zero coefficients
        is_zero = np.isclose(bk_flat, 0)
        zerovar[k] = np.sum(is_zero)

        # non-zero indices 
        indices_non_zero = np.where(~is_zero)[0]
        ind_diff0[f"in.diff0_{k+1}"] = indices_non_zero.tolist()

        # Predictions (Fitted Values) 
        # Y_hat = X * beta + intercept
        pred_k = (X @ bk_flat) + intercept[k]
        YY_pred[:, k] = pred_k

        # Residuals
        RES[:, k] = y.flatten() - pred_k

    return {
        "Xmean": E_mean,
        "scores": TT,
        "loadings": WW,
        "Bhat": Bhat,
        "intercept": intercept,
        "fitted_values": YY_pred,
        "residuals": RES,
        "lambda": listeLambda,
        "zerovar": zerovar,
        "ind_diff0": ind_diff0,
        "type": "lasso"
    }

## Quick test

*author: @gemini*

In [20]:
import numpy as np

def test_dual_spls_routine():
    # 1. Simulation Setup
    n_obs = 50
    n_vars = 100
    n_comp = 3

    print(f"Test Initialization: n={n_obs}, p={n_vars}, n_components={n_comp}")

    # 2. Random Data Generation
    np.random.seed(42)
    X_sim = np.random.normal(0, 1, size=(n_obs, n_vars))

    # Create a target vector correlated to strictly the first 5 variables
    beta_true = np.zeros(n_vars)
    beta_true[:5] = [2.5, -1.5, 3.0, 0.5, -2.0]
    y_sim = X_sim @ beta_true + np.random.normal(0, 0.1, size=n_obs)

    # 3. Algorithm Execution
    try:
        # High ppnu to enforce sparsity (variable selection)
        res = dual_spls_lasso(X_sim, y_sim, n_components=n_comp, ppnu=0.9)
    except Exception as e:
        print(f"Critical execution error: {e}")
        return

    # 4. Dimensionality Checks (Assertions)
    try:
        # Verify return type
        assert isinstance(res, dict), "Return format must be a dictionary."

        # Verify Beta coefficients (p, n_comp)
        assert res['Bhat'].shape == (n_vars, n_comp), \
            f"Incorrect dimension for Bhat. Expected {(n_vars, n_comp)}, received {res['Bhat'].shape}"

        # Verify Scores (n, n_comp)
        assert res['scores'].shape == (n_obs, n_comp), \
            f"Incorrect dimension for scores. Expected {(n_obs, n_comp)}, received {res['scores'].shape}"

        # Verify Fitted Values (n, n_comp)
        assert res['fitted_values'].shape == (n_obs, n_comp), \
            f"Incorrect dimension for fitted_values. Expected {(n_obs, n_comp)}, received {res['fitted_values'].shape}"

        print(">> Dimension validation: SUCCESS")

        # 5. Functional Check (Sparsity)
        nb_zeros = res['zerovar']
        print(f">> Number of zero coefficients per component: {nb_zeros}")

        if np.any(nb_zeros > 0):
            print(">> Functional validation: Lasso successfully performed variable selection.")
        else:
            print(">> Warning: No variables selected (Check ppnu or normalization).")

    except AssertionError as error:
        print(f"Validation test failure: {error}")

# Run the routine
test_dual_spls_routine()


Test Initialization: n=50, p=100, n_components=3
>> Dimension validation: SUCCESS
>> Number of zero coefficients per component: [90 82 76]
>> Functional validation: Lasso successfully performed variable selection.
