<a href="https://colab.research.google.com/github/helonayala/sysid/blob/main/orthogonal_least_squares.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Orthogonal least squares

This script shows the use of OLS for generating ERR metrics.

For didatic purposes, we show the applications of OLS to a small matrix, as given in an example from Billings (2013) book: data from Table 3.1 (Example 3.3).

The matrix is given below:

| x_1 | x_2 | x_3 | x_4   | y.   |

| 9   | -5  | 5   | -1.53 | 9.08 |

| 1   | -1  | 8   | -0.39 | 7.87 |

| 2   | -5  | 6   | -3.26 | 3.01 |

| 8   | -2  | 0   | 0.36  | 5.98 |

| 0   | 0   | 9   | 0.13  | 9.05 |

The last column is the output, and x_4 is a linear combination of x_1 and x_2. The ideal model would use only 3 columns instead of 4. We show below how OLS can be used to detect redundant information.

## Functions and imports

In [11]:
import numpy as np
from numpy.linalg import pinv, solve

def MGS(P):
    """
    Performs Modified Gram-Schmidt orthogonalization on matrix P,
    as defined in Aguirre (2015).

    Args:
        P (np.ndarray): The input matrix where columns are vectors to be orthogonalized.

    Returns:
        dict: A dictionary containing:
            'Q' (np.ndarray): The orthogonalized matrix, where columns are orthogonal.
            'A' (np.ndarray): The unit upper triangular matrix of coefficients.
    """
    n_rows, n_cols = P.shape

    A = np.eye(n_cols, dtype=float) # Initialize A as an identity matrix (unit upper triangular)
    P_curr = P.astype(float)       # Working copy of P, converted to float
    Q = np.zeros_like(P_curr, dtype=float)

    # Iterate through columns to be orthogonalized
    for i in range(n_cols):
        Q[:, i] = P_curr[:, i] # The i-th orthogonal vector Q[:,i] is the current P_curr[:,i]

        # Orthogonalize subsequent columns (P_curr[:,j]) against the current orthogonal vector Q[:,i]
        # This loop applies the modification step for each subsequent column
        for j in range(i + 1, n_cols):
            # Check for zero norm to prevent division by zero for orthogonal vector Q[:,i]
            # If Q[:,i] is a zero vector, its projection onto other vectors is zero,
            # so A[i,j] remains 0 and P_curr[:,j] doesn't change from this step.
            q_i_norm_sq = Q[:, i].T @ Q[:, i]
            if q_i_norm_sq > 1e-18: # Use a small epsilon to check for non-zero norm
                # Compute coefficient A[i,j] (projection of P_curr[:,j] onto Q[:,i])
                A[i, j] = (Q[:, i].T @ P_curr[:, j]) / q_i_norm_sq
                # Subtract the projection from P_curr[:,j]
                P_curr[:, j] = P_curr[:, j] - A[i, j] * Q[:, i]
            # If q_i_norm_sq is zero, A[i,j] is already 0 (from identity init) and P_curr[:,j] remains unchanged

    return Q, A

def ols(P, Y_target):
    """
    Calculates OLS parameters, Error Reduction Ratio (ERR), and Error Sum Ratio (ESR)
    based on the specified column indices from the full predictor matrix P.

    Args:
        P_full (np.ndarray): The full original predictor matrix (e.g., P_original).
        Y_target (np.ndarray): The target vector (e.g., Y).
        p_column_indices (list or np.ndarray): A list or array of 0-based integer
                                                indices specifying which columns
                                                from P_full to use for the current P matrix.
        mgs_function (function): The Modified Gram-Schmidt function to use (e.g., MGS).

    Returns:
        tuple: A tuple containing:
            th_OLS (np.ndarray): OLS estimated parameters.
            ERR (np.ndarray): Error Reduction Ratio for each orthogonal regressor.
            ESR (float): Error Sum Ratio (total unexplained variance).
    """

    niter = P.shape[1] # Number of regressors (columns in the sliced P matrix)

    # Perform Modified Gram-Schmidt orthogonalization on the sliced P matrix.
    W, A = MGS(P)

    # Calculate 'g' coefficients in the orthogonal basis.
    g = np.zeros(niter)
    for i in range(niter):
        g[i] = (Y_target.T @ W[:, i]).item() / (W[:, i].T @ W[:, i]).item()
    g = g.reshape(-1, 1) # Reshape 'g' to a column vector

    # Calculate ERR (Error Reduction Ratio) for each regressor.
    ERR = np.zeros(niter)
    for i in range(niter):
        ERR[i] = ((Y_target.T @ W[:, i]).item()**2) / ((Y_target.T @ Y_target).item() * (W[:, i].T @ W[:, i]).item())

    # Calculate ESR (Error Sum Ratio).
    ESR = 1 - np.sum(ERR)

    # Calculate the final OLS parameters (th_OLS) from the orthogonal basis.
    th_OLS = solve(A, g)

    return th_OLS, ERR, ESR


## Read data

In [12]:

# --- Data from Table 3.1 (Billings 2013 book, Example 3.3) ---
mat_data = np.array([
    [9, -5, 5, -1.53, 9.08],
    [1, -1, 8, -0.39, 7.87],
    [2, -5, 6, -3.26, 3.01],
    [8, -2, 0, 0.36, 5.98],
    [0, 0, 9, 0.13, 9.05]
])

# Separate predictors (P) and output (Y)
P_original = mat_data[:, :4]
Y = mat_data[:, 4].reshape(-1, 1)

## Least Squares Solution

Calculate the LS solution using the pseudo-inverse (generalized inverse). This is given just after Eq. (3.22) in the book.

In [13]:
# The formula is: theta_hat = (P^T P)^-1 P^T Y, which is equivalent to pinv(P) @ Y.
th_ls = pinv(P_original) @ Y
print('OLS estimated parameters (th_ls):')
print(th_ls)

OLS estimated parameters (th_ls):
[[0.85685332]
 [0.5363392 ]
 [0.98734167]
 [0.59768453]]


## Orthogonal Least Squares (OLS) Solution

As in Table 3.2 we try to compare the solution given by this code to the content in the book.

Two-terms model (1st line in Table 3.2)


In [18]:

th_OLS, ERR, ESR = ols(P_original[:,[2, 0]], Y)

print('OLS estimated parameters (th_OLS from OLS method using MGS):')
print(th_OLS)
print('\nERR (Error Reduction Ratio) for each orthogonal regressor:')
print(ERR)
print('\nESR (Error Sum Ratio):')
print(ESR)
print("-" * 40 + "\n")



OLS estimated parameters (th_OLS from OLS method using MGS):
[[0.81935333]
 [0.60128022]]

ERR (Error Reduction Ratio) for each orthogonal regressor:
[0.77370749 0.17268374]

ESR (Error Sum Ratio):
0.053608771292348534
----------------------------------------



Three-terms model (2nd line in Table 3.2)

In [17]:

th_OLS, ERR, ESR = ols(P_original[:,[2, 0, 1]], Y)

print('OLS estimated parameters (th_OLS from OLS method using MGS):')
print(th_OLS)
print('\nERR (Error Reduction Ratio) for each orthogonal regressor:')
print(ERR)
print('\nESR (Error Sum Ratio):')
print(ESR)
print("-" * 40 + "\n")



OLS estimated parameters (th_OLS from OLS method using MGS):
[[0.99669235]
 [1.00046032]
 [0.99172293]]

ERR (Error Reduction Ratio) for each orthogonal regressor:
[0.77370749 0.17268374 0.05352266]

ESR (Error Sum Ratio):
8.611116850232303e-05
----------------------------------------



Four-terms model (3rd line in Table 3.2, includes a redundant term, with low ERR)

In [16]:
th_OLS, ERR, ESR = ols(P_original[:,[2, 0, 1, 3]], Y)

print('OLS estimated parameters (th_OLS from OLS method using MGS):')
print(th_OLS)
print('\nERR (Error Reduction Ratio) for each orthogonal regressor:')
print(ERR)
print('\nESR (Error Sum Ratio):')
print(ESR)
print("-" * 40 + "\n")


OLS estimated parameters (th_OLS from OLS method using MGS):
[[0.98734167]
 [0.85685332]
 [0.5363392 ]
 [0.59768453]]

ERR (Error Reduction Ratio) for each orthogonal regressor:
[7.73707491e-01 1.72683737e-01 5.35226601e-02 4.95380005e-06]

ESR (Error Sum Ratio, total unexplained variance):
8.115736845359933e-05
----------------------------------------



These results should be directly comparable to Table 3.2 in Billing's book.

Adding the last term did not change significantly the ESR.