In [None]:
'''
 * Copyright (c) 2018 Radhamadhab Dalai
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
'''

#Semi-Supervised and Graph Regression## Semi-Supervised and Graph Regression Representer Theorem

### Representer Theorem for Semi-Supervised Learning

The representer theorem for supervised learning (Theorem 8.4) can be extended to semi-supervised learning and graph signals.

**Theorem 8.5 (Representer Theorem for Semi-Supervised Learning)**

Given a set of $ l $ labeled examples $\{(x_i, y_i)\}_{i=1}^l$, a set of $ u $ unlabeled examples $\{x_j\}_{j=l+1}^{l+u}$, and the graph Laplacian $ L $, the minimizer of the semi-supervised/graph optimization problem

$$
f^* = \arg \min_f \left\{ \frac{1}{2} \sum_{i=1}^l V(x_i, y_i, f) + \frac{\gamma_A}{2} \|f\|_H^2 + \frac{(u-l)^2}{2} f^T L f \right\}
$$

admits an expansion

$$
f^*(x) = \sum_{i=1}^{l+u} \alpha_i K(x, x_i),
$$

where  K(x, $x_i$)  denotes the kernel function, and $ \alpha_i $ are the coefficients to be determined.

Note that when the graph Laplacian L  is the identity matrix I , Theorem 8.5 reduces to the representer theorem for semi-supervised learning. Further, if $ u = 0 $, then Theorem 8.5 reduces to Theorem 8.4 for supervised learning.

### Loss Representation

The loss function can be represented in terms of $ \alpha $ as follows:

$$
L(\alpha) = \frac{1}{2} \left\{ (y - J K \alpha)^T (y - J K \alpha) + \gamma_A \alpha^T K \alpha + \frac{(u-l)^2}{2} \alpha^T L \alpha \right\},
$$

where:

- K  is the $$(l + u) \times (l + u)$$ Gram matrix with entries $$ K_{ij} = K(x_i, x_j) $$,
- $$ y = [y_1, \dots, y_l, 0, \dots, 0]^T $$ is the (l + u)-dimensional label vector,
- J is the matrix that selects labeled examples (essentially a matrix that maps the predictions to the labeled examples).

This formulation shows that the semi-supervised learning problem can be solved by finding the optimal solution for $ \alpha $ that minimizes the given loss function.



## Semi-Supervised and Graph Regression Representer Theorem

### Representer Theorem for Semi-Supervised Learning

The representer theorem for supervised learning (Theorem 8.4) can be extended to semi-supervised learning and graph signals.

**Theorem 8.5 (Representer Theorem for Semi-Supervised Learning)**

Given a set of \( l \) labeled examples \(\{(x_i, y_i)\}_{i=1}^l\), a set of \( u \) unlabeled examples \(\{x_j\}_{j=l+1}^{l+u}\), and the graph Laplacian \( L \), the minimizer of the semi-supervised/graph optimization problem

\[
f^* = \arg \min_f \left\{ \frac{1}{2} \sum_{i=1}^l V(x_i, y_i, f) + \frac{\gamma_A}{2} \|f\|_H^2 + \frac{(u-l)^2}{2} f^T L f \right\}
\]

admits an expansion

\[
f^*(x) = \sum_{i=1}^{l+u} \alpha_i K(x, x_i),
\]

where \( K(x, x_i) \) denotes the kernel function, and \( \alpha_i \) are the coefficients to be determined.

Note that when the graph Laplacian \( L \) is the identity matrix \( I \), Theorem 8.5 reduces to the representer theorem for semi-supervised learning. Further, if \( u = 0 \), then Theorem 8.5 reduces to Theorem 8.4 for supervised learning.

### Loss Representation

The loss function can be represented in terms of \( \alpha \) as follows:

\[
L(\alpha) = \frac{1}{2} \left\{ (y - J K \alpha)^T (y - J K \alpha) + \gamma_A \alpha^T K \alpha + \frac{(u-l)^2}{2} \alpha^T L \alpha \right\},
\]

where:

- \( K \) is the \((l + u) \times (l + u)\) Gram matrix with entries \( K_{ij} = K(x_i, x_j) \),
- \( y = [y_1, \dots, y_l, 0, \dots, 0]^T \) is the \((l + u)\)-dimensional label vector,
- \( J \) is the matrix that selects labeled examples (essentially a matrix that maps the predictions to the labeled examples).

This formulation shows that the semi-supervised learning problem can be solved by finding the optimal solution for \( \alpha \) that minimizes the given loss function.


## Laplacian Regularized Least Squares (LapRLS)

### Diagonal Matrix \( \text{Diag}(1, \dots, 1, 0, \dots, 0) \)
Let \( \text{Diag}(1, \dots, 1, 0, \dots, 0) \) be an \( (l + u) \times (l + u) \) diagonal matrix with the first \( l \) diagonal entries as 1 and the remaining entries as 0.

### First-Order Optimization Condition

The first-order optimization condition can be derived from the loss function by setting the gradient with respect to \( \alpha \) to zero:

$$
\frac{\partial L(\alpha)}{\partial \alpha} = 0 \Rightarrow - (J K)^T (y - J K \alpha) + \gamma_A K^T K \alpha + \frac{\gamma_l}{(u - l)^2} K^T L K \alpha = 0.
$$

Given that $ K^T J^T = K^T $, $ J^T J = J $, and $ J y - J J K \alpha = J y - J K \alpha $, the optimization condition simplifies to:

$$
\alpha^* = \left( J K + \gamma_A l I + \frac{\gamma_l l}{(u - l)^2} L K \right)^{-1} J y.
$$

This solution is known as the Laplacian Regularized Least Squares (LapRLS) solution.

### LapRLS for Graph Supervised Regression and Classification

LapRLS can also be applied to graph supervised regression and classification. In this scenario, the number \( u \) of unlabeled samples is zero, which simplifies the matrix \( J \) to the identity matrix \( I \). Consequently, the optimization solution simplifies to:

$$
\alpha^* = \left( K + \gamma_A l I + \frac{\gamma_l l}{(u - l)^2} L K \right)^{-1} y.
$$

This is the LapRLS solution for supervised regression and classification.

### Connection to Regularized Least Squares (RLS)

LapRLS contains the regularized least squares (RLS) as a special case. The RLS algorithm for non-graph signals is a fully supervised method where the optimization problem is:

$$
f^* = \arg \min_f \left\{ \frac{1}{2} \sum_{i=1}^l (y_i - f(x_i))^2 + \frac{\gamma_A}{2} \|f\|_K^2 \right\}.
$$

By the classical representer theorem, the solution is given by:

$$
f^*(x) = \sum_{i=1}^l \alpha_i^* K(x, x_i).
$$

Substituting this into the optimization problem yields:

$$
\alpha^* = \arg \min_\alpha \left\{ \frac{1}{2} (y - K \alpha)^T (y - K \alpha) + \frac{\gamma_A}{2} \alpha^T K \alpha \right\}.
$$

From the first-order optimization condition \( \frac{\partial V(\alpha)}{\partial \alpha} = 0 \), we obtain the solution:

$$
\alpha^* = \left( K + \gamma_A l I \right)^{-1} y.
$$


In [2]:
import numpy as np
from scipy.spatial.distance import pdist, squareform
from scipy.linalg import inv

class LapRLS:
    def __init__(self, gamma_A=1.0, gamma_l=1.0, kernel='rbf', sigma=1.0):
        """
        Initialize the LapRLS model.
        :param gamma_A: Regularization parameter for the function norm.
        :param gamma_l: Regularization parameter for the Laplacian term.
        :param kernel: Kernel type ('linear', 'poly', 'rbf').
        :param sigma: Parameter for the RBF kernel.
        """
        self.gamma_A = gamma_A
        self.gamma_l = gamma_l
        self.kernel_type = kernel
        self.sigma = sigma
        self.alpha = None

    def _kernel(self, X, Y=None):
        """
        Compute the kernel matrix.
        :param X: Input data.
        :param Y: Second input data (optional, for non-square kernel matrices).
        :return: Kernel matrix.
        """
        if Y is None:
            Y = X

        if self.kernel_type == 'linear':
            return np.dot(X, Y.T)
        elif self.kernel_type == 'poly':
            return (np.dot(X, Y.T) + 1) ** 3
        elif self.kernel_type == 'rbf':
            pairwise_sq_dists = squareform(pdist(X, 'sqeuclidean'))
            K = np.exp(-pairwise_sq_dists / (2 * self.sigma ** 2))
            return K
        else:
            raise ValueError("Unsupported kernel type.")

    def _laplacian_matrix(self, X):
        """
        Compute the graph Laplacian matrix.
        :param X: Input data.
        :return: Laplacian matrix.
        """
        pairwise_sq_dists = squareform(pdist(X, 'sqeuclidean'))
        W = np.exp(-pairwise_sq_dists / (2 * self.sigma ** 2))
        D = np.diag(W.sum(axis=1))
        L = D - W
        return L

    def fit(self, X_labeled, y_labeled, X_unlabeled):
        """
        Fit the LapRLS model using labeled and unlabeled data.
        :param X_labeled: Labeled input data.
        :param y_labeled: Labels corresponding to the labeled data.
        :param X_unlabeled: Unlabeled input data.
        """
        X = np.vstack((X_labeled, X_unlabeled))
        l, u = X_labeled.shape[0], X_unlabeled.shape[0]

        # Compute the kernel matrix
        K = self._kernel(X)

        # Compute the Laplacian matrix
        L = self._laplacian_matrix(X)

        # Create diagonal matrix J
        J = np.zeros((l + u, l + u))
        J[:l, :l] = np.eye(l)

        # Solve for alpha
        A = J @ K + self.gamma_A * l * np.eye(l + u) + self.gamma_l * l * L @ K
        self.alpha = inv(A) @ J @ y_labeled

    def predict(self, X_test, X_train=None):
        """
        Make predictions using the fitted model.
        :param X_test: Test input data.
        :param X_train: Training input data.
        :return: Predicted labels.
        """
        if X_train is None:
            X_train = self.X

        K_test = self._kernel(X_test, X_train)
        return K_test @ self.alpha


In [3]:
# Example data (labeled and unlabeled)
X_labeled = np.array([[1, 2], [2, 3], [3, 4]])
y_labeled = np.array([1, -1, 1])
X_unlabeled = np.array([[4, 5], [5, 6]])

# Create and fit the model
laprls = LapRLS(gamma_A=0.1, gamma_l=0.1, kernel='rbf', sigma=1.0)
laprls.fit(X_labeled, y_labeled, X_unlabeled)

# Predict on new data
X_test = np.array([[2, 3], [3, 4], [6, 7]])
predictions = laprls.predict(X_test, X_labeled)
print(predictions)


ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 3 is different from 5)