# Examples
This notebook runs every doctest snippet that appears in the package docstrings, so you can verify the NA-aware helpers interactively.

## Table of Contents
- Principal Component Analysis (PCA)
- Multiple Factor Analysis (MFA)
- Partial Least Squares (PLS)
- RV coefficient
    - RV/RV2
    - RV/RV2 with list of arrays
- Normalization
- Standardization
- scikit-learn style
    - PCA
    - PLSRegressor
    - Normalizer
    - StandardScaler
    - scikit-learn pipeline

In [1]:
import numpy as np
from missing_methods import pca, pls, rv, rv2, rv_list, rv2_list, mfa
from missing_methods.sk import PCA, PLSRegressor, Normalizer, StandardScaler

## Principal Component Analysis (PCA)
This cell runs `missing_methods.pca`, a NIPALS-based decomposition that scales inner products by observed proportions so NaNs are ignored during the fit.

In [2]:
X = np.array([[2.5, 2.4],
              [0.5, 0.7],
              [2.2, 2.9]])
X[1, 0] = np.nan
result = pca(X, ncomp=2)
result["scores"].shape

(3, 2)

## Multiple Factor Analysis (MFA)
`missing_methods.mfa` scales each block by its leading NIPALS eigenvalue before concatenating, ensuring NaNs stay handled via the MCAR-scaled PCA at every step.

In [3]:
rng = np.random.default_rng(0)
X = rng.standard_normal((10, 4))
Y = rng.standard_normal((10, 4)) + 0.3 * X
Z = rng.standard_normal((10, 4)) + 0.2 * X
X[[1, 3, 7], 2] = np.nan
Y[[0, 4, 9], 1] = np.nan
Z[[5, 6], 0] = np.nan
blocks = [X, Y, Z]
result = mfa(blocks, ncomp=3)
result["scores"].shape

(10, 3)

## Partial Least Squares (PLS)
`missing_methods.pls` alternates NIPALS updates between X and Y, scaling cross-products by the observed entries so NaNs do not distort the latent components.

In [4]:
X = np.array([[2.5, 2.4],
              [0.5, 0.7],
              [2.2, 2.9]])
Y = np.array([[2.4],
              [0.6],
              [2.1]])
X[1, 0] = np.nan
Y[2, 0] = np.nan
result = pls(X, Y, ncomp=2)
result["scores"].shape

(3, 2)

## RV coefficient
`missing_methods.rv` uses NIPALS PCA outputs to compute RV coefficients with MCAR-scaled inner products so NaNs are excluded from the similarity sums.

In [5]:
X = np.array([[2.5, 2.4],
              [0.5, 0.7],
              [2.2, 2.9]])
Y = np.array([[2.4, 2.9],
              [0.6, 0.5],
              [2.1, 2.2]])
X[1, 0] = np.nan
Y[2, 0] = np.nan
rv_value = float(rv(X, Y))
rv_value

0.7327404890152707

## RV2 coefficient
`missing_methods.rv2` zeroes each Gram diagonal before computing the cosine-like similarity, and its NIPALS-based scores ignore NaNs thanks to scaled inner products.

In [6]:
X = np.array([[2.5, 2.4],
              [0.5, 0.7],
              [2.2, 2.9]])
Y = np.array([[2.4, 2.9],
              [0.6, 0.5],
              [2.1, 2.2]])
X[1, 0] = np.nan
Y[2, 0] = np.nan
rv2_value = float(rv2(X, Y))
rv2_value

0.6730600493101697

## RV with list of arrays
`missing_methods.rv_list` reuses the NIPALS-driven rv helper to build a symmetric matrix, preserving the NaN-aware scaling in every entry.

In [7]:
X = np.array([[2.5, 2.4],
              [0.5, 0.7],
              [2.2, 2.9]])
Y = np.array([[2.4, 2.9],
              [0.6, 0.5],
              [2.1, 2.2]]) + 0.3 * X
Z = np.array([[1.2, 0.9],
              [0.3, 0.4],
              [1.1, 1.3]]) + 0.2 * X
X[1, 0] = np.nan
Y[2, 0] = np.nan
Z[0, 1] = np.nan
arrays = [X, Y, Z]
rv_list(arrays).shape

(3, 3)

## RV2 with list of arrays
`missing_methods.rv2_list` calls rv2 for each pair, so every element shares the NIPALS-based, MCAR-scaled diagonal removal that keeps NaNs silent.

In [8]:
X = np.array([[2.5, 2.4],
              [0.5, 0.7],
              [2.2, 2.9]])
Y = np.array([[2.4, 2.9],
              [0.6, 0.5],
              [2.1, 2.2]]) + 0.3 * X
Z = np.array([[1.2, 0.9],
              [0.3, 0.4],
              [1.1, 1.3]]) + 0.2 * X
X[1, 0] = np.nan
Y[2, 0] = np.nan
Z[0, 1] = np.nan
arrays = [X, Y, Z]
rv2_list(arrays).shape

(3, 3)

## Normalize helper
`missing_methods.normalize` scales each column by the MCAR-aware norms, so NaNs never inflate the length before ignoring them in the NIPALS-backed rescaling.

In [9]:
import numpy as np
from missing_methods import normalize
X = np.array([[3.0, np.nan], [0.0, 4.0]])
normalized = normalize(X)
normalized.shape
float(np.nanmax(normalized))

1.0

## Standardize helper
`missing_methods.standardize` centers each column via MCAR-aware means and scales by NaN-safe sums of squares, so inverse transforms stay consistent.

In [10]:
import numpy as np
from missing_methods import standardize
X = np.array([[1.0, 2.0], [np.nan, 4.0]])
standardized = standardize(X)
np.nanmean(standardized, axis=0)

array([0., 0.])

## PCA in scikit-learn style
The sklearn PCA layers on the NIPALS-based helper, alternating X updates and scaling cross-products only where data is observed so NaNs stay excluded.

In [11]:
X = np.array([[2.5, 2.4],
              [0.5, 0.7],
              [2.2, 2.9]])
X[1, 0] = np.nan
estimator = PCA(ncomp=2)
estimator.fit(X)
estimator.transform(X).shape

(3, 2)

## PLSRegressor in scikit-learn style
The sklearn PLSRegressor layers on the NIPALS-based helper, alternating X/Y updates and scaling cross-products only where data is observed so NaNs stay excluded.

In [12]:
X = np.array([[2.5, 2.4],
              [0.5, 0.7],
              [2.2, 2.9]])
Y = np.array([[2.4],
              [0.6],
              [2.1]])
X[2, 1] = np.nan
Y[0, 0] = np.nan
estimator = PLSRegressor(ncomp=2)
estimator.fit(X, Y)
estimator.predict(X).shape

(3, 1)

## Normalizer in scikit-learn style
The sklearn Normalizer wraps `_normalize`, a NIPALS-based l2 rescaler that divides only by norms computed over observed entries so NaNs never contribute.

In [13]:
X = np.array([[3.0, np.nan],
              [0.0, 4.0]])
normalizer = Normalizer()
normalized = normalizer.fit_transform(X)
normalized.shape

(2, 2)

## StandardScaler in scikit-learn style
StandardScaler inherits the MCAR-aware NIPALS variance estimates so transform/inverse_transform scale only via NaN-safe sums of squares.

In [14]:
X = np.array([[1.0, 2.0],
              [np.nan, 4.0]])
scaler = StandardScaler()
transformed = scaler.fit_transform(X)
reconstructed = scaler.inverse_transform(transformed)
np.allclose(reconstructed, X, equal_nan=True)

True

## scikit-learn pipeline
NaN-safe StandardScaler and PLSRegressor combined

In [15]:
from sklearn.pipeline import make_pipeline
X = np.array([[2.5, 2.4],
              [0.5, 0.7],
              [2.2, 2.9]])
Y = np.array([[2.4],
              [0.6],
              [2.1]])
X[2, 1] = np.nan
Y[0, 0] = np.nan
scaler = StandardScaler()
estimator = PLSRegressor(ncomp=2)
pipeline = make_pipeline(scaler, estimator)
pipeline.fit(X, Y)
pipeline.predict(X).shape

(3, 1)