**Introduction**

Subspace learning involves finding a shared feature space (or subspace) where the source and target domains can be aligned. By projecting data from both domains into this common subspace, the model can reduce the domain shift (differences in data distributions) and improve its ability to generalize to the target domain.

**Key Concepts:**

**Domain Shift:**

The source and target domains often have different distributions, making it hard for a model trained on the source domain to perform well on the target domain.

Subspace learning mitigates this by aligning the distributions in a shared subspace.

**Feature Transformation:**

Data from both domains is transformed into a lower-dimensional subspace where the structural relationships between the domains are preserved.

**Optimization:**

The subspace is learned by minimizing the difference between the source and target distributions while preserving the discriminative structure of the source domain.

**Principal Component Analysis (PCA):**

Reduces dimensionality and finds a subspace that captures the most variance in the data.

**Imports**




In [1]:
# torch
import torch
import torch.nn as nn
import torch.optim as optim

# torchvision
import torchvision.datasets as datasets
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# sklearn
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# matplotlib
import matplotlib.pyplot as plt


**Data Processing**

In [2]:
# Load CIFAR-10 dataset
# Define transformations (grayscale for simplicity)
transform = transforms.Compose([
    transforms.Grayscale(),
    transforms.ToTensor(),
    transforms.Lambda(lambda x: x.view(-1))  # Flatten images
])

# CIFAR-10 (Source Domain)
cifar10_data = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
cifar10_loader = torch.utils.data.DataLoader(cifar10_data, batch_size=1000, shuffle=True)
cifar10_features, _ = next(iter(cifar10_loader))
cifar10_features = cifar10_features.numpy()

# SVHN (Target Domain)
svhn_data = datasets.SVHN(root='./data', split='train', download=True, transform=transform)
svhn_loader = torch.utils.data.DataLoader(svhn_data, batch_size=1000, shuffle=True)
svhn_features, _ = next(iter(svhn_loader))
svhn_features = svhn_features.numpy()


100%|██████████| 170M/170M [00:05<00:00, 31.1MB/s]
100%|██████████| 182M/182M [00:01<00:00, 97.0MB/s]


**Normalize Each Domain Separately**

In [3]:
# Standardize CIFAR-10 (Source Domain)
cifar10_scaler = StandardScaler()
cifar10_features = cifar10_scaler.fit_transform(cifar10_features)

# Standardize SVHN (Target Domain)
svhn_scaler = StandardScaler()
svhn_features = svhn_scaler.fit_transform(svhn_features)


**Apply PCA to Reduce Dimensionality**

In [4]:
# Fit PCA on the Source Domain (CIFAR-10)
pca_source = PCA(n_components=50)
cifar10_pca = pca_source.fit_transform(cifar10_features)

# Transform Target Domain (SVHN) to the Source's PCA Subspace
svhn_pca = pca_source.transform(svhn_features)

print("Source Domain (CIFAR-10) Shape:", cifar10_pca.shape)
print("Target Domain (SVHN) Shape:", svhn_pca.shape)


Source Domain (CIFAR-10) Shape: (1000, 50)
Target Domain (SVHN) Shape: (1000, 50)


**Align Subspaces**

Now that both domains share a reduced subspace, you can apply a subspace alignment method to better adapt the source to the target domain. For instance:



1.   Align the principal components of the source and target domains.
2.   Train a classifier on the source domain using the PCA-transformed data.
3.   Use the same classifier to predict on the target domain.







In [5]:


# Mock labels for simplicity (replace with real labels)
source_labels = cifar10_data.targets[:1000]  # Use first 1000 labels
target_labels = svhn_data.labels[:1000]  # Use first 1000 labels

clf = LogisticRegression(max_iter=1000)
clf.fit(cifar10_pca, source_labels)

# Predict on Target Domain
predictions = clf.predict(svhn_pca)
print("Target Domain Accuracy:", accuracy_score(target_labels, predictions))

Target Domain Accuracy: 0.083


**Predict on Target Domain**

In [9]:
# SVHN (Target Domain)
test_svhn_data = datasets.SVHN(root='./data', split='test', download=True, transform=transform)
test_svhn_loader = torch.utils.data.DataLoader(test_svhn_data, batch_size=1000, shuffle=True)
test_svhn_features, _ = next(iter(test_svhn_loader))
test_svhn_features = test_svhn_features.numpy()

In [15]:
test_svhn_pca = pca_source.transform(test_svhn_features)


test_pred = clf.predict(test_svhn_pca)

accuracy_score(test_svhn_data.labels[:1000], test_pred)

0.148