Clustered Singular Value Decomposition (cSVD) is a matrix decomposition that factorizes a given dataset using its Singular Value Decomposition (SVD) and reconstructs the features by projection matrices of clustered subsets. In this method, k-means clustering is used to label the whole data into several subsets in a low-dimensional space.
where
import csvd
...
cs = csvd.ClusteredSVD(data, n_svd_components=n_svd_components, num_clusters=num_clusters)
V = cs.V
W = cs.wlist
rec_svd = (V.T @ cs.rho).squeeze(-1)
rec_csvd = cs.rec
...
- data: B x N dataset matrix (B: number of instances, N: number of features)
- n_svd_components: reduced parameter dimension
- num_clusters: number of clusters
- V: a projection matrix V obtained by SVD with the whole data
- rho: reduced dimensional data
- labels: clustering labels
- wlist: a list of k projection matrices Ws obtained by SVD with each clustered dataset
(#instances, #features), n_components, num_clusters
- D1. wine dataset: (178, 13), n_components=5, num_clusters=4
- D2. breast cancer dataset: (569, 30), n_components=5, num_clusters=3
- D3. mnist dataset: (1797, 64), n_components=20, num_clusters=10
- D4. covertype dataset: (581012, 54), n_components=5, num_clusters=10
type | D1 | D2 | D3 | D4 |
---|---|---|---|---|
SVD | 0.1745 | 0.2760 | 1.9888 | 78.5639 |
cSVD | 0.1705 | 0.1863 | 1.6580 | 73.0463 |
Yongho Kim and Jan Heiland (2023), Convolutional Autoencoders, Clustering, and POD for Low-dimensional Parametrization of Navier-Stokes Equations