# Truncated Singular Value Decomposition (TSVD) Multi-Node Multi-GPU (MNMG) Demo

The TSVD algorithm is a linear dimensionality reduction algorithm that works really well for datasets in which samples correlated in large groups. Unlike PCA, TSVD does not center the data before computation. 

Unlike the single-GPU implementation, The MNMG TSVD API currently requires a Dask cuDF Dataframe as input. `transform()` also returns a Dask cuDF Dataframe. The Dask cuDF Dataframe API is very similar to the Dask DataFrame API, but underlying Dataframes are cuDF, rather than Pandas.

For information on converting your dataset to Dask cuDF format: https://rapidsai.github.io/projects/cudf/en/latest/dask-cudf.html

For more information about cuML's TSVD implementation: https://rapidsai.github.io/projects/cuml/en/0.6.0/api.html#truncated-svd

In [30]:
import os
import numpy as np

import pandas as pd
import cudf as gd

from cuml.dask.common import to_dask_df
from cuml.dask.datasets import make_blobs

from dask.distributed import Client, wait
from dask_cuda import LocalCUDACluster

from dask_ml.decomposition import TruncatedSVD as skTSVD
from cuml.dask.decomposition import TruncatedSVD as cumlTSVD

## Start Dask Cluster

We can use the `LocalCUDACluster` to start a Dask cluster on a single machine with one worker mapped to each GPU. This is called one-process-per-GPU (OPG). 

In [80]:
cluster = LocalCUDACluster(threads_per_worker=1)
client = Client(cluster)

Port 8787 is already in use. 
Perhaps you already have a cluster running?
Hosting the diagnostics dashboard on a random port instead.


## Define Parameters

In [81]:
n_samples = 2**15
n_features = 128

n_components = 4
random_state = 42

## Generate Data

### GPU

In [82]:
%%time
X_cudf, _ = make_blobs(n_samples, 
                       n_features, 
                       centers=1, 
                       cluster_std=1.0,
                       random_state=random_state)

CPU times: user 640 ms, sys: 69 ms, total: 709 ms
Wall time: 2.72 s


### Host

In [83]:
wait(X_cudf)

X_host = to_dask_df(X_cudf).to_dask_array(lengths=True)

## Scikit-learn Model

In [84]:
%%time
tsvd_sk = skTSVD(n_components=n_components,
                 algorithm="tsqr", 
                 random_state=random_state)

result_sk = tsvd_sk.fit_transform(X_host)

CPU times: user 134 ms, sys: 43.3 ms, total: 177 ms
Wall time: 1.03 s


## cuML Model

In [85]:
%%time
tsvd_cuml = cumlTSVD(n_components=n_components,
                     algorithm="full", 
                     n_iter=5000,
                     tol=0.00001,
                     random_state=random_state)

result_cuml = tsvd_cuml.fit_transform(X_cudf)

CPU times: user 360 ms, sys: 46.5 ms, total: 406 ms
Wall time: 1.67 s


## Evaluate Results

### Singular Values

In [86]:
passed = np.allclose(tsvd_sk.singular_values_, 
                     tsvd_cuml.singular_values_.to_array(), 
                     atol=1e-1)
print('compare tsvd: cuml vs sklearn singular_values_ {}'.format('equal' if passed else 'NOT equal'))

compare tsvd: cuml vs sklearn singular_values_ equal


### Components

In [87]:
sk_components = np.abs(tsvd_sk.components_)
cuml_components = np.abs(np.asarray(tsvd_cuml.components_.as_gpu_matrix()))

passed = np.allclose(sk_components, cuml_components, atol=1e-1)
print('compare tsvd: cuml vs sklearn components_ {}'.format('equal' if passed else 'NOT equal'))

compare tsvd: cuml vs sklearn components_ equal


### Transform

In [88]:
# compare the reduced matrix
passed = np.allclose(result_sk.compute(), np.asarray(result_cuml.compute().as_gpu_matrix()), atol=1)
# larger error margin due to different algorithms: arpack vs full
print('compare tsvd: cuml vs sklearn transformed results %s'%('equal'if passed else 'NOT equal'))

compare tsvd: cuml vs sklearn transformed results equal
