<a id="introduction"></a>
## Introduction to Dimensionality Reduction
#### By Paul Hendricks
-------

In this notebook, we will show how to do GPU accelerated Dimensionality Reduction in RAPIDS.

**Table of Contents**

* [Introduction to Dimensionality Reduction](#introduction)
* [Principle Components Analysis](#pca)
* [Truncated SVC](#tvsd)
* [UMAP](#umap)
* [Setup](#setup)
* [Conclusion](#conclusion)

Before going any further, let's make sure we have access to `matplotlib`, a popular Python library for visualizing data.

In [1]:
import os

try:
    import matplotlib
except ModuleNotFoundError:
    os.system('conda install -y matplotlib')

<a id="setup"></a>
## Setup

This notebook was tested using the following Docker containers:

* `rapidsai/rapidsai-nightly:0.8-cuda10.0-devel-ubuntu18.04-gcc7-py3.7` from [DockerHub - rapidsai/rapidsai-nightly](https://hub.docker.com/r/rapidsai/rapidsai-nightly)

This notebook was run on the NVIDIA Tesla V100 GPU. Please be aware that your system may be different and you may need to modify the code or install packages to run the below examples. 

If you think you have found a bug or an error, please file an issue here: https://github.com/rapidsai/notebooks/issues

Before we begin, let's check out our hardware setup by running the `nvidia-smi` command.

In [2]:
!nvidia-smi

Mon Jun 10 02:57:51 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla V100-SXM2...  On   | 00000000:06:00.0 Off |                    0 |
| N/A   38C    P0    57W / 300W |    660MiB / 16130MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000000:07:00.0 Off |                    0 |
| N/A   38C    P0    45W / 300W |     11MiB / 16130MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  On   | 00000000:0A:00.0 Off |                    0 |
| N/A   

Next, let's see what CUDA version we have:

In [3]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130


Next, let's load some helper functions from `matplotlib` and configure the Jupyter Notebook for visualization.

In [4]:
from matplotlib.colors import ListedColormap
import matplotlib.pyplot as plt


%matplotlib inline

<a id="pca"></a>
## Principle Components Analysis

To be edited.

In [5]:
# Both import methods supported
from cuml import PCA
from cuml.decomposition import PCA

import cudf
import numpy as np

gdf_float = cudf.DataFrame()
gdf_float['0'] = np.asarray([1.0,2.0,5.0], dtype = np.float32)
gdf_float['1'] = np.asarray([4.0,2.0,1.0], dtype = np.float32)
gdf_float['2'] = np.asarray([4.0,2.0,1.0], dtype = np.float32)

pca_float = PCA(n_components = 2)
pca_float.fit(gdf_float)

print(f'components: {pca_float.components_}')
print(f'explained variance: {pca_float.explained_variance_}')
print(f'explained variance ratio: {pca_float.explained_variance_ratio_}')

print(f'singular values: {pca_float.singular_values_}')
print(f'mean: {pca_float.mean_}')
print(f'noise variance: {pca_float.noise_variance_}')

trans_gdf_float = pca_float.transform(gdf_float)
print(f'Inverse: {trans_gdf_float}')

input_gdf_float = pca_float.inverse_transform(trans_gdf_float)
print(f'Input: {input_gdf_float}')

components:              0            1           2
0   0.69225764   -0.5102837 -0.51028395
1  -0.72165036  -0.48949987  -0.4895003
explained variance: 0      8.510402
1    0.48959687
dtype: float32
explained variance ratio: 0      0.9456003
1    0.054399658
dtype: float32
singular values: 0    4.1256275
1    0.9895422
dtype: float32
mean: 0    2.6666667
1    2.3333333
2    2.3333333
dtype: float32
noise variance: 0    0.0
dtype: float32
Inverse:               0           1
0    -2.8547091 -0.42891636
1  -0.121316016  0.80743366
2     2.9760244 -0.37851727
Input:            0          1          2
0  1.0000001  3.9999993        4.0
1        2.0  2.0000002  1.9999999
2  4.9999995  1.0000006  1.0000001


<a id="tvsd"></a>
## Truncated SVD

To be edited.

In [6]:
# Both import methods supported
from cuml import TruncatedSVD
from cuml.decomposition import TruncatedSVD

import cudf
import numpy as np

gdf_float = cudf.DataFrame()
gdf_float['0'] = np.asarray([1.0,2.0,5.0], dtype = np.float32)
gdf_float['1'] = np.asarray([4.0,2.0,1.0], dtype = np.float32)
gdf_float['2'] = np.asarray([4.0,2.0,1.0], dtype = np.float32)

tsvd_float = TruncatedSVD(n_components = 2, algorithm = "jacobi", n_iter = 20, tol = 1e-9)
tsvd_float.fit(gdf_float)

print(f'components: {tsvd_float.components_}')
print(f'explained variance: {tsvd_float.explained_variance_}')
print(f'explained variance ratio: {tsvd_float.explained_variance_ratio_}')
print(f'singular values: {tsvd_float.singular_values_}')

trans_gdf_float = tsvd_float.transform(gdf_float)
print(f'Transformed matrix: {trans_gdf_float}')

input_gdf_float = tsvd_float.inverse_transform(trans_gdf_float)
print(f'Input matrix: {input_gdf_float}')

components:            0            1           2
0  0.5872595   0.57233125   0.5723313
1  0.8093987  -0.41525516  -0.4152552
explained variance: 0    0.49499735
1     5.5050035
dtype: float32
explained variance ratio: 0    0.082499556
1      0.9175005
dtype: float32
singular values: 0    7.4390235
1    4.0817785
dtype: float32
Transformed matrix:           0             1
0   5.16591    -2.5126426
1  3.463844  -0.042223275
2   4.08096     3.2164836
Input matrix:            0          1           2
0  0.9999999  3.9999995   3.9999998
1  1.9999996  1.9999996   1.9999999
2        5.0  0.9999995  0.99999964


<a id="umap"></a>
## UMAP

To be edited.

<a id="conclusion"></a>
## Conclusion

In this notebook, we showed how to do GPU accelerated Dimensionality Reduction in RAPIDS.

To learn more about RAPIDS, be sure to check out: 

* [Open Source Website](http://rapids.ai)
* [GitHub](https://github.com/rapidsai/)
* [Press Release](https://nvidianews.nvidia.com/news/nvidia-introduces-rapids-open-source-gpu-acceleration-platform-for-large-scale-data-analytics-and-machine-learning)
* [NVIDIA Blog](https://blogs.nvidia.com/blog/2018/10/10/rapids-data-science-open-source-community/)
* [Developer Blog](https://devblogs.nvidia.com/gpu-accelerated-analytics-rapids/)
* [NVIDIA Data Science Webpage](https://www.nvidia.com/en-us/deep-learning-ai/solutions/data-science/)
