# Rapids UMAP on Colab

# <img src="http://reconstrue.com/assets/images/reconstrue_logo_brandmark.svg" width="42px" align="top" /> **Reconstrue**

## Legal

This code is licensed by Reconstrue under the Apache 2.0 License.

Reconstrue's work on this started from notebooks in the Rapids' repo, [notebook-contrib](https://github.com/rapidsai/notebooks-contrib) which is licensed licensed under the [Apache License 2.0](https://github.com/rapidsai/notebooks-contrib/blob/branch-0.11/LICENSE). The following two files were used:
- [umap_demo.ipynb](https://github.com/rapidsai/notebooks-contrib/blob/branch-0.11/colab_notebooks/cuml/umap_demo.ipynb)
- [09_Introduction_to_Dimensionality_Reduction.ipynb](https://github.com/rapidsai/notebooks-contrib/blob/branch-0.11/getting_started_notebooks/intro_tutorials/09_Introduction_to_Dimensionality_Reduction.ipynb)

## Introduction

This notebook exercises Rapids' GPU accelerated UMAP implementation, on Colab. This is just a proof of concept test drive, nothing single-cell specific, nor large datasets. 


## Set up

The main tests can be a bit slow. So, here's some switches to control which tests to run.

In [0]:
#@title Pre-run config switches

# UMAP of iris dataset costs about 10 minutes
run_stock_umap = False #@param {type:"boolean"}


run_rapids_umap = True #@param {type:"boolean"}


In [0]:
def is_gpu_rapids_friendly():
  import sys, os 

  sys.path.append('/usr/local/lib/python3.6/site-packages/')
  os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/libnvvm.so'
  os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/'

  import pynvml

  pynvml.nvmlInit()
  handle = pynvml.nvmlDeviceGetHandleByIndex(0)
  device_name = pynvml.nvmlDeviceGetName(handle)

  answer = False
  if (device_name == b'Tesla T4') or (device_name == b'Tesla P100-PCIE-16GB'):
    answer = True
  return answer


In [0]:
import os
import requests
import IPython 

def get_audio_file(filename, url):
  bells_dest_dir = "/content/tmp/"
  dest_filename = os.path.join(bells_dest_dir, filename)
  if not os.path.exists(bells_dest_dir):
    os.mkdir(bells_dest_dir)
  response = requests.get(url)
  with open(dest_filename, 'wb') as f:
    f.write(response.content)
  return dest_filename

jingle_bell_filename = get_audio_file("jingle_bell.mp3", "https://www.soundjay.com/misc/bell-ringing-05.mp3")

## Pre-installed umap-learn

The stock UMAP implementation, package umap-learn, comes preinstalled on Colab.

This version is already fast because Numba is used. The below tests indicate that Numba is compiling to the GPU or TPU if present (although the TPU has not been put to better effect than the GPU).

| Run time | Runtime | Date |
|--|--|--|
| 0:14:20 | CPU | 2019-12-06 |
| 0:06:19 | K80 | 2019-12-06 |
| 0:10:52 | K80 | 2019-12-08 |
| **0:07:11** | TPU | 2019-12-06 |

These tests were repeated a few times. Times were all within about 15 seconds of each other.

In [0]:
## Default Colab install of UMAP
!pip show umap-learn

### MNIST by umap-learn

Via [UMAP on the MNIST Digits dataset](https://umap-learn.readthedocs.io/en/latest/auto_examples/plot_mnist_example.html#sphx-glr-auto-examples-plot-mnist-example-py)

Iris data loaded by [fetch_openml](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_openml.html).

In [0]:
import umap
from sklearn.datasets import fetch_openml
import matplotlib.pyplot as plt
import seaborn as sns
import time
import datetime

sns.set(context="notebook", style="dark")

def exercise_stock_umap():
  start_time = time.time()
  print("Start time:  %s\n" % datetime.datetime.now())

  mnist = fetch_openml('mnist_784', version=1)

  reducer = umap.UMAP(random_state=42, verbose=True)
  embedding = reducer.fit_transform(mnist.data)

  fig, ax = plt.subplots(figsize=(12, 10))
  color = mnist.target.astype(int)
  plt.scatter(
    embedding[:, 0], embedding[:, 1], c=color, cmap="Spectral", s=0.1
  )
  plt.setp(ax, xticks=[], yticks=[])
  plt.title("MNIST by stock UMAP", fontsize=18)

  plt.show()

  print("Run time: %s" % datetime.timedelta(seconds=round(time.time()-start_time)))

if run_stock_umap:
  exercise_stock_umap()

## Rapids UMAP

### Detect GPU

On Colab, you have to explicitly request a GPU. As of late 2019, there are four that have been seen recently.

These work with Rapids:
- Tesla P100
- Tesla T4

These do not:
- Tesla P4
- Tesla K80

Check to maker sure a compatible GPU has been allocated (after, of course, being requested).

In [0]:
!nvidia-smi

### Detect TPU

I doubt Rapids works with TPU. TPUs are Google hardware; Rapids is Nvidia software.

But stock UMAP uses Numba to JIT, so perhaps Numba can compile to TPU. [Test results show TPU is faster than CPU but roughly same speed as GPU. More testing needed.]

Note that [Colab can deploy Tensorflow 1 or 2](https://colab.research.google.com/notebooks/tensorflow_version.ipynb#scrollTo=NeWVBhf1VxlH) and we want 2.

**TODO:** This would be a good snippet

In [0]:
# Colab has two flavors of TensorFlow: TF 1.x & 2.x. Be explicit to suppress a warning. 
try:
  # %tensorflow_version is a Colab-only thing 
  %tensorflow_version 2.x
except Exception:
  print("TensorFlow 2.x does not seem to be available")

from tensorflow.python.client import device_lib

# Alternatively, similar in JSON
#device_lib.list_local_devices()

In [0]:
import os
import pprint
import tensorflow as tf

if 'COLAB_TPU_ADDR' not in os.environ:
  print('ERROR: Not connected to a TPU runtime; Request a TPU runtime in the Runtime menu')
else:
  tpu_address = 'grpc://' + os.environ['COLAB_TPU_ADDR']
  print ('TPU address is', tpu_address)

  with tf.Session(tpu_address) as session:
    devices = session.list_devices()
    
  print('TPU devices:')
  pprint.pprint(devices)

Complaints of NVIDIA-SMI failing to communicate with NVIDIA driver means there is no GPU. A GPU can be requested in the `Runtime` menu via 'Change runtime type'.

### cuML's UMAP docs

UMAP is a dimensionality reduction algorithm which performs non-linear dimension reduction. 
- It can also be used for visualization of the dataset. 

The UMAP model implemented in cuml allows the user to set the following parameter values:
1.	`n_neighbors`: number of neighboring samples used for manifold approximation. Larger values result in more global views of the manifold, while smaller values result in more local data being preserved (default = 15)
2.	`n_components`: the dimension of the space to embed into (default = 2)
3.	`n_epochs`: number of training epochs to be used in optimizing the low dimensional embedding (default = None)
4.	`learning_rate`: initial learning rate for the embedding optimization (default = 1.0)
5.	`init`: the low dimensional embedding to use: a. 'spectral': use a spectral embedding of the fuzzy 1-skeleton b. 'random': assign initial embedding positions at random (default = 'spectral')
6.	`min_dist`: the minimum distance that should be present between embedded points (default = 0.1)
7.	`spread`: determines how clustered the embedded points will be (default = 1.0)
8.	`set_op_mix_ratio`: is the ratio of pure fuzzy union to intersection. If the value is 1.0 then it will be a pure fuzzy union and for the value of 0.0 it will be a pure fuzzy interpolation (default = 1.0)
9.	`local_connectivity`: number of nearest neighbors that should be assumed to be connected at a local level. It should be not more than the local intrinsic dimension of the manifold (default = 1)
10.	`repulsion_strength`: weighting applied to negative samples in low dimensional embedding optimization. Values > 1 implements a higher negative value to the samples (default = 1.0)
11.	`negative_sample_rate`: the rate at which the negative samples should be selected per positive sample during the optimization process (default = 5)
12.	`transform_queue_size`: embedding new points using a trained model_ will control how aggressively to search for nearest neighbors (default = 4.0)
13.	`verbose`: bool (default False)

The cuml implemetation of the UMAP model has the following functions that one can run:
1.	`fit`: it fits the dataset into an embedded space
2.	`fit_transform`: it fits the dataset into an embedded space and returns the transformed output
3.	`transform`: it transforms the dataset into an existing embedded space and returns the low dimensional output

The model accepts only numpy arrays or cudf dataframes as the input. 
- In order to convert your dataset to cudf format please read the cudf [documentation](https://rapidsai.github.io/projects/cudf/en/latest/) 
- For additional information on the UMAP model please refer to the cuml [UMAP documentation](https://rapidsai.github.io/projects/cuml/en/0.6.0/api.html#cuml.UMAP) 
- This setup may take a few minutes
- Long output (output display removed)

### Install take 1

Rapids says to download [rapids-colab.sh](https://raw.githubusercontent.com/rapidsai/notebooks-contrib/master/utils/rapids-colab.sh) and run it through bash. That script asks the user what version they want. We want `0.11`. So shortcircuit the asking and just say `0.11`.



This installs miniconda (the installer-only, slimmer install of Anaconda), then conda installs dask and rapids, so heavy.

https://raw.githubusercontent.com/rapidsai/notebooks-contrib/master/utils/rapids-colab.sh

**TODO: There's got to be a way to check if Rapids already installed, and bypass if so (to avoid reinstalls after inactivity disconnects).

Dask comes pre-installed but rapids-colab.sh uninstalls and then reinstalls (versioning?).


In [0]:
# This is env-eval.py, which fails from a %%bash cell so just dropped it into a cell here:
import sys, os 

sys.path.append('/usr/local/lib/python3.6/site-packages/')
os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/libnvvm.so'
os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/'

import pynvml

pynvml.nvmlInit()
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
device_name = pynvml.nvmlDeviceGetName(handle)

if (device_name == b'Tesla T4') or (device_name == b'Tesla P100-PCIE-16GB'):
  print('***********************************************************************')
  print('Woo! Your instance has the right kind of GPU, a '+ str(device_name)[1:]+'!')
  print('***********************************************************************')
  print()
else:
  raise Exception("""
    Unfortunately Colab didn't give you a T4 or P100 GPU.
    
    Make sure you've configured Colab to request a GPU instance type.
    
    If you get a K80 GPU, try Runtime -> Reset all runtimes...
  """)

In [0]:
%%bash
# Tell the script we want 0.11
RAPIDS_VERSION=0.11

if [ ! -f Miniconda3-4.5.4-Linux-x86_64.sh ]; then
    echo "Removing conflicting packages, will replace with RAPIDS compatible versions"
    # remove existing xgboost and dask installs
    pip uninstall -y xgboost dask distributed

    # intall miniconda
    echo "Installing conda"
    wget https://repo.continuum.io/miniconda/Miniconda3-4.5.4-Linux-x86_64.sh
    chmod +x Miniconda3-4.5.4-Linux-x86_64.sh
    bash ./Miniconda3-4.5.4-Linux-x86_64.sh -b -f -p /usr/local
    
    if [ $RAPIDS_VERSION == 0.11 ] ;then
    echo "Installing RAPIDS $RAPIDS_VERSION packages from the nightly release channel"
    echo "Please standby, this will take a few minutes..."
    # install RAPIDS packages
        conda install -y --prefix /usr/local \
                -c rapidsai-nightly/label/xgboost -c rapidsai-nightly -c nvidia -c conda-forge \
                python=3.6 cudatoolkit=10.1 \
                cudf=$RAPIDS_VERSION cuml cugraph gcsfs pynvml cuspatial \
                dask-cudf \
                xgboost
        # check to make sure that pyarrow is running the right version (0.15) for v0.11 or later
        wget -nc https://github.com/rapidsai/notebooks-contrib/raw/master/utils/update_pyarrow.py

    else
        echo "Installing RAPIDS $RAPIDS_VERSION packages from the stable release channel"
        echo "Please standby, this will take a few minutes..."
        # install RAPIDS packages
        conda install -y --prefix /usr/local \
            -c rapidsai/label/xgboost -c rapidsai -c nvidia -c conda-forge \
            python=3.6 cudatoolkit=10.1 \
            cudf=$RAPIDS_VERSION cuml cugraph cuspatial gcsfs pynvml \
            dask-cudf \
            xgboost
    fi
      
    echo "Copying shared object files to /usr/lib"
    # copy .so files to /usr/lib, where Colab's Python looks for libs
    cp /usr/local/lib/libcudf.so /usr/lib/libcudf.so
    cp /usr/local/lib/librmm.so /usr/lib/librmm.so
    cp /usr/local/lib/libnccl.so /usr/lib/libnccl.so
fi

echo ""
echo "*********************************************"
echo "Your Google Colab instance is RAPIDS ready!"
echo "*********************************************"

In [0]:
# TODO: Kill
# This was before Rapids 0.11
#!wget -nc https://github.com/rapidsai/notebooks-extended/raw/master/utils/rapids-colab.sh
#!bash rapids-colab.sh
#
#import sys, os
#
#sys.path.append('/usr/local/lib/python3.6/site-packages/')
#os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/libnvvm.so'
#os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/'

In [0]:
#!RAPID_VERSION=0.11
#!wget -nc https://raw.githubusercontent.com/rapidsai/notebooks-contrib/master/utils/rapids-colab.sh
#!bash rapids-colab.sh
#
#import sys, os
#dist_package_index = sys.path.index('/usr/local/lib/python3.6/dist-packages')
#sys.path = sys.path[:dist_package_index] + ['/usr/local/lib/python3.6/site-packages'] + sys.path[dist_package_index:]
#sys.path
#if os.path.exists('update_pyarrow.py'): ## Only exists if RAPIDS version is 0.11 or higher
#  exec(open('update_pyarrow.py').read(), globals())

In [0]:
!echo RAPID_VERSION=$RAPIDS_VERSION

### Install Take 2


In [0]:
# Install RAPIDS
!wget -nc https://raw.githubusercontent.com/rapidsai/notebooks-contrib/890b04ed8687da6e3a100c81f449ff6f7b559956/utils/rapids-colab.sh
!bash rapids-colab.sh

import sys, os

dist_package_index = sys.path.index("/usr/local/lib/python3.6/dist-packages")
sys.path = sys.path[:dist_package_index] + ["/usr/local/lib/python3.6/site-packages"] + sys.path[dist_package_index:]
sys.path
if os.path.exists('update_pyarrow.py'): ## This file only exists if you're using RAPIDS version 0.11 or higher
  exec(open("update_pyarrow.py").read(), globals())

### Install Take 3

In [0]:
!nvidia-smi --list-gpus
print('Is current GPU Rapids friendly? %r' % is_gpu_rapids_friendly())

In [0]:
!ls

In [0]:
!rm env-check.py
!rm Miniconda3-4.5.4-Linux-x86_64.sh

The install process calls for env-check.py to be run. The following does that:

In [0]:
# This is env-check.py, which needs to run for a Python shell, not a %%bash cell else pynvml not found
import sys, os 

sys.path.append('/usr/local/lib/python3.6/site-packages/')
os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/libnvvm.so'
os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/'

import pynvml

pynvml.nvmlInit()
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
device_name = pynvml.nvmlDeviceGetName(handle)

print('Detected GPU: %s' % device_name.decode("utf-8"))

if (device_name == b'Tesla T4') or (device_name == b'Tesla P100-PCIE-16GB'):
  print('This Colab VM has a Rapids compatible GPU: ' + str(device_name)[1:])
else:
  print('This Colab VM does NOT have a Rapids compatible GPU (need a Tesla T4 or P100).')


In [0]:
%%bash
#!/bin/bash

set -eu

# fails to find pynvml:
#wget -nc --show-progress https://github.com/rapidsai/notebooks-contrib/raw/master/utils/env-check.py
#echo "Checking for GPU type:"
#python env-check.py

# TODO: staying with 0.10 b/c 0.11 gets complicated, seemingly. Wait until it's final then 0.11
RAPIDS_VERSION=0.10
if [ ! -f Miniconda3-4.5.4-Linux-x86_64.sh ]; then
    # Note: this echo will not show up in cell output for a while. Flushable?
    echo "Removing conflicting packages, will replace with RAPIDS compatible versions"
    # remove existing xgboost and dask installs
    pip uninstall -y xgboost dask distributed

    # intall miniconda
    echo "Installing conda"
    wget -nc --show-progress https://repo.continuum.io/miniconda/Miniconda3-4.5.4-Linux-x86_64.sh
    chmod +x Miniconda3-4.5.4-Linux-x86_64.sh
    bash ./Miniconda3-4.5.4-Linux-x86_64.sh -b -f -p /usr/local
    
    if [ $RAPIDS_VERSION == 0.11 ] ;then
    echo "Installing RAPIDS $RAPIDS_VERSION packages from the nightly release channel"
    echo "Please standby, this will take a few minutes..."
    # install RAPIDS packages
        conda install -y --prefix /usr/local \
                -c rapidsai-nightly/label/xgboost -c rapidsai-nightly -c nvidia -c conda-forge \
                python=3.6 cudatoolkit=10.1 \
                cudf=$RAPIDS_VERSION cuml cugraph gcsfs pynvml cuspatial \
                dask-cudf \
                xgboost
        # check to make sure that pyarrow is running the right version (0.15) for v0.11 or later
        wget -nc https://github.com/rapidsai/notebooks-contrib/raw/master/utils/update_pyarrow.py

    else
        echo "Installing RAPIDS $RAPIDS_VERSION packages from the stable release channel"
        echo "Please standby, this will take a few minutes..."
        # install RAPIDS packages
        conda install -y --prefix /usr/local \
            -c rapidsai/label/xgboost -c rapidsai -c nvidia -c conda-forge \
            python=3.6 cudatoolkit=10.1 \
            cudf=$RAPIDS_VERSION cuml cugraph cuspatial gcsfs pynvml \
            dask-cudf \
            xgboost
    fi
      
    echo "Copying shared object files to /usr/lib"
    # copy .so files to /usr/lib, where Colab's Python looks for libs
    cp /usr/local/lib/libcudf.so /usr/lib/libcudf.so
    cp /usr/local/lib/librmm.so /usr/lib/librmm.so
    cp /usr/local/lib/libnccl.so /usr/lib/libnccl.so
fi

echo ""
echo "This Google Colab instance is RAPIDS ready."


### Install Take 4

This moves [rapids-colab.sh](https://raw.githubusercontent.com/rapidsai/notebooks-contrib/890b04ed8687da6e3a100c81f449ff6f7b559956/utils/rapids-colab.sh) from a bash cell to a Pythoh cell, which has better progressive feedback output UI.

These conda installs are annoyingly [slow](https://github.com/reconstrue/single_cell_on_colab/issues/59).

In [0]:
import os

def install_take_4():
  RAPIDS_VERSION='0.10'
   
  if not os.path.exists('/contents/Miniconda3-4.5.4-Linux-x86_64.sh'):
    print('Removing conflicting packages, will replace with RAPIDS compatible versions')

    # remove existing xgboost and dask installs
    !pip uninstall -y xgboost dask distributed

    # intall miniconda
    print('Installing conda')
    !wget -nc --show-progress https://repo.continuum.io/miniconda/Miniconda3-4.5.4-Linux-x86_64.sh
    !chmod +x Miniconda3-4.5.4-Linux-x86_64.sh
    !bash ./Miniconda3-4.5.4-Linux-x86_64.sh -b -f -p /usr/local
    
    if RAPIDS_VERSION == '0.11':
      print('Installing RAPIDS %s packages from the nightly release channel' % RAPIDS_VERSION)
      print('Please standby, this will take a few minutes...')
      # install RAPIDS packages
      !conda install -y --prefix /usr/local -c rapidsai-nightly/label/xgboost -c rapidsai-nightly -c nvidia -c conda-forge python=3.6 cudatoolkit=10.1 cudf=$RAPIDS_VERSION cuml cugraph gcsfs pynvml cuspatial dask-cudf xgboost
      # check to make sure that pyarrow is running the right version (0.15) for v0.11 or later
      !wget -nc https://github.com/rapidsai/notebooks-contrib/raw/master/utils/update_pyarrow.py
    else:
      print('Installing RAPIDS %s packages from the nightly release channel' % RAPIDS_VERSION)
      print('Please standby, this will take a few minutes...')
      # install RAPIDS packages
      !conda install -y --prefix /usr/local -c rapidsai/label/xgboost -c rapidsai -c nvidia -c conda-forge python=3.6 cudatoolkit=10.1 cudf=$RAPIDS_VERSION cuml cugraph cuspatial gcsfs pynvml dask-cudf xgboost
      
    print('Copying shared object files to /usr/lib')
    # copy .so files to /usr/lib, where Colab's Python looks for libs
    !cp /usr/local/lib/libcudf.so /usr/lib/libcudf.so
    !cp /usr/local/lib/librmm.so /usr/lib/librmm.so
    !cp /usr/local/lib/libnccl.so /usr/lib/libnccl.so


    import sys, os
    dist_package_index = sys.path.index('/usr/local/lib/python3.6/dist-packages')
    sys.path = sys.path[:dist_package_index] + ['/usr/local/lib/python3.6/site-packages'] + sys.path[dist_package_index:]

install_take_4()

IPython.display.Audio(jingle_bell_filename, autoplay=True)
print('This Google Colab instance is ready for RAPIDS')


In [0]:
!ls -lh /usr/lib/libcudf.so

In [0]:
# without this cudf not findable
import sys, os
dist_package_index = sys.path.index('/usr/local/lib/python3.6/dist-packages')
sys.path = sys.path[:dist_package_index] + ['/usr/local/lib/python3.6/site-packages'] + sys.path[dist_package_index:]
sys.path

### Imports

In [0]:
import numpy as np
import pandas as pd

import os

from sklearn import datasets
from sklearn.metrics import adjusted_rand_score
from sklearn.cluster import KMeans
from sklearn.manifold.t_sne import trustworthiness

import cudf
from cuml.manifold.umap import UMAP

@# Running cuml's UMAP model on blobs dataset

In [0]:
# create a blobs dataset with 500 samples and 10 features each
data, labels = datasets.make_blobs(
    n_samples=500, n_features=10, centers=5)



In [0]:
# using the cuml UMAP algorithm to reduce the features of the dataset and store
embedding = UMAP().fit_transform(data)


In [0]:

# calculate the score of the results obtained using cuml's algorithm and sklearn kmeans
score = adjusted_rand_score(labels,
            KMeans(5).fit_predict(embedding))
print(score) # should equal 1.0

**Ring a bell at end of Set up**

The above install can take a bit, and will have to be done once every 12 hours.

Here's a page with a list of free bells sounds: [Bell Sound Effects](https://www.soundjay.com/bell-sound-effect.html). One of those will do as an alert that the boring prelude is over.

In [0]:
import requests
import IPython

# bell_src_10_sec = "https://www.soundjay.com/misc/bell-ring-01.mp3"
bell_src = "https://www.soundjay.com/misc/bell-ringing-05.mp3"
bell_dest = "/content/bell.mp3"

response = requests.get(bell_src)
open(bell_dest, 'wb').write(response.content)

IPython.display.Audio(bell_dest, autoplay=True)


## Exercise cuml's UMAP

In [0]:
# load the iris dataset from sklearn and extract the required information
iris = datasets.load_iris()
data = iris.data
print(iris.DESCR)

In [0]:
# define the cuml UMAP model and use fit_transform function to obtain the low dimensional output of the input dataset
embedding = UMAP(
    n_neighbors=10, min_dist=0.01,  init="random"
).fit_transform(data)

In [0]:
# calculate the trust worthiness of the results obtaind from the cuml UMAP
trust = trustworthiness(iris.data, embedding, 10)
print(trust)

In [0]:
# create a selection variable which will have 75% True and 25% False values. The size of the selection variable is 150
iris_selection = np.random.choice(
    [True, False], 150, replace=True, p=[0.10, 0.90])
# create an iris dataset using the selection variable
data = iris.data[iris_selection]
print(data)

In [0]:
# create a cuml UMAP model 
fitter = UMAP(n_neighbors=10, min_dist=0.01, verbose=False)
# fit the data created the selection variable to the cuml UMAP model created (fitter)
fitter.fit(data)
# create a new iris dataset by inverting the values of the selection variable (ie. 75% False and 25% True values) 
new_data = iris.data[~iris_selection]
# transform the new data using the previously created embedded space
embedding = fitter.transform(new_data)

In [0]:
# calculate the trustworthiness score for the new data created (new_data)
trust = trustworthiness(new_data, embedding, 10)
print(trust)

And that's where the first notebook ended, without any viz :(

In [0]:
print(iris)

In [0]:
print(len(embedding))
#print(embedding)

### Continuing to viz

In [0]:
# TODO: not working yet:

# embedding is the final thing produces int umap_demo.ipynb
import matplotlib.pyplot as plt

colors = ['blue', 'orange', 'green', 'red', 'purple', 
          'brown', 'pink', 'gray', 'olive', 'cyan']
colors = ['tab:' + color for color in colors]

# create figure
figure = plt.figure()
axis = figure.add_subplot(111)

for i in range(4):
    # WANT: 
    mask = iris.target[~iris_selection] == i
    #mask = y == i
    print(len(mask))
    axis.scatter(embedding[mask, 0], embedding[mask, 1], 
                 c=colors[i], label=str(i))
    #axis.scatter(embedding[:, 0], embedding[:, 1], 
    #             c=colors[i], label=str(i))
axis.set_title('UMAP by Rapids')

plt.legend()
plt.tight_layout()
plt.show()

In [0]:
print(len(iris.target))

## Stuff from Intro to DR notebook

Repo `rapidsai/notebooks-contrib` has [09_Introduction_to_Dimensionality_Reduction.ipynb](https://github.com/rapidsai/notebooks-contrib/blob/branch-0.11/getting_started_notebooks/intro_tutorials/09_Introduction_to_Dimensionality_Reduction.ipynb), which does have a viz or two.

First notice the distinct inports. In umap_demo.ipynb:
```
from cuml.manifold.umap import UMAP
```
In 09_Introduction_to_Dimensionality_Reduction.ipynb:
```
from cuml import UMAP as UMAP_GPU
```

In [0]:
import numpy as np; print('NumPy Version:', np.__version__)
import sklearn; print('Scikit-Learn Version:', sklearn.__version__)
from sklearn.datasets import load_digits


digits = load_digits()
X, y = digits['data'], digits['target']
X, y = X.astype(np.float32), y.astype(np.float32)
print('X: ', X.shape, X.dtype, 'y: ', y.shape, y.dtype)

In [0]:
# create figure
import matplotlib.pyplot as plt
figure = plt.figure()
f, axes = plt.subplots(4, 4, figsize=(10, 10))

i = 0
for row in axes:
    for axis in row:
        axis.imshow(X[i].reshape(8, 8), cmap='gray')
        axis.set_title('Class: ' + str(int(y[i])))
        i += 1
    
plt.tight_layout()
plt.show()


In [0]:
X_df = pd.DataFrame(X)
X_df.columns = ['feature_' + str(i) for i in range(X_df.shape[1])]
X_cudf = cudf.DataFrame.from_pandas(X_df)

In [0]:
from cuml import UMAP as UMAP_GPU
umap_gpu = UMAP_GPU(n_neighbors=10, n_components=2)

In [0]:
components_gpu = umap_gpu.fit_transform(X_cudf).to_pandas().values
components_gpu

In [0]:
import matplotlib.pyplot as plt

colors = ['blue', 'orange', 'green', 'red', 'purple', 
          'brown', 'pink', 'gray', 'olive', 'cyan']
colors = ['tab:' + color for color in colors]

# create figure
figure = plt.figure()
axis = figure.add_subplot(111)

for i in range(10):
    mask = y == i
    axis.scatter(components_gpu[mask, 0], components_gpu[mask, 1], 
                 c=colors[i], label=str(i))
axis.set_title('UMAP (gpu)')

plt.legend()
plt.tight_layout()
plt.show()