In this notebook we'll do dimensionality reduction and visualization of the FFT features that were first used in this competition in [this Giba's notebook](https://www.kaggle.com/titericz/0-309-baseline-logisticregression-using-fft). I've created a stand-alone notebook that extracts those features, and it can be found [here](https://www.kaggle.com/tunguz/giba-s-fft-features-only).

We will make this visualization notebook with the Rapids library. [Rapids](https://rapids.ai) is an open-source GPU accelerated Data Sceince and Machine Learning library, developed and mainatained by [Nvidia](https://www.nvidia.com). It is designed to be compatible with many existing CPU tools, such as Pandas, scikit-learn, numpy, etc. It enables **massive** acceleration of many data-science and machine learning tasks, oftentimes by a factor fo 100X, or even more. 

Rapids is still undergoing developemnt, and only recently has it become possible to use RAPIDS natively in the Kaggle Docker environment. If you are interested in installing and riunning Rapids locally on your own machine, then you should [refer to the followong instructions](https://rapids.ai/start.html).

In [None]:
import cupy as cp
import cudf, cuml
import pandas as pd
import numpy as np
from cuml.manifold import TSNE, UMAP
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
train = cp.load('../input/giba-s-fft-features-only/TRAIN.npy')
test = cp.load("../input/giba-s-fft-features-only/TEST.npy")
TRAIN_TAB = cudf.read_csv("../input/giba-s-fft-features-only/TRAIN_TAB.csv")

In [None]:
TRAIN_TAB.head()

In [None]:
TARGET_VALUES = TRAIN_TAB.iloc[:,2:].values
TARGET_VALUES = cp.asnumpy(TARGET_VALUES)

In [None]:
TARGET_VALUES[:,0]

First, we are going to combine train and test to try to visualize the overall shape of reduced data.

In [None]:
train_test = cp.vstack([train, test])

In [None]:
%%time
tsne = TSNE(n_components=2)
train_test_2D = tsne.fit_transform(train_test)

Well, that only took a few seconds!

In order to visualize the new reduced dataset, we'll need to convert it into a numpy array, as matplotlibe does not work on GPUs. 

In [None]:
train_test_2D = cp.asnumpy(train_test_2D)

Now let's take a look at the data

In [None]:
plt.scatter(train_test_2D[:,0], train_test_2D[:,1], s = 0.5)


There are some hints of the structure, but nothing too dramatic. Maybe this is not surprizing; after all, the deataset is supposed to represent 24 different sound categories. 

Now let's look at what the dataset looks with UMAP dimensionality reduction.

In [None]:
%%time
umap = UMAP(n_components=2)
train_test_2D = umap.fit_transform(train_test)

That was even faster! 

Let's see what this dimensionality reduction looks like.

In [None]:
train_test_2D = cp.asnumpy(train_test_2D)

In [None]:
plt.scatter(train_test_2D[:,0], train_test_2D[:,1], s = 0.5)


There seems to be more of a structure with this reduction, but still nothign dramatic, at least not at this scale. UMAP usually produces more outlyers, which tend to shrink make the bulk of the datapoints into small fraction of the visual representation.

We'll now make dimensionality reductions for the train set only, and take a look how the distributions look with respect to the target.

In [None]:
%%time
tsne = TSNE(n_components=2)
umap = UMAP(n_components=2)
train_2D_tsne = tsne.fit_transform(train)
train_2D_umap = umap.fit_transform(train)

In [None]:
train_2D_tsne = cp.asnumpy(train_2D_tsne)
train_2D_umap = cp.asnumpy(train_2D_umap)

In [None]:
y = TARGET_VALUES[:,0] #cp.asnumpy(TRAIN_TAB['s0'].values)

In [None]:
plt.scatter(train_2D_tsne[:,0], train_2D_tsne[:,1], c = y, s = 0.5)


It's very hard to see, but all of the yellow dots seem to be concentrated in the upper middle area.

In [None]:
plt.scatter(train_2D_umap[:,0], train_2D_umap[:,1], c = y, s = 0.5)


In [None]:
fig, axs = plt.subplots(2, 2)
for i in range(2):
    for j in range(2):
        axs[i,j].scatter(train_2D_tsne[:,0], train_2D_tsne[:,1], c = TARGET_VALUES[:,i*2+j], s = 1.0)


In [None]:
fig, axs = plt.subplots(2, 2)
for i in range(2):
    for j in range(2):
        axs[i,j].scatter(train_2D_umap[:,0], train_2D_umap[:,1], c = TARGET_VALUES[:,i*2+j], s = 1.0)