<img src="https://developer.nvidia.com/sites/default/files/pictures/2018/rapids/rapids-logo.png"/>

[Rapids](https://rapids.ai) is an open-source GPU accelerated Data Sceince and Machine Learning library, developed and mainatained by [Nvidia](https://www.nvidia.com). It is designed to be compatible with many existing CPU tools, such as Pandas, scikit-learn, numpy, etc. It enables **massive** acceleration of many data-science and machine learning tasks, oftentimes by a factor fo 100X, or even more. If you are interested in installing and running Rapids locally on your own machine, then you should [refer to the followong instructions](https://rapids.ai/start.html).

In [None]:
import cudf
import cuml
import cupy as cp
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import glob
import os
from scipy.interpolate import interp1d
import gc
from cuml.linear_model import Ridge
from cuml.neighbors import KNeighborsRegressor
from cuml.svm import SVR
from cuml.ensemble import RandomForestRegressor
from cuml.preprocessing.TargetEncoder import TargetEncoder
from sklearn.model_selection import GroupKFold, KFold
from cuml.metrics import mean_squared_error

import soundfile as sf
# Librosa Libraries
import librosa
import librosa.display
import IPython.display as ipd
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm

from sklearn.metrics import roc_auc_score, label_ranking_average_precision_score

In [None]:
train = cudf.read_csv("/kaggle/input/tabular-playground-series-feb-2021/train.csv")
test = cudf.read_csv("/kaggle/input/tabular-playground-series-feb-2021/test.csv")
sample_submission = cudf.read_csv('../input/tabular-playground-series-feb-2021/sample_submission.csv')

In [None]:
target = train['target'].values
columns = test.columns[1:]
cat_features = columns[:10]
cat_features

In [None]:
train.head()

In [None]:
test.head()

In [None]:
rr_train_oof = cp.zeros((300000,))
rr_test_preds = 0
rr_train_oof.shape

In this notebook we'll deal with categorical features using Target Encoding. For the ake of consistency, target encoding needs to be applied withing the cross-validation loop; otherwise, we'll be easily leakign targt information to the out-of-fold rows, which can lead to serious overfitting.

We'll also start with a simple Ridge regression. This is the simplest ML algo, and in general can give us a good idea of what the baseline score would be for our problem.

In [None]:
NUM_FOLDS = 10
kf = KFold(n_splits=NUM_FOLDS, shuffle=True, random_state=0)

for f, (train_ind, val_ind) in tqdm(enumerate(kf.split(train, target))):
        #print(f'Fold {f}')
        train_df, val_df = train.iloc[train_ind][columns], train.iloc[val_ind][columns]
        train_target, val_target = target[train_ind], target[val_ind]
        test_df = test.copy()
        
        for cat_col in cat_features:
            te = TargetEncoder()
            train_df[cat_col] = te.fit_transform(train_df[cat_col], train_target)
    
            val_df[cat_col] = te.transform(val_df[cat_col])
            test_df[cat_col] = te.transform(test_df[cat_col])
            
        model = Ridge(alpha=0.1)
        model.fit(train_df, train_target)
        temp_oof = model.predict(val_df)
        temp_test = model.predict(test_df[columns])

        rr_train_oof[val_ind] = temp_oof
        rr_test_preds += temp_test/NUM_FOLDS
        
        print(mean_squared_error(temp_oof, val_target, squared=False))

In [None]:
mean_squared_error(rr_train_oof, target, squared=False)

In [None]:
val_df.head()

In [None]:
cp.save('rr_train_oof', rr_train_oof)
cp.save('rr_test_preds', rr_test_preds)

Next, we'll take a look at the K Nearest Neighbors algorithm



In [None]:
knn_train_oof = cp.zeros((300000,))
knn_test_preds = 0
knn_train_oof.shape

In [None]:
NUM_FOLDS = 10
kf = KFold(n_splits=NUM_FOLDS, shuffle=True, random_state=0)

for f, (train_ind, val_ind) in tqdm(enumerate(kf.split(train, target))):
        #print(f'Fold {f}')
        train_df, val_df = train.iloc[train_ind][columns], train.iloc[val_ind][columns]
        train_target, val_target = target[train_ind], target[val_ind]
        test_df = test.copy()
        
        for cat_col in cat_features:
            te = TargetEncoder()
            train_df[cat_col] = te.fit_transform(train_df[cat_col], train_target)
    
            val_df[cat_col] = te.transform(val_df[cat_col])
            test_df[cat_col] = te.transform(test_df[cat_col])
            
        model = KNeighborsRegressor(n_neighbors=200)
        model.fit(train_df, train_target)
        temp_oof = model.predict(val_df)
        temp_test = model.predict(test_df[columns])

        knn_train_oof[val_ind] = temp_oof
        knn_test_preds += temp_test/NUM_FOLDS
        
        print(mean_squared_error(temp_oof, val_target, squared=False))

In [None]:
cp.save('knn_train_oof', knn_train_oof)
cp.save('knn_test_preds', knn_test_preds)

In [None]:
mean_squared_error(knn_train_oof, target, squared=False)

In [None]:
svr_train_oof = cp.zeros((300000,))
svr_test_preds = 0
svr_train_oof.shape

And now Support Vector Regressor

In [None]:
NUM_FOLDS = 10
kf = KFold(n_splits=NUM_FOLDS, shuffle=True, random_state=0)

for f, (train_ind, val_ind) in tqdm(enumerate(kf.split(train, target))):
        #print(f'Fold {f}')
        train_df, val_df = train.iloc[train_ind][columns], train.iloc[val_ind][columns]
        train_target, val_target = target[train_ind], target[val_ind]
        test_df = test.copy()
        
        for cat_col in cat_features:
            te = TargetEncoder()
            train_df[cat_col] = te.fit_transform(train_df[cat_col], train_target)
    
            val_df[cat_col] = te.transform(val_df[cat_col])
            test_df[cat_col] = te.transform(test_df[cat_col])
            
        model = SVR(C=1)
        model.fit(train_df, train_target)
        temp_oof = model.predict(val_df)
        temp_test = model.predict(test_df[columns])

        svr_train_oof[val_ind] = temp_oof
        svr_test_preds += temp_test/NUM_FOLDS
        
        print(mean_squared_error(temp_oof, val_target, squared=False))

In [None]:
mean_squared_error(svr_train_oof, target, squared=False)

In [None]:
cp.save('svr_train_oof', svr_train_oof)
cp.save('svr_test_preds', svr_test_preds)

In [None]:
mean_squared_error(0.8*rr_train_oof + 0.1*knn_train_oof+ 0.1*svr_train_oof, target, squared=False)

In [None]:
sample_submission['target'] = 0.8*rr_test_preds + 0.1* knn_test_preds + 0.1* svr_test_preds
sample_submission.to_csv('submission.csv', index=False)