# Installing and Testing RAPIDS

* [RAPIDS](https://rapids.ai/start.html)

**First, create a CONDA environment**

```
conda create --name rapidsai python=3.7
conda activate rapidsai
```

**Second, install RAPIDS**

```

conda install -c rapidsai -c nvidia -c conda-forge \
    -c defaults rapids=0.15 python=3.7 cudatoolkit=10.1
```

**Third, link environment to Jupyter**

```
conda install nb_conda
python -m ipykernel install --user --name rapidsai --display-name "Python 3.7 (rapidsai)"
```


**Fourth, test**
The following code tests a basic RAPIDS environment.

In [3]:
import cudf
gdf = cudf.read_csv('https://data.heatonresearch.com/data/t81-558/iris.csv')
for column in ['sepal_l', 'sepal_w', 'petal_l', 'petal_w']:
    print(gdf[column].mean())

5.843333333333334
3.0573333333333332
3.7580000000000005
1.1993333333333331


In [2]:
gdf

Unnamed: 0,sepal_l,sepal_w,petal_l,petal_w,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica


# XGBoost on RAPIDS/DASK

Based on this article: [A New, Official Dask API for XGBoost](https://medium.com/rapids-ai/a-new-official-dask-api-for-xgboost-e8b10f3d1eb7)

To get the sample data:

```
wget https://archive.ics.uci.edu/ml/machine-learning-databases/00280/HIGGS.csv.gz
```

In [4]:
from dask.distributed import Client
from dask_cuda import LocalCUDACluster
from dask import dataframe as dd
import xgboost as xgb
import dask_cudf

def main(client):
    fname = 'HIGGS.csv'
    colnames = ["label"] + ["feature-%02d" % i for i in range(1, 29)]
    dask_df = dask_cudf.read_csv(fname, header=None, names=colnames)
    y = dask_df["label"]
    X = dask_df[dask_df.columns.difference(["label"])]
    dtrain = xgb.dask.DaskDMatrix(client, X, y)
    
    output = xgb.dask.train(client,
                            {'tree_method': 'gpu_hist'},
                            dtrain,
                            num_boost_round=100,
                            evals=[(dtrain, 'train')])
    booster = output['booster']
    history = output['history']  
    booster.save_model('xgboost-model')
    print('Training evaluation history:', history)



if __name__ == '__main__':
    with LocalCUDACluster(n_workers=1) as cluster:
        with Client(cluster) as client:
            main(client)

  self.sync(self._update_scheduler_info)


Training evaluation history: {'train': {'rmse': [0.47604, 0.462175, 0.453842, 0.447357, 0.443491, 0.439899, 0.43716, 0.435319, 0.433487, 0.432348, 0.431212, 0.430249, 0.429489, 0.428774, 0.428301, 0.427275, 0.426828, 0.426515, 0.426069, 0.425665, 0.425394, 0.425144, 0.424518, 0.424256, 0.424035, 0.423768, 0.423607, 0.423324, 0.423079, 0.42295, 0.42277, 0.422627, 0.42251, 0.422321, 0.4221, 0.421926, 0.421681, 0.421544, 0.421409, 0.421262, 0.421097, 0.420947, 0.420819, 0.420764, 0.420622, 0.420424, 0.420327, 0.42021, 0.420122, 0.420039, 0.419905, 0.419762, 0.419576, 0.419435, 0.419317, 0.419108, 0.418989, 0.418819, 0.418703, 0.418612, 0.41855, 0.418478, 0.41844, 0.418402, 0.418369, 0.418184, 0.418119, 0.417994, 0.417904, 0.417679, 0.417484, 0.417386, 0.417255, 0.41722, 0.417125, 0.417071, 0.417037, 0.416833, 0.416762, 0.416721, 0.416675, 0.416573, 0.416519, 0.416475, 0.41643, 0.416398, 0.416371, 0.416355, 0.416288, 0.416167, 0.416147, 0.41612, 0.416086, 0.416055, 0.41598, 0.415902, 0.415

In [None]:
!ls HIGGS.csv