# RAPIDS cuDF

!!! tip "Run Jupyter Notebook"
    You can run the code for this section in this [jupyter notebook link](https://github.com/ritchieng/deep-learning-wizard/blob/master/docs/machine_learning/gpu/rapids_cudf.ipynb) on Google Colab. Simply copy the notebook into your Google Drive and run with Google Colab.

## Environment Setup

### Check Version

#### Python Version

In [1]:
# Check Python Version
!python --version

Python 3.6.7


#### Ubuntu Version

In [2]:
# Check Ubuntu Version
!lsb_release -a

No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.2 LTS
Release:	18.04
Codename:	bionic


#### Check CUDA Version

In [3]:
# Check CUDA/cuDNN Version
!nvcc -V && which nvcc

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
/usr/local/cuda/bin/nvcc


#### Check GPU Version

In [4]:
# Check GPU
!nvidia-smi

Wed Apr 24 07:41:30 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   75C    P8    18W /  70W |      0MiB / 15079MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No ru

### Installation of cuDF/cuML

In [5]:
!python -m pip install cudf-cuda100==0.6.1 cuml-cuda100==0.6.1



### Installation of NVIDIA Toolkit and Numba

In [6]:
# Install CUDA toolkit
!apt install -y --no-install-recommends -q nvidia-cuda-toolkit
!pip install numba

import os
os.environ['NUMBAPRO_LIBDEVICE'] = "/usr/lib/nvidia-cuda-toolkit/libdevice"
os.environ['NUMBAPRO_NVVM'] = "/usr/lib/x86_64-linux-gnu/libnvvm.so"

Reading package lists...
Building dependency tree...
Reading state information...
nvidia-cuda-toolkit is already the newest version (9.1.85-3ubuntu1).
The following package was automatically installed and is no longer required:
  libnvidia-common-410
Use 'apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 6 not upgraded.


## Critical Imports

In [0]:
# Critical imports
import os
import numpy as np
import pandas as pd
import cudf

## DataFrame Operations

In [8]:
df = cudf.Series([1, 2, 3, 4, 5, 6])
print(df)
print(type(df))

0    1
1    2
2    3
3    4
4    5
5    6
dtype: int64
<class 'cudf.dataframe.series.Series'>


### Create a single column dataframe of floats

In [9]:
df = cudf.Series([1., 2., 3., 4., 5., 6.])
print(df)

0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
5    6.0
dtype: float64


### Create three column dataframe of dates, integers and floats

In [10]:
# Import
import datetime as dt

# Create blank cudf dataframe
df = cudf.DataFrame()

# Create 10 busindates ess from 1st January 2019 via pandas
df['dates'] = pd.date_range('1/1/2019', periods=10, freq='B')

# Integers
df['integers'] = [i for i in range(10)]

# Floats
df['floats'] = [float(i) for i in range(10)]

# Print dataframe
print(df)

                     dates  integers  floats
0 2019-01-01T00:00:00.000         0     0.0
1 2019-01-02T00:00:00.000         1     1.0
2 2019-01-03T00:00:00.000         2     2.0
3 2019-01-04T00:00:00.000         3     3.0
4 2019-01-07T00:00:00.000         4     4.0
5 2019-01-08T00:00:00.000         5     5.0
6 2019-01-09T00:00:00.000         6     6.0
7 2019-01-10T00:00:00.000         7     7.0
8 2019-01-11T00:00:00.000         8     8.0
9 2019-01-14T00:00:00.000         9     9.0


### Create a dataframe of alphabets a, b and c (strings)

In [11]:
s = cudf.Series(['a', 'b', 'c'])
print(s)

0    a
1    b
2    c
dtype: object


### Create a 2 Column Dataframe of integers and string category
- For all string columns, you must convert them to type `category` for filtering functions to work intuitively (for now)

In [12]:
# Create pandas dataframe
pandas_df = pd.DataFrame({
    'integers': [1, 2, 3, 4], 
    'strings': ['a', 'b', 'c', 'd']
})

# Convert string column to category format
pandas_df['strings'] = pandas_df['strings'].astype('category')

# Bridge from pandas to cudf
df = cudf.DataFrame.from_pandas(pandas_df)

# Print dataframe
print(df)

   integers  strings
0         1        a
1         2        b
2         3        c
3         4        d


### Printing Column Names

In [13]:
df.columns

Index(['integers', 'strings'], dtype='object')

### Filtering Integers/Floats by Column Values (Method 1
- This only works for floats and integers, not for strings

In [14]:
print(df.query('integers == 1'))

   integers  strings
0         1        a


### Filtering Strings by Column Values (Method 1)

In [15]:
print(df.query('strings == a'))

KeyError: ignored

### Filtering Strings by Column Values (Method 2)


In [16]:
# Filtering based on the string column
print(df[df.strings == 'b'])

   integers  strings
1         2        b


### Filtering Integers/Floats by Column Values (Method 2)

In [17]:
# Filtering based on the string column
print(df[df.integers == 2])

   integers  strings
1         2        b
