### Rapids AI speed test comparison
The RAPIDS data science framework includes a collection of libraries for executing end-to-end data science pipelines completely in the GPU. 
It is designed to have a familiar look and feel to data scientists working in Python. Here’s a code snippet where we read in a CSV file 
and output some descriptive statistics.
(https://rapids.ai/start.html)

This test is partially adapted from the github repo: 

https://github.com/bhattbhavesh91/cuDF-RAPIDS-demo

In [42]:
import os
import pandas as pd
import numpy as np
import time
import cudf

## The size of input csv file

In [43]:
print ("Size of file is {} MB".format(os.path.getsize('/data-10tb/saruarlive/dataProject/BratsData/5m Sales Records.csv')/(1000*1000)))

Size of file is 624.001733 MB


## Looking into the Reading time by pandas libary

In [44]:
tpd = time.time()
df_pd = pd.read_csv('/data-10tb/saruarlive/dataProject/BratsData/5m Sales Records.csv')
epd = time.time()
pd_time = epd - tpd
print("Time takes to load by Pandas = {}".format(pd_time))

Time takes to load by Pandas = 5.1055519580841064


## Looking into the Reading time by cuDF 

In [45]:
tpd = time.time()
cudf_pd = cudf.read_csv('/data-10tb/saruarlive/dataProject/BratsData/5m Sales Records.csv')
epd = time.time()
cupd_time = epd - tpd
print("Time takes to load by cuDF = {}".format(cupd_time))

Time takes to load by Pandas = 0.2445385456085205


## Concatenation time comparision

In [46]:
tpd = time.time()
df_pd_concat = pd.concat([df_pd for _ in range(10)])
epd = time.time()
pd_time = epd - tpd
print("Time takes to concatenate by Pandas = {}".format(pd_time))

Time takes to concatenate by Pandas = 5.351191759109497


In [48]:
tpd = time.time()
cudf_pd_concat = cudf.concat([cudf_pd for _ in range(10)])
epd = time.time()
cupd_time = epd - tpd
print("Time takes to concatenate by cuDF = {}".format(cupd_time))

Time takes to concatenate by Pandas = 0.46720361709594727


## Mean calculation time comparision

In [52]:
tpd = time.time()
df_pd_mean = df_pd['Unit Price'].mean()
epd = time.time()
pd_time = epd - tpd
print("Time takes to calculate the mean by Pandas = {}".format(pd_time))

Time takes to calculate the mean by Pandas = 0.013599634170532227


In [53]:
tpd = time.time()
cudf_pd_mean = cudf_pd['Unit Price'].mean()
epd = time.time()
pd_time = epd - tpd
print("Time takes to calculate the mean by cuDF = {}".format(pd_time))

Time takes to calculate the mean by cuDF = 0.02292943000793457
