# Use RAPIDS CuDF and CuML
This notebook gives an example of creating a regression model to predict the price of train tickets using CuDF dataframe and CuML. Hence instead of utilizing CPUs our computations will run on GPUs.

Import cuDF and read the file from public storage.

In [None]:
import cudf

spain = cudf.read_csv(
    "https://saturn-public-data.s3.us-east-2.amazonaws.com/examples/dashboard/thegurus_opendata_renfe_trips_filtered.csv"
)

### Feature Engineering
Now we will do some feature engineering. First we filter the dataframe for rows which do not have missing price value and used `astype` function to change the data types from string to datetime. To set the dataframe index from column insert_date, we have used method `set_index` and created new columns. Then we select the predictors and target variable. Notice so far the syntax is same as we used with pandas dataframe. There is slight syntax variation for converting categorical variables of X dataframe to binary vector form, here we are using function `one_hot_encoding`.

In [None]:
# Extract only the rows where column price is not null ( Objective 1)
spain = spain[spain["price"].notnull()]

# Convert dataType for columns ‘departure’, ‘arrival' and   	'insert_date' ( Objective 2)
spain["departure"] = spain["departure"].astype("datetime64[ns]")
spain["arrival"] = spain["arrival"].astype("datetime64[ns]")
spain["insert_date"] = spain["insert_date"].astype("datetime64[ns]")

# Set column insert_date as index . Create new columns  ‘Year’,     	‘Hour’, ‘Month’ from index. ( Objective 3)
spain.set_index("insert_date", inplace=True)

# Create new columns ‘Year’,‘Hour’, ‘Month’ from index.
spain["Year"] = spain.index.year
spain["Hour"] = spain.index.hour
spain["Month"] = spain.index.month

# Splitting the data to dependent and independent variables
X = spain[["origin", "destination", "duration", "Hour", "Month", "Year"]]
y = spain["price"]
# Perform one hot coding over ‘origin’ and ‘destination’ ( Objective 4)
X = X.one_hot_encoding(
    "origin", prefix="origin_", cats=["BARCELONA", "SEVILLA", "VALENCIA", "MADRID", "PONFERRADA"]
)
X = X.one_hot_encoding(
    "destination",
    prefix="destination_",
    cats=["BARCELONA", "SEVILLA", "VALENCIA", "MADRID", "PONFERRADA"],
)
X = X.drop(["destination", "origin"], axis=1)

### Model Creation with Linear Regression
To perform linear regression we first import necessary libraries, `train_test_split` and `LinearRegression` from cuml. Then we call the model, fitting the training set and predicting the test set.


In [None]:
from cuml import train_test_split
from cuml import LinearRegression

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
lr = LinearRegression(fit_intercept=True, normalize=False)
lr.fit(x_train, y_train)
preds = lr.predict(x_test)

### Model Creation with ElasticNet Regression
Now let us regularize the linear regression model by constraining weights. I choose ElasticNet, which is the combination of L1 and L2. Notice I made no changes to code relative to what we do in sklearn, except for the imported libraries.

In [None]:
from cuml.linear_model import ElasticNet

regr = ElasticNet(alpha=0.1, l1_ratio=0.5)
result_enet = regr.fit(x_train, y_train)
preds = result_enet.predict(x_test)