## **Setting up the Ray cluster**

**Let's first login to the OpenShift cluster and navigate to the project**

In [None]:
! oc login --token=your-token --server=your-cluster

In [None]:
! oc project default

**We will import the CodeFlare pieces from codflare-sdk**

In [None]:
from codeflare_sdk.cluster.cluster import Cluster, ClusterConfiguration

### **Request aggregated resources using CodeFlare**

**cluster-up() will create an AppWrapper CRD that will request aggregated resources and create
a Ray cluster with Ray head and two Ray worker nodes (each represented by a pod) when resources are available. If resources are not available,
it will wait in a queue and instantly deploy the Ray cluster when resources free up.**

In [None]:
# Create our cluster and submit appwrapper
cluster = Cluster(ClusterConfiguration(name='road-ray', min_worker=1, max_worker=1, min_cpus=2, max_cpus=2, min_memory=8, max_memory=8, gpu=0))

In [None]:
cluster.up()

In [None]:
cluster.is_ready()

In [None]:
cluster.status()

In [None]:
ray_cluster_uri = cluster.cluster_uri()

**Below we will go ahead and connect to this cluster so that we can run our code on it.**

In [None]:
#before proceeding make sure the cluster exists and the uri is not empty
assert ray_cluster_uri, "Ray cluster needs to be started and set before proceeding"

import ray

# reset the ray context in case there's already one. 
ray.shutdown()
# establish connection to ray cluster

#install additionall libraries that will be required for this training
runtime_env = {"pip": ["scikit-learn"]}

ray.init(address=f'{ray_cluster_uri}', runtime_env=runtime_env)

print("Ray cluster is up and running: ", ray.is_initialized())

## Load Data

In [None]:
!pip install -r requirements.txt
import joblib
import pandas as pd

df = pd.read_csv('road_roughness_data.csv')
print(df)

## Features

In [None]:
df.iloc[:,:-1]

## Target variable

In [None]:
df.iloc[:,-1:]

## Split data into Train and Test sets

In [None]:
# Import train_test_split function
from sklearn.model_selection import train_test_split

X = df.iloc[:,:-1]
#y = df.iloc[:,-1:]
y = df['road_condition']

# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.80) # 70% training and 30% test

In [None]:
# Create Ray object references
X_train_remote, X_test_remote, y_train_remote, y_test_remote = ray.put(X_train), ray.put(X_test), ray.put(y_train), ray.put(y_test)

## Fit Random Forest Classifier to Train set and Run prediction on test data

In [None]:
@ray.remote
def train_fn(X_train, y_train, X_test):
    #Import Random Forest Model
    from sklearn.ensemble import RandomForestClassifier

    #Create a Gaussian Classifier
    clf = RandomForestClassifier(n_estimators=100,verbose=1)

    #Train the model using the training sets y_pred=clf.predict(X_test)
    clf.fit(X_train,y_train)
    
    #Run prediction on test data and return the results
    y_pred = clf.predict(X_test)
    return y_pred, clf

In [None]:
y_pred, clf = ray.get(train_fn.remote(X_train_remote, y_train_remote, X_test_remote))

**Let's clean up. cluster.down() will delete the Ray cluster, free up resources and delete the AppWrapper CRD.**

## Test model accuracy

In [None]:
#Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics
# Model Accuracy, how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

## Save model

In [None]:
# save the model to disk
filename = 'road-model.joblib'
joblib.dump(clf, filename)

## Load and Test prediction from saved model

In [None]:
# load the model from disk
loaded_model = joblib.load(filename)
result = loaded_model.score(X_test, y_test)
print(result)

In [None]:
cluster.down()

In [None]:
!nvidia-smi