# Environment Sanity Check #

Click the _Runtime_ dropdown at the top of the page, then _Change Runtime Type_ and confirm the instance type is _GPU_.

Check the output of `!nvidia-smi` to make sure you've been allocated a Tesla T4, P4, or P100.

In [1]:
!nvidia-smi

Wed Nov 29 16:46:18 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   51C    P8    10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

#Setup:
This set up script:

1. Checks to make sure that the GPU is RAPIDS compatible
1. Installs the **current stable version** of RAPIDSAI's core libraries using pip, which are:
  1. cuDF
  1. cuML
  1. cuGraph
  1. xgboost

**This will complete in about 3-4 minutes**

Please use the [RAPIDS Conda Colab Template notebook](https://colab.research.google.com/drive/1TAAi_szMfWqRfHVfjGSqnGVLr_ztzUM9) if you need to install any of RAPIDS Extended libraries, such as:
- cuSpatial
- cuSignal
- cuxFilter
- cuCIM

OR
- nightly versions of any library


In [2]:
# This get the RAPIDS-Colab install files and test check your GPU.  Run this and the next cell only.
# Please read the output of this cell.  If your Colab Instance is not RAPIDS compatible, it will warn you and give you remediation steps.
!git clone https://github.com/rapidsai/rapidsai-csp-utils.git
!python rapidsai-csp-utils/colab/pip-install.py


Cloning into 'rapidsai-csp-utils'...
remote: Enumerating objects: 395, done.[K
remote: Counting objects: 100% (126/126), done.[K
remote: Compressing objects: 100% (75/75), done.[K
remote: Total 395 (delta 92), reused 53 (delta 51), pack-reused 269[K
Receiving objects: 100% (395/395), 108.50 KiB | 673.00 KiB/s, done.
Resolving deltas: 100% (194/194), done.
Collecting pynvml
  Downloading pynvml-11.5.0-py3-none-any.whl (53 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.1/53.1 kB 1.8 MB/s eta 0:00:00
Installing collected packages: pynvml
Successfully installed pynvml-11.5.0
***********************************************************************
Woo! Your instance has the right kind of GPU, a Tesla T4!
We will now install RAPIDS cuDF, cuML, and cuGraph via pip! 
Please stand by, should be quick...
***********************************************************************

Looking in indexes: https://pypi.org/simple, https://pypi.nvidia.com
Collecting cudf-cu11
  Downloading https://

# RAPIDS is now installed on Colab.  
You can copy your code into the cells below or use the below to validate your RAPIDS installation and version.  
# Enjoy!

In [3]:
import cudf
cudf.__version__

'23.10.02'

In [4]:
import cuml
cuml.__version__

'23.10.00'

In [5]:
import cugraph
cugraph.__version__

'23.10.00'

In [51]:
!pip install optuna



# Next Steps #

For an overview of how you can access and work with your own datasets in Colab, check out [this guide](https://towardsdatascience.com/3-ways-to-load-csv-files-into-colab-7c14fcbdcb92).

For more RAPIDS examples, check out our RAPIDS notebooks repos:
1. https://github.com/rapidsai/notebooks
2. https://github.com/rapidsai/notebooks-contrib

In [55]:
import cudf
from cuml.ensemble import RandomForestClassifier
from cuml.model_selection import train_test_split
from cuml.metrics import accuracy_score
import optuna

In [18]:
train_df = cudf.read_csv('/content/train.csv')

In [22]:
train_df

Unnamed: 0,PassengerId,Survived,Pclass,Name,Age,SibSp,Parch,Ticket,Fare,Cabin,Sex_female,Sex_male,Embarked_C,Embarked_Q,Embarked_S
0,1,0,3,"Braund, Mr. Owen Harris",22.000000,1,0,A/5 21171,7.2500,,0,1,0,0,1
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",38.000000,1,0,PC 17599,71.2833,C85,1,0,1,0,0
2,3,1,3,"Heikkinen, Miss. Laina",26.000000,0,0,STON/O2. 3101282,7.9250,,1,0,0,0,1
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",35.000000,1,0,113803,53.1000,C123,1,0,0,0,1
4,5,0,3,"Allen, Mr. William Henry",35.000000,0,0,373450,8.0500,,0,1,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",27.000000,0,0,211536,13.0000,,0,1,0,0,1
887,888,1,1,"Graham, Miss. Margaret Edith",19.000000,0,0,112053,30.0000,B42,1,0,0,0,1
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",29.699118,1,2,W./C. 6607,23.4500,,1,0,0,0,1
889,890,1,1,"Behr, Mr. Karl Howell",26.000000,0,0,111369,30.0000,C148,0,1,1,0,0


In [20]:
def preprocess_data(df):
    # Handle missing values
    df['Age'].fillna(df['Age'].mean(), inplace=True)
    df['Fare'].fillna(df['Fare'].mean(), inplace=True)

    # Encode categorical features
    df = cudf.get_dummies(df, columns=['Sex', 'Embarked'])

    return df

train_df = preprocess_data(train_df)





In [23]:


# Prepare features and labels
X = train_df.drop(['Survived', 'PassengerId','Ticket','Cabin','Name'], axis=1)
y = train_df['Survived']




In [25]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [26]:
rf_model = RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42)

# Train the model
rf_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = rf_model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")

  return func(**kwargs)
  ret = func(*args, **kwargs)


Accuracy: 0.7921348214149475


In [27]:
from sklearn.metrics import classification_report

print("Classification Report:")
print(classification_report(y_test.to_pandas(), y_pred.to_pandas()))

Classification Report:
              precision    recall  f1-score   support

           0       0.86      0.84      0.85       124
           1       0.65      0.69      0.67        54

    accuracy                           0.79       178
   macro avg       0.75      0.76      0.76       178
weighted avg       0.80      0.79      0.79       178



In [56]:
def objective(trial):
    # Define the search space for hyperparameters
    params = {
        'n_estimators': trial.suggest_int('n_estimators', 50, 200),
        'max_depth': trial.suggest_int('max_depth', 5, 15),
        'max_features': trial.suggest_float('max_features', 0.1, 1.0)
    }

    # Initialize the Random Forest classifier with the sampled hyperparameters
    rf_model = RandomForestClassifier(**params, random_state=42)

    # Train the model on the training set
    rf_model.fit(X_train, y_train)

    # Make predictions on the test set
    y_pred = rf_model.predict(X_test)

    # Evaluate the model using accuracy
    accuracy = accuracy_score(y_test, y_pred)

    return accuracy

In [66]:
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=500)

# Get the best hyperparameters
best_params = study.best_params
print("Best Hyperparameters:", best_params)

[I 2023-11-29 17:34:54,181] A new study created in memory with name: no-name-1be967a5-bef0-44b2-9b92-64caec3ac549
  return func(**kwargs)
  ret = func(*args, **kwargs)
[I 2023-11-29 17:34:54,502] Trial 0 finished with value: 0.7808988690376282 and parameters: {'n_estimators': 52, 'max_depth': 14, 'max_features': 0.7100994943014544}. Best is trial 0 with value: 0.7808988690376282.
[I 2023-11-29 17:34:54,941] Trial 1 finished with value: 0.7977527976036072 and parameters: {'n_estimators': 139, 'max_depth': 11, 'max_features': 0.8878494151961197}. Best is trial 1 with value: 0.7977527976036072.
[I 2023-11-29 17:34:55,372] Trial 2 finished with value: 0.8089887499809265 and parameters: {'n_estimators': 179, 'max_depth': 9, 'max_features': 0.6300009621706701}. Best is trial 2 with value: 0.8089887499809265.
[I 2023-11-29 17:34:55,564] Trial 3 finished with value: 0.7865168452262878 and parameters: {'n_estimators': 86, 'max_depth': 6, 'max_features': 0.9118929306437669}. Best is trial 2 with

Best Hyperparameters: {'n_estimators': 168, 'max_depth': 9, 'max_features': 0.7205853176767454}


In [67]:
best_rf_model = RandomForestClassifier(**best_params, random_state=42)
best_rf_model.fit(X_train, y_train)

# Make predictions on the entire dataset
y_pred_best = best_rf_model.predict(X_test)


In [68]:
print("Classification Report:")
print(classification_report(y_test.to_pandas(), y_pred_best.to_pandas()))

Classification Report:
              precision    recall  f1-score   support

           0       0.88      0.87      0.87       124
           1       0.71      0.72      0.72        54

    accuracy                           0.83       178
   macro avg       0.79      0.80      0.80       178
weighted avg       0.83      0.83      0.83       178

