# Random Forests Multi-node, Multi-GPU demo

The experimental cuML multi-node, multi-GPU (MNMG) implementation of random forests leverages Dask to do embarrassingly-parallel model fitting. For a random forest with `N` trees being fit by `W` workers, each worker will build `N / W` trees. During inference, predictions from all `N` trees will be combined.

The caller is responsible for partitioning the data efficiently via Dask. To build an accurate model, it's important to ensure that each worker has a representative chunk of the data. This can come by distributing the data evenly after ensuring that it is well shuffled. Or, given sufficient memory capacity, the caller can replicate the data to all workers. This approach will most closely simulate the single-GPU building approach.

**Note:** cuML 0.9 contains the first, experimental preview release of the MNMG random forest model. The API is subject to change in future releases, and some known limitations remain (listed in the documentation).

For more information on MNMG Random Forest models, see the documentation:
 * https://rapidsai.github.io/projects/cuml/en/stable/api.html#cuml.dask.ensemble.RandomForestClassifier
 * https://rapidsai.github.io/projects/cuml/en/stable/api.html#cuml.dask.ensemble.RandomForestRegressor

In [1]:
import cudf
import cupy as cp
import dask_cudf
import pytest
import rmm

import numpy as np
import pandas as pd

from cuml.dask.ensemble import RandomForestClassifier as cuRFC_mg
from cuml.dask.ensemble import RandomForestRegressor as cuRFR_mg
from cuml.dask.common import utils as dask_utils

from dask.array import from_array
from sklearn.datasets import make_regression, make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, r2_score
from dask.distributed import Client


def _prep_training_data(c, X_train, y_train, partitions_per_worker):
    workers = c.has_what().keys()
    n_partitions = partitions_per_worker * len(workers)
    X_cudf = cudf.DataFrame.from_pandas(pd.DataFrame(X_train))
    X_train_df = dask_cudf.from_cudf(X_cudf, npartitions=n_partitions)

    y_cudf = np.array(pd.DataFrame(y_train).values)
    y_cudf = y_cudf[:, 0]
    y_cudf = cudf.Series(y_cudf)
    y_train_df = \
        dask_cudf.from_cudf(y_cudf, npartitions=n_partitions)

    X_train_df, \
        y_train_df = dask_utils.persist_across_workers(c,
                                                       [X_train_df,
                                                        y_train_df],
                                                       workers=workers)
    return X_train_df, y_train_df


In [2]:
from dask_cuda import LocalCUDACluster

cluster = LocalCUDACluster(protocol="tcp", scheduler_port=8786)
c = Client(cluster)
partitions_per_worker = 1

In [None]:
%%time
X, y = make_regression(n_samples=12500000, n_features=20,
                       n_informative=10, random_state=123)

X = X.astype(np.float32)
y = y.astype(np.float32)

X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=1000)

cu_rf_params = {
    'n_estimators': 10,
    'max_depth': 16,
    'n_bins': 16,
}

workers = c.has_what().keys()
n_partitions = partitions_per_worker * len(workers)

X_cudf = cudf.DataFrame.from_pandas(pd.DataFrame(X_train))
X_train_df = \
    dask_cudf.from_cudf(X_cudf, npartitions=n_partitions)

y_cudf = np.array(pd.DataFrame(y_train).values)
y_cudf = y_cudf[:, 0]
y_cudf = cudf.Series(y_cudf)
y_train_df = \
    dask_cudf.from_cudf(y_cudf, npartitions=n_partitions)
X_cudf_test = cudf.DataFrame.from_pandas(pd.DataFrame(X_test))
X_test_df = \
    dask_cudf.from_cudf(X_cudf_test, npartitions=n_partitions)

X_train_df, y_train_df = dask_utils.persist_across_workers(
    c, [X_train_df, y_train_df], workers=workers)

cu_rf_mg = cuRFR_mg(**cu_rf_params)
cu_rf_mg.fit(X_train_df, y_train_df)


In [None]:
%%time

cu_rf_mg_predict = cu_rf_mg.predict(X_test_df).compute()
cu_rf_mg_predict = cp.asnumpy(cp.array(cu_rf_mg_predict))

acc_score = r2_score(cu_rf_mg_predict, y_test)

assert acc_score >= 0.67