### In this notebook, we look at a bug in xgboost (1.2.1 or older) GPU mode. When the input data is `column major`, the performance will be significantly worse. Please change it to `row major` before training.

### Please note that this bug is resolved in the new xgboost 1.3.0 version [#6459](https://github.com/dmlc/xgboost/pull/6459)

In [None]:
import xgboost as xgb
import cuml
import cupy

from cuml.metrics import roc_auc_score
from cuml.datasets.classification import make_classification

In [None]:
def print_version(*x):
    for i in x:
        print(i, eval(f'{i}.__version__'))
        
def print_data_info(*x):
    
    for i in x:
        data = eval(i)
        major = 'row major' if data.flags.c_contiguous else 'column major'
        print(i, type(data), data.shape, major)

Let's check out the version of libraries.

In [None]:
print_version('xgb', 'cuml', 'cupy')

Create some synthetic data for binary classification.

In [None]:
%%time

X, y = make_classification ( n_classes = 2,
                             n_features = 10,
                             n_samples = 10000,
                             random_state = 0 )

print_data_info('X', 'y')

Train a simple xgboost model and return training AUC. 

In [None]:
def train_xgb(X, y):
    params = {'eta': 0.1,
              'max_depth': 3,
              'objective': 'binary:logistic',
              'eval_metric': 'auc',
              'tree_method': 'gpu_hist',
             }

    dtrain = xgb.DMatrix(data=X, label=y)
    bst = xgb.train(params, dtrain=dtrain,
                    num_boost_round=10)

    score = roc_auc_score(y, bst.predict(dtrain))
    print(f"training AUC = {score:.3f}")

In [None]:
train_xgb(X, y)

The performance is quite poor! Let's check out the layout of data and change it.

In [None]:
print('Befor change:')
print_data_info('X')

print('After change:')
X = cupy.ascontiguousarray(X)
print_data_info('X')

In [None]:
train_xgb(X, y)

Voila! I was astonished when I first found this.