## 12 · Evaluate the trained model  
Pull the XGBoost `Booster` back from the checkpoint, run predictions on the entire validation set, and compute overall accuracy. Converting the Ray Dataset to pandas keeps the example short; in production you could stream batches instead of materialising the whole frame.


In [None]:
# 12. Retrieve Booster object from Ray Checkpoint
booster = RayTrainReportCallback.get_model(best_ckpt)

# Convert Ray Dataset → pandas for quick local scoring
val_pd = val_ds.to_pandas()
dmatrix = xgb.DMatrix(val_pd[feature_columns])
pred_prob = booster.predict(dmatrix)
pred_labels = np.argmax(pred_prob, axis=1)

acc = accuracy_score(val_pd.label, pred_labels)
print(f"Validation accuracy: {acc:.3f}")

### 13 · Confusion matrix visualisation  
Raw counts and row-normalised ratios highlight which cover types the model confuses most often. Diagonal dominance indicates good performance; off-diagonal hot spots may suggest a need for more data or feature engineering for those specific classes.

In [None]:
# 13. Confusion matrix

cm = confusion_matrix(val_pd.label, pred_labels)  # or sample_batch.label if used

sns.heatmap(cm, annot=True, fmt="d", cmap="viridis")
plt.title("Confusion Matrix with Counts")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.show()

cm_norm = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]
sns.heatmap(cm_norm, annot=True, fmt=".2f", cmap="viridis")
plt.title("Normalized Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.show()

### 14 · CPU batch inference with Ray remote tasks  
To demonstrate scalable inference, send a 1024-row pandas batch to a **single CPU worker**.  The remote function loads the model once per task, converts the batch to `DMatrix`, and returns class indices. Measure accuracy on the fly to confirm that out-of-process inference matches earlier results.


In [None]:
# 14. Example: Run batch inference using Ray remote task on a CPU worker

# This remote function is scheduled on a CPU-enabled Ray worker.
# It loads a trained XGBoost model from a Ray checkpoint and runs predictions on a pandas DataFrame.
@ray.remote(num_cpus=1)
def predict_batch(ckpt, batch_pd):
    # Load the trained XGBoost Booster model from the checkpoint.
    model = RayTrainReportCallback.get_model(ckpt)

    # Convert the input batch (pandas DataFrame) to DMatrix, required by XGBoost for inference.
    dmatrix = xgb.DMatrix(batch_pd[feature_columns])

    # Predict class probabilities for each row in the batch.
    preds = model.predict(dmatrix)

    # Select the class with highest predicted probability for each row.
    return np.argmax(preds, axis=1)

# Take a random sample of 1024 rows from the validation set to use as input.
sample_batch = val_pd.sample(1024, random_state=0)

# Submit the batch inference task to a Ray worker and block until it finishes.
preds = ray.get(predict_batch.remote(best_ckpt, sample_batch))

# Compute and print classification accuracy by comparing predictions to true labels.
print("Sample batch accuracy:", accuracy_score(sample_batch.label, preds))

### 15 · Feature-importance diagnostics  
XGBoost’s built-in `get_score(importance_type="gain")` ranks each feature by its average gain across all splits. Visualising the top-15 helps connect model behaviour back to domain knowledge. For example, elevation, and soil-type often dominate forest-cover prediction.

In [None]:
# 15. Gain‑based feature importance
importances = booster.get_score(importance_type="gain")
keys, gains = zip(*sorted(importances.items(), key=lambda kv: kv[1], reverse=True)[:15])

plt.barh(range(len(gains)), gains)
plt.yticks(range(len(gains)), keys)
plt.gca().invert_yaxis()
plt.title("Top-15 Feature Importances (gain)"); plt.xlabel("Average gain"); plt.show()