How to use bst.eval_set() and bst.update() with xgboost_ray #248

Jeffwan · 2022-12-06T22:57:39Z

I am trying to adopt xgboost_ray for a xgboost project. Currently I meet a problem. The original code is doing some fine grain control on the training process. for every iteration

       eval_results = self.bst.eval_set(
            evals=[(self.dmat_train, "train"), (self.dmat_valid, "valid")], iteration=self.bst.num_boosted_rounds() - 1
        )
        self.log_info(fl_ctx, eval_results)
        auc = float(eval_results.split("\t")[2].split(":")[1])
        for i in range(self.trees_per_round):
            self.bst.update(self.dmat_train, self.bst.num_boosted_rounds())

        # extract newly added self.trees_per_round using xgboost slicing api
        bst = self.bst[self.bst.num_boosted_rounds() - self.trees_per_round : self.bst.num_boosted_rounds()]

code source: https://github.com/NVIDIA/NVFlare/blob/dev/nvflare/app_opt/xgboost/tree_based/executor.py#L153-L174

Note: I already get bst object from xgboost_ray.train()

There're two blockers, they are bst.eval_set() and bst.update() since bst is from xgboost library, it won't accept RDMatrix which throws an error here.

  File "/usr/local/lib/python3.8/site-packages/xgboost/core.py", line 1980, in eval_set
    raise TypeError(f"expected DMatrix, got {type(d[0]).__name__}")
TypeError: expected DMatrix, got RayDMatrix

I look at the documentation and can not find the replacement like predict. How can I make it?

/cc @Yard1

The text was updated successfully, but these errors were encountered:

Yard1 · 2022-12-06T23:06:25Z

It looks like you are implementing your own training loop. This goes beyond what xgboost-ray provides out of the box.

You'd most likely need to subclass the internal RayXGBoostActor (xgboost_ray/main.py) and replace the logic inside the predict method, which is ran on every worker using normal xgboost (which is configured to communicate with other workers through the rabit tracker). We do not provide an API to pass your own Actor class, so you'll have to most likely monkey-patch it.

I would be happy to look into making this process smoother by providing developer APIs.

Jeffwan · 2022-12-07T18:15:47Z

This goes beyond what xgboost-ray provides out of the box.

Thanks. I know this is beyong the scope right now. Does xgboost_ray have a plan to support it later?

We do not provide an API to pass your own Actor class, so you'll have to most likely monkey-patch it.

Seems I need to replicate some functions similar like train() or predict() but using custom RayXGBoostActor? This requires me fully understand the codes in xgboost_ray and do you think there's a easier way to support my use case?

Yard1 · 2022-12-07T18:32:12Z

I think the train() and predict() methods of RayXGBoostActor are relatively straightforward and do not require knowledge of the entire xgboost-ray codebase. I do not believe there's an easier way.

We can add some extra developer APIs to make modifying the training/prediction behavior easier.

I'd be happy to schedule a chat to talk about this, if you think that'll be helpful! Please email me at antoni [at] anyscale.com

Yard1 mentioned this issue Dec 6, 2022

Can RDMatrix convert to DMatrix? #249

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use bst.eval_set() and bst.update() with xgboost_ray #248

How to use bst.eval_set() and bst.update() with xgboost_ray #248

Jeffwan commented Dec 6, 2022 •

edited

Yard1 commented Dec 6, 2022

Jeffwan commented Dec 7, 2022

Yard1 commented Dec 7, 2022 •

edited

How to use bst.eval_set() and bst.update() with xgboost_ray #248

How to use bst.eval_set() and bst.update() with xgboost_ray #248

Comments

Jeffwan commented Dec 6, 2022 • edited

Yard1 commented Dec 6, 2022

Jeffwan commented Dec 7, 2022

Yard1 commented Dec 7, 2022 • edited

Jeffwan commented Dec 6, 2022 •

edited

Yard1 commented Dec 7, 2022 •

edited