Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Staged prediction/incremental fitting support #304

Open
pkhokhlov opened this issue Nov 29, 2021 · 4 comments
Open

Staged prediction/incremental fitting support #304

pkhokhlov opened this issue Nov 29, 2021 · 4 comments
Labels
enhancement New feature or request

Comments

@pkhokhlov
Copy link

Hi @interpret-ml

  1. Do you plan on adding staged prediction similar to XGBoost's ntree_limit/iteration_range?
  2. Do you plan on adding warm starts similar to scikit-learn's GBM warm_start?

Thanks for your great work on this library.

@interpret-ml
Copy link
Collaborator

Hi @pkhokhlov --

We've talked internally about ways to expose more customization in the boosting stages, mostly in order to give the caller better ways to control pair selection. Allowing for something that looks like scikit-learn's warm_start is one option for handling that. We don't have immediate plans to work on this given other priorities, although we do view it as important in the medium term.

Staged prediction isn't something that we've considered. Our model class does a lot more boosting than XGBoost. Reaching into the millions of boosting steps is a typical scenario. The extra storage required to preserve the information for an iteration_range feature would therefore be considerable. We have considered using algorithms internally that would require preserving a window of the last N boosting steps, but the discussion there has been to throw away that information after completing the model.

Can you give us some details on how you'd like to use these features?

-InterpretML team

@pkhokhlov
Copy link
Author

@interpret-ml Rather than training K separate models to K, 2K, ..., N * K iterations, I'd like to train a single model to N * K iterations and perform predictions on validation set using the first K, 2*K, ... N * K iterations to see model performance as function of iterations. It is very helpful for understanding convergence and overfitting.

Warm start would achieve something similar by allowing training the model "manually" to K iterations and saving that model, then continuing training from K to 2K and saving that, etc. If millions of boosting iterations are typical, then the warm start approach is definitely more viable.

Let me know if anything is unclear.

@bverhoeff
Copy link

bverhoeff commented Dec 15, 2021

If I understand it correctly, a warm_start functionality would enable federated learning in combination with the merge function.

This would be a very interesting use case. Or is there another way continue training of an EBM?

A well-performing explainable algorithm such as EBM that enables federated learning will solve some major problems of AI development.

@paulbkoch
Copy link
Collaborator

There's a longer discussion regarding an API for staged prediction here #403. This issue is slightly different in that it's about continuing the boosting of an existing model without changing any of the parameters, but you could imagine using the other API to handle warm starts as a special case by passing in the same parameters on each stage without modification.

One issue that I see with warm starts in general is that we internally choose which pairs to boost on after fitting the mains, and the warm start methodology doesn't fit neatly into that scenario. If you wanted to boost just 5 rounds for instance, would we pick the pairs after those 5 rounds? If we did, then the pairs selected would probably not be very good. Then we'd boost on those pairs for 5 rounds I suppose. If someone wanted to warm start this model later, presumably we'd keep our already selected pairs. I could still see warm starts as a useful feature if the model consists only of mains, or explicitly specified pairs.

@bverhoeff -- The merge_ebms function is now fully supported and available in the latest v0.3.0 release. You should be able to do federated learning with it now without needing the warm start functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

4 participants