Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Otuput a Rforestry tree in treelite format so that inferencing is blazing fast #33

Open
linanqiu opened this issue Sep 22, 2022 · 3 comments

Comments

@linanqiu
Copy link
Collaborator

treelite is a format for serializing trees for prediction only. It takes xgboost, lightgbm, and sklearn trees out of the box. It basically copies the internal structure of a tree, converts it into blazingly fast C code and makes predicting through the final structure extremely fast. There's also a CUDA treelite wrapper that converts any treelite model into one that works on GPUs.

One can also construct a treelite model from a custom trained model. All one needs are the following details for each tree: split points, feature at split, numerical vs categorical, and leaf nodes.

If we can get Rforestry to dump its internal structure into a JSON or something like that, I can work with that to convert it into a treelite tree. That'd give us most prediction performance perks + sklearn perks.

@JasjeetSekhon @theo-s

@linanqiu
Copy link
Collaborator Author

Yup just confirmed that this works with model matrix as well if we basically (ab)use multiclass leaf vectors and make each leaf node an indicator vector of observation indices lol.

@linanqiu
Copy link
Collaborator Author

plottree.R has enough examples for me to get started on dumping the tree structure to exactly what treelite needs. Let me play with it.

@linanqiu
Copy link
Collaborator Author

@theo-s @JasjeetSekhon I just realized that in order to fully implement doubleOOB, treelite's predict method would have to take in an additional "treesToExclude" vector (or "treesToInclude"). Otherwise it will take the average of all the trees by default. That requires changing the API significantly in treelite and possibly the tree compilation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant