-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to get the output leaf indices of every trees in a LightGBM/Xgboost model #233
Comments
In PMML representation, tree nodes are identified by the This is an optional attribute; if missing, the PMML engine shall assign "virtual" 1-based integer identifiers.
The results from tree-based models (decision trees, decision tree ensembles) typically implement the This marker interface, possibly in combination with the model-level The situation is a bit more complicated with tree ensemble models (XGBoost, LightGBM, GBT, Random Forest), because the prediction result is "layered", which means that the The internal structure of the |
TLDR: Use the following approach:
|
See also th following two sample projects about dealing with decision tree ensemble (RF) models: Closing this issue, as the provided guidance should be sufficient to continue on your own. Feel free to ask clarifying/follow-up questions if necessary. |
Thanks for your detailed reply and guidance. During past days I've tried to implement the 'node-id extraction' function following your approach guidance, and I've got some new problems. PS1: Currently, I just follow your project https://github.com/vruusmann/rf_feature_impact to manage to get the leaf node ids. The code blocks below are from this project without any edition except my custom data, model and PS2: All the codes below are working with jpmml
We may discuss this version problem later. I've arrived at your step 8, using one sample data to do prediction and get my LGB classification model's
And I checked the detail of this
The printed result(an example info from my LGB's 500th tree):
My pmml model file doesn't contain the Another problem is that, if the
(I'm not sure whether this problem is a bit silly as I'm a Java rookie starting Java exactly from this project...) As a reference, here is a part from my pmml model file that shows some basic info and the structure of my 500th tree segmentation:
Version infomation [Python side] Please feel free to tell me if you need more info about my program. Thanks. |
The most important change between 1.4.X and 1.5.X development branches is that 1.5.X contains many decision tree evaluator implementations, and uses the most "lightweight" implementation that does seem to do the job. The 1.4.X-compatible decision tree evaluator is It returns The newer & lightweight tree evaluator is As you already observed, it returns Most decision tree evaluation tasks are fully served by the However, you want to access extra information that is not available when using This can be achieved using the EvaluatorBuilder evaluatorBuilder = new LoadingModelEvaluatorBuilder()
// THIS!
.setExtraResultFeatures(EnumSet.of(ResultFeature.Entity_ID))
.load(new File());
Evaluator evaluator = evaluatorBuilder.build(); The resulting |
The JPMML-LightGBM library initializes the Node identifier may get "erased" during decision tree compaction as implemented by the They are required to be present initially: But they get "erased": Decision tree compaction is active by default. If you are interested in preserving LightGBM decision trees in their native layout, then you should disable it by setting the For example, if you're converting LightGBM models using the SkLearn2PMML package, then you can toggle this option using the pipeline = PMMLPipeline([
("classifier", LGBMClassifier())
])
pipeline.fit(X, y)
# THIS!
pipeline.configure(compact = False)
sklearn2pmml(pipeline, "pipeline.pmml") Exactly the same applies to XGBoost models - you need to turn off decision tree compaction, which is active by default. |
While doing prediction, I want to get the output leaf indices of every trees from my PMML LightGBM/Xgboost model. Any index format is OK, including onehot/labelencoded/tree node idx.
The pmml model is generated by Python sklearn package with sklearn2pmml or jpmml-lightgbm.
Actually, my purpose is the same as
LGBMClassifier.predict(data, pred_leaf=True)
in Python.How can I do that in Java using JPMML-Evaluator?
The text was updated successfully, but these errors were encountered: