Xgboost 4j and Predictor implementations in separate sbt subprojects #1

lucagiovagnoli · 2020-05-08T00:36:58Z

This is a diff over combust#645

Main observation

I think the original solution at combust#645 is less disrupting to most people who don’t want to know about Predictor. Everyone will by default keep using xgboost4j. The solution in this PR brings a non-deterministic deserialization behaviour if users import both MLeap subprojects xgboost-runtime and xgboost-predictor.

Why? The “xgboost.classifier” key in the BundleRegistry contains either one of the Ops. I think this depends on how maven merges the reference.conf files via the “AppendTransformer”. This might be non-deterministic, similarly to how maven handles dependency conflicts? Secondly it also randomly depends on the scala hash key when filling up the BundleRegistry hash table.

Other observations

The mleap-xgboost-predictor needs a sbt dependency on mleap-xgboost-runtime because it's using the testing helpers code from there. I am not sure sharing the testing helpers justified creating a third common project.
I can't deserialize using both projects without tweaking the Bundle Registry. I had to remove the test "A deserialized XGBoost4j has the same results of a deserialized Predictor".
Since I cannot serialize using the Predictor (store() is not implemented) I added static model files in resources that were serialized via the runtime project.

…predictor for higher performance

…ects

…s and was broken - New FVec Tensor factories - also fix a copy-pasted test

…or implementations in separate sbt subprojects

Sync with combust master

lucagiovagnoli added 7 commits February 24, 2020 09:58

Draft for a new MLeap OP to deserialize xgboost models using xgboost-…

66cc2e7

…predictor for higher performance

XGBoostPredictor working. Needs cleanup and tests

600682a

Added a new optional Op for unloading xgboost models as Predictor obj…

8474e19

…ects

Delete FVecTensorImpl since implementing external classes is dangerou…

7d9143e

…s and was broken - New FVec Tensor factories - also fix a copy-pasted test

Merge branch 'master' into luca-xgboost-predictor-performant-op

b872d21

Adding documentation and RELEASE NOTES

502cbef

Trying out Anca's suggestion of separating the Xgboost 4j and Predict…

f95bf85

…or implementations in separate sbt subprojects

lucagiovagnoli force-pushed the luca-xgboost-predictor-performant-op branch from 502cbef to a5c6de1 Compare May 10, 2020 21:17

lucagiovagnoli pushed a commit that referenced this pull request Jul 22, 2020

Merge pull request #1 from combust/master

63a90cc

Sync with combust master

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xgboost 4j and Predictor implementations in separate sbt subprojects #1

Xgboost 4j and Predictor implementations in separate sbt subprojects #1

lucagiovagnoli commented May 8, 2020

Xgboost 4j and Predictor implementations in separate sbt subprojects #1

Are you sure you want to change the base?

Xgboost 4j and Predictor implementations in separate sbt subprojects #1

Conversation

lucagiovagnoli commented May 8, 2020