Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xgboost 4j and Predictor implementations in separate sbt subprojects #1

Open
wants to merge 7 commits into
base: luca-xgboost-predictor-performant-op
Choose a base branch
from

Conversation

lucagiovagnoli
Copy link
Owner

This is a diff over combust#645

Main observation

I think the original solution at combust#645 is less disrupting to most people who don’t want to know about Predictor. Everyone will by default keep using xgboost4j. The solution in this PR brings a non-deterministic deserialization behaviour if users import both MLeap subprojects xgboost-runtime and xgboost-predictor.

Why? The “xgboost.classifier” key in the BundleRegistry contains either one of the Ops. I think this depends on how maven merges the reference.conf files via the “AppendTransformer”. This might be non-deterministic, similarly to how maven handles dependency conflicts? Secondly it also randomly depends on the scala hash key when filling up the BundleRegistry hash table.

Other observations

  • The mleap-xgboost-predictor needs a sbt dependency on mleap-xgboost-runtime because it's using the testing helpers code from there. I am not sure sharing the testing helpers justified creating a third common project.
  • I can't deserialize using both projects without tweaking the Bundle Registry. I had to remove the test "A deserialized XGBoost4j has the same results of a deserialized Predictor".
  • Since I cannot serialize using the Predictor (store() is not implemented) I added static model files in resources that were serialized via the runtime project.

@lucagiovagnoli lucagiovagnoli force-pushed the luca-xgboost-predictor-performant-op branch from 502cbef to a5c6de1 Compare May 10, 2020 21:17
lucagiovagnoli pushed a commit that referenced this pull request Jul 22, 2020
Sync with combust master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant