-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xgboost Mojo does not work in spark #8505
Comments
José Luis Cañadas commented: Using the same code but changing modelPath by {{val modelPath = "prostate_gbm.zip"}} all works fine and I get my sparkdataframe with probabilities. {code:scala}scala> dfScore.show() |
José Luis Cañadas commented: I think that it is not work because the class {{class: hex.genmodel.algos.xgboost.XGBoostJavaMojoModel$OneHotEncoderFactory}} is not serializable, and map needs serializable |
Michal Kurka commented: [~accountid:5c0ed8a7dda509509754f35c] thanks for the report, will be fixed! |
José Luis Cañadas commented: I have a similar problem with predict contributions when I set {{setEnableContributions(true)}} in the same code. {code:scala}Caused by: java.io.NotSerializableException: hex.genmodel.algos.tree.TreeSHAP$PathPointer |
Michal Kurka commented: [~accountid:5c0ed8a7dda509509754f35c] will be fixed in [https://0xdata.atlassian.net/browse/PUBDEV-7308|https://0xdata.atlassian.net/browse/PUBDEV-7308|smart-link] |
JIRA Issue Migration Info Jira Issue: PUBDEV-7133 Linked PRs from JIRA |
I try using a xgboost mojo inside spark using genmodel.jar .
I've launched spark with
{code:java}~/spark/spark-2.3.2-bin-hadoop2.7/bin/spark-shell --conf spark.driver.memory="3g" --conf spark.executor.memory="2g" --conf spark.executor.instances=2 --conf spark.executor.cores=2 --jars genmodel_xgboost.jar{code}
If I want a gbm mojo the next code works fine, but with a xgboost fails
{code:scala}import root.hex.genmodel.GenModel
import root.hex.genmodel.easy.{EasyPredictModelWrapper, RowData}
import root.hex.genmodel.easy.prediction
import root.hex.genmodel.MojoModel
import root.hex.genmodel.easy.RowData
import root.hex.genmodel.algos.xgboost
import scala.collection.JavaConversions._
import spark.implicits._
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType, DoubleType}
// load xgboost model
val modelPath = "mod_xgb_prostate.zip"
// If mojo model is a gbm, then all works .
//val modelPath = "prostate_gbm.zip"
// Import data
val dataPath = "prostate_test.csv"
val tabla_origin = spark.read.option("header", "true").
csv(dataPath)
val test = tabla_origin.select(tabla_origin.columns.map(c => col(c).cast(StringType)) : _*)
// Import MOJOmodel
val mojo = MojoModel.load(modelPath)
val easyModel = new EasyPredictModelWrapper(
new EasyPredictModelWrapper.Config().
setModel(mojo).
setConvertUnknownCategoricalLevelsToNa(true).
setConvertInvalidNumbersToNa(true))
val header = test.columns
val dfScore = test.map {
x =>
val r = new RowData
header.indices.foreach(idx => r.put(header(idx), x.getAsStringType ))
//r.foreach(println)
val score = easyModel.predictBinomial(r).classProbabilities
(x.getAsString, score(1))
}.toDF("ID","predict")
dfScore.show(){code}
The error is
{code:java}Caused by: java.io.NotSerializableException: hex.genmodel.algos.xgboost.XGBoostJavaMojoModel$OneHotEncoderFactory
Serialization stack:
- object not serializable (class: hex.genmodel.algos.xgboost.XGBoostJavaMojoModel$OneHotEncoderFactory, value: hex.genmodel.algos.xgboost.XGBoostJavaMojoModel$OneHotEncoderFactory@25777187)
- field (class: hex.genmodel.algos.xgboost.XGBoostJavaMojoModel, name: _1hotFactory, type: class hex.genmodel.algos.xgboost.XGBoostJavaMojoModel$OneHotEncoderFactory){code}
This code works fine if mojo contains gbm model instead xgboost model.
On the other hand, creating only one RowData works fine.
{code:scala}// Creating the row
val r = new RowData
r.put("AGE", "68")
r.put("RACE", "2")
r.put("DCAPS", "2")
r.put("VOL", "0")
r.put("GLEASON", "6")
// Performing the Prediction
val prediction = easyModel.predictBinomial(r).classProbabilities
scala> val prediction = easyModel.predictBinomial(r).classProbabilities
prediction: Array[Double] = Array(0.8520880937576294, 0.14791187644004822){code}
The text was updated successfully, but these errors were encountered: