Xgboost Mojo does not work in spark #8505

exalate-issue-sync · 2023-05-12T08:34:25Z

I try using a xgboost mojo inside spark using genmodel.jar .

I've launched spark with

{code:java}~/spark/spark-2.3.2-bin-hadoop2.7/bin/spark-shell --conf spark.driver.memory="3g" --conf spark.executor.memory="2g" --conf spark.executor.instances=2 --conf spark.executor.cores=2 --jars genmodel_xgboost.jar{code}

If I want a gbm mojo the next code works fine, but with a xgboost fails

{code:scala}import root.hex.genmodel.GenModel
import root.hex.genmodel.easy.{EasyPredictModelWrapper, RowData}
import root.hex.genmodel.easy.prediction
import root.hex.genmodel.MojoModel
import root.hex.genmodel.easy.RowData
import root.hex.genmodel.algos.xgboost
import scala.collection.JavaConversions._
import spark.implicits._
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType, DoubleType}

// load xgboost model
val modelPath = "mod_xgb_prostate.zip"

// If mojo model is a gbm, then all works .
//val modelPath = "prostate_gbm.zip"

// Import data
val dataPath = "prostate_test.csv"
val tabla_origin = spark.read.option("header", "true").
csv(dataPath)

val test = tabla_origin.select(tabla_origin.columns.map(c => col(c).cast(StringType)) : _*)
// Import MOJOmodel
val mojo = MojoModel.load(modelPath)

val easyModel = new EasyPredictModelWrapper(
new EasyPredictModelWrapper.Config().
setModel(mojo).
setConvertUnknownCategoricalLevelsToNa(true).
setConvertInvalidNumbersToNa(true))

val header = test.columns

val dfScore = test.map {
x =>
val r = new RowData
header.indices.foreach(idx => r.put(header(idx), x.getAsStringType ))
//r.foreach(println)
val score = easyModel.predictBinomial(r).classProbabilities
(x.getAsString, score(1))
}.toDF("ID","predict")

dfScore.show(){code}

The error is

{code:java}Caused by: java.io.NotSerializableException: hex.genmodel.algos.xgboost.XGBoostJavaMojoModel$OneHotEncoderFactory
Serialization stack:
- object not serializable (class: hex.genmodel.algos.xgboost.XGBoostJavaMojoModel$OneHotEncoderFactory, value: hex.genmodel.algos.xgboost.XGBoostJavaMojoModel$OneHotEncoderFactory@25777187)
- field (class: hex.genmodel.algos.xgboost.XGBoostJavaMojoModel, name: _1hotFactory, type: class hex.genmodel.algos.xgboost.XGBoostJavaMojoModel$OneHotEncoderFactory){code}

This code works fine if mojo contains gbm model instead xgboost model.

On the other hand, creating only one RowData works fine.

{code:scala}// Creating the row
val r = new RowData
r.put("AGE", "68")
r.put("RACE", "2")
r.put("DCAPS", "2")
r.put("VOL", "0")
r.put("GLEASON", "6")

// Performing the Prediction
val prediction = easyModel.predictBinomial(r).classProbabilities

scala> val prediction = easyModel.predictBinomial(r).classProbabilities
prediction: Array[Double] = Array(0.8520880937576294, 0.14791187644004822){code}

exalate-issue-sync · 2023-05-12T08:34:27Z

José Luis Cañadas commented: Using the same code but changing modelPath by {{val modelPath = "prostate_gbm.zip"}} all works fine and I get my sparkdataframe with probabilities.

{code:scala}scala> dfScore.show()
+---+--------------------+
| ID| predict|
+---+--------------------+
| 1|0.017411978603061484|
| 3| 0.15237338415016477|
| 7| 0.9673828180120637|
| 9| 0.6975588239134931|
| 11| 0.6326511408545452|
| 12| 0.6772755998675147|
| 23| 0.7249044772358193|
| 26| 0.18062940301190955|
| 27| 0.6112314130200889|
| 28| 0.02070870276868032|
| 31| 0.6664765672577156|
| 40| 0.30640687022231006|
| 42| 0.03227430892469307|
| 43| 0.06904906423296006|
| 44| 0.22146086338434523|
| 45| 0.9039854436895566|
| 48| 0.5022209926037005|
| 49|0.012421228012059684|
| 52| 0.9870381011837654|
| 53| 0.32597994849213346|
+---+--------------------+
only showing top 20 rows{code}

exalate-issue-sync · 2023-05-12T08:34:29Z

José Luis Cañadas commented: I think that it is not work because the class {{class: hex.genmodel.algos.xgboost.XGBoostJavaMojoModel$OneHotEncoderFactory}} is not serializable, and map needs serializable

exalate-issue-sync · 2023-05-12T08:34:30Z

Michal Kurka commented: [~accountid:5c0ed8a7dda509509754f35c] thanks for the report, will be fixed!

exalate-issue-sync · 2023-05-12T08:34:32Z

José Luis Cañadas commented: I have a similar problem with predict contributions when I set {{setEnableContributions(true)}} in the same code.

{code:scala}Caused by: java.io.NotSerializableException: hex.genmodel.algos.tree.TreeSHAP$PathPointer
Serialization stack:
- object not serializable (class: hex.genmodel.algos.tree.TreeSHAP$PathPointer, value: hex.genmodel.algos.tree.TreeSHAP$PathPointer@2adc8e41)
- field (class: hex.genmodel.algos.xgboost.XGBoostJavaMojoModel$XGBoostContributionsPredictor, name: _workspace, type: class java.lang.Object)
- object (class hex.genmodel.algos.xgboost.XGBoostJavaMojoModel$XGBoostContributionsPredictor, hex.genmodel.algos.xgboost.XGBoostJavaMojoModel$XGBoostContributionsPredictor@4d6c0824)
- field (class: hex.genmodel.easy.EasyPredictModelWrapper, name: predictContributions, type: interface hex.genmodel.PredictContributions)
- object (class hex.genmodel.easy.EasyPredictModelWrapper, hex.genmodel.easy.EasyPredictModelWrapper@3d34b11c)
{code}

exalate-issue-sync · 2023-05-12T08:34:34Z

Michal Kurka commented: [~accountid:5c0ed8a7dda509509754f35c] will be fixed in [https://0xdata.atlassian.net/browse/PUBDEV-7308|https://0xdata.atlassian.net/browse/PUBDEV-7308|smart-link]

h2o-ops · 2023-05-14T22:43:52Z

JIRA Issue Migration Info

Jira Issue: PUBDEV-7133
Assignee: Michal Kurka
Reporter: José Luis Cañadas
State: Resolved
Fix Version: 3.28.0.1
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#4145

h2o-ops closed this as completed May 14, 2023

h2o-ops added the fixVersion/3.28.0.1 label May 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xgboost Mojo does not work in spark #8505

Xgboost Mojo does not work in spark #8505

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

h2o-ops commented May 14, 2023

Xgboost Mojo does not work in spark #8505

Xgboost Mojo does not work in spark #8505

Comments

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

h2o-ops commented May 14, 2023