java.lang.ArrayIndexOutOfBoundsException: Index 1684 out of bounds for length 1684 when using deeplearning in ensemble #7686

exalate-issue-sync · 2023-05-11T17:28:35Z

raise EnvironmentError("Job with key {} failed with an exception: {}\nstacktrace: "
OSError: Job with key $03017f00000132d4ffffffff$_90f979146e9d13e0fa230dc8b964786 failed with an exception: DistributedException from /127.0.0.1:54321: 'Index 1684 out of bounds for length 1684', caused by java.lang.ArrayIndexOutOfBoundsException: Index 1684 out of bounds for length 1684
stacktrace:
DistributedException from /127.0.0.1:54321: 'Index 1684 out of bounds for length 1684', caused by java.lang.ArrayIndexOutOfBoundsException: Index 1684 out of bounds for length 1684
at water.MRTask.getResult(MRTask.java:494)
at water.MRTask.getResult(MRTask.java:502)
at water.MRTask.doAll(MRTask.java:397)
at water.MRTask.doAll(MRTask.java:403)
at hex.Model.predictScoreImpl(Model.java:1784)
at hex.Model.score(Model.java:1618)
at water.api.ModelMetricsHandler$1.compute2(ModelMetricsHandler.java:403)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1575)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1684 out of bounds for length 1684
at hex.genmodel.GenModel.setCats(GenModel.java:707)
at hex.genmodel.GenModel.setInput(GenModel.java:686)
at hex.genmodel.algos.deeplearning.DeeplearningMojoModel.score0(DeeplearningMojoModel.java:70)
at hex.genmodel.algos.deeplearning.DeeplearningMojoModel.score0(DeeplearningMojoModel.java:158)
at hex.genmodel.algos.ensemble.StackedEnsembleMojoModel.score0(StackedEnsembleMojoModel.java:39)
at hex.generic.GenericModel.score0(GenericModel.java:93)
at hex.Model.score0(Model.java:1992)
at hex.Model.score0(Model.java:1959)
at hex.Model$BigScore.score0(Model.java:1903)
at hex.Model$BigScore.map(Model.java:1881)
at water.MRTask.compute2(MRTask.java:675)
at water.H2O$H2OCountedCompleter.compute1(H2O.java:1578)
at hex.Model$BigScore$Icer.compute1(Model$BigScore$Icer.java)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1574)
... 5 more

exalate-issue-sync · 2023-05-11T17:28:37Z

Hassan Hawilo commented: I have tried older version of H2O the error now changed to

java.lang.IllegalArgumentException: Unsupported MOJO model hex.genmodel.algos.deeplearning.DeeplearningMojoModel.
OSError: Job with key $03017f00000132d4ffffffff$_b756f6aab3e7b7d12d531ff7aec345c8 failed with an exception: java.lang.IllegalArgumentException: Unsupported MOJO model hex.genmodel.algos.deeplearning.DeeplearningMojoModel.
stacktrace:
java.lang.IllegalArgumentException: Unsupported MOJO model hex.genmodel.algos.deeplearning.DeeplearningMojoModel.
at hex.generic.Generic$MojoDelegatingModelDriver.computeImpl(Generic.java:91)
at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:222)
at hex.generic.Generic$MojoDelegatingModelDriver.compute2(Generic.java:70)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1443)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

exalate-issue-sync · 2023-05-11T17:28:39Z

Hassan Hawilo commented: the older version is {{3.26.0.11}} using python 3.6

exalate-issue-sync · 2023-05-11T17:28:41Z

Hassan Hawilo commented: will try a different method of saving the model now

exalate-issue-sync · 2023-05-11T17:28:42Z

Tomas Fryda commented: [~accountid:5c108033b5881d1b2e510659] Thank you for reporting this issue. Unfortunately, I couldn’t reproduce it yet.

Could you provide more information? I prepared couple questions that should help me pinpoint the problem, unless you have a reproducible example that you could share with us which would make things much easier.

Did you train the StackedEnsemble using AutoML? If so, could you provide parameters you used for the H2OAutoML?

Could you try saving just the DeepLearning base model and then load it and predict using it? (To find out if the issue is just with the DeepLearning or with both StackedEnsemble and DeepLearning)

If your DeepLearning models have their default names (starting with DeepLearning) you could use the following (just assign the stacked_ensemble and dataset variables):

{code:python}import shutil
import tempfile

stacked_ensemble = se # CHANGE THIS: StackedEnsemble that you are not able to persist correctly
dataset = test # CHANGE THIS: H2OFrame on which the scoring/prediction failed

for model_id in [mid for mid in stacked_ensemble.base_models if mid.startswith("DeepLearning")]:
try:
tempdir = tempfile.mkdtemp()
mojoname = h2o.get_model(model_id).save_mojo(tempdir)
model = h2o.import_mojo(mojoname)
model.predict(dataset)
finally:
shutil.rmtree(tempdir){code}

For each deep learning model it will create a temporary directory, save the deep learning model there, load it, predict and finally remove the temporary directory.

Did it work correctly?

Could you also provide more information about the data you used?

I prepared a snippet to do simple summary.
It will print dataset shape, index of response column and one line for each column with ordinal integer used as identifier (instead of name), column type, number of levels in categorical variable/mean and variance for rest, number of missing values.

{code:python}dataset = test # CHANGE THIS: H2O Frame used for predicting
y = "response" # CHANGE THIS: Response column name

print("dataset.shape =", dataset.shape)
print("response = ", dataset.columns.index(y))
for i, name in enumerate(dataset.columns):
print(dict(
id=i,
type=dataset.types[name],
col_info=dataset[name].nlevels() if dataset.types[name] == "enum" else (dataset[name].mean(), dataset[name].var(na_rm=True)),
missing_values=dataset[name].isna().sum())){code}

Could you paste here the output of that summary?

Also any other relevant information that you could share would be greatly appreciated.

Thank you!

exalate-issue-sync · 2023-05-11T17:28:44Z

Hassan Hawilo commented: Can share with you the model and a prediction row that can produce the error

exalate-issue-sync · 2023-05-11T17:28:46Z

Hassan Hawilo commented: if you can send me a link or email to share the model and a prediction row csv file privately would be appreciated

exalate-issue-sync · 2023-05-11T17:28:47Z

Tomas Fryda commented: [~accountid:5c108033b5881d1b2e510659] That would be great! [tomas.fryda@h2o.ai|mailto:tomas.fryda@h2o.ai]
Thanks!

exalate-issue-sync · 2023-05-11T17:28:49Z

Hassan Hawilo commented: Done

Many Thanks!

exalate-issue-sync · 2023-05-11T17:28:51Z

Tomas Fryda commented: [~accountid:5c108033b5881d1b2e510659] Thank you for your cooperation! I found the issue hopefully the fix will be in the next release. The problem was with fold column handling, since the fold column is the last column of your dataset, I think you can workaround it by modifying the mojo (if you didn’t find any other way):

unpack the mojo
open the top-level {{model.ini}}
modify line 11 {{n_features = 1685}} => {{n_features = 1684}} and save
compress it again

This worked on iris dataset, hopefully it will work on yours too but if you will use the this workaround please make sure the predictions are the same, for example:

{code:python}import tempfile

tempdir = tempfile.mkdtemp()
predictions = se_model.predict(test)
mojo_name = se_model.save_mojo(tempdir)
print(mojo_name)

PATCH the mojo as decribed earlier

mojo_model = h2o.import_mojo(mojo_name)
mojo_predictions = mojo_model.predict(test)

(predictions == mojo_predictions).all(){code}

h2o-ops · 2023-05-14T20:48:32Z

JIRA Issue Migration Info

Jira Issue: PUBDEV-7962
Assignee: Tomas Fryda
Reporter: Hassan Hawilo
State: Resolved
Fix Version: 3.32.0.4
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#5239

exalate-issue-sync bot added AutoML java Mojo python labels May 11, 2023

h2o-ops assigned tomasfryda May 14, 2023

h2o-ops closed this as completed May 14, 2023

h2o-ops added the fixVersion/3.32.0.4 label May 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

java.lang.ArrayIndexOutOfBoundsException: Index 1684 out of bounds for length 1684 when using deeplearning in ensemble #7686

java.lang.ArrayIndexOutOfBoundsException: Index 1684 out of bounds for length 1684 when using deeplearning in ensemble #7686

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

h2o-ops commented May 14, 2023

java.lang.ArrayIndexOutOfBoundsException: Index 1684 out of bounds for length 1684 when using deeplearning in ensemble #7686

java.lang.ArrayIndexOutOfBoundsException: Index 1684 out of bounds for length 1684 when using deeplearning in ensemble #7686

Comments

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

PATCH the mojo as decribed earlier

h2o-ops commented May 14, 2023