Exception while training K-Means model with nfold #7888

exalate-issue-sync · 2023-05-11T18:24:11Z

From Google Groups:

I am using H2O flow to train a k-means clustering model. When I use the attached data file and select all the columns for training the model, many times I get the following exception:

JOB FAILURE.

java.lang.ArrayIndexOutOfBoundsException: 4

TOGGLE STACK TRACE

java.lang.ArrayIndexOutOfBoundsException: 4

            at water.util.ArrayUtils.add(ArrayUtils.java:239)

            at hex.ModelMetricsClustering$MetricBuilderClustering.reduce(ModelMetricsClustering.java:131)

            at hex.ModelMetricsClustering$MetricBuilderClustering.reduce(ModelMetricsClustering.java:80)

            at hex.ModelBuilder.cv_mainModelScores(ModelBuilder.java:804)

            at hex.ModelBuilder.computeCrossValidation(ModelBuilder.java:518)

            at hex.ModelBuilder$1.compute2(ModelBuilder.java:364)

            at water.H2O$H2OCountedCompleter.compute(H2O.java:1563)

            at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)

            at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)

            at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)

            at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)

            at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

I have changed only 3 of the default hyper parameters while training this model. They are as below:

nfold - 5

k - 5

checked the estimate_k

Rest all the parameter are default. The same data frame was used for training and validation.

Here is the completer build model parameters:

buildModel 'kmeans', {"model_id":"kmeans-9b5a609c-acd7-4303-9de2-5f084493e75d","training_frame":"MLM_10k.hex","validation_frame":"MLM_10k.hex","nfolds":5,"ignored_columns":[],"ignore_const_cols":true,"k":5,"estimate_k":true,"max_iterations":10,"standardize":true,"init":"Furthest","fold_assignment":"AUTO","score_each_iteration":false,"seed":-1,"max_runtime_secs":0,"categorical_encoding":"AUTO","keep_cross_validation_models":true,"keep_cross_validation_predictions":false,"keep_cross_validation_fold_assignment":false,"cluster_size_constraints":[]}

I have checked the input data file and I don’t see anything wrong over there. Only thing which I found is that if I ignore PaymentAmtQtr3 and PaymentAmtQtr4 columns, the model builds successfully. But I don’t see anything wrong in these columns. Not sure what is the issue.

I am using latest h2o version 3.30.1.1

Data can be downloaded here:
https://groups.google.com/g/h2ostream/c/UYDyr6hs3zE?pli=1

The text was updated successfully, but these errors were encountered:

h2o-ops · 2023-05-14T21:20:25Z

JIRA Issue Migration Info

Jira Issue: PUBDEV-7752
Assignee: Veronika Maurerová
Reporter: Jan Sterba
State: Resolved
Fix Version: 3.36.0.1
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#5961

h2o-ops closed this as completed May 14, 2023

h2o-ops added the fixVersion/3.36.0.1 label May 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exception while training K-Means model with nfold #7888

Exception while training K-Means model with nfold #7888

exalate-issue-sync bot commented May 11, 2023

h2o-ops commented May 14, 2023

Exception while training K-Means model with nfold #7888

Exception while training K-Means model with nfold #7888

Comments

exalate-issue-sync bot commented May 11, 2023

h2o-ops commented May 14, 2023