Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asH2OFrame Method Could Fail on a String Column Having More Than 10 Million Distinct Values #3106

Closed
exalate-issue-sync bot opened this issue May 22, 2023 · 2 comments
Assignees

Comments

@exalate-issue-sync
Copy link

H2O-3 doesn't support categorical columns with higher cardinality than 10 million values by design. H2O.import function throws an appropriate exception and asks a user to import the column as string column.

the asH2OFrame function tries to mimic the behavior of h2o.import function to get the same quality of models regardless how data were imported to H2O-3 cluster.

If the column is identified as a categorical column and the cardinality is higher than 10 million, the usage of the asH2OFrame function could lead to the following exception.:

{{Stacktrace: [DistributedException from ip-100-84-158-78.eu-west-1.compute.internal/100.84.158.78:54321: 'null', caused by java.lang.NegativeArraySizeException, water.MRTask.getResult(MRTask.java:494), water.MRTask.getResult(MRTask.java:502), water.MRTask.doAll(MRTask.java:409), water.MRTask.doAllNodes(MRTask.java:421), ai.h2o.sparkling.extensions.rest.api.ImportFrameHandler.finalize(ImportFrameHandler.scala:42), sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method), sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62), sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43), java.lang.reflect.Method.invoke(Method.java:498), water.api.Handler.handle(Handler.java:60), water.api.RequestServer.serve(RequestServer.java:470), water.api.RequestServer.doGeneric(RequestServer.java:301), water.api.RequestServer.doPost(RequestServer.java:227), javax.servlet.http.HttpServlet.service(HttpServlet.java:707), javax.servlet.http.HttpServlet.service(HttpServlet.java:790), ai.h2o.org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848), ai.h2o.org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584), ai.h2o.org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180), ai.h2o.org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512), ai.h2o.org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112), ai.h2o.org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141), ai.h2o.org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119), ai.h2o.org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134), water.webserver.jetty9.Jetty9ServerAdapter$LoginHandler.handle(Jetty9ServerAdapter.java:119), ai.h2o.org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119), ai.h2o.org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134), ai.h2o.org.eclipse.jetty.server.Server.handle(Server.java:534), ai.h2o.org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320), ai.h2o.org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251), ai.h2o.org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283), ai.h2o.org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108), ai.h2o.org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93), ai.h2o.org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303), ai.h2o.org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148), ai.h2o.org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136), ai.h2o.org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671), ai.h2o.org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589), java.lang.Thread.run(Thread.java:748), Caused by:java.lang.NegativeArraySizeException, water.MemoryManager.malloc(MemoryManager.java:243), water.MemoryManager.malloc1(MemoryManager.java:274), water.MemoryManager.malloc1(MemoryManager.java:272), water.parser.PackedDomains.merge(PackedDomains.java:77), ai.h2o.sparkling.extensions.internals.CollectCategoricalDomainsTask$1.compute2(CollectCategoricalDomainsTask.java:79), water.H2O$H2OCountedCompleter.compute(H2O.java:1557), jsr166y.CountedCompleter.exec(CountedCompleter.java:468), jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263), jsr166y.ForkJoinTask.doInvoke(ForkJoinTask.java:360), jsr166y.ForkJoinTask.invokeAll(ForkJoinTask.java:741), ai.h2o.sparkling.extensions.internals.CollectCategoricalDomainsTask.reduce(CollectCategoricalDomainsTask.java:84), ai.h2o.sparkling.extensions.internals.CollectCategoricalDomainsTask.reduce(CollectCategoricalDomainsTask.java:29), water.MRTask.reduce4(MRTask.java:776), water.MRTask.reduce3(MRTask.java:763), water.MRTask.postLocal0(MRTask.java:730), water.MRTask.onCompletion(MRTask.java:703), jsr166y.CountedCompleter.__tryComplete(CountedCompleter.java:425), water.RPC$2.compute2(RPC.java:622), water.H2O$H2OCountedCompleter.compute(H2O.java:1557), jsr166y.CountedCompleter.exec(CountedCompleter.java:468), jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263), jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)}}

The goal of this ticket is to assign H2O String type to such columns in such scenario.

@DinukaH2O
Copy link

JIRA Issue Migration Info

Jira Issue: SW-2449
Assignee: Marek Novotny
Reporter: Marek Novotny
State: Resolved
Fix Version: 3.32.0.1-2
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#2356
#2358
#2341

@hasithjp
Copy link
Member

JIRA Issue Migration Info Cont'd

Jira Issue Created Date: 2020-09-24T12:30:10.386-0700

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants