You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
H2O-3 doesn't support categorical columns with higher cardinality than 10 million values by design. H2O.import function throws an appropriate exception and asks a user to import the column as string column.
the asH2OFrame function tries to mimic the behavior of h2o.import function to get the same quality of models regardless how data were imported to H2O-3 cluster.
If the column is identified as a categorical column and the cardinality is higher than 10 million, the usage of the asH2OFrame function could lead to the following exception.:
Jira Issue: SW-2449
Assignee: Marek Novotny
Reporter: Marek Novotny
State: Resolved
Fix Version: 3.32.0.1-2
Attachments: N/A
Development PRs: Available
H2O-3 doesn't support categorical columns with higher cardinality than 10 million values by design. H2O.import function throws an appropriate exception and asks a user to import the column as string column.
the
asH2OFrame
function tries to mimic the behavior ofh2o.import
function to get the same quality of models regardless how data were imported to H2O-3 cluster.If the column is identified as a categorical column and the cardinality is higher than 10 million, the usage of the
asH2OFrame
function could lead to the following exception.:{{Stacktrace: [DistributedException from ip-100-84-158-78.eu-west-1.compute.internal/100.84.158.78:54321: 'null', caused by java.lang.NegativeArraySizeException, water.MRTask.getResult(MRTask.java:494), water.MRTask.getResult(MRTask.java:502), water.MRTask.doAll(MRTask.java:409), water.MRTask.doAllNodes(MRTask.java:421), ai.h2o.sparkling.extensions.rest.api.ImportFrameHandler.finalize(ImportFrameHandler.scala:42), sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method), sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62), sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43), java.lang.reflect.Method.invoke(Method.java:498), water.api.Handler.handle(Handler.java:60), water.api.RequestServer.serve(RequestServer.java:470), water.api.RequestServer.doGeneric(RequestServer.java:301), water.api.RequestServer.doPost(RequestServer.java:227), javax.servlet.http.HttpServlet.service(HttpServlet.java:707), javax.servlet.http.HttpServlet.service(HttpServlet.java:790), ai.h2o.org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848), ai.h2o.org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584), ai.h2o.org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180), ai.h2o.org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512), ai.h2o.org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112), ai.h2o.org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141), ai.h2o.org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119), ai.h2o.org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134), water.webserver.jetty9.Jetty9ServerAdapter$LoginHandler.handle(Jetty9ServerAdapter.java:119), ai.h2o.org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119), ai.h2o.org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134), ai.h2o.org.eclipse.jetty.server.Server.handle(Server.java:534), ai.h2o.org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320), ai.h2o.org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251), ai.h2o.org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283), ai.h2o.org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108), ai.h2o.org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93), ai.h2o.org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303), ai.h2o.org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148), ai.h2o.org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136), ai.h2o.org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671), ai.h2o.org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589), java.lang.Thread.run(Thread.java:748), Caused by:java.lang.NegativeArraySizeException, water.MemoryManager.malloc(MemoryManager.java:243), water.MemoryManager.malloc1(MemoryManager.java:274), water.MemoryManager.malloc1(MemoryManager.java:272), water.parser.PackedDomains.merge(PackedDomains.java:77), ai.h2o.sparkling.extensions.internals.CollectCategoricalDomainsTask$1.compute2(CollectCategoricalDomainsTask.java:79), water.H2O$H2OCountedCompleter.compute(H2O.java:1557), jsr166y.CountedCompleter.exec(CountedCompleter.java:468), jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263), jsr166y.ForkJoinTask.doInvoke(ForkJoinTask.java:360), jsr166y.ForkJoinTask.invokeAll(ForkJoinTask.java:741), ai.h2o.sparkling.extensions.internals.CollectCategoricalDomainsTask.reduce(CollectCategoricalDomainsTask.java:84), ai.h2o.sparkling.extensions.internals.CollectCategoricalDomainsTask.reduce(CollectCategoricalDomainsTask.java:29), water.MRTask.reduce4(MRTask.java:776), water.MRTask.reduce3(MRTask.java:763), water.MRTask.postLocal0(MRTask.java:730), water.MRTask.onCompletion(MRTask.java:703), jsr166y.CountedCompleter.__tryComplete(CountedCompleter.java:425), water.RPC$2.compute2(RPC.java:622), water.H2O$H2OCountedCompleter.compute(H2O.java:1557), jsr166y.CountedCompleter.exec(CountedCompleter.java:468), jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263), jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)}}
The goal of this ticket is to assign H2O String type to such columns in such scenario.
The text was updated successfully, but these errors were encountered: