New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sw does not support parquet import #4170
Comments
Michal Malohlava commented: the original intention was not to include h2o-parsers in SW assembly, since they are not fully featured and depends on fix versions of libraries (e.g. Avro) which can lead to unexpected behavior. |
Michal Kurka commented: This needs to be most likely fix on H2O's side. We should use version Parquet 1.8.x (the one compatible with Spark). |
Michal Kurka commented: Was able to reproduce also on 2.1.14 |
Jakub Hava commented: This was fixed in h2o |
Michal Kurka commented: Parquet import was not working because Parquet libraries are part of Spark's distribution and are loaded by another class loader. We were accessing package-private class InternalParquetRecordReader which was throwing IllegalAccessException even though the calling class was in the same package (it was however loaded by a different class loader with a different protection domain). The solution was to copy InternalParquetRecordReader into H2O's code base and adapt it for our purposes. This class only uses public developer-facing API. We also adopted Parquet 1.8.1. This was not tested on Spark 2.2.x. |
JIRA Issue Migration Info Jira Issue: SW-542 Linked PRs from JIRA |
JIRA Issue Migration Info Cont'd Jira Issue Created Date: 2017-09-26T15:27:59.366-0700 |
parquet parser is not registered-
{code:java}
09-26 20:21:13.862 127.0.0.1:54321 4219 #r thread INFO: Registered parsers: [GUESS, ARFF, XLS, SVMLight, CSV]
{code}
Steps to reproduce:
{code}
library(sparklyr)
library(h2o)
options(rsparkling.sparklingwater.version = "2.1.14")
library(rsparkling)
Sys.setenv(SPARK_HOME="~/spark/spark-2.1.0-bin-hadoop2.7")
config <- spark_config()
config$
sparklyr.shell.driver-memory
<- '7G'config$
sparklyr.shell.executor-memory
<- '7G'sc <- spark_connect(master='local', version='2.1.0', config=config)
h2o_context(sc)
h2o.clusterInfo()
h2o.importFile("/Users/nidhimehta/full/",destination_frame = "full")
{code}
The text was updated successfully, but these errors were encountered: