Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

saveAsTable function doesn't create a table after updating to the spark 3.1.1 #349

Open
kimtox opened this issue Feb 25, 2022 · 1 comment

Comments

@kimtox
Copy link

kimtox commented Feb 25, 2022

Hello,
I have a problem after updating the Spark version from 2.4.3 to 3.1.1. Previously I use the following code to save parquet files and create correspondence table and everything worked fine in tests

df.write
    .mode(SaveMode.ErrorIfExists)
    .format("parquet").option("path", location)
    .saveAsTable(tableFqn)

But after I had moved to Spark version 3.1.1 the last line stopped to create the corresponding table (in tests, at least). Command spark.table(tableFqn) returns an empty df.

Also, I got new warnings and I supposed that this is the root cause of the problem:

2022-02-24 18:37:29.736 [WARN] SparkContext - Using an existing SparkContext; some configuration may not take effect. <ScalaTest-run>
2022-02-24 18:37:33.148 [WARN] HiveConf - HiveConf of name hive.stats.jdbc.timeout does not exist <ScalaTest-run>
2022-02-24 18:37:33.148 [WARN] HiveConf - HiveConf of name hive.stats.retries.wait does not exist <ScalaTest-run>
2022-02-24 18:37:36.709 [WARN] ObjectStore - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0 <ScalaTest-run>
2022-02-24 18:37:36.709 [WARN] ObjectStore - setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore UNKNOWN@192.168.0.200 <ScalaTest-run>
2022-02-24 18:37:36.725 [WARN] ObjectStore - Failed to get database default, returning NoSuchObjectException <ScalaTest-run>
2022-02-24 18:37:37.158 [WARN] ObjectStore - Failed to get database enriched_db, returning NoSuchObjectException <ScalaTest-run>
2022-02-24 18:37:37.174 [WARN] ObjectStore - Failed to get database enriched_db, returning NoSuchObjectException <ScalaTest-run>
2022-02-24 18:37:37.213 [WARN] ObjectStore - Failed to get database global_temp, returning NoSuchObjectException <ScalaTest-run>
2022-02-24 18:37:37.217 [WARN] ObjectStore - Failed to get database enriched_db, returning NoSuchObjectException <ScalaTest-run>

Full stacktrace:

2022-02-24 18:37:29.736 [WARN] SparkContext - Using an existing SparkContext; some configuration may not take effect. <ScalaTest-run>
2022-02-24 18:37:33.148 [WARN] HiveConf - HiveConf of name hive.stats.jdbc.timeout does not exist <ScalaTest-run>
2022-02-24 18:37:33.148 [WARN] HiveConf - HiveConf of name hive.stats.retries.wait does not exist <ScalaTest-run>
2022-02-24 18:37:36.709 [WARN] ObjectStore - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0 <ScalaTest-run>
2022-02-24 18:37:36.709 [WARN] ObjectStore - setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore UNKNOWN@192.168.0.200 <ScalaTest-run>
2022-02-24 18:37:36.725 [WARN] ObjectStore - Failed to get database default, returning NoSuchObjectException <ScalaTest-run>
2022-02-24 18:37:37.158 [WARN] ObjectStore - Failed to get database enriched_db, returning NoSuchObjectException <ScalaTest-run>
2022-02-24 18:37:37.174 [WARN] ObjectStore - Failed to get database enriched_db, returning NoSuchObjectException <ScalaTest-run>
2022-02-24 18:37:37.213 [WARN] ObjectStore - Failed to get database global_temp, returning NoSuchObjectException <ScalaTest-run>
2022-02-24 18:37:37.217 [WARN] ObjectStore - Failed to get database enriched_db, returning NoSuchObjectException <ScalaTest-run>

2022-02-24 18:37:38.694 [WARN] package - Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'. <ScalaTest-run-running-KimtoxTest>
2022-02-24 18:37:43.584 [WARN] ProcfsMetricsGetter - Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped <driver-heartbeater>
2022-02-24 18:37:44.381 [WARN] SessionState - METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory. <ScalaTest-run-running-KimtoxTest>
2022-02-24 18:37:44.448 [WARN] HiveConf - HiveConf of name hive.internal.ss.authz.settings.applied.marker does not exist <ScalaTest-run-running-KimtoxTest>
2022-02-24 18:37:44.448 [WARN] HiveConf - HiveConf of name hive.stats.jdbc.timeout does not exist <ScalaTest-run-running-KimtoxTest>
2022-02-24 18:37:44.448 [WARN] HiveConf - HiveConf of name hive.stats.retries.wait does not exist <ScalaTest-run-running-KimtoxTest>
-chgrp: '<MY_COMPANY_NAME>\<MY_USERNAME>' does not match expected pattern for group
Usage: hadoop fs [generic options]
	[-appendToFile <localsrc> ... <dst>]
	[-cat [-ignoreCrc] <src> ...]
	[-checksum <src> ...]
	[-chgrp [-R] GROUP PATH...]
	[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
	[-chown [-R] [OWNER][:[GROUP]] PATH...]
	[-copyFromLocal [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
	[-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-count [-q] [-h] [-v] [-t [<storage type>]] [-u] [-x] <path> ...]
	[-cp [-f] [-p | -p[topax]] [-d] <src> ... <dst>]
	[-createSnapshot <snapshotDir> [<snapshotName>]]
	[-deleteSnapshot <snapshotDir> <snapshotName>]
	[-df [-h] [<path> ...]]
	[-du [-s] [-h] [-x] <path> ...]
	[-expunge]
	[-find <path> ... <expression> ...]
	[-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-getfacl [-R] <path>]
	[-getfattr [-R] {-n name | -d} [-e en] <path>]
	[-getmerge [-nl] [-skip-empty-file] <src> <localdst>]
	[-help [cmd ...]]
	[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
	[-mkdir [-p] <path> ...]
	[-moveFromLocal <localsrc> ... <dst>]
	[-moveToLocal <src> <localdst>]
	[-mv <src> ... <dst>]
	[-put [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
	[-renameSnapshot <snapshotDir> <oldName> <newName>]
	[-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...]
	[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
	[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
	[-setfattr {-n name [-v value] | -x name} <path>]
	[-setrep [-R] [-w] <rep> <path> ...]
	[-stat [format] <path> ...]
	[-tail [-f] <file>]
	[-test -[defsz] <path>]
	[-text [-ignoreCrc] <src> ...]
	[-touchz <path> ...]
	[-truncate [-w] <length> <path> ...]
	[-usage [cmd ...]]

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations.
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
command [genericOptions] [commandOptions]

Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...

2022-02-24 18:37:46.942 [WARN] ApacheUtils - NoSuchMethodException was thrown when disabling normalizeUri. This indicates you are using an old version (< 4.5.8) of Apache http client. It is recommended to use http client version >= 4.5.9 to avoid the breaking change introduced in apache client 4.5.7 and the latency in exception handling. See https://github.com/aws/aws-sdk-java/issues/1919 for more information <ScalaTest-run-running-FactActualsToEnrichedIntegrationTest>

.....

[]: Expected 5 values but got 0
java.lang.AssertionError: []: Expected 5 values but got 0

Does anybody have any ideas about this behavior? Versions:
Scala - 2.12.15
Spark - 3.1.1
spark-testing-base_2.12 - 3.1.1_1.1.1

@edmondop
Copy link
Contributor

@kimtox it looks like the Default Hive Metastore is not started correctly. Can you share a full reproducible example under the form of a gist maybe that also include your gradle/maven/sbt configuration?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants