Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

va = {} creates an unwritable state #1260

Closed
mpinese opened this issue Jan 16, 2017 · 0 comments
Closed

va = {} creates an unwritable state #1260

mpinese opened this issue Jan 16, 2017 · 0 comments

Comments

@mpinese
Copy link
Contributor

mpinese commented Jan 16, 2017

The following command always fails at the write stage:

hail read test.in.vds annotatevariants expr -c 'va = {}' write -o test.out.vds

The traceback is huge, but I've copied what I think is the relevant parts:

log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/01/17 09:24:46 INFO SparkContext: Running Spark version 2.0.2
17/01/17 09:24:46 INFO SecurityManager: Changing view acls to: marpin
17/01/17 09:24:46 INFO SecurityManager: Changing modify acls to: marpin
17/01/17 09:24:46 INFO SecurityManager: Changing view acls groups to:
17/01/17 09:24:46 INFO SecurityManager: Changing modify acls groups to:
17/01/17 09:24:46 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(marpin); groups with view permissions: Set(); users  with modify permissions: Set(marpin); groups with modify permissions: Set()
17/01/17 09:24:46 INFO Utils: Successfully started service 'sparkDriver' on port 37801.
17/01/17 09:24:46 INFO SparkEnv: Registering MapOutputTracker
17/01/17 09:24:46 INFO SparkEnv: Registering BlockManagerMaster
17/01/17 09:24:46 INFO DiskBlockManager: Created local directory at 
/tmp/hail/blockmgr-522fbeb1-5053-4884-9115-5f2af7bd912a
17/01/17 09:24:46 INFO MemoryStore: MemoryStore started with capacity 15.8 GB
17/01/17 09:24:46 INFO SparkEnv: Registering OutputCommitCoordinator
17/01/17 09:24:46 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/01/17 09:24:46 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://129.94.72.55:4040
17/01/17 09:24:46 INFO Executor: Starting executor ID driver on host localhost
17/01/17 09:24:46 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 37833.
17/01/17 09:24:46 INFO NettyBlockTransferService: Server created on 129.94.72.55:37833
17/01/17 09:24:46 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 129.94.72.55, 37833)
17/01/17 09:24:46 INFO BlockManagerMasterEndpoint: Registering block manager 129.94.72.55:37833 with 15.8 GB RAM, BlockManagerId(driver, 129.94.72.55, 37833)
17/01/17 09:24:46 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 129.94.72.55, 37833)
hail: info: running: read test.in.vds
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
hail: info: running: annotatevariants expr -c 'va = {}'
hail: info: running: write -o test.out.vds
[Stage 1:==>                                                      (1 + 24) / 25]hail: write: caught exception: org.apache.spark.SparkException: Job aborted.
        at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:149)
        at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
        at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
        at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
        at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:525)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
        at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488)
        at org.broadinstitute.hail.variant.RichVDS.write(VariantSampleMatrix.scala:1073)
        at org.broadinstitute.hail.driver.Write$.run(Write.scala:35)
        at org.broadinstitute.hail.driver.Write$.run(Write.scala:6)
        at org.broadinstitute.hail.driver.Command.runCommand(Command.scala:259)
        at org.broadinstitute.hail.driver.Main$.runCommand(Main.scala:91)
        at org.broadinstitute.hail.driver.Main$$anonfun$runCommands$1$$anonfun$1.apply(Main.scala:115)
        at org.broadinstitute.hail.driver.Main$$anonfun$runCommands$1$$anonfun$1.apply(Main.scala:115)
        at org.broadinstitute.hail.utils.package$.time(package.scala:119)
        at org.broadinstitute.hail.driver.Main$$anonfun$runCommands$1.apply(Main.scala:114)
        at org.broadinstitute.hail.driver.Main$$anonfun$runCommands$1.apply(Main.scala:108)
        at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
        at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
        at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186)
        at org.broadinstitute.hail.driver.Main$.runCommands(Main.scala:108)
        at org.broadinstitute.hail.driver.Main$.main(Main.scala:233)
        at org.broadinstitute.hail.driver.Main.main(Main.scala)org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 1.0 failed 1 times, most recent failure: Lost task 3.0 in stage 1.0 (TID 4, localhost): org.apache.spark.SparkException: Task failed while writing rows
        at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:261)
        at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
        at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
        at org.apache.spark.scheduler.Task.run(Task.scala:86)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: org.apache.spark.sql.catalyst.expressions.GenericRow is not a valid external type for schema of boolean
named_struct(contig, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true], top level row object), 0, variant), StructField(contig,StringType,false), StructField(start,IntegerType,false), StructField(ref,StringType,false), StructField(altAlleles,ArrayType(StructType(StructField(ref,StringType,false), StructField(alt,StringType,false)),false),false)), 0, contig), StringType), true), start, validateexternaltype(getexternalrowfield(validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true], top level row object), 0, variant), StructField(contig,StringType,false), StructField(start,IntegerType,false), StructField(ref,StringType,false), StructField(altAlleles,ArrayType(StructType(StructField(ref,StringType,false), StructField(alt,StringType,false)),false),false)), 1, start), IntegerType), ref, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true], top level row object), 0, variant), StructField(contig,StringType,false), StructField(start,IntegerType,false), StructField(ref,StringType,false), StructField(altAlleles,ArrayType(StructType(StructField(ref,StringType,false), StructField(alt,StringType,false)),false),false)), 2, ref), StringType), true), altAlleles, mapobjects(MapObjects_loopValue8, MapObjects_loopIsNull9, ObjectType(class java.lang.Object), if (isnull(validateexternaltype(lambdavariable(MapObjects_loopValue8, MapObjects_loopIsNull9, ObjectType(class java.lang.Object)), StructField(ref,StringType,false), StructField(alt,StringType,false)))) null else named_struct(ref, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(validateexternaltype(lambdavariable(MapObjects_loopValue8, MapObjects_loopIsNull9, ObjectType(class java.lang.Object)), StructField(ref,StringType,false), StructField(alt,StringType,false)), 0, ref), StringType), true), alt, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(validateexternaltype(lambdavariable(MapObjects_loopValue8, MapObjects_loopIsNull9, ObjectType(class java.lang.Object)), StructField(ref,StringType,false), StructField(alt,StringType,false)), 1, alt), StringType), true)), validateexternaltype(getexternalrowfield(validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true], top level row object), 0, variant), StructField(contig,StringType,false), StructField(start,IntegerType,false), StructField(ref,StringType,false), StructField(altAlleles,ArrayType(StructType(StructField(ref,StringType,false), StructField(alt,StringType,false)),false),false)), 3, altAlleles), ArrayType(StructType(StructField(ref,StringType,false), StructField(alt,StringType,false)),false)))) AS variant#8

Attached is a toy test.in.vds that reproduces the problem test.in.vds.tar.gz. Tested on a clean ed54489 built with gradlew installDist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant