Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion error when filtering rows based on locus position #12280

Closed
nidabdella opened this issue Oct 6, 2022 · 1 comment
Closed

Assertion error when filtering rows based on locus position #12280

nidabdella opened this issue Oct 6, 2022 · 1 comment

Comments

@nidabdella
Copy link

nidabdella commented Oct 6, 2022

Hail fails when trying to filter a MatrixTable based on locus position

To reproduce

import hail as hl
mt = hl.import_vcf("http://hgdownload.cse.ucsc.edu/gbdb/hg19/1000Genomes/phase3/ALL.chrY.phase3_integrated_v1a.20130502.genotypes.vcf.gz", force_bgz=True)
----------------------------------------------------------------------
Initializing Hail with default parameters...
2022-10-06 15:56:03 WARN  Utils:69 - Your hostname, nid resolves to a loopback address: 127.0.1.1; using 192.168.248.80 instead (on interface wlp0s20f3)
2022-10-06 15:56:03 WARN  Utils:69 - Set SPARK_LOCAL_IP if you need to bind to another address
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/home/med/.local/lib/python3.8/site-packages/pyspark/jars/spark-unsafe_2.12-3.1.3.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
2022-10-06 15:56:03 WARN  NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Running on Apache Spark version 3.1.3
SparkUI available at http://192.168.248.80:4040
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.2.100-2ea2615a797a
LOGGING: writing to /
--------------------------------------------------------------------------
mt.filter_rows(mt.locus.position==2867101).count_rows()

Expected

Return a count of rows with that condition

Error

FatalError: AssertionError: assertion failed

Java stack trace:
java.lang.AssertionError: assertion failed
        at scala.Predef$.assert(Predef.scala:208)
        at is.hail.expr.ir.LoweredTableReader$.makeCoercer(TableIR.scala:135)
        at is.hail.expr.ir.GenericTableValue.getLTVCoercer(GenericTableValue.scala:137)
        at is.hail.expr.ir.GenericTableValue.toTableStage(GenericTableValue.scala:162)
        at is.hail.io.vcf.MatrixVCFReader.lower(LoadVCF.scala:1798)
        at is.hail.expr.ir.lowering.LowerTableIR$.applyTable(LowerTableIR.scala:717)
        at is.hail.expr.ir.lowering.LowerTableIR$.lower$2(LowerTableIR.scala:697)
        at is.hail.expr.ir.lowering.LowerTableIR$.applyTable(LowerTableIR.scala:903)
        at is.hail.expr.ir.lowering.LowerTableIR$.lower$1(LowerTableIR.scala:467)
        at is.hail.expr.ir.lowering.LowerTableIR$.apply(LowerTableIR.scala:472)
        at is.hail.expr.ir.lowering.LowerToCDA$.lower(LowerToCDA.scala:73)
        at is.hail.expr.ir.lowering.LowerToCDA$.apply(LowerToCDA.scala:18)
        at is.hail.expr.ir.lowering.LowerToDistributedArrayPass.transform(LoweringPass.scala:77)
        at is.hail.expr.ir.LowerOrInterpretNonCompilable$.evaluate$1(LowerOrInterpretNonCompilable.scala:27)
        at is.hail.expr.ir.LowerOrInterpretNonCompilable$.rewrite$1(LowerOrInterpretNonCompilable.scala:67)
        at is.hail.expr.ir.LowerOrInterpretNonCompilable$.rewrite$1(LowerOrInterpretNonCompilable.scala:53)
        at is.hail.expr.ir.LowerOrInterpretNonCompilable$.apply(LowerOrInterpretNonCompilable.scala:72)
        at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.transform(LoweringPass.scala:69)
        at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$3(LoweringPass.scala:16)
        at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
        at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$1(LoweringPass.scala:16)
        at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
        at is.hail.expr.ir.lowering.LoweringPass.apply(LoweringPass.scala:14)
        at is.hail.expr.ir.lowering.LoweringPass.apply$(LoweringPass.scala:13)
        at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.apply(LoweringPass.scala:64)
        at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1(LoweringPipeline.scala:15)
        at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1$adapted(LoweringPipeline.scala:13)
        at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
        at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
        at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:13)
        at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:47)
        at is.hail.backend.spark.SparkBackend._execute(SparkBackend.scala:416)
        at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$2(SparkBackend.scala:452)
        at is.hail.backend.ExecuteContext$.$anonfun$scoped$3(ExecuteContext.scala:70)
        at is.hail.utils.package$.using(package.scala:646)
        at is.hail.backend.ExecuteContext$.$anonfun$scoped$2(ExecuteContext.scala:70)
        at is.hail.utils.package$.using(package.scala:646)
        at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:17)
        at is.hail.backend.ExecuteContext$.scoped(ExecuteContext.scala:59)
        at is.hail.backend.spark.SparkBackend.withExecuteContext(SparkBackend.scala:310)
        at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$1(SparkBackend.scala:449)
        at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:52)
        at is.hail.backend.spark.SparkBackend.executeEncode(SparkBackend.scala:448)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.base/java.lang.Thread.run(Thread.java:829)



Hail version: 0.2.100-2ea2615a797a
Error summary: AssertionError: assertion failed
@danking
Copy link
Contributor

danking commented Oct 6, 2022

Huh. Well, this is a terrible error message, but the short answer is that Hail doesn't support reading directly from an HTTP(S) server. You can either download that file or use a dataset that is available in a cloud storage bucket.

In general, you'll want to convert to Hail's native MatrixTable format before you do further analysis anyway.

I'll fix this to give a more reasonable error message, but, in general, not all HTTP(S) servers support the Range header which means Hail can't efficiently read from all HTTP(S) servers. If you're looking for public datasets to experiment with, I strongly recommend using the Dense Hail MatrixTable of the HGDP+1KG dataset hosted for free by the three major clouds https://gnomad.broadinstitute.org/downloads#v3-hgdp-1kg.

@danking danking closed this as completed Oct 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants