You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Semantic hash assumes the params.files is a list of concrete file paths but it is a list of file paths with glob expressions. Consider the following example. Part of this ticket must also determine why this was not caught by test_glob.
(base) dking@wm28c-761 hail % gsutil cp ./src/test/resources/ldprune2.vcf gs://danking/chr1.vcf
Copying file://./src/test/resources/ldprune2.vcf [Content-Type=text/x-vcard]...
/ [1 files][ 11.5 KiB/ 11.5 KiB]
Operation completed over 1 objects/11.5 KiB.
(base) dking@wm28c-761 hail % gsutil cp ./src/test/resources/ldprune2.vcf gs://danking/chr2.vcf
Copying file://./src/test/resources/ldprune2.vcf [Content-Type=text/x-vcard]...
/ [1 files][ 11.5 KiB/ 11.5 KiB]
Operation completed over 1 objects/11.5 KiB.
(base) dking@wm28c-761 hail % ipython
Python 3.10.9 (main, Jan 11 2023, 09:18:18) [Clang 14.0.6 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.16.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import hail as hl
...: hl.import_vcf('gs://danking/chr*.vcf').count()
Initializing Hail with default parameters...
/Users/dking/miniconda3/lib/python3.10/site-packages/hailtop/aiocloud/aiogoogle/user_config.py:29: UserWarning: You have specified the GCS requester pays configuration in both your spark-defaults.conf (/Users/dking/miniconda3/lib/python3.10/site-packages/pyspark/conf/spark-defaults.conf) and either an explicit argument or through `hailctl config`. For GCS requester pays configuration, Hail first checks explicit arguments, then `hailctl config`, then spark-defaults.conf.
warnings.warn(
SLF4J: No SLF4J providers were found.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.
SLF4J: Class path contains SLF4J bindings targeting slf4j-api versions 1.7.x or earlier.
SLF4J: Ignoring binding found at [jar:file:/Users/dking/miniconda3/lib/python3.10/site-packages/pyspark/jars/log4j-slf4j-impl-2.17.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See https://www.slf4j.org/codes.html#ignoredBindings for an explanation.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Running on Apache Spark version 3.3.3
SparkUI available at http://192.168.1.142:4040
Welcome to
__ __ <>__
/ /_/ /__ __/ /
/ __ / _ `/ / /
/_/ /_/\_,_/_/_/ version 0.2.125-c4e2880b3279
LOGGING: writing to /Users/dking/projects/hail/hail/hail-20231026-0957-0.2.125-c4e2880b3279.log
--------------------------------------------------------------------------- / 1]
FatalError Traceback (most recent call last)
Cell In[1], line 2
1 import hail as hl
----> 2 hl.import_vcf('gs://danking/chr*.vcf').count()
File ~/miniconda3/lib/python3.10/site-packages/hail/matrixtable.py:2631, in MatrixTable.count(self)
2618 """Count the number of rows and columns in the matrix.
2619
2620 Examples
(...)
2628 Number of rows, number of cols.
2629 """
2630 count_ir = ir.MatrixCount(self._mir)
-> 2631 return Env.backend().execute(count_ir)
File ~/miniconda3/lib/python3.10/site-packages/hail/backend/backend.py:180, in Backend.execute(self, ir, timed)
178 result, timings = self._rpc(ActionTag.EXECUTE, payload)
179 except FatalError as e:
--> 180 raise e.maybe_user_error(ir) from None
181 if ir.typ == tvoid:
182 value = None
File ~/miniconda3/lib/python3.10/site-packages/hail/backend/backend.py:178, in Backend.execute(self, ir, timed)
176 payload = ExecutePayload(self._render_ir(ir), '{"name":"StreamBufferSpec"}', timed)
177 try:
--> 178 result, timings = self._rpc(ActionTag.EXECUTE, payload)
179 except FatalError as e:
180 raise e.maybe_user_error(ir) from None
File ~/miniconda3/lib/python3.10/site-packages/hail/backend/py4j_backend.py:214, in Py4JBackend._rpc(self, action, payload)
212 if resp.status_code >= 400:
213 error_json = orjson.loads(resp.content)
--> 214 raise fatal_error_from_java_error_triplet(error_json['short'], error_json['expanded'], error_json['error_id'])
215 return resp.content, resp.headers.get('X-Hail-Timings', '')
FatalError: FileNotFoundException: File not found: gs://danking/chr*.vcf
Java stack trace:
java.io.FileNotFoundException: File not found: gs://danking/chr*.vcf
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getFileStatus(GoogleHadoopFileSystemBase.java:984)
at is.hail.io.fs.HadoopFS.fileListEntry(HadoopFS.scala:175)
at is.hail.io.fs.HadoopFS.fileListEntry(HadoopFS.scala:87)
at is.hail.io.fs.FS.fileListEntry(FS.scala:417)
at is.hail.io.fs.FS.fileListEntry$(FS.scala:417)
at is.hail.io.fs.HadoopFS.fileListEntry(HadoopFS.scala:87)
at is.hail.expr.ir.analyses.SemanticHash$.getFileHash(SemanticHash.scala:373)
at is.hail.expr.ir.analyses.SemanticHash$.$anonfun$encode$18(SemanticHash.scala:198)
at scala.collection.immutable.List.foreach(List.scala:431)
at is.hail.expr.ir.analyses.SemanticHash$.encode(SemanticHash.scala:198)
at is.hail.expr.ir.analyses.SemanticHash$.$anonfun$apply$6(SemanticHash.scala:42)
at is.hail.expr.ir.analyses.SemanticHash$.$anonfun$apply$6$adapted(SemanticHash.scala:41)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at is.hail.expr.ir.analyses.SemanticHash$.go$1(SemanticHash.scala:41)
at is.hail.expr.ir.analyses.SemanticHash$.$anonfun$apply$4(SemanticHash.scala:54)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
at is.hail.expr.ir.analyses.SemanticHash$.$anonfun$apply$1(SemanticHash.scala:34)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
at is.hail.expr.ir.analyses.SemanticHash$.apply(SemanticHash.scala:26)
at is.hail.backend.spark.SparkBackend._execute(SparkBackend.scala:509)
at is.hail.backend.spark.SparkBackend.$anonfun$execute$4(SparkBackend.scala:546)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
at is.hail.backend.spark.SparkBackend.$anonfun$execute$3(SparkBackend.scala:542)
at is.hail.backend.spark.SparkBackend.$anonfun$execute$3$adapted(SparkBackend.scala:541)
at is.hail.backend.ExecuteContext$.$anonfun$scoped$3(ExecuteContext.scala:76)
at is.hail.utils.package$.using(package.scala:657)
at is.hail.backend.ExecuteContext$.$anonfun$scoped$2(ExecuteContext.scala:76)
at is.hail.utils.package$.using(package.scala:657)
at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:17)
at is.hail.backend.ExecuteContext$.scoped(ExecuteContext.scala:62)
at is.hail.backend.spark.SparkBackend.$anonfun$withExecuteContext$3(SparkBackend.scala:368)
at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:52)
at is.hail.utils.ExecutionTimer$.logTime(ExecutionTimer.scala:59)
at is.hail.backend.spark.SparkBackend.$anonfun$withExecuteContext$2(SparkBackend.scala:364)
at is.hail.backend.spark.SparkBackend.execute(SparkBackend.scala:541)
at is.hail.backend.BackendHttpHandler.handle(BackendServer.scala:51)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
at sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:82)
at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:822)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
at sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:794)
at sun.net.httpserver.ServerImpl$DefaultExecutor.execute(ServerImpl.java:199)
at sun.net.httpserver.ServerImpl$Dispatcher.handle(ServerImpl.java:544)
at sun.net.httpserver.ServerImpl$Dispatcher.run(ServerImpl.java:509)
at java.lang.Thread.run(Thread.java:750)
Hail version: 0.2.125-c4e2880b3279
Error summary: FileNotFoundException: File not found: gs://danking/chr*.vcf
Version
0.2.124
Relevant log output
No response
The text was updated successfully, but these errors were encountered:
Fixeshail-is#13915
`MatrixVCFReader` accepts glob patterns (wildcards in glob names). This
bamboozled `SemanticHash` which had assumed all files had been resolved.
This change fixes this by adding explicit `FileNotFoundException`
handling to `SemanticHash` and replacing the `params.files` object of
`MatrixVCFReader` with the resolved paths.
Fixes#13915
`MatrixVCFReader` accepts glob patterns (wildcards in glob names). This
bamboozled `SemanticHash` which had assumed all files had been resolved.
This change fixes this by adding explicit `FileNotFoundException`
handling to `SemanticHash` and replacing the `params.files` object of
`MatrixVCFReader` with the resolved paths.
What happened?
Semantic hash assumes the params.files is a list of concrete file paths but it is a list of file paths with glob expressions. Consider the following example. Part of this ticket must also determine why this was not caught by
test_glob
.Version
0.2.124
Relevant log output
No response
The text was updated successfully, but these errors were encountered: