Skip to content

[query] rare Google Cloud Storage error #13721

@danking

Description

@danking

What happened?

https://batch.hail.is/batches/8043496/jobs/22010 (wlu, all-by-aou)

It seems that, on close, we encountered an exception. Google searches have not revealed much of use. There's this stack overflow issue where the fix was to use a different part of the API. We're not using createFrom though. I can't even find the source code for JsonResumableSessionFailureScenario.

Version

0.2.124

Relevant log output

2023-09-24 01:58:16.721 JVMEntryway: INFO: is.hail.JVMEntryway received arguments:
2023-09-24 01:58:16.721 JVMEntryway: INFO: 0: /hail-jars/gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar
2023-09-24 01:58:16.721 JVMEntryway: INFO: 1: is.hail.backend.service.Main
2023-09-24 01:58:16.721 JVMEntryway: INFO: 2: /batch/8cca2fb0e9764b6195f85b899fb76986
2023-09-24 01:58:16.721 JVMEntryway: INFO: 3: /batch/8cca2fb0e9764b6195f85b899fb76986/log
2023-09-24 01:58:16.721 JVMEntryway: INFO: 4: gs://hail-query-ger0g/jars/13536b531342a263b24a7165bfeec7bd02723e4b.jar
2023-09-24 01:58:16.721 JVMEntryway: INFO: 5: worker
2023-09-24 01:58:16.721 JVMEntryway: INFO: 6: gs://aou_tmp/parallelizeAndComputeWithIndex/OLkY5pgCTBWt2Yw4iCp6WsR2N5drFQmMiQJa7wSE_ik=
2023-09-24 01:58:16.721 JVMEntryway: INFO: 7: 9571
2023-09-24 01:58:16.721 JVMEntryway: INFO: 8: 12185
2023-09-24 01:58:16.721 JVMEntryway: INFO: Yielding control to the QoB Job.
2023-09-24 01:58:16.722 Worker$: INFO: is.hail.backend.service.Worker 13536b531342a263b24a7165bfeec7bd02723e4b
2023-09-24 01:58:16.722 Worker$: INFO: running job 9571/12185 at root gs://aou_tmp/parallelizeAndComputeWithIndex/OLkY5pgCTBWt2Yw4iCp6WsR2N5drFQmMiQJa7wSE_ik= with scratch directory '/batch/8cca2fb0e9764b6195f85b899fb76986'
2023-09-24 01:58:16.729 GoogleStorageFS$: INFO: Initializing google storage client from service account key
2023-09-24 01:58:17.061 WorkerTimer$: INFO: readInputs took 338.458743 ms.
2023-09-24 01:58:17.061 : INFO: RegionPool: initialized for thread 10: pool-2-thread-2
2023-09-24 01:58:17.096 : INFO: RegionPool: REPORT_THRESHOLD: 265.0K allocated (201.0K blocks / 64.0K chunks), regions.size = 5, 0 current java objects, thread 10: pool-2-thread-2
2023-09-24 01:58:17.707 : INFO: RegionPool: REPORT_THRESHOLD: 521.0K allocated (457.0K blocks / 64.0K chunks), regions.size = 9, 0 current java objects, thread 10: pool-2-thread-2
2023-09-24 01:58:18.609 : INFO: RegionPool: REPORT_THRESHOLD: 1.1M allocated (698.0K blocks / 410.0K chunks), regions.size = 19, 0 current java objects, thread 10: pool-2-thread-2
2023-09-24 01:58:19.984 : INFO: RegionPool: REPORT_THRESHOLD: 2.0M allocated (1.0M blocks / 1010.0K chunks), regions.size = 19, 0 current java objects, thread 10: pool-2-thread-2
2023-09-24 01:58:24.240 : INFO: RegionPool: REPORT_THRESHOLD: 4.3M allocated (2.2M blocks / 2.1M chunks), regions.size = 19, 0 current java objects, thread 10: pool-2-thread-2
2023-09-24 01:58:24.240 GoogleStorageFS$: INFO: createNoCompression: gs://aou_tmp/tmp/hail/icullIwHC8dQXtq8JU2uDW/aggregate_intermediates/-ntpjdAQ9sKaR8lK26cV0p5790a4d87-9035-41ae-afc6-326f710d9a89
2023-09-24 01:58:24.305 GoogleStorageFS$: INFO: close: gs://aou_tmp/tmp/hail/icullIwHC8dQXtq8JU2uDW/aggregate_intermediates/-ntpjdAQ9sKaR8lK26cV0p5790a4d87-9035-41ae-afc6-326f710d9a89
2023-09-24 01:58:51.513 : INFO: TaskReport: stage=0, partition=9571, attempt=0, peakBytes=4507648, peakBytesReadable=4.30 MiB, chunks requested=51, cache hits=0
2023-09-24 01:58:51.513 : INFO: RegionPool: FREE: 4.3M allocated (2.2M blocks / 2.1M chunks), regions.size = 19, 0 current java objects, thread 10: pool-2-thread-2
2023-09-24 01:58:51.515 JVMEntryway: ERROR: QoB Job threw an exception.
java.lang.reflect.InvocationTargetException: null
	at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source) ~[?:?]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_382]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_382]
	at is.hail.JVMEntryway$1.run(JVMEntryway.java:119) ~[jvm-entryway.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_382]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_382]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_382]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_382]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_382]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_382]
	at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_382]
Caused by: is.hail.relocated.com.google.cloud.storage.StorageException: Missing Range header in response
	|> PUT https://storage.googleapis.com/upload/storage/v1/b/aou_tmp/o?name=tmp/hail/icullIwHC8dQXtq8JU2uDW/aggregate_intermediates/-ntpjdAQ9sKaR8lK26cV0p5790a4d87-9035-41ae-afc6-326f710d9a89&uploadType=resumable&upload_id=ADPycdtl5JSqwvftT4W190_-ueC032I_oZcwLAlVVMFkqp06W4eY8b-XMwf8DeT7If9I7uIgmI_PLCuFsExsT0aEh2b4FrHtAiUktumQbvgl1U0icw
	|> content-range: bytes */*
	|  
	|< HTTP/1.1 308 Resume Incomplete
	|< content-length: 0
	|< content-type: text/plain; charset=utf-8
	|< x-guploader-uploadid: ADPycdtl5JSqwvftT4W190_-ueC032I_oZcwLAlVVMFkqp06W4eY8b-XMwf8DeT7If9I7uIgmI_PLCuFsExsT0aEh2b4FrHtAiUktumQbvgl1U0icw
	|  
	at is.hail.relocated.com.google.cloud.storage.JsonResumableSessionFailureScenario.toStorageException(JsonResumableSessionFailureScenario.java:185) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.JsonResumableSessionFailureScenario.toStorageException(JsonResumableSessionFailureScenario.java:117) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.JsonResumableSessionFailureScenario.toStorageException(JsonResumableSessionFailureScenario.java:98) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.JsonResumableSessionQueryTask.call(JsonResumableSessionQueryTask.java:100) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.JsonResumableSession.query(JsonResumableSession.java:57) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.JsonResumableSession.lambda$put$0(JsonResumableSession.java:73) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.Retrying.lambda$run$0(Retrying.java:102) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:103) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.RetryHelper.run(RetryHelper.java:76) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.Retrying.run(Retrying.java:99) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.JsonResumableSession.put(JsonResumableSession.java:68) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.ApiaryUnbufferedWritableByteChannel.internalWrite(ApiaryUnbufferedWritableByteChannel.java:114) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.ApiaryUnbufferedWritableByteChannel.writeAndClose(ApiaryUnbufferedWritableByteChannel.java:65) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.UnbufferedWritableByteChannelSession$UnbufferedWritableByteChannel.writeAndClose(UnbufferedWritableByteChannelSession.java:40) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.DefaultBufferedWritableByteChannel.close(DefaultBufferedWritableByteChannel.java:166) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.StorageByteChannels$SynchronizedBufferedWritableByteChannel.close(StorageByteChannels.java:119) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.StorageException.wrapIOException(StorageException.java:179) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.BaseStorageWriteChannel.close(BaseStorageWriteChannel.java:84) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.io.fs.GoogleStorageFS$$anon$2.$anonfun$close$2(GoogleStorageFS.scala:326) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.io.fs.GoogleStorageFS$$anon$2.doHandlingRequesterPays(GoogleStorageFS.scala:296) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.io.fs.GoogleStorageFS$$anon$2.$anonfun$close$1(GoogleStorageFS.scala:326) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.15.jar:?]
	at is.hail.services.package$.retryTransientErrors(package.scala:182) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.io.fs.GoogleStorageFS$$anon$2.close(GoogleStorageFS.scala:324) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at java.io.FilterOutputStream.close(FilterOutputStream.java:159) ~[?:1.8.0_382]
	at __C1867collect_distributed_array_table_aggregate.apply(Unknown Source) ~[?:?]
	at __C1867collect_distributed_array_table_aggregate.apply(Unknown Source) ~[?:?]
	at is.hail.backend.BackendUtils.$anonfun$collectDArray$16(BackendUtils.scala:91) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.utils.package$.using(package.scala:637) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.annotations.RegionPool.scopedRegion(RegionPool.scala:162) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.backend.BackendUtils.$anonfun$collectDArray$15(BackendUtils.scala:90) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.backend.service.Worker$.$anonfun$main$12(Worker.scala:167) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.15.jar:?]
	at is.hail.services.package$.retryTransientErrors(package.scala:182) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.backend.service.Worker$.$anonfun$main$11(Worker.scala:166) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.backend.service.Worker$.$anonfun$main$11$adapted(Worker.scala:164) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.utils.package$.using(package.scala:637) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.backend.service.Worker$.main(Worker.scala:164) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.backend.service.Main$.main(Main.scala:14) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.backend.service.Main.main(Main.scala) ~[gs:__hail-query-ger0g_jars_13536b531342a263b24a7165bfeec7bd02723e4b.jar.jar:0.0.1-SNAPSHOT]
	... 11 more

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions