Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid server response makes Google pipelines execution to crash #1163

Closed
mozack opened this issue May 28, 2019 · 5 comments
Closed

Invalid server response makes Google pipelines execution to crash #1163

mozack opened this issue May 28, 2019 · 5 comments

Comments

@mozack
Copy link
Contributor

mozack commented May 28, 2019

Bug report

Expected behavior and actual behavior

Errors returned from the Google Genomics Pipelines API appear to generate exceptions which bubble up and cause the workflow to shutdown.

Using retries or an error strategy of ignore does not solve the problem.

Steps to reproduce the problem

The problem is sporadic. I have run this pipeline against 100 samples without error. However, when run against ~1000 samples, the error usually appears.

Program output

The nextflow log file from 2 distinct failures is attached. One fails in
GooglePipelinesTaskHandler.checkIfCompleted() with 503 Service Unavailable
and the other in
GooglePipelinesTaskHandler.submit() with 410 Gone

For the "503 Service Unavailable" response, the recommendation is to retry with a backoff.
"The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff."
See: https://cloud.google.com/genomics/reference/rest/Shared.Types/Code

For the second case, it is not clear whether the "410 Gone" is truly accurate or if this might be also a transient condition.

Environment

  • Nextflow version: 19.04
  • Java version: 1.8.0_144
  • Operating system: Linux - Centos7

Additional context

nextflow.logs.gz
rna_seq_quant.nf.gz

@mozack
Copy link
Contributor Author

mozack commented Jun 3, 2019

FYI, the extension of nextflow.logs.gz is misnamed as it is a compressed tarfile. Please run tar xzf to open it. Sorry for the confusion.

@pditommaso pditommaso changed the title Google Cloud - Genomics Pipelines API errors trigger workflow shutdown Invalid server response makes Google pipelines execution to crash Jun 3, 2019
sivkovic pushed a commit to sivkovic/nextflow that referenced this issue Jun 6, 2019
sivkovic pushed a commit to sivkovic/nextflow that referenced this issue Jun 6, 2019
@mozack
Copy link
Contributor Author

mozack commented Jul 18, 2019

I've not run into this error for a while, however I believe I just encountered it again in 19.07.0-edge:

Jul-18 13:31:20.487 [Task submitter] ERROR nextflow.processor.TaskProcessor - Error executing process > 'salmon_quant (TCGA-OR-A5JI-01A)'

Caused by:
  410 Gone
{
 "error": {
  "errors": [
   {
    "domain": "global",
    "reason": "backendError",
    "message": "Backend Error"
   }
  ],
  "code": 503,
  "message": "Backend Error"
 }
}


com.google.cloud.storage.StorageException: 410 Gone
{
 "error": {
  "errors": [
   {
    "domain": "global",
    "reason": "backendError",
    "message": "Backend Error"
   }
  ],
  "code": 503,
  "message": "Backend Error"
 }
}

        at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:220)
        at com.google.cloud.storage.spi.v1.HttpStorageRpc.write(HttpStorageRpc.java:704)
        at com.google.cloud.storage.BlobWriteChannel$1.run(BlobWriteChannel.java:51)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
        at com.google.cloud.RetryHelper.run(RetryHelper.java:74)
        at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:51)
        at com.google.cloud.storage.BlobWriteChannel.flushBuffer(BlobWriteChannel.java:47)
        at com.google.cloud.BaseWriteChannel.close(BaseWriteChannel.java:161)
        at com.google.cloud.storage.contrib.nio.CloudStorageWriteChannel.close(CloudStorageWriteChannel.java:57)
        at java.nio.channels.Channels$1.close(Channels.java:178)
        at java.nio.file.Files.write(Files.java:3300)
        at nextflow.executor.BashWrapperBuilder.build(BashWrapperBuilder.groovy:282)
        at nextflow.executor.BashWrapperBuilder$build.call(Unknown Source)
        at nextflow.cloud.google.pipelines.GooglePipelinesTaskHandler.createTaskWrapper(GooglePipelinesTaskHandler.groovy:291)
        at nextflow.cloud.google.pipelines.GooglePipelinesTaskHandler.submit(GooglePipelinesTaskHandler.groovy:265)
        at nextflow.processor.TaskPollingMonitor.submit(TaskPollingMonitor.groovy:195)
        at nextflow.processor.TaskPollingMonitor.submitPendingTasks(TaskPollingMonitor.groovy:557)
        at nextflow.processor.TaskPollingMonitor.submitLoop(TaskPollingMonitor.groovy:385)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
...

nextflow.log.gz

@pditommaso
Copy link
Member

The tar looks corrupted. May you upload it again (maybe using zip)?

@mozack
Copy link
Contributor Author

mozack commented Jul 19, 2019

That one is not a tar file, just the log file gzipped. Sorry for the previous confusion.

I've attached it here again zipped.
nextflow.log.zip

I suspect the core issue starts from this line:

Jul-18 13:31:20.487 [Task submitter] ERROR nextflow.processor.TaskProcessor - Error executing process > 'salmon_quant (TCGA-OR-A5JI-01A)'

@pditommaso
Copy link
Member

The error stack trace in the log attached seems related to a different error. Please open a separate issue for it. Closing this.

@pditommaso pditommaso added this to the v19.07.0 milestone Jul 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants