Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Batch - java.lang.NullPointerException - findBestVm #1992

Closed
sminot opened this issue Mar 25, 2021 · 11 comments
Closed

Azure Batch - java.lang.NullPointerException - findBestVm #1992

sminot opened this issue Mar 25, 2021 · 11 comments

Comments

@sminot
Copy link

sminot commented Mar 25, 2021

Bug report

Expected behavior and actual behavior

I'm trying to get set up running Nextflow with Azure Batch as the executor. Prior to running this test, I was able to confirm that nextflow run hello worked perfectly well with my setup. I think that does a good job of making sure that my storage container and Batch service are configured relatively well.

After making sure that hello worked, I moved on to testing a larger workflow which runs a series of jobs to generate a useful dataset. At this point I got the error java.lang.NullPointerException described below.

Steps to reproduce the problem

The command I ran was:

nextflow run -c nextflow.azure.config Golob-Minot/geneshot/download_sra.nf --accession PRJEB26531 --output 'az://REDACTED/REDACTED/' -with-report PRJEB26531.download.report.html -resume -r v0.9 -process.maxForks 20 -with-tower

Unfortunately, I'm not 100% sure that this can be reproduced outside of our Azure environment, but the command above should provide a starting place.

Program output

The error I got was:

Error executing process > 'getSRAlist'

Caused by:
  java.lang.NullPointerException

Looking at the logs, the traceback is:

Mar-25 15:11:54.492 [Task submitter] DEBUG n.c.azure.batch.AzBatchTaskHandler - [AZURE BATCH] Submitting task getSRAlist - work-dir=az://cddi-bioinformatics-work/work/ce/05f55b7d0af1ea3b8ce245b1f7704d
Mar-25 15:11:54.717 [Task submitter] ERROR nextflow.processor.TaskProcessor - Error executing process > 'getSRAlist'

Caused by:
  java.lang.NullPointerException

java.lang.NullPointerException: null
        at nextflow.cloud.azure.batch.AzBatchService.findBestVm(AzBatchService.groovy:168)
        at nextflow.cloud.azure.batch.AzBatchService.guessBestVm(AzBatchService.groovy:128)
        at nextflow.cloud.azure.batch.AzBatchService.specFromAutoPool(AzBatchService.groovy:446)
        at nextflow.cloud.azure.batch.AzBatchService.specForTask(AzBatchService.groovy:492)
        at nextflow.cloud.azure.batch.AzBatchService.getOrCreatePool(AzBatchService.groovy:498)
        at nextflow.cloud.azure.batch.AzBatchService.submitTask(AzBatchService.groovy:256)
        at nextflow.cloud.azure.batch.AzBatchTaskHandler.submit(AzBatchTaskHandler.groovy:91)
        at nextflow.processor.TaskPollingMonitor.submit(TaskPollingMonitor.groovy:196)
        at nextflow.processor.TaskPollingMonitor.submitPendingTasks(TaskPollingMonitor.groovy:560)
        at nextflow.processor.TaskPollingMonitor.submitLoop(TaskPollingMonitor.groovy:387)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1268)
        at groovy.lang.MetaClassImpl.invokeMethodClosure(MetaClassImpl.java:1048)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1142)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
        at groovy.lang.Closure.call(Closure.java:412)
        at groovy.lang.Closure.call(Closure.java:406)
        at groovy.lang.Closure.run(Closure.java:493)
        at java.lang.Thread.run(Thread.java:748)
Mar-25 15:11:54.752 [Task submitter] DEBUG nextflow.Session - Session aborted -- Cause: java.lang.NullPointerException

Environment

  • Nextflow version: 21.03.0-edge build 5518
  • Java version: Groovy 3.0.7 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13
  • Operating system: Linux 4.15.0-101-generic
  • Bash version: GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu)
@abhi18av
Copy link
Member

Hi @sminot , thanks for raising this issue with us.

I was wondering whether it'd be possible for you share with us some more specifics of your Azure env such as quotas/region etc

@sminot
Copy link
Author

sminot commented Mar 25, 2021

I'm happy to share anything that would be helpful!

The region is westus2, and I'm not sure about quotas.

@pditommaso
Copy link
Member

It looks it's failing here

return getVmType(location, scores.firstEntry().value)

@sminot could you share your NF config?

@sminot
Copy link
Author

sminot commented Mar 26, 2021

Masking the private info, my config is:

plugins {
  id 'nf-azure'
}

process {
  executor = 'azurebatch'
}

azure {
  batch {
    location = 'westus2'
    accountName = ACCOUNT
    accountKey = KEY
    autoPoolMode = true
  }
  storage {
    accountName = ACCOUNT
    accountKey = KEY
  }
}

workDir = "az://BUCKET/work"

@pditommaso
Copy link
Member

I think the problem is that it can't find a VM type matching the cpus/mem in your pipeline task. If you specify a concrete one using the vmType attribute it should bypass the problem

https://www.nextflow.io/docs/edge/azure.html#pools-configuration

@sminot
Copy link
Author

sminot commented Mar 26, 2021

Looking at that configuration, I updated my config to include:

plugins {
  id 'nf-azure'
}

process {
  executor = 'azurebatch'
}

azure {
  batch {
    location = 'westus2'
    accountName = ACCOUNT
    accountKey = KEY
    autoPoolMode = true
    deletePoolsOnCompletion = true
    pools {
        auto {
           vmType = 'Standard_D2_v2'
           vmCount = 5
           autoScale = true
           maxVmCount = 50
        }
    }
  }
  storage {
    accountName = ACCOUNT
    accountKey = KEY
  }
}

workDir = "az://BUCKET/work"

And the new error traceback is:

Mar-26 08:37:02.943 [Task submitter] DEBUG n.c.azure.batch.AzBatchTaskHandler - [AZURE BATCH] Submitting task getSRAlist - work-dir=az://cddi-bioinformatics-work/work/0a/a48965996d98d4aaa228e27afbde3c
Mar-26 08:37:03.287 [Task submitter] ERROR nextflow.processor.TaskProcessor - Error executing process > 'getSRAlist'

Caused by:
  java.lang.NullPointerException

java.lang.NullPointerException: null
        at nextflow.cloud.azure.batch.AzBatchService.findBestVm(AzBatchService.groovy:168)
        at nextflow.cloud.azure.batch.AzBatchService.guessBestVm(AzBatchService.groovy:128)
        at nextflow.cloud.azure.batch.AzBatchService.specFromAutoPool(AzBatchService.groovy:446)
        at nextflow.cloud.azure.batch.AzBatchService.specForTask(AzBatchService.groovy:492)
        at nextflow.cloud.azure.batch.AzBatchService.getOrCreatePool(AzBatchService.groovy:498)
        at nextflow.cloud.azure.batch.AzBatchService.submitTask(AzBatchService.groovy:256)
        at nextflow.cloud.azure.batch.AzBatchTaskHandler.submit(AzBatchTaskHandler.groovy:91)
        at nextflow.processor.TaskPollingMonitor.submit(TaskPollingMonitor.groovy:196)
        at nextflow.processor.TaskPollingMonitor.submitPendingTasks(TaskPollingMonitor.groovy:560)
        at nextflow.processor.TaskPollingMonitor.submitLoop(TaskPollingMonitor.groovy:387)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1268)
        at groovy.lang.MetaClassImpl.invokeMethodClosure(MetaClassImpl.java:1048)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1142)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
        at groovy.lang.Closure.call(Closure.java:412)
        at groovy.lang.Closure.call(Closure.java:406)
        at groovy.lang.Closure.run(Closure.java:493)
        at java.lang.Thread.run(Thread.java:748)
Mar-26 08:37:03.292 [Task submitter] DEBUG nextflow.Session - Session aborted -- Cause: java.lang.NullPointerException

@pditommaso
Copy link
Member

Standard_D2_v2 has only 2 cpus and 7GB is enough for your pipeline?

@sminot
Copy link
Author

sminot commented Mar 26, 2021

Good point! I must admit that this Azure configuration is all very new to me.

I changed the vmType to Standard_D14, but unfortunately I ended up getting the same error.

It's entirely possible that this error is being caused by some problem with my Azure configuration that I'm not aware of. I tried following the setup instructions on the Nextflow docs page, but it's completely possible that I missed something.

Since I'm not giving you particularly good information to go on, I wouldn't object to closing this issue and moving the discussion to an email thread if you would prefer, @pditommaso.

I would be very excited to get up and running on Azure with Nextflow, but I also don't want to ask too much of your time! Many thanks for any help you can provide.

@pditommaso
Copy link
Member

No pb. Don't see any Standard_D14, there should be Standard_D14_v2 or Standard_D14_v2_Promo

@sminot
Copy link
Author

sminot commented Mar 26, 2021

Oops! I am clearly learning here...

I can confirm that this fixes the bug! The next error I see is coming from inside the job (and therefore my responsibility), and so I am going to marked this as resolved.

Thank you!!

@sminot sminot closed this as completed Mar 26, 2021
@pditommaso
Copy link
Member

Cool, i've uploaded a patch to prevent the NPE in the future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants