jbang exceptions while running on cloud #45

abhi18av · 2022-08-18T18:24:05Z

Hi genepi team 👋

Thanks for the neat pipeline!

I was able to successfully run the pipeline locally using the -profile test,docker profiles however it seems that the jbang's native behavior of downloading JARs on the fly might not be well suited for the cloud environment.

Issue encountered

When I tried to run the pipeline on the cloud (both Azure/AWS Batch) setting via Nextflow CLI by adding the cloud-specific configs (azure.config) and invoking

$ nextflow -c ./azure.config run https://github.com/genepi/nf-gwas -profile test,docker,azb -r main -latest

I kept running across the following issue in the initial caching process

Error executing process > 'NF_GWAS:CACHE_JBANG_SCRIPTS'

Caused by:
  Process `NF_GWAS:CACHE_JBANG_SCRIPTS` terminated with an error exit status (1)

Command executed:

  jbang export portable -O=RegenieLogParser.jar RegenieLogParser.java
  jbang export portable -O=RegenieFilter.jar RegenieFilter.java
  jbang export portable -O=RegenieValidateInput.jar RegenieValidateInput.java

Command exit status:
  1

Command output:
  (empty)

Command error:
  Exception in thread "main" dev.jbang.cli.ExitException: Resource could not be copied from class path: jbang.properties
        at dev.jbang.source.resolvers.ClasspathResourceResolver.getClasspathResource(ClasspathResourceResolver.java:66)
        at dev.jbang.source.resolvers.ClasspathResourceResolver.resolve(ClasspathResourceResolver.java:31)
        at dev.jbang.source.resolvers.CombinedResourceResolver.lambda$resolve$0(CombinedResourceResolver.java:26)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
        at java.base/java.util.Spliterators$ArraySpliterator.tryAdvance(Spliterators.java:958)
        at java.base/java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:127)
        at java.base/java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:502)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:488)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
        at java.base/java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:150)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:543)
        at dev.jbang.source.resolvers.CombinedResourceResolver.resolve(CombinedResourceResolver.java:28)
        at dev.jbang.source.ResourceRef.forResource(ResourceRef.java:136)
        at dev.jbang.Configuration.getBuiltin(Configuration.java:265)
        at dev.jbang.Configuration.defaults(Configuration.java:218)
        at dev.jbang.Configuration.getMerged(Configuration.java:251)
        at dev.jbang.Configuration.instance(Configuration.java:196)
        at dev.jbang.cli.JBang$ConfigurationResourceBundle.getKeys(JBang.java:212)
        at picocli.CommandLine$Model$Messages.keys(CommandLine.java:11250)
        at picocli.CommandLine$Model$Messages.<init>(CommandLine.java:11234)
        at picocli.CommandLine$Model$CommandSpec.setBundle(CommandLine.java:6346)
        at picocli.CommandLine$Model$CommandSpec.resourceBundle(CommandLine.java:6342)
        at picocli.CommandLine.setResourceBundle(CommandLine.java:3319)
        at dev.jbang.cli.JBang.getCommandLine(JBang.java:227)
        at dev.jbang.cli.JBang.getCommandLine(JBang.java:107)
        at dev.jbang.Main.main(Main.java:12)

I suspect that this might have to do with how jbang relies on the download of jar dependencies (in a local lib folder) which is not available to the tasks in other nodes (or other instances of the container) in a multi-node setting.

Suggestions

Allow me to share a couple of suggestions which might be worth considering.

Use a compiled shadow-jar/uber-jar (single jar with all deps baked in) so that there is no need for the lib folder to be available to downstream processes which rely upon these cached jar files.
Another alternative, perhaps with less effort, is to perhaps bake in the compiled jar files in the container itself since the tool is already available in the container, this way we can ensure that the dependencies (i.e. lib folder) as well as the Regenie* JARs are all available within the container instances across different nodes.

Collaboration

I'd be happy to test the pipeline on the cloud and to discuss any changes which might be necessary for making the pipeline optimal (hardware config etc) for the cloud setting.

The text was updated successfully, but these errors were encountered:

seppinho · 2022-08-19T09:26:33Z

Hi Abhinav,
Great to see you're trying to run this on Azure. Thanks for the detailed error description and suggested solutions. Bundling the JARs into the docker container makes sense to us, feel free to move on with this. Let us know if there is input required from our side.

Collaboration sounds great, we would love to improve the pipeline that it runs smoothly on Azure.

abhi18av · 2022-08-19T11:20:02Z

Thanks @seppinho

Yes, I'd appreciate some thoughts on the overall manner we get this done.

As you might have noticed in the PR #46

I had to tweak the location of the JAR file within the container /opt/RegenieValidateInput.jar, so that it's available in a centralized location. When Nextflow mounts the task directory volume, this location can be accessed within the container.
I also tweaked the inputs of validate_phenotypes.nf as it would no longer rely upon the JAR being staged via channel.
Disabled the CACHE_JBANG_SCRIPTS process as its not serving the purpose in cloud context.

However, these changes mean that any non-containerized (in this case docker) workload would not find the expected JARs there. As the pipeline isn't officially supporting conda based setup then I'm guessing the above changes would be okay?

P.S. I have also explored the approach of staging the lib folder across the pipeline here (on internal PR of my fork) but ran across some issues there.

seppinho · 2022-08-19T15:46:57Z

Thx for the detailed explanation. Right, conda is currently not supported, so approach sounds good to me. Really appreciate your work.

seppinho · 2022-08-23T11:30:59Z

Thanks! #46 has now been merged.

abhi18av mentioned this issue Aug 19, 2022

Bundle the jbang-compiled JARs within the docker container and update empty-file usage for cloud storage #46

Merged

seppinho closed this as completed Aug 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jbang exceptions while running on cloud #45

jbang exceptions while running on cloud #45

abhi18av commented Aug 18, 2022

seppinho commented Aug 19, 2022

abhi18av commented Aug 19, 2022 •

edited

Loading

seppinho commented Aug 19, 2022

seppinho commented Aug 23, 2022

jbang exceptions while running on cloud #45

jbang exceptions while running on cloud #45

Comments

abhi18av commented Aug 18, 2022

Issue encountered

Suggestions

Collaboration

seppinho commented Aug 19, 2022

abhi18av commented Aug 19, 2022 • edited Loading

seppinho commented Aug 19, 2022

seppinho commented Aug 23, 2022

abhi18av commented Aug 19, 2022 •

edited

Loading