Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jbang exceptions while running on cloud #45

Closed
abhi18av opened this issue Aug 18, 2022 · 4 comments
Closed

jbang exceptions while running on cloud #45

abhi18av opened this issue Aug 18, 2022 · 4 comments

Comments

@abhi18av
Copy link
Contributor

Hi genepi team 👋

Thanks for the neat pipeline!

I was able to successfully run the pipeline locally using the -profile test,docker profiles however it seems that the jbang's native behavior of downloading JARs on the fly might not be well suited for the cloud environment.

Issue encountered

When I tried to run the pipeline on the cloud (both Azure/AWS Batch) setting via Nextflow CLI by adding the cloud-specific configs (azure.config) and invoking

$ nextflow -c ./azure.config run https://github.com/genepi/nf-gwas -profile test,docker,azb -r main -latest

I kept running across the following issue in the initial caching process

Error executing process > 'NF_GWAS:CACHE_JBANG_SCRIPTS'

Caused by:
  Process `NF_GWAS:CACHE_JBANG_SCRIPTS` terminated with an error exit status (1)

Command executed:

  jbang export portable -O=RegenieLogParser.jar RegenieLogParser.java
  jbang export portable -O=RegenieFilter.jar RegenieFilter.java
  jbang export portable -O=RegenieValidateInput.jar RegenieValidateInput.java

Command exit status:
  1

Command output:
  (empty)

Command error:
  Exception in thread "main" dev.jbang.cli.ExitException: Resource could not be copied from class path: jbang.properties
        at dev.jbang.source.resolvers.ClasspathResourceResolver.getClasspathResource(ClasspathResourceResolver.java:66)
        at dev.jbang.source.resolvers.ClasspathResourceResolver.resolve(ClasspathResourceResolver.java:31)
        at dev.jbang.source.resolvers.CombinedResourceResolver.lambda$resolve$0(CombinedResourceResolver.java:26)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
        at java.base/java.util.Spliterators$ArraySpliterator.tryAdvance(Spliterators.java:958)
        at java.base/java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:127)
        at java.base/java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:502)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:488)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
        at java.base/java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:150)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:543)
        at dev.jbang.source.resolvers.CombinedResourceResolver.resolve(CombinedResourceResolver.java:28)
        at dev.jbang.source.ResourceRef.forResource(ResourceRef.java:136)
        at dev.jbang.Configuration.getBuiltin(Configuration.java:265)
        at dev.jbang.Configuration.defaults(Configuration.java:218)
        at dev.jbang.Configuration.getMerged(Configuration.java:251)
        at dev.jbang.Configuration.instance(Configuration.java:196)
        at dev.jbang.cli.JBang$ConfigurationResourceBundle.getKeys(JBang.java:212)
        at picocli.CommandLine$Model$Messages.keys(CommandLine.java:11250)
        at picocli.CommandLine$Model$Messages.<init>(CommandLine.java:11234)
        at picocli.CommandLine$Model$CommandSpec.setBundle(CommandLine.java:6346)
        at picocli.CommandLine$Model$CommandSpec.resourceBundle(CommandLine.java:6342)
        at picocli.CommandLine.setResourceBundle(CommandLine.java:3319)
        at dev.jbang.cli.JBang.getCommandLine(JBang.java:227)
        at dev.jbang.cli.JBang.getCommandLine(JBang.java:107)
        at dev.jbang.Main.main(Main.java:12)

I suspect that this might have to do with how jbang relies on the download of jar dependencies (in a local lib folder) which is not available to the tasks in other nodes (or other instances of the container) in a multi-node setting.

Suggestions

Allow me to share a couple of suggestions which might be worth considering.

  1. Use a compiled shadow-jar/uber-jar (single jar with all deps baked in) so that there is no need for the lib folder to be available to downstream processes which rely upon these cached jar files.

  2. Another alternative, perhaps with less effort, is to perhaps bake in the compiled jar files in the container itself since the tool is already available in the container, this way we can ensure that the dependencies (i.e. lib folder) as well as the Regenie* JARs are all available within the container instances across different nodes.

Collaboration

I'd be happy to test the pipeline on the cloud and to discuss any changes which might be necessary for making the pipeline optimal (hardware config etc) for the cloud setting.

@seppinho
Copy link
Member

Hi Abhinav,
Great to see you're trying to run this on Azure. Thanks for the detailed error description and suggested solutions. Bundling the JARs into the docker container makes sense to us, feel free to move on with this. Let us know if there is input required from our side.

Collaboration sounds great, we would love to improve the pipeline that it runs smoothly on Azure.

@abhi18av
Copy link
Contributor Author

abhi18av commented Aug 19, 2022

Thanks @seppinho

Yes, I'd appreciate some thoughts on the overall manner we get this done.

As you might have noticed in the PR #46

  1. I had to tweak the location of the JAR file within the container /opt/RegenieValidateInput.jar, so that it's available in a centralized location. When Nextflow mounts the task directory volume, this location can be accessed within the container.

  2. I also tweaked the inputs of validate_phenotypes.nf as it would no longer rely upon the JAR being staged via channel.

  3. Disabled the CACHE_JBANG_SCRIPTS process as its not serving the purpose in cloud context.

However, these changes mean that any non-containerized (in this case docker) workload would not find the expected JARs there. As the pipeline isn't officially supporting conda based setup then I'm guessing the above changes would be okay?

P.S. I have also explored the approach of staging the lib folder across the pipeline here (on internal PR of my fork) but ran across some issues there.

@seppinho
Copy link
Member

Thx for the detailed explanation. Right, conda is currently not supported, so approach sounds good to me. Really appreciate your work.

@seppinho
Copy link
Member

Thanks! #46 has now been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants