Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DEV] Adding a gatk_jar downloader #58

Closed
wants to merge 5 commits into from

Conversation

Mxrcon
Copy link
Member

@Mxrcon Mxrcon commented Apr 6, 2022

Hey 👋, May I help on this issue? #57 I'm adding a utility module able to download the Gatk_jar using the wget command and them use tar to uncompress it. The output is caught by a glob pattern, Not sure about this addition, maybe we can discuss another way of parsing it!
I've also added the alpine container usage on the docker profile, it has tar and wget. (i've double-checked on alpine package list.

This only add the module as utility, we can discuss if the module should be added on our workflows.

This Pull Request closes #57

Hope this small PR helps!
Feel free to add your thoughts, I'm available to help with this task.

Kindly, Davi

@Mxrcon Mxrcon added the enhancement New feature or request label Apr 6, 2022
@Mxrcon Mxrcon requested a review from abhi18av April 6, 2022 02:06
@Mxrcon Mxrcon self-assigned this Apr 6, 2022
@abhi18av
Copy link
Member

abhi18av commented Apr 6, 2022

Thanks for the initiative Davi! 💚 👍

I'm also working on addressing the installation issue via conda and perhaps we can merge this PR as soon as the problem with gatk-register (and possible upgrade) is resolved in #59 ?

@Mxrcon
Copy link
Member Author

Mxrcon commented Apr 6, 2022

😃 sure! this problem is so odd, I'm interested on how they'll solve it.

@Mxrcon Mxrcon force-pushed the mxrcon/dev-gatk-downloader branch from c837e9b to b3463a9 Compare April 6, 2022 21:41
@Mxrcon Mxrcon force-pushed the mxrcon/dev-gatk-downloader branch from 582f620 to 0da326b Compare April 6, 2022 21:46
@Mxrcon
Copy link
Member Author

Mxrcon commented Apr 6, 2022

Update: I added wget and tar to conda env, as this will prevent some incompatibilities

Copy link
Member

@abhi18av abhi18av left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Davi thanks for this PR and the initiative on fixing these usability issues.

I have been testing the mtbseq-1.0.4 release via (i) conda 🔴 (ii) docker 🔴 and basically if gatk/gatk3 is not the issue then perl Basic::Statistics module is throwing dependency issues!

Even after using conda!!!

Therefore I'd like to be extra sure andd confirm whether you've tested this on a baseline linux machine via conda as well?

Comment on lines 19 to 22
withName:
'DOWNLOAD_GATK_JAR' {
container = 'alpine:3.14'
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mxrcon , perhaps we might not need to use an extra container?

In these cases we can reuse (maybe?) the mtbseq container since its derived from debian and would be downloaded (and available) on the underlying machine anyhow.

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, even alpine being a minimal container, we can use the mtbseq for it.

Comment on lines 29 to +32
gatk38_jar = "${projectDir}/resources/GenomeAnalysisTK-3.8-0-ge9d806836/GenomeAnalysisTK.jar"

gatk_jar_link = "https://storage.googleapis.com/gatk-software/package-archive/gatk/GenomeAnalysisTK-3.8-0-ge9d806836.tar.bz2"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some testing, preferably we can remove the necessity of gatk38_jar completely 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, but we can keep it until we properly test it

conda_envs/mtbseq-nf-env.yml Show resolved Hide resolved
@abhi18av
Copy link
Member

Davi, let's put this PR on hold till the point our analysis for generated reports is completed and then we can merge and test this PR and make the v1.0.0 release 🤓

@Mxrcon
Copy link
Member Author

Mxrcon commented Apr 10, 2022

@abhi18av Sure! I completely agreed.

@Mxrcon
Copy link
Member Author

Mxrcon commented Apr 10, 2022

Davi thanks for this PR and the initiative on fixing these usability issues.

I have been testing the mtbseq-1.0.4 release via (i) conda red_circle (ii) docker red_circle and basically if gatk/gatk3 is not the issue then perl Basic::Statistics module is throwing dependency issues!

Even after using conda!!!

Therefore I'd like to be extra sure andd confirm whether you've tested this on a baseline linux machine via conda as well?

@abhi18av
I tried to install mtbseq 1.0.4 via mamba, and tried to register using gatk3-register, then tried to run MTBseq, this was the output:

Can't locate Statistics/Basic.pm in @INC (you may need to install the Statistics::Basic module) (@INC contains: /home/davi_marcon/miniconda3/envs/mtbseq-104/share/mtbseq-1.0.4-2/lib /home/davi_marcon/miniconda3/envs/mtbseq-104/lib/perl5/5.32/site_perl /home/davi_marcon/miniconda3/envs/mtbseq-104/lib/perl5/site_perl /home/davi_marcon/miniconda3/envs/mtbseq-104/lib/perl5/5.32/vendor_perl /home/davi_marcon/miniconda3/envs/mtbseq-104/lib/perl5/vendor_perl /home/davi_marcon/miniconda3/envs/mtbseq-104/lib/perl5/5.32/core_perl /home/davi_marcon/miniconda3/envs/mtbseq-104/lib/perl5/core_perl .) at /home/davi_marcon/miniconda3/envs/mtbseq-104/share/mtbseq-1.0.4-2/lib/TBtools.pm line 9.
BEGIN failed--compilation aborted at /home/davi_marcon/miniconda3/envs/mtbseq-104/share/mtbseq-1.0.4-2/lib/TBtools.pm line 9.
Compilation failed in require at /home/davi_marcon/miniconda3/envs/mtbseq-104/share/mtbseq-1.0.4-2/lib/TBbwa.pm line 8.
BEGIN failed--compilation aborted at /home/davi_marcon/miniconda3/envs/mtbseq-104/share/mtbseq-1.0.4-2/lib/TBbwa.pm line 8.
Compilation failed in require at /home/davi_marcon/miniconda3/envs/mtbseq-104/bin/MTBseq line 10.
BEGIN failed--compilation aborted at /home/davi_marcon/miniconda3/envs/mtbseq-104/bin/MTBseq line 10.

@abhi18av
Copy link
Member

Okay thanks for the confirmation Davi, is it working fine with mtbseq-1.0.3 ?

@Mxrcon
Copy link
Member Author

Mxrcon commented Apr 11, 2022

@abhi18av, yes! using the version 1.0.3 it works with no problems when using gatk3-register

@abhi18av
Copy link
Member

Okay, then please elaborate step-by-step how you tested with mtbseq-nf:v1.0.3 version.

I just want to confirm whether the default conda setup for that version works with gatk-register (since this is what is available in corresponding biocontainer)

Happy to connect tonight if this is not clear.

@Mxrcon
Copy link
Member Author

Mxrcon commented Apr 13, 2022

@abhi18av, Here's how i setup the testings, You can change the mtbseq=={version} for matching any one interesting

version 1.0.3:

mamba create -n mtbseq_test -c bioconda mtbseq==1.0.3
mamba activate mtbseq_test
gatk3-register GenomeAnalysisTK.jar
MTBseq --check

Output:

<INFO>  [2022-04-13 17:38:22]   Found perl module: MCE
<INFO>  [2022-04-13 17:38:22]   Found perl module: Statistics::Basic
<INFO>  [2022-04-13 17:38:22]   Found bwa in your PATH!
<INFO>  [2022-04-13 17:38:22]   Found samtools in your PATH!
<ERROR> [2022-04-13 17:38:22]   gatk is not installed or not in your PATH!

version 1.0.4:

mamba create -n mtbseq_test -c bioconda mtbseq==1.0.4
mamba activate mtbseq_test
gatk3-register GenomeAnalysisTK.jar
MTBseq --check

output:

Can't locate Statistics/Basic.pm in @INC (you may need to install the Statistics::Basic module) (@INC contains: /home/davi_marcon/miniconda3/envs/mtbseq_test/share/mtbseq-1.0.4-2/lib /home/davi_marcon/miniconda3/envs/mtbseq_test/lib/perl5/5.32/site_perl /home/davi_marcon/miniconda3/envs/mtbseq_test/lib/perl5/site_perl /home/davi_marcon/miniconda3/envs/mtbseq_test/lib/perl5/5.32/vendor_perl /home/davi_marcon/miniconda3/envs/mtbseq_test/lib/perl5/vendor_perl /home/davi_marcon/miniconda3/envs/mtbseq_test/lib/perl5/5.32/core_perl /home/davi_marcon/miniconda3/envs/mtbseq_test/lib/perl5/core_perl .) at /home/davi_marcon/miniconda3/envs/mtbseq_test/share/mtbseq-1.0.4-2/lib/TBtools.pm line 9.
BEGIN failed--compilation aborted at /home/davi_marcon/miniconda3/envs/mtbseq_test/share/mtbseq-1.0.4-2/lib/TBtools.pm line 9.
Compilation failed in require at /home/davi_marcon/miniconda3/envs/mtbseq_test/share/mtbseq-1.0.4-2/lib/TBbwa.pm line 8.
BEGIN failed--compilation aborted at /home/davi_marcon/miniconda3/envs/mtbseq_test/share/mtbseq-1.0.4-2/lib/TBbwa.pm line 8.
Compilation failed in require at /home/davi_marcon/miniconda3/envs/mtbseq_test/bin/MTBseq line 10.
BEGIN failed--compilation aborted at /home/davi_marcon/miniconda3/envs/mtbseq_test/bin/MTBseq line 10.

Very odd, both of them return errors 😭

version 1.0.3 + gatk:

mamba create -n mtbseq_test -c bioconda mtbseq==1.0.4 gatk==3.8
mamba activate mtbseq_test
gatk3-register GenomeAnalysisTK.jar
MTBseq --check

output:

<INFO>  [2022-04-13 17:55:59]   Found perl module: MCE
<INFO>  [2022-04-13 17:55:59]   Found perl module: Statistics::Basic
<INFO>  [2022-04-13 17:55:59]   Found bwa in your PATH!
<INFO>  [2022-04-13 17:55:59]   Found samtools in your PATH!
<ERROR> [2022-04-13 17:55:59]   gatk is not installed or not in your PATH! 

LATEST WORKING SOLUTION: version 1.0.3 + gatk:

mamba create -n mtbseq_test -c bioconda mtbseq=1.0.3=pl526_1 gatk=3.8=py36_0
mamba activate mtbseq_test
gatk-register GenomeAnalysisTK.jar
MTBseq --check

output:

<INFO>  [2022-04-13 18:05:27]   Found perl module: MCE
<INFO>  [2022-04-13 18:05:27]   Found perl module: Statistics::Basic
<INFO>  [2022-04-13 18:05:27]   Found bwa in your PATH!
<INFO>  [2022-04-13 18:05:27]   Found samtools in your PATH!
<INFO>  [2022-04-13 18:05:27]   Found gatk in your PATH!
<INFO>  [2022-04-13 18:05:27]   Found picard in your PATH!

@abhi18av
Copy link
Member

This sounds good Davi, it seems that we've both been able to test the conda based setup independently for mtbseq-1.0.3 but NOT for 1.0.4 (see #58 (review) , #59 and #58 (comment) )

Adding here the final working solution ✅

mamba create -n mtbseq_test -c bioconda mtbseq=1.0.3=pl526_1 gatk=3.8=py36_0
mamba activate mtbseq_test
gatk-register GenomeAnalysisTK.jar
MTBseq --check

@abhi18av
Copy link
Member

This PR will be addressed after the QC workflow has been confirmed to be working.

@abhi18av abhi18av added the hold label Apr 28, 2022
@abhi18av abhi18av marked this pull request as draft April 28, 2022 10:22
@Mxrcon
Copy link
Member Author

Mxrcon commented Aug 29, 2022

On my last review around our overall progress on this task, I'd say that this PR is very close to #74, and the overall design of the custom docker container and the conda_env will take care of this problem.

Before I start to tweak those changes, I'd like to hear your thoughts @abhi18av :

What do you think about this:

  1. If were using docker based profile, well keep using the docker image resulting from the build.sh which should have the gatk.jar inside it as well provide a download instruction into the dockerfile.

  2. The other problem would be the gatkjar register when using conda, for this, I'd like to suggest another change: Add a wget instruction and a gatkregister on setup_conda_envs.sh script.

This way, well have both build.sh for docker and setup_conda_envs.sh for conda, setting up the gatkjar on mtbseq

On both changes, we'll need to update the installation instructions so the user will need to pick one of the following:

  1. run bash container/build.sh
  2. run bash conda_envs/setup_conda_envs.sh

Perhaps, we should publish the resulting image so we can use container directive easily.

@abhi18av
Copy link
Member

@Mxrcon , sure - it makes sense to me 👍 . Feel free to take these forward in #74 directly to avoid merge conflicts and accidental code mess-ups.

@abhi18av
Copy link
Member

abhi18av commented Sep 3, 2022

Closing this PR due to the decision to improve the GATK jar using #74

@abhi18av abhi18av closed this Sep 3, 2022
@abhi18av abhi18av deleted the mxrcon/dev-gatk-downloader branch September 3, 2022 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request hold
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automatically download and extract the GATK jar
2 participants