Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mismatch between model names valid for dragonflye 1.1.0 and medaka 1.8.0 #19

Closed
flass opened this issue May 31, 2023 · 14 comments
Closed

Comments

@flass
Copy link

flass commented May 31, 2023

Hi,

I have made a conda install of dragonflye (within a docker image), forcing the dependencies for flye and medaka to be the latest versions:

micromamba install -n base -y -c conda-forge -c bioconda \
    flye=2.9.2 \
    medaka=1.8.0 \
    dragonflye=1.1.0

this works, but if I want to specifiy the use of the latest model r1041_e82_400bps_sup_v420 , I get an error at the medaka stage:

[...]
[dragonflye] Running: medaka_consensus -i READS.fq.gz -d flye/polish/racon/1/consensus.fasta -o flye/polish/medaka/1 -m r1041_e82_400bps_sup_v420 -t 4  2>&1 | sed 's/^/[polishing - medaka (1 of 1)] /' | tee -a dragonflye.log
[polishing - medaka (1 of 1)] Traceback (most recent call last):
[polishing - medaka (1 of 1)]   File "/opt/conda/lib/python3.10/site-packages/medaka/medaka.py", line 35, in __call__
[polishing - medaka (1 of 1)]     model_fp = medaka.models.resolve_model(val)
[polishing - medaka (1 of 1)]   File "/opt/conda/lib/python3.10/site-packages/medaka/models.py", line 31, in resolve_model
[polishing - medaka (1 of 1)]     raise ValueError(
[polishing - medaka (1 of 1)] ValueError: Model r1041_e82_400bps_sup_v420 is not a known model or existant file.
[dragonflye] Error running command: medaka_consensus -i READS.fq.gz -d flye/polish/racon/1/consensus.fasta -o flye/polish/medaka/1 -m r1041_e82_400bps_sup_v420 -t 4  2>&1 | sed 's/^/[polishing - medaka (1 of 1)] /' | tee -a
dragonflye.log

Indeed medaka wants something like this: r1041_e82_400bps_sup_v4.2.0, with dots in the version name.

docker run -v $HOME:$HOME -w $HOME/test gitlab-registry.internal.sanger.ac.uk/sanger-pathogens/docker-images-test/dragonflye:1.1.0 medaka tools list\_models
Available: r103_fast_g507, r103_fast_snp_g507, r103_fast_variant_g507, r103_hac_g507, r103_hac_snp_g507, r103_hac_variant_g507, r103_min_high_g345, r103_min_high_g360, r103_prom_high_g360, r103_prom_snp_g3210, r103_prom_variant_g3210, r103_sup_g507, r103_sup_snp_g507, r103_sup_variant_g507, r1041_e82_260bps_fast_g632, r1041_e82_260bps_fast_variant_g632, r1041_e82_260bps_hac_g632, r1041_e82_260bps_hac_v4.0.0, r1041_e82_260bps_hac_v4.1.0, r1041_e82_260bps_hac_variant_g632, r1041_e82_260bps_hac_variant_v4.1.0, r1041_e82_260bps_sup_g632, r1041_e82_260bps_sup_v4.0.0, r1041_e82_260bps_sup_v4.1.0, r1041_e82_260bps_sup_variant_g632, r1041_e82_260bps_sup_variant_v4.1.0, r1041_e82_400bps_fast_g615, r1041_e82_400bps_fast_g632, r1041_e82_400bps_fast_variant_g615, r1041_e82_400bps_fast_variant_g632, r1041_e82_400bps_hac_g615, r1041_e82_400bps_hac_g632, r1041_e82_400bps_hac_v4.0.0, r1041_e82_400bps_hac_v4.1.0, r1041_e82_400bps_hac_v4.2.0, r1041_e82_400bps_hac_variant_g615, r1041_e82_400bps_hac_variant_g632, r1041_e82_400bps_hac_variant_v4.1.0, r1041_e82_400bps_hac_variant_v4.2.0, r1041_e82_400bps_sup_g615, r1041_e82_400bps_sup_v4.0.0, r1041_e82_400bps_sup_v4.1.0, r1041_e82_400bps_sup_v4.2.0, r1041_e82_400bps_sup_variant_g615, r1041_e82_400bps_sup_variant_v4.1.0, r1041_e82_400bps_sup_variant_v4.2.0, r104_e81_fast_g5015, r104_e81_fast_variant_g5015, r104_e81_hac_g5015, r104_e81_hac_variant_g5015, r104_e81_sup_g5015, r104_e81_sup_g610, r104_e81_sup_variant_g610, r10_min_high_g303, r10_min_high_g340, r941_e81_fast_g514, r941_e81_fast_variant_g514, r941_e81_hac_g514, r941_e81_hac_variant_g514, r941_e81_sup_g514, r941_e81_sup_variant_g514, r941_min_fast_g303, r941_min_fast_g507, r941_min_fast_snp_g507, r941_min_fast_variant_g507, r941_min_hac_g507, r941_min_hac_snp_g507, r941_min_hac_variant_g507, r941_min_high_g303, r941_min_high_g330, r941_min_high_g340_rle, r941_min_high_g344, r941_min_high_g351, r941_min_high_g360, r941_min_sup_g507, r941_min_sup_snp_g507, r941_min_sup_variant_g507, r941_prom_fast_g303, r941_prom_fast_g507, r941_prom_fast_snp_g507, r941_prom_fast_variant_g507, r941_prom_hac_g507, r941_prom_hac_snp_g507, r941_prom_hac_variant_g507, r941_prom_high_g303, r941_prom_high_g330, r941_prom_high_g344, r941_prom_high_g360, r941_prom_high_g4011, r941_prom_snp_g303, r941_prom_snp_g322, r941_prom_snp_g360, r941_prom_sup_g507, r941_prom_sup_snp_g507, r941_prom_sup_variant_g507, r941_prom_variant_g303, r941_prom_variant_g322, r941_prom_variant_g360, r941_sup_plant_g610, r941_sup_plant_variant_g610
Default consensus:  r1041_e82_400bps_sup_v4.2.0
Default variant:  r1041_e82_400bps_sup_variant_v4.2.0

If trying to give that medaka-valid value to dragonflye:

dragonflye \
--reads dragonflye/barcode07.fastq.gz \
--R1 4075_2#2_1.fastq.gz \
--R2 4075_2#2_2.fastq.gz \
--gsize 4.5M --medaka 1 --model r1041_e82_400bps_sup_v4.2.0 \
--cpus 4 --ram 6 --outdir dragonflye/test_IP6794-89

then dragonflye fails at the argument validation step:

[dragonflye] You ran: /opt/conda/bin/dragonflye --reads dragonflye/barcode07.fastq.gz --R1 dragonflye/4075_2#2_1
.fastq.gz --R2 dragonflye/4075_2#2_2.fastq.gz --gsize 4.5M --medaka 1 --model r1041_e82_400bps_sup_v4.2.0 --cpus 4 --ram 6 --outdir dragonflye/test_IP6794-89
[dragonflye] This is dragonflye 1.1.0
[dragonflye] Written by Robert A Petit III
[dragonflye] Homepage is https://github.com/rpetit3/dragonflye
[dragonflye] Operating system is linux
[dragonflye] Perl version is v5.32.1
[dragonflye] Machine has 256 CPU cores and 2015.34 GB RAM
[dragonflye] Verifying input model (--model): r1041_e82_400bps_sup_v4.2.0
[dragonflye] Unable to verify model 'r1041_e82_400bps_sup_v4.2.0', please check spelling and try again.
[dragonflye] Available Medaka models include:
[dragonflye]    r103_fast_g507
[dragonflye]    r103_hac_g507
[dragonflye]    r103_min_high_g345
[dragonflye]    r103_min_high_g360
[...]

Could you please change your validation scheme so that it matches that of medaka?

Best wishes,
Florent

@incoherentian
Copy link

Problem is clearly not enough cores or RAM.

...more seriously, I thought something else might happen when pinning flye 2.9.2 (I'm guessing that's what you meant), thus backed off for someone else to try first 😈 Thank you for being that person!

So the wrapper is actually still passing kit14 along fine to flye2.9.2, just not medaka models for kit 14? That's actually still pretty awesome, I think I'm going to pin these too. Non- kit 14 models with medaka 1.8.0 are still polishing fine?

@flass
Copy link
Author

flass commented May 31, 2023

yes, it's flye 2.9.2, sorry (corrected in original post)
the flye step runs fine. i'm still having issues troubleshooting the medaka run.
with a conda install, apparently medaka comes with only the default models (those for Kit14) in /opt/conda/medaka/medaka/data/, so I had to copy the extra model files I needed from the git repo. now i'm getting some error linked to opening the model tar.gz file...
I can update here when succeeding, but I would not want to distract from the main poit of this issue, which that the validation scheme of dragonflye is off.

@rpetit3
Copy link
Owner

rpetit3 commented May 31, 2023

Hi @flass

I'll get this fixed today, let me know if you think its worth updating the medaka recipe to include those missing models.

Cheers,
Robert

@rpetit3
Copy link
Owner

rpetit3 commented May 31, 2023

@flass I think I have this fixed now, if you want to give it a try.

Here's the link to download: https://raw.githubusercontent.com/rpetit3/dragonflye/main/bin/dragonflye

@flass
Copy link
Author

flass commented May 31, 2023

Thank you Robert. I tried executing your dragonflye 1.1.1 sscript from within my docker container for v1.1.0 and it seems that it works!

mib114737i:dragonflye fl4$ docker run -v $HOME:$HOME -w $HOME/test/dragonflye gitlab-registry.internal.sanger.ac.uk/sanger-pathogens/docker-images-test/dragonflye:1.1.0 ./dragonflye --reads barcode07.fastq.gz --R1 4075_2#2_1.fastq.gz --R2 4075_2#2_2.fastq.gz --gsize 4.5M --medaka 1 --model r1041_e82_400bps_sup_v4.2.0 --cpus 4 --ram 6 --outdir test_IP6794-89-4
[dragonflye] Hello mambauser
[dragonflye] You ran: /Users/fl4/test/dragonflye/dragonflye --reads barcode07.fastq.gz --R1 4075_2#2_1.fastq.gz --R2 4075_2#2_2.fastq.gz --gsize 4.5M --medaka 1 --model r1041_e82_400bps_sup_v4.2.0 --cpus 4 --ram 6 --outdir test_IP6794-89-4
[dragonflye] This is dragonflye 1.1.1
[dragonflye] Written by Robert A Petit III
[dragonflye] Homepage is https://github.com/rpetit3/dragonflye
[dragonflye] Operating system is linux
[dragonflye] Perl version is v5.32.1
[dragonflye] Machine has 4 CPU cores and 7.68 GB RAM
[dragonflye] Verifying input model (--model): r1041_e82_400bps_sup_v4.2.0
[dragonflye] Model r1041_e82_400bps_sup_v4.2.0 verified!
[...going on to run the rest of the pipeline...]

will you have this released as a bioconda recipe / biocontainer image sometimes soon?

Best wishes,

Florent

@rpetit3
Copy link
Owner

rpetit3 commented May 31, 2023

Awesome, I'll get a version release submitted. It'll take a few hours to get synced on Bioconda, as I think I just missed the hourly bot auto-bump job

@flass
Copy link
Author

flass commented May 31, 2023

Awesome! No worries, I can wait tomorrow ;-)
thanks a lot again.
Florent

@flass
Copy link
Author

flass commented May 31, 2023

any chance that the bioconda recipe (and hence the resulting biocontainer) will pick up flye 2.9.2 and medaka 1.8.0 by default?

@rpetit3
Copy link
Owner

rpetit3 commented May 31, 2023

I'll verify in the build, but assuming yes.

Probably going to pin medaka to >=1.8.0. Think its worth pinning flye as well? Maybe just flye>=2.9

@flass
Copy link
Author

flass commented May 31, 2023

yes I'd say so it's worth pinning both - the flye 2.9.2 assembly has worked fine for me (can't really comment on quality though).

it would defintely be worth having a bunch (or all) of medaka models included the bioconda recipe.

I personally wished to have the following:
r1041_e82_400bps_sup_g615, r1041_e82_400bps_sup_variant_g615, r104_e81_sup_g610, r104_e81_sup_variant_g610, r941_min_sup_g507, r941_min_sup_variant_g507, r941_min_sup_snp_g507

Thanks!

@incoherentian
Copy link

incoherentian commented May 31, 2023

If dragonflye now supporting more efficient kit 14 alignment with newer flye anyway, my vote would def. be yes!

P.S. I tried adding your channel as I thought that would not suffer a delay. Now rereading and see you're just pushing it straight to bioconda!
mamba create -n dragonflye_m180 -c bioconda -c conda-forge -c rpetit3 dragonflye=1.1.1 flye=2.9.2 medaka=1.8.0

@rpetit3
Copy link
Owner

rpetit3 commented May 31, 2023

@flass Mind double checking those, I think we are getting them now. Here's what I'm getting with v1.1.1 (haha might not have to do anything!)

[dragonflye]    Available:
[dragonflye]    r103_fast_g507
[dragonflye]    r103_fast_snp_g507
[dragonflye]    r103_fast_variant_g507
[dragonflye]    r103_hac_g507
[dragonflye]    r103_hac_snp_g507
[dragonflye]    r103_hac_variant_g507
[dragonflye]    r103_min_high_g345
[dragonflye]    r103_min_high_g360
[dragonflye]    r103_prom_high_g360
[dragonflye]    r103_prom_snp_g3210
[dragonflye]    r103_prom_variant_g3210
[dragonflye]    r103_sup_g507
[dragonflye]    r103_sup_snp_g507
[dragonflye]    r103_sup_variant_g507
[dragonflye]    r1041_e82_260bps_fast_g632
[dragonflye]    r1041_e82_260bps_fast_variant_g632
[dragonflye]    r1041_e82_260bps_hac_g632
[dragonflye]    r1041_e82_260bps_hac_v4.0.0
[dragonflye]    r1041_e82_260bps_hac_v4.1.0
[dragonflye]    r1041_e82_260bps_hac_variant_g632
[dragonflye]    r1041_e82_260bps_hac_variant_v4.1.0
[dragonflye]    r1041_e82_260bps_sup_g632
[dragonflye]    r1041_e82_260bps_sup_v4.0.0
[dragonflye]    r1041_e82_260bps_sup_v4.1.0
[dragonflye]    r1041_e82_260bps_sup_variant_g632
[dragonflye]    r1041_e82_260bps_sup_variant_v4.1.0
[dragonflye]    r1041_e82_400bps_fast_g615
[dragonflye]    r1041_e82_400bps_fast_g632
[dragonflye]    r1041_e82_400bps_fast_variant_g615
[dragonflye]    r1041_e82_400bps_fast_variant_g632
[dragonflye]    r1041_e82_400bps_hac_g615
[dragonflye]    r1041_e82_400bps_hac_g632
[dragonflye]    r1041_e82_400bps_hac_v4.0.0
[dragonflye]    r1041_e82_400bps_hac_v4.1.0
[dragonflye]    r1041_e82_400bps_hac_v4.2.0
[dragonflye]    r1041_e82_400bps_hac_variant_g615
[dragonflye]    r1041_e82_400bps_hac_variant_g632
[dragonflye]    r1041_e82_400bps_hac_variant_v4.1.0
[dragonflye]    r1041_e82_400bps_hac_variant_v4.2.0
[dragonflye]    r1041_e82_400bps_sup_g615
[dragonflye]    r1041_e82_400bps_sup_v4.0.0
[dragonflye]    r1041_e82_400bps_sup_v4.1.0
[dragonflye]    r1041_e82_400bps_sup_v4.2.0
[dragonflye]    r1041_e82_400bps_sup_variant_g615
[dragonflye]    r1041_e82_400bps_sup_variant_v4.1.0
[dragonflye]    r1041_e82_400bps_sup_variant_v4.2.0
[dragonflye]    r104_e81_fast_g5015
[dragonflye]    r104_e81_fast_variant_g5015
[dragonflye]    r104_e81_hac_g5015
[dragonflye]    r104_e81_hac_variant_g5015
[dragonflye]    r104_e81_sup_g5015
[dragonflye]    r104_e81_sup_g610
[dragonflye]    r104_e81_sup_variant_g610
[dragonflye]    r10_min_high_g303
[dragonflye]    r10_min_high_g340
[dragonflye]    r941_e81_fast_g514
[dragonflye]    r941_e81_fast_variant_g514
[dragonflye]    r941_e81_hac_g514
[dragonflye]    r941_e81_hac_variant_g514
[dragonflye]    r941_e81_sup_g514
[dragonflye]    r941_e81_sup_variant_g514
[dragonflye]    r941_min_fast_g303
[dragonflye]    r941_min_fast_g507
[dragonflye]    r941_min_fast_snp_g507
[dragonflye]    r941_min_fast_variant_g507
[dragonflye]    r941_min_hac_g507
[dragonflye]    r941_min_hac_snp_g507
[dragonflye]    r941_min_hac_variant_g507
[dragonflye]    r941_min_high_g303
[dragonflye]    r941_min_high_g330
[dragonflye]    r941_min_high_g340_rle
[dragonflye]    r941_min_high_g344
[dragonflye]    r941_min_high_g351
[dragonflye]    r941_min_high_g360
[dragonflye]    r941_min_sup_g507
[dragonflye]    r941_min_sup_snp_g507
[dragonflye]    r941_min_sup_variant_g507
[dragonflye]    r941_prom_fast_g303
[dragonflye]    r941_prom_fast_g507
[dragonflye]    r941_prom_fast_snp_g507
[dragonflye]    r941_prom_fast_variant_g507
[dragonflye]    r941_prom_hac_g507
[dragonflye]    r941_prom_hac_snp_g507
[dragonflye]    r941_prom_hac_variant_g507
[dragonflye]    r941_prom_high_g303
[dragonflye]    r941_prom_high_g330
[dragonflye]    r941_prom_high_g344
[dragonflye]    r941_prom_high_g360
[dragonflye]    r941_prom_high_g4011
[dragonflye]    r941_prom_snp_g303
[dragonflye]    r941_prom_snp_g322
[dragonflye]    r941_prom_snp_g360
[dragonflye]    r941_prom_sup_g507
[dragonflye]    r941_prom_sup_snp_g507
[dragonflye]    r941_prom_sup_variant_g507
[dragonflye]    r941_prom_variant_g303
[dragonflye]    r941_prom_variant_g322
[dragonflye]    r941_prom_variant_g360
[dragonflye]    r941_sup_plant_g610
[dragonflye]    r941_sup_plant_variant_g610

@flass
Copy link
Author

flass commented May 31, 2023

yep, all good, I have all I need there!

@rpetit3
Copy link
Owner

rpetit3 commented May 31, 2023

v1.1.1 is now available: quay.io/biocontainers/dragonflye:1.1.1--hdfd78af_0

it should include medaka 1.8.0 and flye 2.9.2, let me know if not.

Otherwise I think we are good here! Thank you for the help, and please feel free to reopen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants