Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mashtree::createTreeFromPhylip: Can't call method "as_text" on an undefined value - using .msh as input #86

Open
Lumimar opened this issue Feb 21, 2024 · 1 comment

Comments

@Lumimar
Copy link

Lumimar commented Feb 21, 2024

Hello there,
I don't seem to be able to get a .dnd output when running mashtree from *msh files, whereas I can get a .dnd output if I run mashtree using the fastq files as input... see below

I ran
mashtree --mindepth 0 --numcpus 4 --outmatrix mashmatrix.txt ERR*/*msh > mashtree.dnd

on the output generated by
mash sketch -k 21 -s 10000 - -o ${out_dir}"/"${id}

with the following log

mashtree: main: Found mash version 2 - /home/xxx/.local/bin/mash
mashtree: main: Temporary directory will be /tmp/xxx_16154334/MASHTREE.EUJ6ti
mashtree: main: mashtree on 448 files

mashtree: mashSketch(TID2): This thread will work on 112 sketches
mashtree: mashSketch(TID2): Working on file 1 out of 112
mashtree: mashSketch(TID2): Input file is a sketch file itself and will be used as such: ERRxxx/ERRxxx.msh
mashtree: mashSketch(TID2): WARNING: ERRxxx.msh was already mashed.
.....
mashtree: mashDist(TID6): Distances for /tmp/xxx_16154334/MASHTREE.EUJ6ti/ERRxxx1.msh
mashtree: mashDist(TID7): Distances for /tmp/xxx_16154334/MASHTREE.EUJ6ti/ERRxxx2.msh
mashtree: mashDist(TID5): Distances for /tmp/xxx_16154334/MASHTREE.EUJ6ti/ERRxxx3.msh
mashtree: mashDist(TID6): Distances for /tmp/xxx_16154334/MASHTREE.EUJ6ti/ERRxxx4.msh
mashtree: mashDist(TID5): Distances for /tmp/xxx_16154334/MASHTREE.EUJ6ti/ERRxxx5.msh
mashtree: mashDistance: Databasing distances (1/4, TID5)
mashtree: mashDistance: Waiting to join thread (2/4, TID6)
mashtree: mashDistance: Databasing distances (2/4, TID6)
mashtree: mashDistance: Waiting to join thread (3/4, TID7)
mashtree: mashDistance: Databasing distances (3/4, TID7)
mashtree: mashDistance: Waiting to join thread (4/4, TID8)
mashtree: mashDistance: Databasing distances (4/4, TID8)
mashtree: mashDistance: Converting to phylip format into /tmp/xxxx_16154334/MASHTREE.EUJ6ti/distances.phylip
mashtree: mashDistance: Writing a distance matrix to mashmatrix.txt
mashtree: Mashtree::createTreeFromPhylip: Can't call method "as_text" on an undefined value 
Stopped at ...Mashtree.pm line 339.

the outmatrix was generated, but not the .dnd output...
looking at Mashtree.pm it seems that $outdir/tree.dnd.tmp was not created ( I removed unlink() on line 343 but no .tmp file appeared).

Mashtree version 1.4.6, installed with conda on a Linux cluster with the following configuration 4.18.0-513.9.1.el8_9.x86_64
could not install via cpanm because of some missing dependencies that I could not install without sudo.
now if I run cpanm -l ~ Mashtree I get
Mashtree is up to date. (1.4.6)

Is it advisable to use fastq rather than msh as input? Many thanks!

@ohdongha
Copy link

ohdongha commented May 13, 2024

Sorry for the hitchhiking... I have the exact same error message when trying to run mashtree (1.4.6) with *.msh files as input.

looking at Mashtree.pm it seems that $outdir/tree.dnd.tmp was not created

It seems like the issue is with distancesToPhylip because the distances.phylip file is almost empty (in my case) - I guess it should have the matrix of distances in PHYLIP format so that quicktree can draw trees in the next step:

$ cat temp_mashtree/distances.phylip 
    0

I wonder if the issue is when trying to parse the genome names to create the phylip distance matrix. In my case, the first several lines of the distances.db.tsv file looks like this:

$ head temp_mashtree/distances.db.tsv | column -t -s$'\t'
genome1                                      genome2                                      distance
GCA_022405125.1_ASM2240512v1_genomic.fna.gz  GCA_022405125.1_ASM2240512v1_genomic.fna.gz  0
GCA_022405125.1_ASM2240512v1_genomic.fna.gz  GCA_028453695.1_APUR_v2.2.0_genomic.fna.gz   0.203582
GCA_022405125.1_ASM2240512v1_genomic.fna.gz  GCA_028454255.1_HLIG_v2.2.0_genomic.fna.gz   0.20308
GCA_022405125.1_ASM2240512v1_genomic.fna.gz  GCA_029448645.1_ASM2944864v1_genomic.fna.gz  0.19694
GCA_022405125.1_ASM2240512v1_genomic.fna.gz  GCA_913789895.3_iySelTumu1.3_genomic.fna.gz  0.202798
GCA_022405125.1_ASM2240512v1_genomic.fna.gz  GCA_913789915.3_iySphMoni1.3_genomic.fna.gz  0.209807
GCA_022405125.1_ASM2240512v1_genomic.fna.gz  GCA_916610135.2_iyMacEuro1.2_genomic.fna.gz  0.197773
GCA_022405125.1_ASM2240512v1_genomic.fna.gz  GCA_916610235.2_iyLasMori1.2_genomic.fna.gz  0.203062
GCA_022405125.1_ASM2240512v1_genomic.fna.gz  GCA_916610255.1_iyLasLatv2.1_genomic.fna.gz  0.202186

Is there a rule the genome file names need to follow (limit in length, etc.)?

...

[Edit:] When I tried the same set with the fasta sequence files as input, mashtree ran successfully. It would be good to be able to work with msh files as well, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants