Skip to content

metadecoder coverage KeyError #1

@acvill

Description

@acvill

I am attempting to implement MetaDecoder to cluster metaSPAdes contigs. I installed MetaDecoder v1.0.9 in a conda environment:

source /home/miniconda3/bin/activate
conda create --name metadecoder python=3.8.6
conda activate metadecoder
conda install -c bioconda fraggenescan==1.31
conda install -c bioconda prodigal==2.6.3
conda install -c bioconda hmmer==3.2.1
pip3 install https://github.com/liu-congcong/MetaDecoder/releases/download/v1.0.9/metadecoder-1.0.9-py3-none-any.whl

Environment

# packages in environment at /home/miniconda3/envs/metadecoder:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
ca-certificates           2022.4.26            h06a4308_0
fraggenescan              1.31                 hec16e2b_4    bioconda
hmmer                     3.2.1                he1b5a44_2    bioconda
joblib                    1.1.0                    pypi_0    pypi
ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
libffi                    3.3                  h58526e2_2    conda-forge
libgcc-ng                 11.2.0              h1d223b6_16    conda-forge
libgomp                   11.2.0              h1d223b6_16    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libstdcxx-ng              11.2.0              he4da1e4_16    conda-forge
libzlib                   1.2.11            h166bdaf_1014    conda-forge
metadecoder               1.0.9                    pypi_0    pypi
ncurses                   6.3                  h27087fc_1    conda-forge
numpy                     1.18.5                   pypi_0    pypi
openssl                   1.1.1o               h166bdaf_0    conda-forge
perl                      5.32.1          2_h7f98852_perl5    conda-forge
pip                       22.0.4             pyhd8ed1ab_0    conda-forge
prodigal                  2.6.3                hec16e2b_4    bioconda
python                    3.8.6           hffdb5ce_5_cpython    conda-forge
python_abi                3.8                      2_cp38    conda-forge
readline                  8.1.2                h7f8727e_1
scikit-learn              0.23.2                   pypi_0    pypi
scipy                     1.5.4                    pypi_0    pypi
setuptools                62.1.0           py38h578d9bd_0    conda-forge
sqlite                    3.38.5               h4ff8645_0    conda-forge
threadpoolctl             3.1.0                    pypi_0    pypi
tk                        8.6.12               h27826a3_0    conda-forge
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
zlib                      1.2.11            h166bdaf_1014    conda-forge

My unsorted SAM alignments were created with BWA-MEM. When I run metadecoder coverage on my list of SAM files, I get a KeyError:

Input

metadecoder coverage \
  -s ${sams} \
  -o metadecoder.coverage \
  --threads 8

Output

2022-05-11 09:07:30 -> Loading sam files.
2022-05-11 09:14:11 -> Done.
2022-05-11 09:14:11 -> Writing to file.
Traceback (most recent call last):
  File "/home/miniconda3/envs/metadecoder/bin/metadecoder", line 240, in <module>
    metadecoder_coverage.main(parameters)
  File "/home/miniconda3/envs/metadecoder/lib/python3.8/site-packages/metadecoder/metadecoder_coverage.py", line 91, in main
    sequence2bin_coverages[lines[0]][int(lines[1]), coverage_index] += float(lines[2])
KeyError: 'NODE_16_length_211411_cov_56.508393'

NODE_16_length_211411_cov_56.508393 is the reference sequence name of the first line of the first SAM file loaded. I get the same error when I explicitly call the SAM files (no ${sams} variable). The presence / absence of SAM file headers does not change the error state. Please note that metadecoder seed works fine.

I suspect this may be failing because each SAM file represents reads aligned to a different assembly, and metadecoder coverage is expecting the contig names to be derived from a single assembly and therefore consistent across alignments. What is the recommended pipeline for running MetaDecoder for multiple samples? Should I first create a contig catalog, as with VAMB, and use that catalog as my ASSEMBLY.fasta?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions