-
Notifications
You must be signed in to change notification settings - Fork 5
metadecoder coverage KeyError #1
Description
I am attempting to implement MetaDecoder to cluster metaSPAdes contigs. I installed MetaDecoder v1.0.9 in a conda environment:
source /home/miniconda3/bin/activate
conda create --name metadecoder python=3.8.6
conda activate metadecoder
conda install -c bioconda fraggenescan==1.31
conda install -c bioconda prodigal==2.6.3
conda install -c bioconda hmmer==3.2.1
pip3 install https://github.com/liu-congcong/MetaDecoder/releases/download/v1.0.9/metadecoder-1.0.9-py3-none-any.whl
Environment
# packages in environment at /home/miniconda3/envs/metadecoder:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
ca-certificates 2022.4.26 h06a4308_0
fraggenescan 1.31 hec16e2b_4 bioconda
hmmer 3.2.1 he1b5a44_2 bioconda
joblib 1.1.0 pypi_0 pypi
ld_impl_linux-64 2.36.1 hea4e1c9_2 conda-forge
libffi 3.3 h58526e2_2 conda-forge
libgcc-ng 11.2.0 h1d223b6_16 conda-forge
libgomp 11.2.0 h1d223b6_16 conda-forge
libnsl 2.0.0 h7f98852_0 conda-forge
libstdcxx-ng 11.2.0 he4da1e4_16 conda-forge
libzlib 1.2.11 h166bdaf_1014 conda-forge
metadecoder 1.0.9 pypi_0 pypi
ncurses 6.3 h27087fc_1 conda-forge
numpy 1.18.5 pypi_0 pypi
openssl 1.1.1o h166bdaf_0 conda-forge
perl 5.32.1 2_h7f98852_perl5 conda-forge
pip 22.0.4 pyhd8ed1ab_0 conda-forge
prodigal 2.6.3 hec16e2b_4 bioconda
python 3.8.6 hffdb5ce_5_cpython conda-forge
python_abi 3.8 2_cp38 conda-forge
readline 8.1.2 h7f8727e_1
scikit-learn 0.23.2 pypi_0 pypi
scipy 1.5.4 pypi_0 pypi
setuptools 62.1.0 py38h578d9bd_0 conda-forge
sqlite 3.38.5 h4ff8645_0 conda-forge
threadpoolctl 3.1.0 pypi_0 pypi
tk 8.6.12 h27826a3_0 conda-forge
wheel 0.37.1 pyhd8ed1ab_0 conda-forge
xz 5.2.5 h516909a_1 conda-forge
zlib 1.2.11 h166bdaf_1014 conda-forge
My unsorted SAM alignments were created with BWA-MEM. When I run metadecoder coverage on my list of SAM files, I get a KeyError:
Input
metadecoder coverage \
-s ${sams} \
-o metadecoder.coverage \
--threads 8
Output
2022-05-11 09:07:30 -> Loading sam files.
2022-05-11 09:14:11 -> Done.
2022-05-11 09:14:11 -> Writing to file.
Traceback (most recent call last):
File "/home/miniconda3/envs/metadecoder/bin/metadecoder", line 240, in <module>
metadecoder_coverage.main(parameters)
File "/home/miniconda3/envs/metadecoder/lib/python3.8/site-packages/metadecoder/metadecoder_coverage.py", line 91, in main
sequence2bin_coverages[lines[0]][int(lines[1]), coverage_index] += float(lines[2])
KeyError: 'NODE_16_length_211411_cov_56.508393'
NODE_16_length_211411_cov_56.508393 is the reference sequence name of the first line of the first SAM file loaded. I get the same error when I explicitly call the SAM files (no ${sams} variable). The presence / absence of SAM file headers does not change the error state. Please note that metadecoder seed works fine.
I suspect this may be failing because each SAM file represents reads aligned to a different assembly, and metadecoder coverage is expecting the contig names to be derived from a single assembly and therefore consistent across alignments. What is the recommended pipeline for running MetaDecoder for multiple samples? Should I first create a contig catalog, as with VAMB, and use that catalog as my ASSEMBLY.fasta?