Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High level of duplicated protein sequences #41

Closed
hegardon opened this issue Feb 3, 2023 · 1 comment
Closed

High level of duplicated protein sequences #41

hegardon opened this issue Feb 3, 2023 · 1 comment

Comments

@hegardon
Copy link

hegardon commented Feb 3, 2023

Hi,
I am using PLASS (v4.687d7) on a set of metagenomes from ~100 cheese samples and it works very well, but still, I have some questions.
In each dataset a high level of protein sequences (on average 30%) are duplicated (with 100% identity and coverage). I understand that some sequences could be duplicated (originating from closely related species), but 30% seems to be quite high.
Another issue is the total amount of assembled amino acid. As an example, for an initial dataset of 18 million reads (2x150 bp paired-end reads, 2.7 Gbp in total), 7 million proteins are assembled (2e+9 aa in total, almost as much as the total amount of nucleotides, which means, to me, more amino acid than expected...).
Is there an explanation about these results ?

I am using PLASS with the following command (others parameters as default):
plass assemble METAG_R1.fastq.gz METAG_R2.fastq.gz METAG_out.fasta -e 0.001 --num-iterations 12 --filter-proteins 1 --remove-tmp-files 1

Thanks
Helene

@milot-mirdita
Copy link
Member

milot-mirdita commented Mar 13, 2023

Since Plass can reuse each read in every iteration. It tends to create a lot of variation that are not necessarily useful. We generally use mmseqs linclust to remove fragments afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants