You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I just tried this yet with a single example sequence so this would need more proper benchmarking:
Top: original sequence w/ FS from poreCov
Middle: sequence after proovframe correction w/ all SC2 proteins as reference. However, this introduces another error in ORF1a likely due to the polyprotein structure of ORF1ab!
Bottom: Thus, I removed the protein sequence of the polyprotein from the reference FASTA and this seems to work. Sequence fixed
# map proteins to reads
proovframe/bin/proovframe map -a GCF_009858895.2_ASM985889v3_protein_noORF1ab.faa -o raw-seqs.tsv sample.consensus.fasta
# fix frameshifts in reads
proovframe/bin/proovframe fix -o corrected.fasta sample.consensus.fasta raw-seqs.tsv
However: I would suggest then providing these fs-corrected consensus sequences in addition to the default consensus sequences. It would need proper benchmarking to figure out if these corrections do not introduce any other potential errors for SARS-CoV-2 sequences.
The text was updated successfully, but these errors were encountered:
It happens quite frequently that FSs are introduced in consensus sequences. In almost all cases these are errors.
Suggestion:
We could integrate a new tool
proovframe
to correct FS based on aligning reference protein sequences to the consensuses.I just tried this yet with a single example sequence so this would need more proper benchmarking:
Top: original sequence w/ FS from poreCov
Middle: sequence after
proovframe
correction w/ all SC2 proteins as reference. However, this introduces another error in ORF1a likely due to the polyprotein structure of ORF1ab!Bottom: Thus, I removed the protein sequence of the polyprotein from the reference FASTA and this seems to work. Sequence fixed
Reference protein FASTA used w/o the ORF1ab polyprotein:
GCF_009858895.2_ASM985889v3_protein_noORF1ab.faa.zip
Commands:
However: I would suggest then providing these fs-corrected consensus sequences in addition to the default consensus sequences. It would need proper benchmarking to figure out if these corrections do not introduce any other potential errors for SARS-CoV-2 sequences.
The text was updated successfully, but these errors were encountered: