Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequence alignment with MAFFT or Muscle #3

Closed
rvosa opened this issue Apr 11, 2020 · 1 comment
Closed

Sequence alignment with MAFFT or Muscle #3

rvosa opened this issue Apr 11, 2020 · 1 comment

Comments

@rvosa
Copy link
Member

rvosa commented Apr 11, 2020

The experiences detailed here (nextstrain/ncov#268) show that doing the MSA in one big run eventually becomes prohibitive. This was not a problem for the 400 GenBank genomes set, but as those submissions are increasing (or when we add GISAID data) it becomes an issue.

MAFFT has the virtue of being the standard that is now being used (e.g. by Rambaut et al.) but it might be slower than Muscle (@rvosa's subjective experience)? Both can be run on the CIPRES cluster. Test and decide.

@rvosa rvosa added this to the Full workflow milestone Apr 11, 2020
This was referenced Apr 11, 2020
@rvosa rvosa changed the title Decide on alignment: MAFFT vs Muscle, decomposed or one big run? Sequence alignment with MAFFT or Muscle Apr 11, 2020
@rvosa
Copy link
Member Author

rvosa commented Apr 13, 2020

cipresrun \
     -y data/cipres_appinfo.yml \
     -t MAFFT_XSEDE \
     -p vparam.anysymbol_=1 \
     -i data/genomes/sars-cov-2.fasta \
     -o data/genomes/output.mafft

@rvosa rvosa closed this as completed Apr 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant