Skip to content

programweb/sequence_alignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 

Repository files navigation

sequence_alignment

Multiple Sequence Alignment

Ebola virus is the causative agent of Ebola hemorrhagic fever (EHF), a disease affecting humans and other primates. The disease is characterized by high death rates (as high as 90%) and is highly contagious. I obtained RNA genomes for the ebola project.  

aaa  

We can see the downloaded file contains 249 full length genomes. First, just align the first ten full ebola sequences against one another (Why? Comparing multiple genomes can be used to assess the quality of genome assemblies. Inconsistencies or gaps in the alignments can highlight potential issues with the assembly and guide further improvements.). Second, find the IDs of the first ten sequences. Using these accession numbers, we can extract the sequences that correspond to these IDs.  

bbb  

I perform a multiple sequence alignment with the mafft tool. MAFFT is a multiple sequence alignment program for *nix operating systems. It offers a range of multiple alignment methods: L-INS-i (accurate; for alignment of <∼200 sequences), FFT-NS-2 (fast; for alignment of <∼30,000 sequences), etc.  

ccc  

I can view the alignment with the head/tail BASH commands which display a visual alignment of all sequences. One can see that one difference between the genomes is how complete the assemblies are at their edges. The term "edges" indicates the parts of the genomes that are at the beginning or end of the sequence, often where the sequencing data may be less reliable or less well-assembled. The * character indicates a consensus.  

ddd  

One can perform multiple sequence alignments on 10% of the genomes  

eee  

fff

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors