Skip to content

Gene search pipeline for retrieving gene-coding nucelotide sequences and annotations from a local GenBank database.

Notifications You must be signed in to change notification settings

hivdb/Gene_to_Sequences

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Gene_to_Sequences

Gene search pipeline for retrieving gene-coding nucelotide sequences and annotations from a local GenBank database.

This package is described in Rhee, S-Y and Shafer, RW (2017), "Geographically-Stratified Representative HIV-1 Group M pol Subtype and Circulating Recombinant Form Sequences," manuscript submitted for publication.

The package includes:

  1. GB_to_BLASTDB.pl (a perl script) parses GenBank files, creates a fasta file with sequence headers containing GenBank annotations and converts the fasta file to a BLAST searchable databae.

  2. Gene_to_Sequences.pl (a perl script) as the second part of the pipeline, it performs an amino acid to nucleotide sequence search for a gene and generates a file containing aligned full-length nucleotide sequences with associated GenBank annotations including AccessionID, Title, Authors, PubMedID, Country, Collection_Date and TaxonomyID

  3. Subtree_Sampling.Rmd (a R markdown) samples subtype reference sequences from subtrees of a phylogenetic tree
    Usage guidelines for this R markdown

Prerequisites

Usage

For the usage

perl GB_to_BLASTDB.pl --help

perl Gene_to_Sequences.pl --help

For more information

perl GB_to_BLASTDB.pl --man

perl Gene_to_Sequences.pl --man

Authors

License

You may distribute this module under the same terms as perl itself

About

Gene search pipeline for retrieving gene-coding nucelotide sequences and annotations from a local GenBank database.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages