Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "--start" and "--end" arguments to newreference.py to allow for creating subgenic trees #58

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

j23414
Copy link
Contributor

@j23414 j23414 commented Mar 19, 2024

Description of proposed changes

Adds optional "--start" and "--end" arguments to provide 0-based start and end positions respective to a "--gene" of interest.

Since the GenBank sequences can contain extra sequences off the end of the polyprotein, the start and end positions are relative to the gene of interest which was deemed more stable behavior.

Example of only pulling out E gene in Dengue (original)

python scripts/newreference.py \
  --reference dengue_reference.gb \
  --output-fasta E.fasta \
  --output-genbank E.gb \
  --gene E

Will generate a reference genbank with features:

FEATURES             Location/Qualifiers
     CDS             1..1485
                     /gene="E"
                     /product="envelope protein E"
                     /protein_id="NP_740317.1"
     source          1..1485
                     /clone="rDEN4"
                     /db_xref="taxon:11070"
                     /mol_type="genomic RNA"
                     /organism="Dengue virus 4"

Example of pulling E subgenic region (New Feature)

Run with new start and end region:

python scripts/newreference.py \
  --reference dengue_reference.gb \
  --output-fasta E.fasta \
  --output-genbank E.gb \
  --gene E \
  --start 0 \
  --end 9

Will result in:

LOCUS       DENV4/NA/REFERENCE/2003    9 bp    DNA              UNK 01-JAN-1980
DEFINITION  Dengue virus 4, complete genome.
ACCESSION   NC_002640
VERSION     NC_002640.1
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     source          1..9
                     /clone="rDEN4"
                     /db_xref="taxon:11070"
                     /mol_type="genomic RNA"
                     /organism="Dengue virus 4"
     CDS             1..9
                     /gene="E_0_9"
ORIGIN
        1 atgcgatgc
//

Related issue(s)

Checklist

  • Checks pass

…enic phylogenetic trees

Adds "--start" and "--end" arguments to provide 0-based start and end positions
respective to a "--gene" of interest.

Since the GenBank sequences can contain extra sequences off the end of the polyprotein,
a the start and end positions are relative to the gene of interest was deemed more stable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add --start and --end flags to newreferences.py to allow for creating subgenic tree builds
1 participant