Skip to content

Commit

Permalink
Merge pull request #25 from standage/docs
Browse files Browse the repository at this point in the history
Start giving documentation serious attention
  • Loading branch information
standage committed Dec 14, 2016
2 parents 2daee0c + 1b1a95a commit dd8fd80
Show file tree
Hide file tree
Showing 22 changed files with 304 additions and 37 deletions.
5 changes: 4 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,10 @@
SHELL=/bin/bash -o pipefail

test:
py.test -v --cov=tag --doctest-modules tests/*.py
py.test -v --cov=tag --doctest-modules tag/*.py tests/*.py

doc:
cd docs && make html

install:
pip install .
Expand Down
17 changes: 17 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
The :code:`tag` Python API
==========================

The following classes/modules are included in the :code:`tag` Python API, which
is under `semantic versioning <http://semver.org>`_.

.. toctree::
:maxdepth: 1

range
comment
directive
sequence
feature
reader
writer

5 changes: 5 additions & 0 deletions docs/comment.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Comment
=======

.. automodule:: tag.comment
:members:
4 changes: 2 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = 'sphinx_rtd_theme'
html_theme = 'alabaster'

# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
Expand Down Expand Up @@ -145,7 +145,7 @@
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
#html_static_path = ['_static']

# Add any extra paths that contain custom files (such as robots.txt or
# .htaccess) here, relative to this directory. These files are copied
Expand Down
5 changes: 5 additions & 0 deletions docs/directive.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Directive
=========

.. automodule:: tag.directive
:members:
5 changes: 5 additions & 0 deletions docs/feature.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Feature
=======

.. automodule:: tag.feature
:members:
4 changes: 3 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ tag: genome annotation analysis in Python!
More coming soon.

.. toctree::
:maxdepth: 2
:maxdepth: 1

api



Expand Down
5 changes: 5 additions & 0 deletions docs/range.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Range
=====

.. automodule:: tag.range
:members:
8 changes: 8 additions & 0 deletions docs/reader.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Readers
=======

Currently the :code:`readers` module contains only a single class, GFF3Reader,
but may include others in the future.

.. automodule:: tag.reader
:members:
5 changes: 5 additions & 0 deletions docs/sequence.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Sequence
========

.. automodule:: tag.sequence
:members:
8 changes: 8 additions & 0 deletions docs/writer.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Writers
=======

Currently the :code:`writers` module contains only a single class, GFF3Writer,
but may include others in the future.

.. automodule:: tag.writer
:members:
26 changes: 26 additions & 0 deletions tag/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,29 @@ def open(filename, mode):
openfunc = gzopen
mode += 't'
return openfunc(filename, mode)


def demo_feature():
gene = feature.Feature(
'contig1\tsnap\tgene\t1000\t7500\t.\t+\t.\tID=gene1'
)
mrna = feature.Feature(
'contig1\tsnap\tmRNA\t1000\t7500\t.\t+\t.\tID=mrna1;Parent=gene1'
)
exon1 = feature.Feature(
'contig1\tsnap\texon\t1000\t3700\t.\t+\t.\tParent=mrna1'
)
exon2 = feature.Feature(
'contig1\tsnap\texon\t7250\t7500\t.\t+\t.\tParent=mrna1'
)
cds1 = feature.Feature(
'contig1\tsnap\tCDS\t1289\t3700\t.\t+\t0\tID=cds1;Parent=mrna1'
)
cds2 = feature.Feature(
'contig1\tsnap\tCDS\t7250\t7352\t.\t+\t0\tID=cds1;Parent=mrna1'
)
cds1.add_sibling(cds2)
for f in [exon1, exon2, cds1, cds2]:
mrna.add_child(f)
gene.add_child(mrna)
return gene
8 changes: 5 additions & 3 deletions tag/comment.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,11 @@ class Comment():
Represents a comment in an annotation (GFF3) file.
Any GFF3 entry starting with >= 1 '#' characters is treated as a comment,
with two exceptions: first, the separator directive, a line containing
'###' and nothing more; second, any entry beginning with just two '#'
characters is treated as a directive.
with two exceptions:
- the separator directive, a line containing '###' and nothing more
- any entry beginning with just two '#' characters is treated as a
directive.
"""

def __init__(self, data):
Expand Down
19 changes: 15 additions & 4 deletions tag/directive.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,21 @@ class Directive():
`Directive` objects should be treated as read-only: modify at your peril!
Also, separator directives (`###`) and the `##FASTA` directive are handled
directly by parsers and not by this class.
Directives not explicitly declared in the GFF3 spec are application
specific: they will be parsed without complaint, but no guarantees can be
made about accessing their attributes.
>>> sr = Directive('##sequence-region chr1 5000 10000')
>>> sr.type
'sequence-region'
>>> sr.seqid
'chr1'
>>> gb = Directive('##genome-build BeeBase 4.5')
>>> gb.type
'genome-build'
>>> gb.source
'BeeBase'
"""

def __init__(self, data):
Expand Down Expand Up @@ -73,10 +88,6 @@ def __init__(self, data):

@property
def type(self):
"""
Directives not following one of the explicitly described formats in the
GFF3 spec are application specific and not supported.
"""
if self.dirtype in dirtypes:
return self.dirtype
return None
Expand Down
79 changes: 74 additions & 5 deletions tag/feature.py
Original file line number Diff line number Diff line change
@@ -1,20 +1,52 @@
#!/usr/bin/env python
#
# ------------------------------------------------------------------------------
# -----------------------------------------------------------------------------
# Copyright (C) 2015 Daniel Standage <daniel.standage@gmail.com>
#
# This file is part of tag (http://github.com/standage/tag) and is licensed
# under the BSD 3-clause license: see LICENSE.
# ------------------------------------------------------------------------------
# -----------------------------------------------------------------------------

import tag
from tag.comment import Comment
from tag.directive import Directive
from tag.range import Range
from tag.sequence import Sequence


class Feature(object):
"""Represents a feature entry from a GFF3 file."""
"""
Represents a feature entry from a GFF3 file.
>>> feature = tag.demo_feature()
>>> feature.seqid
'contig1'
>>> feature.source
'snap'
>>> feature.type
'gene'
>>> feature.start, feature.end
(999, 7500)
>>> feature.score is None
True
>>> feature.strand
'+'
>>> feature.phase is None
True
>>> feature.attributes
'ID=gene1'
>>> feature.num_children
1
>>> feature.is_multi
False
>>> feature.is_toplevel
True
>>> for child in feature:
... if child.type == 'CDS':
... assert child.get_attribute('ID') == 'cds1'
>>> feature.slug
'gene@contig1[1000, 7500]'
"""

def __init__(self, data):
fields = data.split('\t')
Expand Down Expand Up @@ -174,6 +206,7 @@ def _visit(self, L, marked, tempmarked):
L.insert(0, self)

def add_child(self, child, rangecheck=False):
"""Add a child feature to this feature."""
assert self.seqid == child.seqid, \
(
'seqid mismatch for feature {} ({} vs {})'.format(
Expand Down Expand Up @@ -205,7 +238,13 @@ def fid(self):

@property
def slug(self):
return '{:s}@{:s}[{:d}, {:d})'.format(self.type, self.seqid,
"""
A concise slug for this feature.
Unlike the internal representation, which is 0-based half-open, the
slug is a 1-based closed interval (a la GFF3).
"""
return '{:s}@{:s}[{:d}, {:d}]'.format(self.type, self.seqid,
self.start + 1, self.end)

@property
Expand All @@ -217,6 +256,24 @@ def is_toplevel(self):
return self.get_attribute('Parent') is None

def add_sibling(self, sibling):
"""
Designate this a multi-feature representative and add a co-feature.
Some features exist discontinuously on the sequence, and therefore
cannot be declared with a single GFF3 entry (which can encode only a
single interval). The canonical encoding for these types of features is
called a multi-feature, in which a single feature is declared on
multiple lines with multiple entries all sharing the same feature type
and ID attribute. This is commonly done with coding sequence (CDS)
features.
In this package, each multi-feature has a single "representative"
feature object, and all other objects/entries associated with that
multi-feature are attached to it as "siblings".
Invoking this method will designate the calling feature as the
multi-feature representative and add the argument as a sibling.
"""
if self.siblings is None:
self.siblings = list()
self.multi_rep = self
Expand Down Expand Up @@ -266,9 +323,11 @@ def end(self):
return self._range.end

def set_coord(self, start, end):
"""Manually reset the feature's coordinates."""
self._range = Range(start, end)

def transform(self, offset, newseqid=None):
"""Transform the feature's coordinates by the given offset."""
for feature in self:
feature._range.transform(offset)
if newseqid is not None:
Expand Down Expand Up @@ -301,7 +360,9 @@ def attributes(self):

def add_attribute(self, attrkey, attrvalue, append=False, oldvalue=None):
"""
Attributes stored as nested dictionaries.
Add an attribute to this feature.
Feature attributes are stored as nested dictionaries.
Each feature can only have one ID, so ID attribute mapping is 'string'
to 'string'. All other attributes can have multiple values, so mapping
Expand Down Expand Up @@ -360,13 +421,21 @@ def get_attribute(self, attrkey, as_string=False, as_list=False):
return attrvalues

def drop_attribute(self, attrkey):
"""Drop the specified attribute from the feature."""
if attrkey in self._attrs:
del self._attrs[attrkey]

def get_attribute_keys(self):
"""Return a list of all this feature's attribute keys."""
return sorted(list(self._attrs))

def parse_attributes(self, attrstring):
"""
Parse an attribute string.
Given a string with semicolon-separated key-value pairs, populate a
dictionary with the given attributes.
"""
if attrstring == '.':
return dict()

Expand Down

0 comments on commit dd8fd80

Please sign in to comment.