Skip to content

Commit

Permalink
Merge pull request #121 from mhalagan-nmdp/update-documentation
Browse files Browse the repository at this point in the history
Update documentation
  • Loading branch information
Michael Halagan committed Nov 14, 2018
2 parents aee4848 + c522e9d commit c758877
Show file tree
Hide file tree
Showing 21 changed files with 1,065 additions and 538 deletions.
70 changes: 70 additions & 0 deletions docs/algorithm.rst
@@ -0,0 +1,70 @@
.. highlight:: shell

======================
Annotation Algorithm
======================

.. note:: There are several places where hard-coded logic was added to make the algorithm work with certain sequences. For instance,


#. **Check if locus is provided** (:ref:`blast`)

* yes, then continue to step 2
* no, then blast to get locus

#. **Check if any exact matches exist** (:ref:`ref`)

* yes, then return annotation associated with the exact match and go to step 7
* no, then go on to step 3

#. **Blast sequence and get list of alleles** (:ref:`blast`)

* If all of the returned sequences are partial then the last sequence will be replaced with a fully characterized sequence.

#. **Iterate through the list and try to annotate with reference sequences** (:ref:`seq_search`)

* Break reference up into features
* Search for each feature in the provided sequence
* If all features are mapped, then go to step 7, else..
* Try and assemble the remaining features based on what has already been mapped. Since we know the coordinates of the mapped features and the remaining unmapped sequence, we can determine if the unmapped sequences fall between two mapped features or at the ends/beginning after/before mapped sequences.
* If all features could be mapped, then go to step 7 else go back to step 4A using any partial annotations for each reference sequence. If no annotation could be created after searching all of the reference sequences, then move on to step 5.

#. **Loop through each reference sequence and do targeted alignments** (:ref:`align`)

* Break up each reference sequence into features and create feature combos that will be used for doing alignments. Order the feature combos by the ones that make the most sense first.
* Do targeted alignments on all of the remaining blocks of sequences that have not yet been mapped.

* If a high enough proportion of the unmapped sequence maps and the deletion/insertion rate is low enough, then extract the unmapped sequence from the alignment and map it.
* If all features are mapped then go to step 7, else..
* run step 4 with the updated partial annotation to see if the annotation can now be assembled. Go to step 7 if all features are mapped else..
* Loop through all feature combinations for all reference sequences. This slows down the annotation if it's very novel. For instance, if it's a new feature sequence and that specific feature has only been reported in IMGT/HLA a few time for a given locus. The acceptance rate for the alignments is decreased slightly after each loop. For class I that decrease stops after the second reference sequence, but for class II it will keep going lower.
* Rerun targeted alignment but with exons only combinations.

#. **Do a full sequence alignment and use any partial annotation** (:ref:`align`)

* If this fails and the rerun flag is set to ``True``, then rerun the whole annotation process starting from step 1. This time, skip the first reference allele that was used for doing the annotation and increase the number of reference alleles used by 1.

#. **Generate GFE notation** (:ref:`gfe`)

* Once a complete annotation is generated the GFE notation will be made
* If the sequence only contains A,T,C or G, then a GFE notation can be created




















17 changes: 17 additions & 0 deletions docs/blast.rst
@@ -0,0 +1,17 @@
.. highlight:: shell

======================
Creating blastn files
======================

.. note:: Make sure blastn is properly installed before running!

1) Download the allele list and the ``_gen`` and ``_nuc`` fasta files for each locus

2) Create the blast files by running the **ngs-imgt-db** perl script

.. code-block:: console
$ ngs-imgt-db -i /path/to/imgt/files -o /output/dir
3) Add the new blast files to the seqann/data/blast directory and check them in.
2 changes: 2 additions & 0 deletions docs/db.rst
@@ -1,5 +1,7 @@
.. highlight:: shell

.. _bio:

======================
BioSQL Database
======================
Expand Down
9 changes: 8 additions & 1 deletion docs/index.rst
Expand Up @@ -6,15 +6,22 @@ Copyright (c) 2018 Be The Match operated by National Marrow Donor Program. All R

readme
installation
algorithm
db
blast

.. toctree::
:maxdepth: 2
:caption: Developer Documentation

debug
testing
seqann
models
seqann.feature_client
seqann.feature_client.apis
seqann.feature_client.models
contributing
debug
history
authors

Expand Down
36 changes: 36 additions & 0 deletions docs/models.rst
@@ -0,0 +1,36 @@
seqann models
==============

.. toctree::

models

.. _ann:

Annotation
----------

.. automodule:: seqann.models.annotation
:members:
:undoc-members:
:show-inheritance:

.. _ref:

Reference Data
---------------
.. automodule:: seqann.models.reference_data
:members:
:undoc-members:
:show-inheritance:

.. _bl:

Blast
---------------
.. automodule:: seqann.models.blast
:members:
:undoc-members:
:show-inheritance:


7 changes: 7 additions & 0 deletions docs/modules.rst
@@ -0,0 +1,7 @@
seqann
======

.. toctree::
:maxdepth: 4

seqann
22 changes: 22 additions & 0 deletions docs/seqann.feature_client.apis.rst
@@ -0,0 +1,22 @@
seqann.feature\_client.apis package
===================================

Submodules
----------

seqann.feature\_client.apis.features\_api module
------------------------------------------------

.. automodule:: seqann.feature_client.apis.features_api
:members:
:undoc-members:
:show-inheritance:


Module contents
---------------

.. automodule:: seqann.feature_client.apis
:members:
:undoc-members:
:show-inheritance:
38 changes: 38 additions & 0 deletions docs/seqann.feature_client.models.rst
@@ -0,0 +1,38 @@
seqann.feature\_client.models package
=====================================

Submodules
----------

seqann.feature\_client.models.feature module
--------------------------------------------

.. automodule:: seqann.feature_client.models.feature
:members:
:undoc-members:
:show-inheritance:

seqann.feature\_client.models.feature\_request module
-----------------------------------------------------

.. automodule:: seqann.feature_client.models.feature_request
:members:
:undoc-members:
:show-inheritance:

seqann.feature\_client.models.sequence module
---------------------------------------------

.. automodule:: seqann.feature_client.models.sequence
:members:
:undoc-members:
:show-inheritance:


Module contents
---------------

.. automodule:: seqann.feature_client.models
:members:
:undoc-members:
:show-inheritance:
46 changes: 46 additions & 0 deletions docs/seqann.feature_client.rst
@@ -0,0 +1,46 @@
seqann.feature\_client package
==============================

Subpackages
-----------

.. toctree::

seqann.feature_client.apis
seqann.feature_client.models

Submodules
----------

seqann.feature\_client.api\_client module
-----------------------------------------

.. automodule:: seqann.feature_client.api_client
:members:
:undoc-members:
:show-inheritance:

seqann.feature\_client.configuration module
-------------------------------------------

.. automodule:: seqann.feature_client.configuration
:members:
:undoc-members:
:show-inheritance:

seqann.feature\_client.rest module
----------------------------------

.. automodule:: seqann.feature_client.rest
:members:
:undoc-members:
:show-inheritance:


Module contents
---------------

.. automodule:: seqann.feature_client
:members:
:undoc-members:
:show-inheritance:
50 changes: 44 additions & 6 deletions docs/seqann.rst
@@ -1,19 +1,57 @@
SeqAnn Module
seqann package
==============

BioSeqAnn
----------
.. toctree::

seqann


seqann.sequence_annotation
---------------------------

.. automodule:: seqann.sequence_annotation
:members:
:undoc-members:
:show-inheritance:

.. _seq_search:

seqann.seq_search
---------------------------

.. automodule:: seqann.seq_search
:members:
:undoc-members:
:show-inheritance:

.. _gfe:

Models
---------------
seqann.gfe
---------------------------

.. automodule:: seqann.models.annotation
.. automodule:: seqann.gfe
:members:
:undoc-members:
:show-inheritance:

.. _blast:

seqann.blast_cmd
---------------------------

.. automodule:: seqann.blast_cmd
:members:
:undoc-members:
:show-inheritance:

.. _align:

seqann.align
---------------------------

.. automodule:: seqann.align
:members:
:undoc-members:
:show-inheritance:


38 changes: 38 additions & 0 deletions docs/testing.rst
@@ -0,0 +1,38 @@
.. highlight:: shell

======================
Testing
======================

.. warning:: Before running tests clustalo, blastn and all the required python packages must be installed!

To run all test simply execute the following command in the top directory of the SeqAnn repository.

.. code-block:: console
$ python -m unittest tests
Different tests
---------------------

.. note:: If you don't have a imgt_biosql db running then not all of the test will run!

* test_seqann
* test_align
* test_blast
* test_feature
* test_gfe
* test_refdata
* test_seqsearch
* test_util

Running specific tests
-----------------------

You can test a specific test by providing the full test path on the command line.

.. code-block:: console
$ python -m unittest tests.test_seqann.TestBioSeqAnn.test_004_ambig

0 comments on commit c758877

Please sign in to comment.