Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #121 from mhalagan-nmdp/update-documentation
Update documentation
- Loading branch information
Showing
21 changed files
with
1,065 additions
and
538 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
.. highlight:: shell | ||
|
||
====================== | ||
Annotation Algorithm | ||
====================== | ||
|
||
.. note:: There are several places where hard-coded logic was added to make the algorithm work with certain sequences. For instance, | ||
|
||
|
||
#. **Check if locus is provided** (:ref:`blast`) | ||
|
||
* yes, then continue to step 2 | ||
* no, then blast to get locus | ||
|
||
#. **Check if any exact matches exist** (:ref:`ref`) | ||
|
||
* yes, then return annotation associated with the exact match and go to step 7 | ||
* no, then go on to step 3 | ||
|
||
#. **Blast sequence and get list of alleles** (:ref:`blast`) | ||
|
||
* If all of the returned sequences are partial then the last sequence will be replaced with a fully characterized sequence. | ||
|
||
#. **Iterate through the list and try to annotate with reference sequences** (:ref:`seq_search`) | ||
|
||
* Break reference up into features | ||
* Search for each feature in the provided sequence | ||
* If all features are mapped, then go to step 7, else.. | ||
* Try and assemble the remaining features based on what has already been mapped. Since we know the coordinates of the mapped features and the remaining unmapped sequence, we can determine if the unmapped sequences fall between two mapped features or at the ends/beginning after/before mapped sequences. | ||
* If all features could be mapped, then go to step 7 else go back to step 4A using any partial annotations for each reference sequence. If no annotation could be created after searching all of the reference sequences, then move on to step 5. | ||
|
||
#. **Loop through each reference sequence and do targeted alignments** (:ref:`align`) | ||
|
||
* Break up each reference sequence into features and create feature combos that will be used for doing alignments. Order the feature combos by the ones that make the most sense first. | ||
* Do targeted alignments on all of the remaining blocks of sequences that have not yet been mapped. | ||
|
||
* If a high enough proportion of the unmapped sequence maps and the deletion/insertion rate is low enough, then extract the unmapped sequence from the alignment and map it. | ||
* If all features are mapped then go to step 7, else.. | ||
* run step 4 with the updated partial annotation to see if the annotation can now be assembled. Go to step 7 if all features are mapped else.. | ||
* Loop through all feature combinations for all reference sequences. This slows down the annotation if it's very novel. For instance, if it's a new feature sequence and that specific feature has only been reported in IMGT/HLA a few time for a given locus. The acceptance rate for the alignments is decreased slightly after each loop. For class I that decrease stops after the second reference sequence, but for class II it will keep going lower. | ||
* Rerun targeted alignment but with exons only combinations. | ||
|
||
#. **Do a full sequence alignment and use any partial annotation** (:ref:`align`) | ||
|
||
* If this fails and the rerun flag is set to ``True``, then rerun the whole annotation process starting from step 1. This time, skip the first reference allele that was used for doing the annotation and increase the number of reference alleles used by 1. | ||
|
||
#. **Generate GFE notation** (:ref:`gfe`) | ||
|
||
* Once a complete annotation is generated the GFE notation will be made | ||
* If the sequence only contains A,T,C or G, then a GFE notation can be created | ||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
.. highlight:: shell | ||
|
||
====================== | ||
Creating blastn files | ||
====================== | ||
|
||
.. note:: Make sure blastn is properly installed before running! | ||
|
||
1) Download the allele list and the ``_gen`` and ``_nuc`` fasta files for each locus | ||
|
||
2) Create the blast files by running the **ngs-imgt-db** perl script | ||
|
||
.. code-block:: console | ||
$ ngs-imgt-db -i /path/to/imgt/files -o /output/dir | ||
3) Add the new blast files to the seqann/data/blast directory and check them in. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,7 @@ | ||
.. highlight:: shell | ||
|
||
.. _bio: | ||
|
||
====================== | ||
BioSQL Database | ||
====================== | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
seqann models | ||
============== | ||
|
||
.. toctree:: | ||
|
||
models | ||
|
||
.. _ann: | ||
|
||
Annotation | ||
---------- | ||
|
||
.. automodule:: seqann.models.annotation | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
.. _ref: | ||
|
||
Reference Data | ||
--------------- | ||
.. automodule:: seqann.models.reference_data | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
.. _bl: | ||
|
||
Blast | ||
--------------- | ||
.. automodule:: seqann.models.blast | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
seqann | ||
====== | ||
|
||
.. toctree:: | ||
:maxdepth: 4 | ||
|
||
seqann |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
seqann.feature\_client.apis package | ||
=================================== | ||
|
||
Submodules | ||
---------- | ||
|
||
seqann.feature\_client.apis.features\_api module | ||
------------------------------------------------ | ||
|
||
.. automodule:: seqann.feature_client.apis.features_api | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
|
||
Module contents | ||
--------------- | ||
|
||
.. automodule:: seqann.feature_client.apis | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
seqann.feature\_client.models package | ||
===================================== | ||
|
||
Submodules | ||
---------- | ||
|
||
seqann.feature\_client.models.feature module | ||
-------------------------------------------- | ||
|
||
.. automodule:: seqann.feature_client.models.feature | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
seqann.feature\_client.models.feature\_request module | ||
----------------------------------------------------- | ||
|
||
.. automodule:: seqann.feature_client.models.feature_request | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
seqann.feature\_client.models.sequence module | ||
--------------------------------------------- | ||
|
||
.. automodule:: seqann.feature_client.models.sequence | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
|
||
Module contents | ||
--------------- | ||
|
||
.. automodule:: seqann.feature_client.models | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
seqann.feature\_client package | ||
============================== | ||
|
||
Subpackages | ||
----------- | ||
|
||
.. toctree:: | ||
|
||
seqann.feature_client.apis | ||
seqann.feature_client.models | ||
|
||
Submodules | ||
---------- | ||
|
||
seqann.feature\_client.api\_client module | ||
----------------------------------------- | ||
|
||
.. automodule:: seqann.feature_client.api_client | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
seqann.feature\_client.configuration module | ||
------------------------------------------- | ||
|
||
.. automodule:: seqann.feature_client.configuration | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
seqann.feature\_client.rest module | ||
---------------------------------- | ||
|
||
.. automodule:: seqann.feature_client.rest | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
|
||
Module contents | ||
--------------- | ||
|
||
.. automodule:: seqann.feature_client | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,19 +1,57 @@ | ||
SeqAnn Module | ||
seqann package | ||
============== | ||
|
||
BioSeqAnn | ||
---------- | ||
.. toctree:: | ||
|
||
seqann | ||
|
||
|
||
seqann.sequence_annotation | ||
--------------------------- | ||
|
||
.. automodule:: seqann.sequence_annotation | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
.. _seq_search: | ||
|
||
seqann.seq_search | ||
--------------------------- | ||
|
||
.. automodule:: seqann.seq_search | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
.. _gfe: | ||
|
||
Models | ||
--------------- | ||
seqann.gfe | ||
--------------------------- | ||
|
||
.. automodule:: seqann.models.annotation | ||
.. automodule:: seqann.gfe | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
.. _blast: | ||
|
||
seqann.blast_cmd | ||
--------------------------- | ||
|
||
.. automodule:: seqann.blast_cmd | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
.. _align: | ||
|
||
seqann.align | ||
--------------------------- | ||
|
||
.. automodule:: seqann.align | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
.. highlight:: shell | ||
|
||
====================== | ||
Testing | ||
====================== | ||
|
||
.. warning:: Before running tests clustalo, blastn and all the required python packages must be installed! | ||
|
||
To run all test simply execute the following command in the top directory of the SeqAnn repository. | ||
|
||
.. code-block:: console | ||
$ python -m unittest tests | ||
Different tests | ||
--------------------- | ||
|
||
.. note:: If you don't have a imgt_biosql db running then not all of the test will run! | ||
|
||
* test_seqann | ||
* test_align | ||
* test_blast | ||
* test_feature | ||
* test_gfe | ||
* test_refdata | ||
* test_seqsearch | ||
* test_util | ||
|
||
Running specific tests | ||
----------------------- | ||
|
||
You can test a specific test by providing the full test path on the command line. | ||
|
||
.. code-block:: console | ||
$ python -m unittest tests.test_seqann.TestBioSeqAnn.test_004_ambig | ||
Oops, something went wrong.