Skip to content

Commit

Permalink
Merge pull request #168 from pagreene/schema-doc
Browse files Browse the repository at this point in the history
Overhaul Documentation
  • Loading branch information
pagreene committed May 11, 2021
2 parents 5717000 + a65e194 commit 5bc59fc
Show file tree
Hide file tree
Showing 22 changed files with 2,504 additions and 1,308 deletions.
4 changes: 2 additions & 2 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
'IPython.sphinxext.ipython_directive',
'IPython.sphinxext.ipython_console_highlighting',
'citations',
'm2r'
'm2r2'
]

# Add any paths that contain templates here, relative to this directory.
Expand Down Expand Up @@ -314,7 +314,7 @@
'functools32', 'ndex2', 'ndex2.client', 'ndex2.niceCXNetwork',
'nltk', 'reportlab', 'reportlab.lib', 'reportlab.lib.enums',
'reportlab.lib.pagesizes', 'reportlab.platypus', 'reportlab.lib.styles',
'reportlab.lib.units'
'reportlab.lib.units', 'indra.tools.assemble_corpus', 'indra.ontology.bio',
]
for mod_name in MOCK_MODULES:
sys.modules[mod_name] = mock.MagicMock()
Expand Down
21 changes: 5 additions & 16 deletions doc/modules/client/readonly/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,24 +12,13 @@ our database to access it even as we perform daily updates on the principal
database, without worrying about queries interfering.


Get Pre-Assembled Statements (:py:mod:`indra_db.client.readonly.pa_statements`)
Construct composable queries (:py:mod:`indra_db.client.readonly.query`)
-------------------------------------------------------------------------------

Here are the tools used to get PA Statements from the readonly database, with
the goal of retrieving at least 1,000 Statements with 10 evidence each in under
30 seconds.
This is a sophisticated system of classes that can be used to form queires
for preassembled statements from the readonly database.

.. automodule:: indra_db.client.readonly.pa_statements
.. automodule:: indra_db.client.readonly.query
:members:
:member-order: bysource


Get Simple Interactions from Metadata (:py:mod:`indra_db.client.readonly.interactions`)
---------------------------------------------------------------------------------------

This provides an API to get somewhat less detailed data than above, using just
the metadata of the database (not looking into the Statement JSONs), but is
much faster. These tools can be sufficient if, for example, all that is needed
is an interactome.

.. automodule::indra_db.client.readonly.interactions
:memebrs:
1 change: 1 addition & 0 deletions doc/modules/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ INDRA Database modules
util/index.rst
managers/index.rst
reading/index.rst
preassembly/index.rst
schemas/index.rst
misc.rst

11 changes: 6 additions & 5 deletions doc/modules/managers/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,11 +52,12 @@ handling.
:members:


Readonly Manager (:py:mod:`indra_db.managers.readonly_manager`)
---------------------------------------------------------------
Static Dump Manager (:py:mod:`indra_db.managers.dump_manager`)
--------------------------------------------------------------

This handles the generation of the content for the readonly database from the
principal database.
This handles the generation of static dumps, including the readonly database
from the principal database.

.. automodule:: indra_db.managers.readonly_manager
.. automodule:: indra_db.managers.dump_manager
:members:
:member-order: bysource
1 change: 1 addition & 0 deletions doc/modules/misc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ access to SQLAlchemy's API.

.. automodule:: indra_db.databases
:members:
:member-order: bysource


Belief Calculator (:py:mod:`indra_db.belief`)
Expand Down
30 changes: 30 additions & 0 deletions doc/modules/preassembly/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
Database Integrated Preassembly Tools
=====================================

The database runs incremental preassembly on the raw statements to generate
the preassembled (PA) Statements. The code to accomplish this task is defined
here, principally in :class:`DbPreassembler
<indra_db.preassembly.preassemble_db.DbPreassembler>`. This module also
defines proceedures for running these jobs on AWS.

Database Preassembly (:py:mod:`indra_db.preassembly.preassemble_db`)
--------------------------------------------------------------------

This module defines a class that manages preassembly for a given list of
statement types on the local machine.

.. automodule:: indra_db.preassembly.preassemble_db
:members:
:member-order: bysource


A Class to Manage and Monitor AWS Batch Jobs (:py:mod:`indra_db.preassembly.submitter`)
---------------------------------------------------------------------------------------

Allow a manager to monitor the Batch jobs to prevent runaway jobs, and smooth
out job runs and submissions.

.. automodule:: indra_db.preassembly.submitter
:members:
:member-order: bysource

23 changes: 9 additions & 14 deletions doc/modules/reading/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ to a standard interface, which then allows readers to be run in a plug-and-play
manner.

.. automodule:: indra_db.reading.read_db
:members:
:members:
:member-order: bysource


The Database Script for Running on AWS (:py:mod:`indra_db.reading.read_db_aws`)
Expand All @@ -25,23 +26,17 @@ This is the script used to run reading on AWS Batch, generally run from an
AWS Lambda function.

.. automodule:: indra_db.reading.read_db_aws
:members:
:members:
:member-order: bysource

The Database Reporter (:py:mod:`indra_db.reading.report_db_aws`)
----------------------------------------------------------------

Create an object that is used to aggregate and report on the reading process,
allowing for effective monitoring.

.. automodule:: indra_db.reading.report_db_aws
:members:

A Class to Manage and Monitor AWS Batch Jobs (:py:mod:`indra_db.reading.submit_reading_pipeline`)
-------------------------------------------------------------------------------------------------
A Class to Manage and Monitor AWS Batch Jobs (:py:mod:`indra_db.reading.submitter`)
-----------------------------------------------------------------------------------

Allow a manager to monitor the Batch jobs to prevent runaway jobs, and smooth
out job runs and submissions.

.. automodule:: indra_db.reading.submit_reading_pipeline
:members:
.. automodule:: indra_db.reading.submitter
:members:
:member-order: bysource

7 changes: 4 additions & 3 deletions doc/modules/schemas/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,9 @@ as some useful mixin classes.
Principal Database Schema (:py:mod:`indra_db.schemas.principal_schema`)
-----------------------------------------------------------------------

Defines the `get_schema` function for the principal database, which represents
the "ground truth" of the knowledge we aggregate.

.. automodule:: indra_db.schemas.principal_schema
:members:
:member-order: bysource

Readonly Database Schema (:py:mod:`indra_db.schemas.readonly_schema`)
---------------------------------------------------------------------
Expand All @@ -21,6 +19,7 @@ external services to access the Statement knowledge we acquire.

.. automodule:: indra_db.schemas.readonly_schema
:members:
:member-order: bysource

Class Mix-ins (:py:mod:`indra_db.schemas.mixins`)
-------------------------------------------------
Expand All @@ -30,6 +29,7 @@ table objects via multiple inheritance.

.. automodule:: indra_db.schemas.mixins
:members:
:member-order: bysource

Indexes (:py:mod:`indra_db.schemas.indexes`)
--------------------------------------------
Expand All @@ -40,3 +40,4 @@ class mixin definition.

.. automodule:: indra_db.schemas.indexes
:members:
:member-order: bysource
22 changes: 10 additions & 12 deletions indra_db/client/datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,6 @@ def get_statement_essentials(clauses, count=1000, db=None, preassembled=True):
list of sqlalchemy WHERE clauses to pass to the filter query.
count : int
Number of statements to retrieve and process in each batch.
do_stmt_count : bool
Whether or not to perform an initial statement counting step to give
more meaningful progress messages.
db : :py:class:`DatabaseManager`
Optionally specify a database manager that attaches to something
besides the primary database, for example a local database instance.
Expand Down Expand Up @@ -148,18 +145,19 @@ def export_relation_dict_to_tsv(relation_dict, out_base, out_types=None):
"""Export a relation dict (from get_relation_dict) to a tsv.
Available output types are:
- "full_tsv" : get a tsv with directed pairs of entities (e.g. HGNC
symbols), the type of relation (e.g. Phosphorylation) and the hash
of the preassembled statement. Columns are agent_1, agent_2 (where
agent_1 affects agent_2), type, hash.
symbols), the type of relation (e.g. Phosphorylation) and the hash
of the preassembled statement. Columns are agent_1, agent_2 (where
agent_1 affects agent_2), type, hash.
- "short_tsv" : like the above, but without the hashes, so only one
instance of each pair and type trio occurs. However, the information
cannot be traced. Columns are agent_1, agent_2, type, where agent_1
affects agent_2.
instance of each pair and type trio occurs. However, the information
cannot be traced. Columns are agent_1, agent_2, type, where agent_1
affects agent_2.
- "pairs_tsv" : like the above, but without the relation type. Similarly,
each row is unique. In addition, the agents are undirected. Thus this
is purely a list of pairs of related entities. The columns are just
agent_1 and agent_2, where nothing is implied by the ordering.
each row is unique. In addition, the agents are undirected. Thus this
is purely a list of pairs of related entities. The columns are just
agent_1 and agent_2, where nothing is implied by the ordering.
Parameters
----------
Expand Down
54 changes: 27 additions & 27 deletions indra_db/client/readonly/query.py
Original file line number Diff line number Diff line change
Expand Up @@ -691,8 +691,8 @@ def get_interactions(self, ro=None, limit=None, offset=None,
sort_by='ev_count') -> Optional[QueryResult]:
"""Get the simple interaction information from the Statements metadata.
Each entry in the result corresponds to a single preassembled Statement,
distinguished by its hash.
Each entry in the result corresponds to a single preassembled Statement,
distinguished by its hash.
Parameters
----------
Expand Down Expand Up @@ -1859,6 +1859,10 @@ def get_clause(ro):
class FromMeshIds(_TextRefCore):
"""Find Statements whose text sources were given one of a list of MeSH IDs.
This object can be constructed from a list of mixed "D" and "C" type mesh
IDs, but for reasons of querying, those IDs will be separated into two
separate classes and a :class:`Union <Union>` of the two classes returned.
Parameters
----------
mesh_ids : list
Expand All @@ -1867,9 +1871,11 @@ class FromMeshIds(_TextRefCore):
Attributes
----------
mesh_ids : tuple
The mesh IDs.
The immutable tuple of mesh IDs, on their original string form.
_mesh_type : str
"C" or "D" indicating which types of IDs are held in this object.
_mesh_nums : list[int]
The mesh IDs converted to integers, stripped of their prefix.
"""
list_name = 'mesh_ids'

Expand Down Expand Up @@ -1901,7 +1907,6 @@ def __new__(cls, mesh_ids: list):
def __init__(self, mesh_ids):
self.mesh_ids = tuple(set(mesh_ids))
self._mesh_nums = []
self._mesh_concept_nums = []
self._mesh_type = None
for mesh_id in self.mesh_ids:
if self._mesh_type is None:
Expand Down Expand Up @@ -2965,29 +2970,24 @@ class EvidenceFilter:
We need to be able to perform logical operations between evidence to handle
important cases:
HasSource(['reach']) & FromMeshIds(['D0001'])
-> we might reasonably want to filter evidence for the second subquery but
not the first.
HasOnlySource(['reach']) & FromMeshIds(['D00001'])
-> Here we would likely want to filter the evidence for both sub queries.
HasOnlySource(['reach']) | FromMeshIds(['D000001'])
-> Not sure what this even means (its purpose)....not sure what we'd do for
evidence filtering when the original statements are or'ed
HasDatabases() & FromMeshIds(['D000001'])
-> Here you COULDN'T perform an & on the evidence, because the two sources
are mutually exclusive (only readings connect to mesh annotations).
However it could make sense you would want to do an "or" between the
evidence, so the evidence is either from a database or from a mesh
annotated document.
"filter all the evidence" and "filter none of the evidence" should
definitely be options. Although "Filter for all" might run into usues with
the "HasDatabase and FromMeshIds" scenario. I think no evidence filter should
be the default, and if you attempt a bogus "filter all evidence" (as with
that scenario) you get an error.
- ``HasSource(['reach']) & FromMeshIds(['D0001'])``: we might reasonably
want to filter evidence for the second subquery but not the first.
- ``HasOnlySource(['reach']) & FromMeshIds(['D00001'])``: Here we would
likely want to filter the evidence for both sub queries.
- ``HasOnlySource(['reach']) | FromMeshIds(['D000001'])``: It is not clear
what this even means (its purpose) or what we'd do for evidence filtering
when the original statements are or'ed
- ``HasDatabases() & FromMeshIds(['D000001'])``: Here you COULDN'T perform
an & on the evidence, because the two sources are mutually exclusive
(only readings connect to mesh annotations). However it could make sense
you would want to do an "or" between the evidence, so the evidence is
either from a database or from a mesh annotated document.
Both "filter all the evidence" and "filter none of the evidence" should
definitely be options. Although "Filter for all" might run into uses with
the "HasDatabase and FromMeshIds" scenario. I think no evidence filter
should be the default, and if you attempt a bogus "filter all evidence" (as
with that scenario) you get an error.
"""

def __init__(self, filters=None, joiner='and'):
Expand Down
13 changes: 0 additions & 13 deletions indra_db/client/readonly/relation.py

This file was deleted.

0 comments on commit 5bc59fc

Please sign in to comment.