# The INDRA Database: Description and Demos

This notebook walks through some of the basic structure of the INDRA Database, and then works through some use-case examples. It is generally assumed for the purposes of this notebook (unless otherwise stated), that the user has direct access to the database.

## The Need-to-knows of INDRA

As the name suggests, this database is built using the tools of INDRA, and in turn it can be used to help with many uses of INDRA. It is thus valuable to go over some key features of the INDRA toolbox.

### The INDRA Statement
The bread and butter of the INDRA Database, and of INDRA itself, is the INDRA Statement, which is described extensively [here](file:///home/patrick/Workspace/indra/doc/_build/html/modules/statements.html). These Statements provide a robust and fairly extensible format for representing mechanistic interactions as Python objects. For the purposes of this tutorial, it is essential to know that Statements:
- Have a **type**, for example:
    - Phosphorylation
    - Complex
- Have **agents**, which in turn have some **db refs**, for example:
    - MEK has the Famplex db ref id MEK
    - Vemurafenib is an agent with the db refs for a CHEBI id "CHEBI:63637" and a ChEMBL id "ChEMBL1229517"
    
Most have two agents, a subject and an object, for example:
- `Phosphorylation(MEK(), ERK())
- `Inhibition(Vemurafenib(), BRAF())`

but there are some types of Statement that are notable exceptions:
- Complexes (any number of agents)
- Auto-Phospohorylations (one agent)

### Sources of INDRA Statements
INDRA has implemented tools for loading and generating these Statements from several sources. Here, the key points to recall are that:
- INDRA can draw from both from **machine reading systems** such as REACH, and from **mechanism databases**, such as Pathway Commons
- For readings, INDRA also provides the groundwork for **running certain readers at massive scales**, fairly easily using AWS Batch.
- The results from these sources, especially when combined, **contain a lot of duplicate and closely related information**.

### Preassembly of INDRA Statements
To build useful models from all these sources, INDRA supplies tools to perform what is call "preasssembly" (what you do before "assembling" your model), in which:
- grounding is regularize (fixes agent db refs), sites, and agent names. **Such Preassembled Statements can be uniquely identified by a hash generated from their contents**.
- the **redunant information between sources is merged, *with the original source information and evidence preserved*, into a distilled set of unique mechanisms**
- the relationship between similar mechanistic information is recorded, such that a more general Statement, such as `Phosphorylation(MEK(), ERK())` can be identified as generalizing `Phosphorylation(MAP2K1(), MAPK1())`. 


## The Structure of the Database

The INDRA Database is made up of several tables. There are 4 core groups:
- Text to Read (`text_refs`, `text_content`)
- Sources of statements: readings and external databases (`reading`, `db_info`)
- 