# Hands on: Exploring literature-derived networks for data interpretation with INDRA

In [1]:
!pip install -q -r requirements.txt

## 1. Introduction

### Recall - Example Dataset

We use an example dataset produced from an MSstats differential abundance analysis.  This dataset is a small molecule dataset with known inhibition targets.  It includes 8 small molecule inhibitors and a control DMSO holdout. 

In [2]:
import pandas as pd

DATA_PATH = "dataProcessOutput.csv"

def import_data(filename):
    pandas_df = pd.read_csv(filename)
    return pandas_df

input_data = import_data(DATA_PATH)
input_data

Unnamed: 0,RUN,Protein,LogIntensities,originalRUN,GROUP,SUBJECT,TotalGroupMeasurements,NumMeasuredFeature,MissingPercentage,more50missing,NumImputedFeature
0,1,1433B_HUMAN,12.873423,230719_THP-1_Chrom_end2end_Plate1_DMSO_A02_DIA,DMSO,2,1210,10,0.0,False,0
1,2,1433B_HUMAN,12.866217,230719_THP-1_Chrom_end2end_Plate1_DMSO_A05_DIA,DMSO,5,1210,10,0.0,False,0
2,3,1433B_HUMAN,12.686827,230719_THP-1_Chrom_end2end_Plate1_DMSO_A10_DIA,DMSO,10,1210,10,0.0,False,0
3,4,1433B_HUMAN,12.625462,230719_THP-1_Chrom_end2end_Plate1_DMSO_A12_DIA,DMSO,12,1210,10,0.0,False,0
4,5,1433B_HUMAN,12.538365,230719_THP-1_Chrom_end2end_Plate1_DMSO_B01_DIA,DMSO,13,1210,10,0.0,False,0
...,...,...,...,...,...,...,...,...,...,...,...
1189821,266,ZZZ3_HUMAN,10.384438,230719_THP-1_Chrom_end2end_Plate3_DMSO_A10,VTP50469,202,170,10,0.0,False,0
1189822,267,ZZZ3_HUMAN,10.231615,230719_THP-1_Chrom_end2end_Plate3_DMSO_B03,VTP50469,207,170,10,0.0,False,0
1189823,268,ZZZ3_HUMAN,10.502691,230719_THP-1_Chrom_end2end_Plate3_DbET6_C07,VTP50469,223,170,10,0.0,False,0
1189824,269,ZZZ3_HUMAN,10.674776,230719_THP-1_Chrom_end2end_Plate3_DMSO_C11,VTP50469,227,170,10,0.0,False,0


### Experimental Factors:
| Treatment    | Target |
| :-------- | :------- |
| DMSO  | Control    |
| VTP50469  | MEN1    |
| PF477736 | Chk1    |
| Jakafi    | JAK1/2    |
| K-975  | TEAD1   |
| VE-821 | ATR    |
| dBET6    | BRD2/3/4   |


Our next goal is to make sense of the treatments and targets.  One option is to look for downstream targets of one or more drugs. Another option is to look for the neighborhood or the upstream controllers of an interesting protein, etc.

## 2. How can we look for downstream targets for a drug?  What about upstream controllers of a protein?

INDRA (Integrated Network and Dynamical Reasoning Assembler) is an automated model assembly system, originally developed for molecular systems biology and then generalized to other domains (see INDRA World). INDRA draws on natural language processing systems and structured databases to collect mechanistic and causal assertions, represents them in a standardized form (INDRA Statements), and assembles them into various modeling formalisms including causal graphs and dynamical models.

At the core of INDRA are its knowledge-level assembly procedures, allowing sources to be assembled into coherent models, a process that involves correcting systematic input errors, finding and resolving redundancies, inferring missing information, filtering to a relevant scope and assessing the reliability of causal information.

The detailed INDRA documentation is available at http://indra.readthedocs.io.

In [3]:
from indra.sources.indra_db_rest.api import get_statements_from_query
from indra.sources.indra_db_rest.query import HasAgent, HasEvidenceBound, HasType
from indra.assemblers.html import HtmlAssembler
from IPython.core.display import HTML

For each drug, we can ground their names using gilda.  In this example, we look at the downstream targets of DBET6

In [4]:
# Query for https://db.indra.bio/statements/from_agents?subject=dBET6&format=html
query = HasAgent("dBET6", role="SUBJECT")
p = get_statements_from_query(query, sort_by = "belief")

ha = HtmlAssembler(p.statements,
                   title='INDRA subnetwork statements',
                   db_rest_url='https://db.indra.bio',
                   ev_counts=p.get_ev_counts(),
                   source_counts=p.get_source_counts())
html_str = ha.make_model()
# HTML(html_str)

INFO: [2024-04-25 10:23:28] indra_db_rest.query_processor - Retrieving statements that have an agent where NAME=dBET6 with role=SUBJECT.
INFO: [2024-04-25 10:23:28] indra_db_rest.request_logs - Running 0th request for statements
INFO: [2024-04-25 10:23:28] indra_db_rest.request_logs -   LIMIT: None
INFO: [2024-04-25 10:23:28] indra_db_rest.request_logs -   OFFSET: 0


We can also search based on a namespace + ID combination.  The below query is for the drug Jakafi, which we had determined had the curie chebi:66917 from gilda previously. 

One can also specify an evidence bound to only collect high evidence statements.  For additional query parameters, see [here](https://indra.readthedocs.io/en/latest/modules/sources/indra_db_rest/index.html#indra.sources.indra_db_rest.query.Query)

In [5]:
query = HasAgent("66917", namespace="CHEBI") & HasEvidenceBound(["> 10"])
p = get_statements_from_query(query, sort_by = "belief")

ha = HtmlAssembler(p.statements,
                   title='INDRA subnetwork statements',
                   db_rest_url='https://db.indra.bio',
                   ev_counts=p.get_ev_counts(),
                   source_counts=p.get_source_counts())
html_str = ha.make_model()
# HTML(html_str)

INFO: [2024-04-25 10:23:33] indra_db_rest.query_processor - Retrieving statements that have > 10 evidence and have an agent where CHEBI=66917.
INFO: [2024-04-25 10:23:33] indra_db_rest.request_logs - Running 0th request for statements
INFO: [2024-04-25 10:23:33] indra_db_rest.request_logs -   LIMIT: None
INFO: [2024-04-25 10:23:33] indra_db_rest.request_logs -   OFFSET: 0


We can use a similar query to look for upstream controllers of a protein. In the below example, we look for upstream inhibitors of BRD2, BRD3, and BRD4.  We should expect to see DBET1 as an inhibitor of BRD2.  

In [9]:
query = ((HasAgent("BRD2", role="OBJECT") | HasAgent("BRD3", role="OBJECT") | HasAgent("BRD4", role="OBJECT"))
         & HasType(["Inhibition"]))
p = get_statements_from_query(query, sort_by = "belief")

ha = HtmlAssembler(p.statements,
                   title='INDRA subnetwork statements',
                   db_rest_url='https://db.indra.bio',
                   ev_counts=p.get_ev_counts(),
                   source_counts=p.get_source_counts())
html_str = ha.make_model()
# HTML(html_str)

INFO: [2024-04-25 10:24:14] indra_db_rest.query_processor - Retrieving statements that (have an agent where NAME=BRD2 with role=OBJECT, have an agent where NAME=BRD3 with role=OBJECT, or have an agent where NAME=BRD4 with role=OBJECT) and have type Inhibition.
INFO: [2024-04-25 10:24:14] indra_db_rest.request_logs - Running 0th request for statements
INFO: [2024-04-25 10:24:14] indra_db_rest.request_logs -   LIMIT: None
INFO: [2024-04-25 10:24:14] indra_db_rest.request_logs -   OFFSET: 0
INFO: [2024-04-25 10:24:21] indra_db_rest.request_logs - Running 1st request for statements
INFO: [2024-04-25 10:24:21] indra_db_rest.request_logs -   LIMIT: None
INFO: [2024-04-25 10:24:21] indra_db_rest.request_logs -   OFFSET: 500
INFO: [2024-04-25 10:24:25] indra.assemblers.html.assembler - Removing CHEBI from refs due to too many matches: {'CHEBI:95080', 'CHEBI:137113'}
INFO: [2024-04-25 10:24:25] indra.assemblers.html.assembler - Removing UP from refs due to too many matches: {'Q17103', 'P01106'}