This notebook demonstrates the basic functionality of DiseaseScope. (Still a Work in Progress!)

# Introduction

DiseaseScope is a collection of tools to infer disease associated genes, pathways, and their structure. The only prior knowledge needed from the user is the Disease Ontology ID of the disease of interest. First, DiseaseScope queries publicly available knowledge sources, such as Human Phenotype Ontology (HPO), OMIM, DisGeNet, Disease Ontology (DO), and more, to gather a seed set of disease associated genes.

# Getting Disease Genes and Tissues

In [17]:
%load_ext autoreload
%autoreload 2

from diseasescope import DiseaseScope

# DOID:2841 This is the DOID for Asthma
scope = DiseaseScope(2841, convert_doid=True)
scope

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


DiseaseScope query "asthma (DOID: 2841)" with attributes 

Note that DiseaseScope will look up the Disease Ontology ID to get the English phrase for the disease. This phrase will be used only by the tissue inference service, if needed. To gather a set of genes, we offer two methods: `biothings`, which queries HPO and DO for relevant genes, or `disgenet`, which contains a pre-aggregated list of disease associated genes. 

In [20]:
scope.get_disease_genes(method="biothings")

INFO:root:Getting HP ids from DOID


http://biothings.io/explorer/api/v2/directinput2output?input_prefix=doid&output_prefix=ncbigene&input_value=2841&format=translator
Returned 99 genes


DiseaseScope query "asthma (DOID: 2841)" with attributes seed genes

The genes are available in the `genes` attribute. Here are the first 5 disease genes (in Entrez Gene ID).

In [23]:
scope.genes[:5]

['3565', '222256', '3291', '142', '2068']

To convert between different gene names, use the `convert_scope` method for genes. 

In [27]:
scope.genes.convert_scope("symbol")[:5]

querying 1-99...done.
Finished.


['IL4', 'CDHR3', 'HSD11B2', 'PARP1', 'ERCC2']

If the user already has a query tissue in mind, he/she can simply assign the attribute. Otherwise, we recommend using the "pubmed" method to infer the relevant tissue. This service compares the word embedding vector of the disease to that of a pre-defined set of tissues and return the tissue that is the most similar.

In [30]:
scope.get_disease_tissues(method="pubmed")

DiseaseScope query "asthma (DOID: 2841)" with attributes seed genes, tissues

The tissues can be accessed by the `tissues` attribute, which contains a list of dictionary, each with three keys: `tissue`, which specify the tissue associated with the disease, `score`, the cosine similarity between the word embedding vectors, and `percentile`, which compares the similarity to a background of random words in the corpus.

In [31]:
scope.biggim

[{'tissue': 'lung', 'score': '0.3881', 'percentile': '99.9000'},
 {'tissue': 'bronchial_epithelium',
  'score': '0.3592',
  'percentile': '100.0000'},
 {'tissue': 'respiratory_epithelium',
  'score': '0.3238',
  'percentile': '100.0000'},
 {'tissue': 'bronchial_epithelial_cell',
  'score': '0.3162',
  'percentile': '99.8000'},
 {'tissue': 'eosinophil', 'score': '0.3128', 'percentile': '99.8000'}]

# Get Network and Expand

For this example, we will pull a network from NDEx and use it to perform random walk to expand the seed gene set. (BigGIM example will be coming soon!) This example uses PCNet, which is a composite network that has integrated many different networks into one. While large, it is comparatively sparser than behemoth networks such as STRING and GIANT and has been shown to be very informative at recovering disease genes. See Huang and Carlin et al, Cell Systems 2018 for more information. The UUID for PCNet is actually stored as a class attribute `PCNET_UUID` for easy access.

In [47]:
scope.get_network(method="ndex", uuid=scope.PCNET_UUID) 

DiseaseScope query "asthma (DOID: 2841)" with attributes seed genes, genes, tissues, network

In [56]:
scope.genes.convert_scope("symbol", inplace=True)

querying 1-99...done.
Finished.


In [58]:
scope.expand_gene_set(
    method="random walk", 
    alpha=0.56, 
    n=250,
    add_subnetwork=True
)

DiseaseScope query "asthma (DOID: 2841)" with attributes seed genes, genes, tissues, network

In [64]:
(scope
    .network
    .node_table
    .sort_values(by='random walk score', ascending=False)
    .reset_index(drop=True)
    .head())

Unnamed: 0,heat,name,random walk score
0,1,IL1B,0.605009
1,1,TNF,0.595927
2,1,CCL2,0.595191
3,1,SCGB1A1,0.593805
4,1,ICAM1,0.589855


In [63]:
scope.expanded_genes[:5]

['IL1B', 'TNF', 'CCL2', 'SCGB1A1', 'ICAM1']

Adding adjacency matrix and edge table to make the network cluster-able.

In [77]:
scope.subnetwork.add_adjacency_matrix()
scope.edge_table = scope.subnetwork.add_edge_table()
scope.edge_table.head()

Unnamed: 0,Gene1,Gene2,weight
0,IL1B,TNF,1.0
1,IL1B,CCL2,1.0
2,IL1B,ICAM1,1.0
3,IL1B,MMP9,1.0
4,IL1B,IL13,1.0


In [79]:
scope.infer_hierarchical_model(
    method='clixo-api', 
    edge_attr="weight",
    method_kwargs={
        "alpha":0.01,
        "beta": 0.5
    }
)

{'parameters': {'alpha': 0.01, 'beta': 0.5, 'hiviewurl': 'http://hiview-test.ucsd.edu', 'interactionfile': 'interactionfile', 'ndexname': 'MyOntology', 'ndexpass': 'ddot_anon', 'ndexserver': 'test.ndexbio.org', 'ndexuser': 'ddot_anon', 'tasktype': 'ddot_ontology', 'uuid': '46715c0f-59fc-4fa5-b54e-50e097df7e04'}, 'result': {'hiviewurl': 'http://hiview-test.ucsd.edu/67437046-aa4c-11e9-a5da-0660b7976219?type=test&server=http://dev2.ndexbio.org', 'ndexurl': 'http://dev2.ndexbio.org/#/network/67437046-aa4c-11e9-a5da-0660b7976219'}, 'status': 'done'}


DiseaseScope query "asthma (DOID: 2841)" with attributes seed genes, genes, tissues, network, hiview_url

In [80]:
scope.hiview_url

'http://hiview-test.ucsd.edu/67437046-aa4c-11e9-a5da-0660b7976219?type=test&server=http://dev2.ndexbio.org'

Note, this example is still being updated. A weighted network will generally produce better results.