### Phenotypic Co-occurrence
One measure enabled by the HPOA disease to phenotype data is phenotypic co-occurrence.  Phenotypic co-occurrence is of interest because it allows us to hypothesize that there is a dependent relationship between two phenotypes.  Although it does not imply causality, it may indicate a causal relationship between two phenotypes and and a third (or more) latent variables, or a causual relationship between two phenotypes. For example, _progressive muscle weakness_ causes _falls_.  Or an alternative example, there is a biological process and/or environmental factor that is causes allergies and asthma to co-occurr.  The latter is of interest in Monarch because we can query biological pathways and processes (GO, Reactome, etc.) related to phenotypes by joining gene-disease-phenotype relationships.  In addition, this type of analysis could  be useful to weight gene to phenotype associations where phenotypes co-occur in a mendelian disease where we have identified a causual gene to disease association.  This analysis could also be used to hypothesize pleiotropic effects.

For simplicitiy, we will treat all phenotypes and diseases as flat (or leaf nodes) in our disease to phenotype association data.  In practice we know this is not correct, and both phenotype groups and disease groups appear in the association data.

#### Approach
We will query the HPOA association data using the Monarch Neo4J database.  These values may differ from querying the raw dataset due to merging equivalent diseases in MONDO.

This analysis assumes we are starting with a phenotype of interest.  Although a more comprehensive analyses, as in generating a co-occurrence matrix, may also be useful.


#### About this notebook
This notebook uses the _Prominent nasal bridge_ as an example phenotype.  This can be changed in the second cell to analyze different phenotypes.

Dependencies:

* pip install requests
* pip install pandas
* pip install neo4j-driver

In [290]:
import requests

# Query the Monarch database for co-occurrence of prominent nasal bridge and other phenotypes
# This can performed using a count aggregate function in cypher

phenotype = "HP:0000426" # Prominent nasal bridge

SCIGRAPH = "https://scigraph-data-dev.monarchinitiative.org/scigraph/"
scigraph_exec = SCIGRAPH + "cypher/execute"
scigraph_resolve = SCIGRAPH + "cypher/resolve"

# The WHERE p1 <> p2 because clique merging of diseases causes
# duplicate edges, need to fix
cypher_query = """
    MATCH (disease:disease)-[:RO:0002200]->(p1:Node{iri:'%s'}),
          (disease)-[:RO:0002200]->(p2:Node)
    WHERE p1 <> p2
    RETURN p2.label as phenotype, COUNT(DISTINCT(disease)) as disease_count
    ORDER BY disease_count DESC
""" % phenotype

params = {
    'cypherQuery': cypher_query,
    'limit': 10
}

scigraph_req = requests.get(scigraph_exec, params=params)
print(scigraph_req.text) # Default format is ascii text table, but we can get back json

+--------------------------------------------------+
| phenotype                        | disease_count |
+--------------------------------------------------+
| "Short stature"                  | 67            |
| "Global developmental delay"     | 67            |
| "Microcephaly"                   | 67            |
| "Intellectual disability"        | 65            |
| "Seizures"                       | 50            |
| "Micrognathia"                   | 48            |
| "Downslanted palpebral fissures" | 46            |
| "High palate"                    | 45            |
| "Cryptorchidism"                 | 44            |
| "Low-set ears"                   | 43            |
+--------------------------------------------------+
10 rows



#### Normalization
We know that the distribution of phenotypes across diseases is not uniform.  In other words, some phenotypes are annnotated to diseases more often than others.  Therefore, we need to normalize this data.  This is not to be confused with the frequency that a patient presents with a phenotype or expressivity.

Two common methods for normalizing co-occurrence data are Jaccard similarity and Cosine similarity.

We define these two approaches as:

$$Jaccard(P1,P2) = \frac{\mid \ P1 \cap P2 \ \mid}{\mid\  P1 \mid + \mid P2 \mid - \mid P1 \cap P2 \ \mid }$$

Where 
$$ \mid \  P \mid = \text{Number of diseases annotated to phenotype P} $$

For cosine similarity we will use the Ochiai coefficient, defined as:

$$\text{Ochiai coefficient(P1,P2)} = \frac{\mid P1 \cap P2 \mid}{\sqrt{\mid \ P1 \mid \times \mid P2 \ \mid}}$$

#### Approach
This can achieved in pure cypher, but we may consider creating a function on the server.


In [291]:
# Normalize with jaccard similarity
cypher_query = """
    MATCH (disease:disease)-[:RO:0002200]->(p1:Node{iri:'%s'}),
          (disease)-[:RO:0002200]->(p2:Phenotype)
    WHERE p2 <> p1
    WITH p2, COUNT(DISTINCT(disease)) as co_count
    MATCH (disease:disease)-[:RO:0002200]->(p1:Node{iri:'%s'})
    WITH COUNT(DISTINCT(disease)) as p1_count, p2, co_count
    MATCH (disease:disease)-[:RO:0002200]->(p2)
    WITH COUNT(DISTINCT(disease)) as p2_count, p1_count, p2, co_count
    RETURN p2.label as phenotype, p1_count, p2_count, co_count as intersection,
           toFloat(co_count)/((p1_count + p2_count)-co_count) as normal_coef
    ORDER BY normal_coef DESC
    """ % (phenotype, phenotype)

params = {
    'cypherQuery': cypher_query,
    'limit': 10
}

scigraph_req = requests.get(scigraph_exec, params=params)
print(scigraph_req.text) # Default format is ascii text table, but we can get back json

+---------------------------------------------------------------------------------------------+
| phenotype                        | p1_count | p2_count | intersection | normal_coef         |
+---------------------------------------------------------------------------------------------+
| "Long face"                      | 129      | 117      | 27           | 0.1232876712328767  |
| "Highly arched eyebrow"          | 129      | 102      | 22           | 0.10526315789473684 |
| "Downslanted palpebral fissures" | 129      | 372      | 46           | 0.1010989010989011  |
| "High palate"                    | 129      | 429      | 45           | 0.08771929824561403 |
| "Thin vermilion border"          | 129      | 110      | 19           | 0.08636363636363636 |
| "Posteriorly rotated ears"       | 129      | 184      | 24           | 0.08304498269896193 |
| "Short philtrum"                 | 129      | 151      | 21           | 0.08108108108108109 |
| "Macrotia"                       | 129

In [292]:
# Normalize with ochiai coefficient
cypher_query = """
    MATCH (disease:disease)-[:RO:0002200]->(p1:Node{iri:'%s'}),
          (disease)-[:RO:0002200]->(p2:Phenotype)
    WHERE p1 <> p2
    WITH DISTINCT p1, p2, disease
    WITH p2, COUNT(DISTINCT(disease)) as co_count
    MATCH (disease:disease)-[:RO:0002200]->(p1:Node{iri:'%s'})
    WITH COUNT(DISTINCT(disease)) as p1_count, p2, co_count
    MATCH (disease:disease)-[:RO:0002200]->(p2)
    WITH COUNT(DISTINCT(disease)) as p2_count, p1_count, p2, co_count
    RETURN p2.label as phenotype, p1_count, p2_count, co_count as intersection,
    toFloat(co_count)/sqrt(p1_count * p2_count) as normal_coef
    ORDER BY normal_coef DESC
    """ % (phenotype, phenotype)

params = {
    'cypherQuery': cypher_query,
    'limit': 10
}

scigraph_req = requests.get(scigraph_exec, params=params)
print(scigraph_req.text) # Default format is ascii text table, but we can get back json

+---------------------------------------------------------------------------------------------+
| phenotype                        | p1_count | p2_count | intersection | normal_coef         |
+---------------------------------------------------------------------------------------------+
| "Long face"                      | 129      | 117      | 27           | 0.21977383072747697 |
| "Downslanted palpebral fissures" | 129      | 372      | 46           | 0.20998656367149773 |
| "Microcephaly"                   | 129      | 941      | 67           | 0.19230259103012684 |
| "Highly arched eyebrow"          | 129      | 102      | 22           | 0.19179078635417854 |
| "High palate"                    | 129      | 429      | 45           | 0.19128856646721754 |
| "Low-set ears"                   | 129      | 488      | 43           | 0.1713814012647042  |
| "Short stature"                  | 129      | 1187     | 67           | 0.17122003759530918 |
| "Ptosis"                         | 129

#### Results
The original results showed several neurological abnormalities: seizures, global development delay, intellecutal disability.  In contrast, after normalization the majority of co-occurring phenotypes are related to morphilogical abnormalities of the head, with the exception of cryptorchidism.  It should be noted that cryptorchidism and microcephaly are phenotype groups, and considering disease to phenotype associations of their subclasses may downweight their co-occurrence.

#### Normalization on phenotypic frequency
For a single disease to phenotype association, the HPOAs provide a frequency field, defined as the frequency of patients that show a particular clinical feature. Examples are Obligate, Frequent, and Occasional.

We can use this data to further weight/normalize phenotypic co-occurrence data.  For example, if two phenotypes occur "very frequently" in the same disease, this would be weighted higher than if one phenotype occurs very frequently, and one occurs occasionally.  We consider the intersection the minimum frequency between two phenotypes.  In addition, the total disease count will be adjusted to account for frequency.

As a test we will set the following weights:

| Frequency    | Definition     | Weight |
|:-------------|:----------------|:--------|
|Excluded      |present in 0%    | 0      |
|Very rare     |present in 1-4%  |. 25    |
|Occasional    |present in 5-29% |1.7    |
|Frequent      |present in 30-79%|5.45    |
|Very frequent |present in 80-99%|8.95    |
|Obligate      |present in 100%|10    |
|Not provided  ||4    |




In [293]:
import pandas as pd
from neo4j.v1 import GraphDatabase

# Frequency weight map
freq_weights = {
    'HP:0040285': 0,
    'HP:0040284': .25,
    'HP:0040283': 1.7,
    'HP:0040282': 5.45,
    'HP:0040281': 8.95,
    'HP:0040280': 10,
    'unknown':    4 # Not sure how to boost this
}

# Result table
result_table = pd.DataFrame()

# Note this query would be a lot shorter if frequencies were edge properties
cypher_query = """
      MATCH (disease:disease)-[:RO:0002200]->(p1:Node{iri:'%s'}),
            (disease)-[:RO:0002200]->(p2:Phenotype)
      WHERE p1 <> p2
      WITH DISTINCT p1, p2, disease
      OPTIONAL MATCH (assoc:association)-[:OBAN:association_has_subject]->(disease),
            (assoc)-[:OBAN:association_has_object]->(p1),
            (assoc)-[:OBAN:association_has_predicate]->(has_pheno:Node{iri:'http://purl.obolibrary.org/obo/RO_0002200'}),
            (assoc)-[::frequencyOfPhenotype]->(freq_p1)
      WITH p1, p2, disease, freq_p1
      OPTIONAL MATCH (assoc:association)-[:OBAN:association_has_subject]->(disease),
            (assoc)-[:OBAN:association_has_object]->(p2),
            (assoc)-[:OBAN:association_has_predicate]->(has_pheno:Node{iri:'http://purl.obolibrary.org/obo/RO_0002200'}),
            (assoc)-[::frequencyOfPhenotype]->(freq_p2)
      RETURN p1, p2, disease, freq_p1, freq_p2
      """ % phenotype

params = {
    'cypherQuery': cypher_query
}

scigraph_req = requests.get(scigraph_resolve, params=params)
resolved_query = scigraph_req.text # Resolve curies to IRIs

scigraph_bolt = "bolt://neo4j.monarchinitiative.org:443"
driver = GraphDatabase.driver(scigraph_bolt, auth=("neo4j", "password"))

def get_scigraph_results(query):
    with driver.session() as session:
        with session.begin_transaction() as tx:
            for record in tx.run(query):
                yield record

for result in get_scigraph_results(resolved_query):
    row = {}
    row['query_phenotype'] = result['p1']['label']
    row['phenotype'] = result['p2']['label']
    row['disease'] = result['disease']['label']
    row['qphenotype_curie'] = result['p1']['iri'].replace("http://purl.obolibrary.org/obo/HP_", "HP:")
    row['phenotype_curie'] = result['p2']['iri'].replace("http://purl.obolibrary.org/obo/HP_", "HP:")
    row['disease_curie'] = result['disease']['iri'].replace("http://purl.obolibrary.org/obo/MONDO_", "MONDO:")

    if result['freq_p1'] is not None:
        freq_p1_curie = \
            result['freq_p1']['iri'].replace("http://purl.obolibrary.org/obo/HP_", "HP:")
    else:
        freq_p1_curie = 'unknown'
    if result['freq_p2'] is not None:
        freq_p2_curie = \
            result['freq_p2']['iri'].replace("http://purl.obolibrary.org/obo/HP_", "HP:")
    else:
        freq_p2_curie = 'unknown'
        
    row['q_phenotype_frequency'] = freq_weights[freq_p1_curie]
    row['phenotype_frequency'] = freq_weights[freq_p2_curie]
    
    result_table = result_table.append(row, ignore_index=True)
    
result_table.head()

Unnamed: 0,disease,disease_curie,phenotype,phenotype_curie,phenotype_frequency,q_phenotype_frequency,qphenotype_curie,query_phenotype
0,dentinogenesis imperfecta-short stature-hearin...,MONDO:0019102,Cone-shaped epiphysis,HP:0010579,8.95,8.95,HP:0000426,Prominent nasal bridge
1,dentinogenesis imperfecta-short stature-hearin...,MONDO:0019102,"Intellectual disability, mild",HP:0001256,8.95,8.95,HP:0000426,Prominent nasal bridge
2,dentinogenesis imperfecta-short stature-hearin...,MONDO:0019102,Abnormal facial shape,HP:0001999,8.95,8.95,HP:0000426,Prominent nasal bridge
3,dentinogenesis imperfecta-short stature-hearin...,MONDO:0019102,Short philtrum,HP:0000322,8.95,8.95,HP:0000426,Prominent nasal bridge
4,dentinogenesis imperfecta-short stature-hearin...,MONDO:0019102,Sensorineural hearing impairment,HP:0000407,8.95,8.95,HP:0000426,Prominent nasal bridge


In [294]:
aggregate_table = pd.DataFrame()

phenotypes = result_table['phenotype'].unique()

for pheno in phenotypes:
    group_by_pheno = result_table[result_table['phenotype'] == pheno]
    intersection = group_by_pheno.loc[:, ['phenotype_frequency', 'q_phenotype_frequency']].min(axis=1).sum()
    row = {
        'phenotype': pheno,
        'phenotype_curie': group_by_pheno.iloc[0]['phenotype_curie'],
        'intersection': intersection
    }
    aggregate_table = aggregate_table.append(row, ignore_index=True)

aggregate_table.sort_values(by=['intersection'], ascending=False).head(10)
    

Unnamed: 0,intersection,phenotype,phenotype_curie
10,308.85,Global developmental delay,HP:0001263
86,285.9,Microcephaly,HP:0000252
9,285.9,Short stature,HP:0004322
38,275.35,Intellectual disability,HP:0001249
19,213.9,Micrognathia,HP:0000347
101,212.2,Downslanted palpebral fissures,HP:0000494
18,192.5,Cryptorchidism,HP:0000028
17,185.75,High palate,HP:0000218
56,180.3,Low-set ears,HP:0000369
44,176.4,Ptosis,HP:0000508


The top ten look similar to our original top ten list. However, we still need to normalize this data.  For the next step we will leverage our solr cache.  Solr/Golr is useful because we can toggle between treating phenotype disease annotations as flat or querying grouping classes (when applicable, such as microcephaly).

In [295]:
import math

# Pull down whole pivot table
solr = 'https://solr-dev.monarchinitiative.org/solr/golr/select/'
params = {
  "facet.pivot": "object,frequency",
  "fq": [
    "subject_category:disease",
    "object_category:phenotype"
  ],
  "rows": "0",
  "q": "*:*",
  "facet.limit": "12000", # Should get all of HPO
  "f.object_closure.facet.prefix": "HP",
  "facet.method": "enum",
  "facet.mincount": "1",
  "facet": "true",
  "wt": "json",
  "facet.sort": "count"
};

solr_req = requests.get(solr, params=params)
pivot_table = solr_req.json()

aggregate_table['p2_count'] = 0
phenotype_ids = result_table['phenotype_curie'].unique()


def calculate_weighted_frequency(facet):
    count = (int(facet['count']) - sum([freq['count'] for freq in facet['pivot']])) * freq_weights['unknown']
    for freq in facet['pivot']:
        count += (freq['count'] * freq_weights[freq['value']])
    return count

for facet in pivot_table['facet_counts']['facet_pivot']['object,frequency']:
    if facet['value'] == phenotype:
        if 'pivot' in facet:
            count = calculate_weighted_frequency(facet)
        else:
            count = int(facet['count']) * freq_weights['unknown']
        aggregate_table['p1_count'] = count
    elif facet['value'] in phenotype_ids:
        if 'pivot' in facet:
            count = calculate_weighted_frequency(facet)
        else:
            count = int(facet['count']) * freq_weights['unknown']
        aggregate_table.loc[(aggregate_table['phenotype_curie'] == facet['value']), "p2_count"] = count

def calculate_jaccard(intersection, count1, count2):
    return intersection / ((count1 + count2) - intersection)

def calcuate_ochiai(intersection, count1, count2):
    return intersection / math.sqrt(count1 * count2)


aggregate_table['jaccard_sim'] = aggregate_table.apply(
        func=lambda row: calculate_jaccard(
                              row['intersection'],
                              row['p1_count'],
                              row['p2_count']),
        axis=1
)
    
aggregate_table['ochiai_coeff'] = aggregate_table.apply(
        func=lambda row: calcuate_ochiai(
                              row['intersection'],
                              row['p1_count'],
                              row['p2_count']),
        axis=1
)

aggregate_table.sort_values(by=['jaccard_sim'], ascending=False).head(15)

Unnamed: 0,intersection,phenotype,phenotype_curie,p2_count,p1_count,jaccard_sim,ochiai_coeff
335,104.45,Long face,HP:0000276,517.25,638.55,0.099348,0.181744
101,212.2,Downslanted palpebral fissures,HP:0000494,1788.05,638.55,0.095827,0.19859
3,101.5,Short philtrum,HP:0000322,708.8,638.55,0.08147,0.150871
160,68.9,Low anterior hairline,HP:0000294,293.85,638.55,0.079792,0.159059
17,185.75,High palate,HP:0000218,1889.0,638.55,0.079319,0.169128
63,88.05,Thin vermilion border,HP:0000233,571.0,638.55,0.078511,0.145819
83,116.15,Macrotia,HP:0000400,970.85,638.55,0.077783,0.147518
57,116.75,Narrow mouth,HP:0000160,1020.75,638.55,0.075686,0.14461
188,76.35,Highly arched eyebrow,HP:0002553,456.3,638.55,0.074963,0.141445
147,99.25,Posteriorly rotated ears,HP:0000358,795.5,638.55,0.074356,0.139256


In [296]:
aggregate_table.sort_values(by=['ochiai_coeff'], ascending=False).head(15)

Unnamed: 0,intersection,phenotype,phenotype_curie,p2_count,p1_count,jaccard_sim,ochiai_coeff
101,212.2,Downslanted palpebral fissures,HP:0000494,1788.05,638.55,0.095827,0.19859
335,104.45,Long face,HP:0000276,517.25,638.55,0.099348,0.181744
86,285.9,Microcephaly,HP:0000252,4470.4,638.55,0.059278,0.169217
17,185.75,High palate,HP:0000218,1889.0,638.55,0.079319,0.169128
160,68.9,Low anterior hairline,HP:0000294,293.85,638.55,0.079792,0.159059
3,101.5,Short philtrum,HP:0000322,708.8,638.55,0.08147,0.150871
113,168.5,Epicanthus,HP:0000286,1977.3,638.55,0.06885,0.149957
49,164.1,Wide nasal bridge,HP:0000431,1885.9,638.55,0.069524,0.149538
56,180.3,Low-set ears,HP:0000369,2288.95,638.55,0.06563,0.149135
10,308.85,Global developmental delay,HP:0001263,6719.8,638.55,0.043812,0.149098


For this example, adding frequency data does not seem to affect the results much with the exception that _low anterior hairline_ is upranked in the frequency aware lists.

Follow up questions:
- What cutoff is considered significant for grouping phenotypes (or should we generate a matrix and cluster?)
- What genes are associated with this group of phenotypes?
- What processes and pathways are associated with these genes?


In [297]:
# Get the top two hits
top_phenotypes = aggregate_table.sort_values(
    by=['jaccard_sim'],
    ascending=False)['phenotype_curie'][0:2]

# Add query phenotype
pheno_group = top_phenotypes.tolist()
pheno_group.append(phenotype)

def get_solr_docs(solr, params):
    resultCount = params['rows']
    while params['start'] < resultCount:
        solr_request = requests.get(solr, params=params)
        response = solr_request.json()
        resultCount = response['response']['numFound']
        params['start'] += params['rows']
        for doc in response['response']['docs']:
            yield doc
            
# Get intersection of genes
def get_gene_intersect(phenotypes, solr):
    genes = dict()
    is_first = True
    for term in phenotypes:
        params = {
            'wt': 'json',
            'rows': 500,
            'start': 0,
            'q': '*:*',
            'fl': 'subject, subject_label',
            'fq': ['subject_category:"gene"',
                   'object_closure: "{}"'.format(term)
                  ]
        }
        temp_dict = {}
        for doc in get_solr_docs(solr, params):
            temp_dict[doc['subject']] = doc['subject_label']
        
        if is_first:
            genes = temp_dict
        else:
            gene_set = genes.keys() & temp_dict.keys()
            genes = {k: temp_dict[k] for k in gene_set}
        is_first = False
    return genes

genes = get_gene_intersect(pheno_group, solr)

list(genes.values())

['OTUD6B',
 'MED12',
 'TMEM237',
 'CEP152',
 'FBN1',
 'TBX1',
 'CEP290',
 'SNAP29',
 'SIN3A',
 'EBF3',
 'MKS1',
 'HERC1',
 'RECQL4',
 'NPHP1',
 'SATB2',
 'BDH1',
 'KAT6B']

In [330]:
import itertools
# Get all related pathways and GO processes

def get_facet_table(solr, params):
    """
    fetch and convert facet table to dictionary
    """
    results = {}
    solr_request = requests.get(solr, params=params)
    response = solr_request.json()
    for facet in response['facet_counts']['facet_fields'].values():
        temp_dict = dict(itertools.zip_longest(*[iter(facet)] * 2, fillvalue=""))
        results = {**results, **temp_dict}
    return results

gene_or_filter = ""
for gene in genes.keys():
    gene_or_filter += 'OR subject_closure:"{}" '.format(gene)
gene_or_filter = gene_or_filter[3:]

# Go mol function params
params = {
    'wt': 'json',
    'rows': 0,
    'q': '*:*',
    'facet': 'true',
    'facet.sort': 'count',
    'facet.method': 'enum',
    'facet.mincount': '1',
    'facet.limit': '3000',
    'facet.field': 'object_label',
    'fl': 'subject, subject_label',
    'fq': [gene_or_filter,
           'relation:"RO:0002331"',
           
          ]
}

go_functions = get_facet_table(solr, params)
go_functions

{'AURKA Activation by TPX2': 4,
 'Adaptive Immune System': 1,
 'Anchoring of the basal body to the plasma membrane': 8,
 'Antigen processing: Ubiquitination & Proteasome degradation': 2,
 'Autophagy - animal': 1,
 'Butanoate metabolism': 1,
 'Cell Cycle': 2,
 'Cell Cycle, Mitotic': 2,
 'Centrosome maturation': 2,
 'Chromatin modifying enzymes': 1,
 'Chromatin organization': 1,
 'Cilium Assembly': 4,
 'Class I MHC mediated antigen processing & presentation': 1,
 'DNA duplex unwinding': 1,
 'DNA repair': 1,
 'DNA replication': 2,
 'DNA strand renaturation': 1,
 'Degradation of the extracellular matrix': 2,
 'Developmental Biology': 1,
 'Elastic fibre formation': 2,
 'Epigenetic regulation of gene expression': 1,
 'Extracellular matrix organization': 1,
 'Factors involved in megakaryocyte development and platelet production': 2,
 'Fatty acid, triacylglycerol, and ketone body metabolism': 3,
 'G2/M Transition': 2,
 'G2/M transition of mitotic cell cycle': 2,
 'Gene Expression': 2,
 'Generi

In [331]:
# Go process params
params = {
    'wt': 'json',
    'rows': 0,
    'q': '*:*',
    'facet': 'true',
    'facet.sort': 'count',
    'facet.method': 'enum',
    'facet.mincount': '1',
    'facet.limit': '3000',
    'facet.field': 'object_label',
    'fl': 'subject, subject_label',
    'fq': [gene_or_filter,
           'relation:"RO:0002327"',
           
          ]
}

go_process = get_facet_table(solr, params)
go_process

{'3-hydroxybutyrate dehydrogenase activity': 1,
 'ARF guanyl-nucleotide exchange factor activity': 1,
 'ATP binding': 1,
 "ATP-dependent 3'-5' DNA helicase activity": 1,
 'DNA binding': 1,
 'DNA binding transcription factor activity': 1,
 'RNA binding': 1,
 'RNA polymerase II activating transcription factor binding': 1,
 'RNA polymerase II distal enhancer sequence-specific DNA binding': 1,
 'RNA polymerase II proximal promoter sequence-specific DNA binding': 2,
 'RNA polymerase II repressing transcription factor binding': 1,
 'RNA polymerase II transcription coactivator activity': 1,
 'RNA polymerase II transcription cofactor activity': 1,
 'RNA polymerase II transcription corepressor activity': 1,
 'SNAP receptor activity': 1,
 'acetyltransferase activity': 1,
 'annealing helicase activity': 1,
 'beta-catenin binding': 1,
 'bubble DNA binding': 1,
 'calcium ion binding': 1,
 'chromatin binding': 3,
 'extracellular matrix constituent conferring elasticity': 1,
 'extracellular matrix st

In [332]:
# pathway params
params = {
    'wt': 'json',
    'rows': 0,
    'q': '*:*',
    'facet': 'true',
    'facet.sort': 'count',
    'facet.method': 'enum',
    'facet.mincount': '1',
    'facet.limit': '3000',
    'facet.field': 'object_label',
    'fl': 'subject, subject_label',
    'fq': [gene_or_filter,
           'object_category:"pathway"',
           
          ]
}

pathways = get_facet_table(solr, params)
pathways

{'AURKA Activation by TPX2': 2,
 'Adaptive Immune System': 1,
 'Anchoring of the basal body to the plasma membrane': 4,
 'Antigen processing: Ubiquitination & Proteasome degradation': 1,
 'Autophagy - animal': 1,
 'Butanoate metabolism': 1,
 'Cell Cycle': 2,
 'Cell Cycle, Mitotic': 2,
 'Centrosome maturation': 2,
 'Chromatin modifying enzymes': 1,
 'Chromatin organization': 1,
 'Cilium Assembly': 4,
 'Class I MHC mediated antigen processing & presentation': 1,
 'Degradation of the extracellular matrix': 1,
 'Developmental Biology': 1,
 'Elastic fibre formation': 1,
 'Epigenetic regulation of gene expression': 1,
 'Extracellular matrix organization': 1,
 'Factors involved in megakaryocyte development and platelet production': 1,
 'Fatty acid, triacylglycerol, and ketone body metabolism': 3,
 'G2/M Transition': 2,
 'Gene Expression': 2,
 'Generic Transcription Pathway': 1,
 'HATs acetylate histones': 1,
 "Hedgehog 'off' state": 1,
 'Hemostasis': 1,
 "Huntington's disease": 1,
 'Immune Sy

In [None]:
# TODO GO term enrichment with ontobio