## Real-time TranQL Query
This notebook will make a real-time TranQL query and construct a model using the returned knowledge graph.

First, everything needs to be loaded.

In [1]:
%load_ext tranql_jupyter
from build_complex import (
    make_model as make_complex_model,
    get_dataset as make_complex_dataset,
    make_type_predicate_mappings
)

Now, let's make a TranQL query to Robokop about Measles.

In [2]:
# disease=measles
kg = %tranql_query SELECT chemical_substance->gene->disease from "/graph/gamma/quick" where disease="MONDO:0004619"

2020-08-05 12:08:29,128 - tranql.tranql_ast - DEBUG - Starting queries on service: http://localhost:8099/graph/gamma/quick (asynchronous=True)
2020-08-05 12:08:46,486 - tranql.tranql_ast - DEBUG - Making requests took 17.354593515396118 s (asynchronous = True)
Query completed with 0 errors.


We can look at the graph a bit to get an idea of its structure and the response in general.

In [3]:
kg.render_force_graph_3d()

Sadly, for this graph, the only edge type that exists between genes and measles (disease) is gene_associated_with_condition. Since there aren't any other predicates that the model can predict between a gene and measles, we can't explore this relationship.

Instead, we'll look at the relationship between a chemical_substance and gene. The chemical_substance of choice isn't significant, so we're going to arbitrarily choose the chemical citral, which interacts with the gene IL2 in this graph.

In [4]:
citral = kg.get_node_by_name("citral")["id"]
il2 = kg.get_node_by_name("IL2")["id"]

# Get all edges going from citral->IL2
citral_il2 = kg.get_edge(citral, il2)
# And from IL2->citral
il2_citral = kg.get_edge(il2, citral)

print([e["type"][0] for e in citral_il2])
print([e["type"][0] for e in il2_citral])

print(citral_il2[0]["predicate_id"])

['increases_secretion_of']
[]
biolink:increases_secretion_of


So, the only edge between them is `citral-[increases_secretion_of]->IL2`. Also, for reference, let's check out what relationship there is between IL2 and measles.

In [5]:
measles = kg.get_node_by_name("measles")["id"]
measles_il2 = kg.get_edge(measles, il2)
il2_measles = kg.get_edge(il2, measles)
print([e["type"][0] for e in measles_il2])
print([e["type"][0] for e in il2_measles])

m_il_edge = measles_il2[0]
il_m_edge = il2_measles[0]
print(m_il_edge["relation_label"])
print(il_m_edge["relation_label"])

['gene_associated_with_condition']
['gene_associated_with_condition']
['gene_involved']
['gene_involved']


Alright, to recap what we've seen so far:
1. We have the chemical citral and the gene IL2
2. Citral increases the secretion of the gene IL2
3. IL2 is involved in measles in some way.

With some quick research, we can gain some insight into how IL2 may be involved in measles. It is worth noting that this topic is complicated and lots of research has been done about it. I'm not knowledable in the subject, so I'll try to give as accurate of a description as I can without making assumptions.

IL-2 (Interleukin-2), is an interleukin, as its name suggests. Interleukins are a class of cytokines, which have various roles in the immune system. Generally, cytokines act as signaling molecules that assist in cell-to-cell communication. Their larger role within the immune system is to mediate and regulate immunity, inflammation, and hematopoiesis (1).

There have been numerous papers on the subject of cytokines and measles, but without spending a lot more time reading them I can't really say anything beyond that they are correlated in some manner or other.

1. https://www.sinobiological.com/resource/cytokines/cytokine-function#:~:text=Cytokines%20are%20a%20large%20group,regulate%20immunity%2C%20inflammation%20and%20hematopoiesis.

**So**, what we can say is this: IL2 is probably related to the immune response to measles somehow. Citral increases the secretion of IL2, so it *might* affect how the immune system responds to a measles infection. All-in-all, this isn't very useful information, but it's better than nothing.

#### Now, let's just give it a go. We're going to see what other edges the model thinks could exist between citral and IL2.
It's possible that nothing will be helpful, but at this point I can't readily change what chemical I'm examining because I've already written too much.

We're going to use the same `validate_complex` method that's used in the notebook "Validating Results".

In [6]:
%%capture
dataset = make_complex_dataset(kg)
model = make_complex_model(dataset)

In [7]:
import pandas as pd
from stellargraph.mapper import KGTripleGenerator
def validate_complex(predicates):
    edge_df = pd.DataFrame([
        {
            "source": edge[0],
            "label": edge[1],
            "target": edge[2]
        } for edge in predicates
    ])
    print(edge_df.head(), "\n")

    # Make a data generator for working with triple-based knowledge graph models like ComplEx.
    # https://stellargraph.readthedocs.io/en/stable/api.html#stellargraph.mapper.KGTripleGenerator
    generator = KGTripleGenerator(
        dataset,
        batch_size=10
    )
    # Create a Keras Sequence that yields edges in edge_df
    flow = generator.flow(edge_df)
    # Create predictions
    predictions = model.predict(flow)
    for idx, prediction in enumerate(predictions):
        prediction = prediction[0]
        if prediction <= 0: continue
        source, predicate, target = edge_df.iloc[idx]
        print(f"Prediction of edge {source}-[{predicate}]->{target} existing: {prediction}")

Before we can do that though, we first need to get what possible predicates there are between a gene and disease.

In [8]:
mappings = make_type_predicate_mappings(kg)
chem_gene = mappings["chemical_substance"]["gene"]
gene_chem = mappings["gene"]["chemical_substance"]
print(chem_gene, "\n")
print(gene_chem)

['directly_interacts_with', 'decreases_activity_of', 'decreases_secretion_of', 'increases_activity_of', 'increases_expression_of', 'increases_secretion_of', 'decreases_expression_of', 'affects_secretion_of', 'increases_stability_of', 'decreases_molecular_interaction', 'increases_molecular_interaction', 'decreases_response_to', 'affects_expression_of', 'decreases_stability_of', 'interacts_with', 'increases_localization_of'] 

['decreases_activity_of', 'has_gene_product', 'affects_localization_of', 'increases_response_to', 'decreases_response_to', 'affects_response_to']


Of these predicates, a couple may relate to `citral-[increase_secretion_of]->IL2`:
- `affects_secretion_of` maybe
- `interacts_with` (perhaps `increases_secretion_of` can be considered an interaction)
- `increases_activity_of` (this might be a stretch)

Although none of these that I think might work are an edge from `gene->chemical_substance`, we'll still try it in that direction just because.

In [9]:
predict_edges = [(citral, predicate, il2) for predicate in chem_gene]
predict_edges += [(il2, predicate, citral) for predicate in gene_chem]
validate_complex(predict_edges)

         source                    label         target
0  MESH:C007076  directly_interacts_with  NCBIGene:3558
1  MESH:C007076    decreases_activity_of  NCBIGene:3558
2  MESH:C007076   decreases_secretion_of  NCBIGene:3558
3  MESH:C007076    increases_activity_of  NCBIGene:3558
4  MESH:C007076  increases_expression_of  NCBIGene:3558 

Prediction of edge MESH:C007076-[increases_secretion_of]->NCBIGene:3558 existing: 2.7210638523101807
Prediction of edge MESH:C007076-[increases_stability_of]->NCBIGene:3558 existing: 0.005242411978542805
Prediction of edge MESH:C007076-[decreases_molecular_interaction]->NCBIGene:3558 existing: 0.0020310436375439167
Prediction of edge MESH:C007076-[interacts_with]->NCBIGene:3558 existing: 0.002701463643461466
Prediction of edge NCBIGene:3558-[decreases_activity_of]->MESH:C007076 existing: 0.004864335060119629
Prediction of edge NCBIGene:3558-[decreases_response_to]->MESH:C007076 existing: 0.4299658536911011


## Conclusion

### Results
The results could have been better, but the models tend to perform increasingly worse the smaller the dataset is, and our graph is only a few hundred nodes and edges. Also, if you run this yourself, you'll likely see widely varying results than what we have here. This is because on small datasets such as this, the training set tends to have a huge impact on what the model gets to learn about the graph and how it behaves. Since the training/test/validation sets are random samples of the overall graph, there's a fair chance that the graph won't be shown crucial features of the graph occasionally.

It's not the end of the world that the graph doesn't predict any new edges particularly strongly, because there just isn't that much for it go on.

### Takeaways
The main takeaway from this notebook is not the results, which were arguably destined to be lackluster. Rather, I hope you learned something about the process from this and maybe something about how this model can be used for this kind of thinking.