<H1>KG-IDG dataset</H1>
<p>The <a href="https://druggablegenome.net/">Illuminating Druggable Genome (IDG) Consortium</a> has the aim of highlighting current knowledge of protein targets through integration of informatics tools.</p>
<p><a href="https://academic.oup.com/bioinformatics/article/39/7/btad418/7211646">KG-Hub</a> is a platform that enables standardized construction, exchange, and reuse of KGs.
Here, we use the KG-Hub graph created with KG-HUB.</p>
<p>This notebook demonstrates how the KG-IDG dataset is generated using GRAPE. We first extract the entire graph and
then choose the largest connected component. Here, we are interested in predicting protein-drug interactions; we therefore
focus on edges whose source node is one of "biolink:ChemicalSubstance", "biolink:ChemicalEntity", "biolink:Drug" (representing medications) and whose destination node is "biolink:Protein". We define the corresponding edge type as "minority_edge" (the name is arbitrary).</p>
<p>The notebook then demonstrates the code used for training using SMOKE_TEST mode (the actual analysis was performed using the script TODO </p>

In [5]:
from grape.datasets.kghub import KGIDG
g = KGIDG(version='20230801') # see here for dashboard describing graph: http://kghub.org/kg-hub-dashboard/
g

In [6]:
main_component = g.remove_components(top_k_components=1)
dense_main_component = main_component.remove_dendritic_trees()
dense_main_component

In [7]:
drug_types = ["biolink:ChemicalSubstance", "biolink:ChemicalEntity", "biolink:Drug"]
# edge_of_interest = ["biolink:interacts_with", "biolink:molecularly_interacts_with", "biolink:physically_interacts_with"]
protein_types = ["biolink:Protein"]
minority_edge_type = 'minority_edge'

dense_main_component.replace_edge_type_name_from_edge_node_type_names_inplace(
    edge_type_name=minority_edge_type,
    source_node_type_names=drug_types,
    destination_node_type_names=protein_types
)
dense_main_component

# Classification
The following code snippet shows the code used to test UND and DANS. Note that here, we set SmokeTest to True (which tests the pipeline without actually running the ML code). For the actual analysis, a comparable script was run with SmokeTest set to False.

In [8]:
from tqdm import tqdm
import pandas as pd
from grape.edge_prediction import edge_prediction_evaluation
from grape.edge_prediction import PerceptronEdgePrediction
# Set smoke test to True for testing:
SMOKE_TEST = True
NUMBER_OF_HOLDOUTS = 10
VALIDATION_UNBALANCE_RATES = (1.0, )
TRAIN_SIZES = (0.75,)

subgraph = dense_main_component.filter_from_names(
    edge_type_names_to_keep=["minority_edge"]
)

results = []

fresults = []
train_size = 0.75
for validation_use_scale_free in tqdm(
    (True, False),
    desc="Validation use scale free",
    leave=False
    ):
    results.append(edge_prediction_evaluation(
        smoke_test=SMOKE_TEST,
        holdouts_kwargs=dict(
            train_size=train_size,
            edge_types=["minority_edge"],
        ),
        evaluation_schema="Connected Monte Carlo",
        graphs=dense_main_component,
        models=[
            PerceptronEdgePrediction(
               edge_features=edge_feature,
                number_of_epochs=1000,
                number_of_edges_per_mini_batch=16,
                learning_rate=0.001,
                use_scale_free_distribution=False
            ) for edge_feature in("Degree","AdamicAdar","JaccardCoefficient","ResourceAllocationIndex","PreferentialAttachment")
        ],
        #number_of_slurm_nodes=NUMBER_OF_HOLDOUTS,
        enable_cache=True,
        number_of_holdouts=NUMBER_OF_HOLDOUTS,
        use_scale_free_distribution=validation_use_scale_free,
        validation_unbalance_rates=VALIDATION_UNBALANCE_RATES,
        subgraph_of_interest=subgraph,
        use_subgraph_as_support=True
    ))
results = pd.concat(results)
results.to_csv("kg_idg_negative_select.tsv",sep="\t")

Validation use scale free:   0%|          | 0/2 [00:00<?, ?it/s]

Evaluating on KGIDG:   0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

Validation use scale free:  50%|█████     | 1/2 [00:07<00:07,  7.71s/it]

Evaluating on KGIDG:   0%|          | 0/10 [00:00<?, ?it/s]



  0%|          | 0/5 [00:00<?, ?it/s]



  0%|          | 0/5 [00:00<?, ?it/s]



  0%|          | 0/5 [00:00<?, ?it/s]



  0%|          | 0/5 [00:00<?, ?it/s]



  0%|          | 0/5 [00:00<?, ?it/s]



  0%|          | 0/5 [00:00<?, ?it/s]



  0%|          | 0/5 [00:00<?, ?it/s]



  0%|          | 0/5 [00:00<?, ?it/s]



  0%|          | 0/5 [00:00<?, ?it/s]



  0%|          | 0/5 [00:00<?, ?it/s]

                                                                        