Explaining Protein-Protein Interactions with Knowledge Graph-based Semantic Similarity

This repository provides an implementation described in the paper: https://www.sciencedirect.com/science/article/pii/S0010482524001604.

Pre-requesites

install python 3.6.8;
install java JDK 11.0.4;
install python libraries by running the following command: pip install -r req.txt.

KGsim2vec

KGsim2vec is a novel method to generate explainable vector representations of entity pairs in a knowledge graph to support learning with minimal losses in performance when compared to opaque models. This framework computes the explainable vector representations, then applies machine learning algorithms to generate predictive models, and finally generates explanations.

(1) Generating Explainable Features

KGsim2vec generates an explainable vector representation of entity pairs based on the semantic similarities between the entities according to different semantic aspects of the ontology, i.e., subgraphs of the ontology at the same depth. The semantic aspects of the ontology are defined by three parameters:

alpha is the minimum number of semantic aspects and can be set to manipulate the size and consequently the level of detail afforded by the explainable vectors (the default value is 10);
beta is the distance to a leaf class and can be set to remove subgraphs of insufficient depth (the default value is 0);
gamma is the percentage of entities annotated in the semantic aspects (the default value is 0);

(2) Supervised Learning

Four types of ML algorithms are used to learn relation prediction models: decision trees (DT and DT6) and genetic programming (GP and GP6x), random forest (RF) and eXtreme gradient boosting (XGB).

(3) Generating Explanations

For interpretable models (decision trees and genetic programming), the explanation is the model itself. However, for the black-box models (random forest and eXtreme gradient boosting), a surrogate model is added to produce local models to explain individual predictions. We employed two of the most well-known post-hoc explainability methods: LIME (Local Interpretable Model-Agnostic Explanations) and LORE (Local Rule-Based Explanations).

(4) Evaluating Explanations

To evaluate the explanations, we considered two aspects: size and informativeness.

Run KGsim2vec

Run the command:

python3 run_kgsim2vec.py alpha gamma beta

How to Cite

@article {PMID:38308873,
	Title = {Explaining protein-protein interactions with knowledge graph-based semantic similarity},
	Author = {Sousa, Rita T and Silva, Sara and Pesquita, Catia},
	DOI = {10.1016/j.compbiomed.2024.108076},
	Volume = {170},
	Month = {March},
	Year = {2024},
	Journal = {Computers in Biology and Medicine},
	ISSN = {0010-4825},
	Pages = {108076},
	URL = {https://doi.org/10.1016/j.compbiomed.2024.108076},
}

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
Experimental_Results		Experimental_Results
Prediction		Prediction
SS_Calculation		SS_Calculation
gplearn_variations		gplearn_variations
lime		lime
Methodology.png		Methodology.png
README.md		README.md
evaluation_explanations.py		evaluation_explanations.py
req.txt		req.txt
run_kgsim2vec.py		run_kgsim2vec.py
ss_calculation_sas.py		ss_calculation_sas.py
supervised_learning.py		supervised_learning.py

liseda-lab/ExplainablePPI

Folders and files

Latest commit

History

Repository files navigation

Explaining Protein-Protein Interactions with Knowledge Graph-based Semantic Similarity

Pre-requesites

KGsim2vec

(1) Generating Explainable Features

(2) Supervised Learning

(3) Generating Explanations

(4) Evaluating Explanations

Run KGsim2vec

How to Cite

About

Resources

Stars

Watchers

Forks

Languages