<!--_____
***

<img width='700' src="https://user-images.githubusercontent.com/8030363/108961534-b9a66980-7634-11eb-96e2-cc46589dcb8c.png" style="vertical-align:middle">Application of PheKnowLator to the construction of RNA-centered Knowledge Graph from the integration of -->

# <p style="text-align: center;">Construction of a simplified RNA-centered Knowledge Graph</p>

***
***

**Authors:** [ECavalleri](https://mail.google.com/mail/u/0/?view=cm&fs=1&tf=1&to=emanuele.cavalleri@unimi.it), [TJCallahan](https://mail.google.com/mail/u/0/?view=cm&fs=1&tf=1&to=callahantiff@gmail.com)

**GitHub Repositories:** [testRNA-KG](https://github.com/emanuelecavalleri/testRNA-KG), [PheKnowLator](https://github.com/callahantiff/PheKnowLator/) (PKT)  
<!--- **Release:** **[v2.0.0](https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0)** --->
  
<br>  

**Objective:** Knowledge graphs provide meaningful ways to integrate heterogeneous biological data and represent complex biological mechanisms. 
[PheKnowLator](https://github.com/callahantiff/PheKnowLator) is a system that support the user in the acquisition of biomedical entities from different kinds of data sources and their representation in terms of a biomedical knowledge graph that models unbiased molecular mechanisms as prescribed by domain ontologies. In this notebook we wish to show the use of PheKnowLator for the generation of a RNA-based knowledge graph that involves **gene-disease**, **gene-miRNA**, and **miRNA-disease** relationships. These relationships are made available from the following public repositories (we reduced row size to accelerate PKT's running process):
- [gene-disease](https://storage.googleapis.com/pheknowlator/current_build/data/original_data/curated_gene_disease_associations.tsv), provided by PheKnowLator ecosystem itself
- [gene-miRNA](https://dianalab.e-ce.uth.gr/downloads/tarbase_v8_data.tar.gz), provided by TarBase
- [miRNA-disease](http://watson.compbio.iupui.edu:8080/miR2Disease/download/AllEntries.txt), provided by miR2Disease

The construction of the knowledge graph should be compliant with the following ontologies:
- Mondo Disease Ontology ([Mondo](https://obofoundry.org/ontology/mondo.html))
- Relations Ontology ([RO](https://obofoundry.org/ontology/ro.html))

<a target="_blank" href="https://github.com/emanuelecavalleri/testRNA-KG/assets/33032169/e6228042-137d-4241-8971-e674fe41d4f3"> <img src="https://github.com/emanuelecavalleri/testRNA-KG/assets/33032169/e6228042-137d-4241-8971-e674fe41d4f3"></a> 

(*Click Figure to Enlarge Image in Current Browser Tab*)

<br>

***
***

## Notebook Purpose
**Wiki Page:** **[`Current PheKnowLator release`](https://github.com/callahantiff/PheKnowLator/wiki/)**

<br>

**Purpose:** This notebook serves as a `main` file for the RNA-KG construction based on PheKnowLator project. This scripts walks through this program step-by-step and generates the knowledge graph shown above. Please see the [PheKnowLator README](https://github.com/callahantiff/PheKnowLator/blob/master/README.md) for more information.

<br>

**Assumptions:**
1. All downloaded and generated data sources are provided through [this](https://drive.google.com/drive/folders/14oXaqKbH2JkQ_nEqCRWB3T26ESaWoySK) dedicated Google Drive repository.  

2. Make sure that the following input documents have been constructed (see the [PheKnowLator Dependencies Wiki](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies) for more information):  
  - [`resource_info.txt`](https://github.com/emanuelecavalleri/testRNA-KG/blob/master/resources/resource_info.txt)
  - [`ontology_source_list.txt`](https://github.com/emanuelecavalleri/testRNA-KG/blob/master/resources/ontology_source_list.txt)
  - [`edge_source_list.txt`](https://github.com/emanuelecavalleri/testRNA-KG/blob/main/resources/edge_source_list.txt)   

3. Download [`RELATIONS_LABELS.txt`](https://github.com/emanuelecavalleri/testRNA-KG/blob/main/resources/relations_data/RELATIONS_LABELS.txt) relations label file prior to running the scripts. Please see [PheKnowLator Dependencies Wiki](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#relations-data) for more information on how it is generated.

4. Select a knowledge graph construction method (i.e. `instance-based` or `subclass-based`).  

<br>

***
### Table of Contents
***
The three primary steps involved in building a knowledge graph are `Downloading Data Sources`, `Creating Edge Lists`, and `Building the knowledge graphs`.

* [Data Sources](#data-source)  
* [Create Edge Lists](#create-edges)  
* [Build Knowledge Graph](#build-kg)  

***

***

_____
### Set-Up Environment

In [1]:
# import needed libraries
#import glob
import json
import pandas
import ray

from pkt_kg.downloads import OntData, LinkedData
from pkt_kg.edge_list import CreatesEdgeList
from pkt_kg.knowledge_graph import FullBuild, PartialBuild, PostClosureBuild

***
## Download Data Sources <a class="anchor" id="data-source"></a>

**Wiki Page:** **[`Dependencies`](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies)**  

**Purpose:**
This portion of this portion of the algorithm is to download:
1. [Download Ontology Data](#download-ontology-data)  
2. [Download Edge Data](#download-edge-data)   

<br>

**Input Files:**
  - [`resource_info.txt`](https://github.com/emanuelecavalleri/testRNA-KG/blob/master/resources/resource_info.txt)
  - [`ontology_source_list.txt`](https://github.com/emanuelecavalleri/testRNA-KG/blob/master/resources/ontology_source_list.txt)
  - [`edge_source_list.txt`](https://github.com/emanuelecavalleri/testRNA-KG/blob/master/resources/edge_source_list.txt)

<br>

**Assumption:** All sources used to construct our knowledge graph need to be preprocessed and ready to download prior to running this code. All mapping, filtering, and label data have been generated prior to this step. For assistance with creating these datasets, see the [`RNA-KG_Preparation.ipynb`](https://github.com/emanuelecavalleri/testRNA-KG/blob/main/notebooks/RNA-KG_Preparation.ipynb) Jupyter Notebook.

***
***
### Ontology Data  <a class="anchor" id="download-ontology-data"></a>
Ontologies are the core data structure used when building PheKnowLator.

In [2]:
ont = OntData('resources/ontology_source_list.txt', 'resources/resource_info.txt')

ont.parses_resource_file()

In [3]:
ont.data_files = ont.source_list
ont.generates_source_metadata()


*** Generating Metadata ***



100%|██████████| 2/2 [00:00<00:00, 9436.00it/s]
100%|██████████| 2/2 [00:00<00:00, 7660.83it/s]


In [4]:
ont._writes_source_metadata_locally()

100%|██████████| 2/2 [00:00<00:00, 9709.04it/s]


In [5]:
ont.resource_info

['gene-disease|;;|entity-class|RO_0003302|http://www.ncbi.nlm.nih.gov/gene/|http://purl.obolibrary.org/obo/|t|0;4|1:./resources/processed_data/DISEASE_MONDO_MAP.txt|10;>=;1.0|None',
 'gene-miRNA|;;|entity-entity|RO_0002434|http://www.ncbi.nlm.nih.gov/gene/|https://www.mirbase.org/hairpin/|t|0;2|0:./resources/processed_data/ENSEMBL_GENE_ENTREZ_GENE_MAP.txt;1:./resources/processed_data/MIRBASE_ID_ACCESSION_MAP.txt|None|None',
 'miRNA-disease|;;|entity-class|RO_0003302|https://www.mirbase.org/hairpin/|http://purl.obolibrary.org/obo/|t|1;0|0:./resources/processed_data/MIRBASE_ID_ACCESSION_MAP.txt;1:./resources/processed_data/DISEASE_DOID_MONDO_Map.txt|None|None']

<br>

### Edge Data   <a class="anchor" id="download-edge-data"></a>
In PheKnowLator, classes are nodes that originate from ontologies. Class data sources are Linked Data sources that are used to create edges in the knowledge graph and thus can connect to other class data sources. Sometimes we want to add data that is not already part of an ontology. In that case, data either be added as an `instance` of an existing ontology class or as its own `owl:class` by being added to the knowledge graph as a `subclass` of an existing `owl:class`.

In [6]:
edges = LinkedData('resources/edge_source_list.txt', 'resources/resource_info.txt')

edges.parses_resource_file()

In [7]:
edges.data_files = edges.source_list
edges.generates_source_metadata()


*** Generating Metadata ***



100%|██████████| 3/3 [00:00<00:00, 3097.71it/s]
100%|██████████| 3/3 [00:00<00:00, 16090.68it/s]


In [8]:
edges._writes_source_metadata_locally()

100%|██████████| 3/3 [00:00<00:00, 15907.60it/s]


In [9]:
edges.source_list.keys()

dict_keys(['gene-disease', 'miRNA-disease', 'gene-miRNA'])

***

## Create Edge Lists <a class="anchor" id="create-edges"></a>

**Wiki Page:** **[`Data Sources`](https://github.com/callahantiff/PheKnowLator/wiki/v2-Data-Sources)**

<br>

**Purpose:** The code below will take the dictionaries of processed data described above and use them to create edge lists for each of the edge types specified in the [`resource_info.txt`](https://github.com/emanuelecavalleri/testRNA-KG/blob/master/resources/resource_info.txt). Each edge list will be appended to a nested dictionary (see details below).

<br>

**Assumptions:**  
1. All `ontology` and `edge` data sources have been downloaded.   

2. All code in the [`RNA-KG_Preparation.ipynb`](https://github.com/emanuelecavalleri/testRNA-KG/blob/main/notebooks/RNA-KG_Preparation.ipynb) Jupyter Notebook has been run. This Notebook contains code needed to generate all mapping, filtering, and label data.

<br>

**Output:** `Master_Edge_List_Dict.json`. Below is an example of what the `Master Edge Dictionary` contains for each processed resource:  
```python
master_edges = {'miRNA-disease'  :
                {'source_labels'    : ';;',
                 'data_type'        : 'entity-class',
                 'edge_relation'    : 'RO_0003302',
                 'uri'              : ('https://www.mirbase.org/hairpin/',
                                       'http://purl.obolibrary.org/obo/'),
                 'delimiter'        : 't',
                 'column_idx'       : '2;1',
                 'identifier_maps'  : '0:./[...]/MIRBASE_ID_ACCESSION_MAP.txt;1:./[...]/DOID_MONDO_MAP.txt',
                 'evidence_criteria': 'None',
                 'filter_criteria'  : 'None',
                 'edge_list'        : ['...']}
```

***

In [10]:
import psutil
# set-up environment for parallel processing -- even if running program serially these steps are needed
cpus = psutil.cpu_count(logical=True)
ray.init()

2024-02-06 17:11:00,270	INFO worker.py:1625 -- Started a local Ray instance.


0,1
Python version:,3.10.12
Ray version:,2.4.0


In [11]:
# combine data sources
combined_edges = dict(edges.data_files, **ont.data_files)
resource_info_loc = './resources/resource_info.txt'

# initialize edge dictionary class
master_edges = CreatesEdgeList(data_files=combined_edges, source_file=resource_info_loc)
master_edges.runs_creates_knowledge_graph_edges(source_file=resource_info_loc, data_files=combined_edges, cpus=cpus)

[2m[36m(CreatesEdgeList pid=2988577)[0m Finished Edge: miRNA-disease (miRNA = 219, disease = 80); 1148 unique edges
[2m[36m(CreatesEdgeList pid=2988578)[0m Finished Edge: gene-miRNA (gene = 10974, miRNA = 646); 37207 unique edges
[2m[36m(CreatesEdgeList pid=2988576)[0m Finished Edge: gene-disease (gene = 3852, disease = 3018); 7663 unique edges


**Preview Master Edge Data**  
Generate a table that includes each `edge-type`, its primary `relation`, example identifiers, and count of unique edges.

In [12]:
master_edges = json.load(open('resources/Master_Edge_List_Dict.json', 'r'))
master_edges.keys()

dict_keys(['gene-miRNA', 'miRNA-disease', 'gene-disease'])

In [13]:
# read in relation data
relation_data = open('./resources/relations_data/RELATIONS_LABELS.txt').readlines()
relation_dict = {x.split('\t')[0]: x.split('\t')[1].strip('\n') for x in relation_data}

# function to return key for any value
def get_key(my_dict, val):
    for key, value in my_dict.items():
        if val == value:
            return key
 
    return "key doesn't exist"

# print basic stats on each resource
edge_data = [[key,
              get_key(relation_dict, 'http://purl.obolibrary.org/obo/'+master_edges[key]['edge_relation']),
              ', '.join(master_edges[key]['edge_list'][0]),
              len(master_edges[key]['edge_list'])]
             for key in master_edges.keys()]

# convert dict to pandas df for nice printing
df = pandas.DataFrame(edge_data, columns = ['Edge Type', 'Relation', 'Example Edge', 'Unique Edges']) 
df                

Unnamed: 0,Edge Type,Relation,Example Edge,Unique Edges
0,gene-miRNA,interacts with,"285704, MI0001446",37207
1,miRNA-disease,causes or contributes to condition,"MI0000066, MONDO_0004967",1148
2,gene-disease,causes or contributes to condition,"3339, MONDO_0010650",7663


<br><br>

***

## Build Knowledge Graph  <a class="anchor" id="build-kg"></a>
**Wiki Pages:**  
- **[`KG-Construction`](https://github.com/callahantiff/PheKnowLator/wiki/KG-Construction)**  
- **[`relations-data`](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#relations-data)**  
- **[`node-metadata`](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#node-metadata)** 

**Jupyter Notebooks:**  
- [`RNA-KG_Preparation.ipynb`](https://github.com/emanuelecavalleri/testRNA-KG/blob/master/notebooks/RNA-KG_Preparation.ipynb)  
[`Ontology_Cleaning.ipynb`](https://github.com/callahantiff/PheKnowLator/blob/master/notebooks/Ontology_Cleaning.ipynb)  


<br>

**Assumptions:**  
- <u>Construction Approach</u>. If using the `subclass-based` construction approach, please make sure that a `pickled` dictionary mapping each non-ontology data node to an existing ontology class is created and added to the `./resources/knowledge_graph` directory (please see [here](https://github.com/callahantiff/PheKnowLator/tree/master/resources/knowledge_graphs#construction-method) for additional information).   
- <u>Relations Data</u>. If inverse relation data is going to be used to build the knowledge graph, that it has been generated and added to the `./resources/relations_data` directory (please see [here](https://github.com/callahantiff/PheKnowLator/blob/master/resources/relations_data/README.md) for additional information).  
- <u>Node Metadata</u>. If node metadata is going to be used to build the knowledge graph, that it has been generated and added to the `./resources/node_metadata` directory (please see [here](https://github.com/callahantiff/PheKnowLator/blob/master/resources/node_data/README.md) for additional information).  
- <u>Decoding OWL Semantics</u>. If decoding OWL-Semantics, please make sure to provide a list of owl:Property types to keep is created and added to the `./resources/knowledge_graph` directory (please see [here](https://github.com/callahantiff/PheKnowLator/wiki/OWL-NETS-2.0) for additional information). 

<br>

**Input:** 
- `Master_Edge_List_Dict.json`  
- Directory of relations data sources - see [here](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#relations-data) for more information
- Directory of node data sources - see [here](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#node-metadata) for more information

<br>

**Output:** Please see [`Release v2.0.0 Wiki`](https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0) for access to all generated output files.   
- `Knowledge Graph` (`.owl` and Networkx MultiDiGraph `.pkl`)  
- `Class Instance URI-UUID Map` (if "instance" construction approach)   
- `Triple List - Integer`  
- `Triple List - Identifier`  
- `Node Integer-Identifier Map`  
- `Node Attribute Data`  

<br>

The process to build the knowledge graph is somewhat time consuming and can be broken into the following steps:  

1. Merge Ontologies. See [here](https://github.com/callahantiff/PheKnowLator/blob/master/resources/ontologies/README.md) for additional information on how to preprocess the ontologies prior to merging them.    

2. Create Edges. Add edge lists to merged ontologies.  

3. Add Inverse Relations and Node Data. See the [Dependencies](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies) Wiki page for details on how to construct these resources.  

4. Filter OWL Semantics. Filter the knowledge graph with the goal of removing all edges that contain entities that are needed to support owl semantics, but are not biologically meaningful (please see [here](https://github.com/callahantiff/PheKnowLator/wiki/OWL-NETS-2.0) for additional information).

5. Save Edge Lists and Node Metadata. Several versions of the knowledge graph are saved, including: the full knowledge graph (`owl` or Networkx MultiDiGraph `pickle`), triple lists (i.e. integer index and identifier labeled edge lists with a dictionary that maps between the integer indices and node identifiers), and a file of metadata (i.e. identifiers, labels, synonyms, and descriptions) for all nodes in the knowledge graph.  

<br>

**‼ IMPORTANT:**  
- The file containing the merged ontologies is quite large and can take up to 10 minutes to read in.  This is not a limitation of the code directly, but rather a function of the [`RDFLib Library`](https://github.com/RDFLib). While there are other ways to read in this data, we maintain reliance on this library as it is the most user-friendly for non-RDF users.   
- If you'd like to include [node metadata](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#node-metadata) when building the knowledge graph, please hold off on building the knowledge graph until you have generated the node data. For details on how to do this see the [node metadata](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#node-metadata) section of the `Dependencies` Wiki Page or help for help with generating the data, please see the []() section of the [`Data_Preparation.ipynb`](https://github.com/callahantiff/PheKnowLator/blob/master/notebooks/Data_Preparation.ipynb) Jupyter Notebook.

***


In [14]:
# specify input arguments
build = 'full'
construction_approach = 'instance'
add_node_data_to_kg = 'yes'
add_inverse_relations_to_kg = 'yes'
decode_owl_semantics = 'yes'
kg_directory_location = './resources/knowledge_graphs'

In [15]:
# construct knowledge graphs
if build == 'partial':
    kg = PartialBuild(construction=construction_approach,
                      node_data=add_node_data_to_kg,
                      inverse_relations=add_inverse_relations_to_kg,
                      decode_owl=decode_owl_semantics,
                      cpus=cpus,
                      write_location=kg_directory_location)
elif build == 'post-closure':
    kg = PostClosureBuild(construction=construction_approach,
                          node_data=add_node_data_to_kg,
                          inverse_relations=add_inverse_relations_to_kg,
                          decode_owl=decode_owl_semantics,
                          cpus=cpus,
                          write_location=kg_directory_location)
else:
    kg = FullBuild(construction=construction_approach,
                   node_data=add_node_data_to_kg,
                   inverse_relations=add_inverse_relations_to_kg,
                   decode_owl=decode_owl_semantics,
                   cpus=cpus,
                   write_location=kg_directory_location)

kg.construct_knowledge_graph()
ray.shutdown()


### Starting Knowledge Graph Build: FULL ###
*** Loading Relations Data ***
Loading and Processing Relation Data
*** Loading Merged Ontologies ***
Merged Ontologies Graph Stats: 2411742 triples, 792655 nodes, 115 predicates, 56282 classes, 32 individuals, 730 object props, 217 annotation props
*** Loading Node Metadata Data ***
Loading and Processing Node Metadata

Extracting Class and Relation Metadata


100%|██████████| 48193/48193 [00:04<00:00, 9806.11it/s] 
100%|██████████| 730/730 [00:00<00:00, 12830.33it/s]


*** Splitting Graph ***
Adding Namespace to BNodes
Creating Logic and Annotation Subsets of Graph


100%|██████████| 345239/345239 [05:55<00:00, 972.42it/s] 


Annotation Assertions (n=1809956 Triples)
Creating Logic Graph (n=601786 Triples)


100%|██████████| 601786/601786 [00:09<00:00, 64928.56it/s]


Merged Ontologies - Logic Subset Graph Stats: 601786 triples, 193948 nodes, 39 predicates, 56282 classes, 32 individuals, 730 object props, 217 annotation props

*** Building Knowledge Graph Edges ***
[2m[36m(EdgeConstructor pid=2992086)[0m 
[2m[36m(EdgeConstructor pid=2992086)[0m Created MIRNA-DISEASE (entity-class) Edges: 6180 OWL Edges, 1148 Original Edges; 2600 OWL Nodes, Original Nodes: 219 miRNA(s), 80 disease(s)
[2m[36m(EdgeConstructor pid=2992028)[0m Created GENE-DISEASE (entity-class) Edges: 45351 OWL Edges, 7552 Original Edges; 21871 OWL Nodes, Original Nodes: 3824 gene(s), 2965 disease(s)
[2m[36m(EdgeConstructor pid=2992028)[0m Created GENE-DISEASE (entity-class) Edges: 45351 OWL Edges, 7552 Original Edges; 21871 OWL Nodes, Original Nodes: 3824 gene(s), 2965 disease(s)
[2m[36m(EdgeConstructor pid=2991963)[0m 
[2m[36m(EdgeConstructor pid=2991963)[0m Created GENE-MIRNA (entity-entity) Edges: 246496 OWL Edges, 74414 Original Edges; 86051 OWL Nodes, Original No

100%|██████████| 891219/891219 [00:23<00:00, 37427.68it/s]


Pickling MultiDiGraph
Generating Network Statistics
Full Logic Subset (OWL) Graph Stats: 295467 nodes, 891219 edges, 1 self-loops, 5 most most common edges: http://www.w3.org/1999/02/22-rdf-syntax-ns#type:370467, http://www.w3.org/2000/01/rdf-schema#subClassOf:111493, http://www.w3.org/2002/07/owl#annotatedTarget:75463, http://www.w3.org/2002/07/owl#annotatedSource:75463, http://purl.obolibrary.org/obo/RO_0002434:74414, http://www.w3.org/2002/07/owl#onProperty:44033, average degree 3.0163063895460405, 5 highest degree nodes: http://www.w3.org/2002/07/owl#NamedIndividual:91826, http://www.w3.org/2002/07/owl#Axiom:75463, http://www.w3.org/2002/07/owl#Class:66007, http://www.w3.org/2002/07/owl#Restriction:44033, http://www.w3.org/2000/01/rdf-schema#subClassOf:35160, http://purl.obolibrary.org/obo/SO_0001217:12169, density: 1.0208641229603544e-05, 2 component(s): {0: 295464, 1: '3 nodes: http://purl.obolibrary.org/obo/mondo/releases/2024-01-03/mondo.owl | http://purl.obolibrary.org/obo/mon

  0%|          | 0/17 [00:00<?, ?it/s]

Removing owl:disjointWith Axioms
Filtering Triples


100%|██████████| 601234/601234 [00:15<00:00, 38604.43it/s]


[2m[36m(OwlNets pid=2993639)[0m Decoding 4916 OWL Classes and Axioms
[2m[36m(OwlNets pid=2993712)[0m Filtering Triples
[2m[36m(OwlNets pid=2994845)[0m Decoding 4915 OWL Classes and Axioms[32m [repeated 15x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)[0m
[2m[36m(OwlNets pid=2994698)[0m Filtering Triples[32m [repeated 14x across cluster][0m


  6%|▌         | 1/17 [03:44<59:54, 224.67s/it]

Removing owl:disjointWith Axioms
Filtering Triples


100%|██████████| 97668/97668 [00:06<00:00, 14107.83it/s]


[2m[36m(OwlNets pid=2995023)[0m Filtering Triples
[2m[36m(OwlNets pid=2995023)[0m Filtering Triples[32m [repeated 2x across cluster][0m
[2m[36m(OwlNets pid=2995069)[0m Decoding 727 OWL Classes and Axioms


 12%|█▏        | 2/17 [04:17<27:53, 111.55s/it]

Removing owl:disjointWith Axioms
Filtering Triples


100%|██████████| 18141/18141 [00:01<00:00, 15511.77it/s]


[2m[36m(OwlNets pid=2995904)[0m Filtering Triples[32m [repeated 16x across cluster][0m
[2m[36m(OwlNets pid=2995904)[0m Decoding 240 OWL Classes and Axioms[32m [repeated 15x across cluster][0m


 18%|█▊        | 3/17 [04:24<14:55, 63.96s/it] 

Removing owl:disjointWith Axioms
Filtering Triples


100%|██████████| 1668/1668 [00:00<00:00, 14562.92it/s]
 24%|██▎       | 4/17 [04:27<08:38, 39.92s/it]

Removing owl:disjointWith Axioms
Filtering Triples


0it [00:00, ?it/s]


Removing owl:disjointWith Axioms
Filtering Triples


0it [00:00, ?it/s]


Removing owl:disjointWith Axioms
Filtering Triples


0it [00:00, ?it/s]


Removing owl:disjointWith Axioms
Filtering Triples


0it [00:00, ?it/s]


Removing owl:disjointWith Axioms
Filtering Triples


0it [00:00, ?it/s]


Removing owl:disjointWith Axioms
Filtering Triples


0it [00:00, ?it/s]


Removing owl:disjointWith Axioms
Filtering Triples


0it [00:00, ?it/s]


Removing owl:disjointWith Axioms
Filtering Triples


0it [00:00, ?it/s]


Removing owl:disjointWith Axioms
Filtering Triples


0it [00:00, ?it/s]


Removing owl:disjointWith Axioms
Filtering Triples


0it [00:00, ?it/s]


Removing owl:disjointWith Axioms
Filtering Triples


0it [00:00, ?it/s]


Removing owl:disjointWith Axioms
Filtering Triples


0it [00:00, ?it/s]


Removing owl:disjointWith Axioms
Filtering Triples


0it [00:00, ?it/s]
100%|██████████| 17/17 [04:27<00:00, 15.74s/it]


Ensuring OWL-NETS Graph Contains a Single Connected Component
Obtaining node list


100%|██████████| 402464/402464 [00:00<00:00, 1516131.59it/s]


Identifying root nodes


100%|██████████| 57832/57832 [00:16<00:00, 3563.53it/s]


Updating graph connectivity
369 triples added to make connected
Serializing OWL-NETS Graph
Converting Knowledge Graph to MultiDiGraph


100%|██████████| 201601/201601 [00:05<00:00, 38425.32it/s]


Pickling MultiDiGraph
Generating Network Statistics
OWL-NETS Graph Stats: 57833 nodes, 201601 edges, 10 self-loops, 5 most most common edges: http://www.w3.org/2000/01/rdf-schema#subClassOf:87768, http://purl.obolibrary.org/obo/RO_0002434:74414, http://purl.obolibrary.org/obo/RO_0003302:8690, http://purl.obolibrary.org/obo/BFO_0000050:6353, http://purl.obolibrary.org/obo/RO_0004003:4583, http://purl.obolibrary.org/obo/RO_0004026:2707, average degree 3.485916345339166, 5 highest degree nodes: http://purl.obolibrary.org/obo/SO_0001217:12168, http://purl.obolibrary.org/obo/SO_0000704:3848, http://purl.obolibrary.org/obo/MONDO_0003847:2111, https://www.mirbase.org/hairpin/MI0000085:1799, https://www.mirbase.org/hairpin/MI0000268:1379, https://www.mirbase.org/hairpin/MI0000063:1372, density: 6.02766002444869e-05, 1 component(s): {0: 57833}
Purifying Graph Based on Construction Approach
Determining what triples need purification
Processing 87768 http://www.w3.org/2000/01/rdf-schema#subClassO

100%|██████████| 87768/87768 [00:11<00:00, 7331.12it/s] 


Serializing Instance-Purified OWL-NETS Graph
Converting Knowledge Graph to MultiDiGraph


100%|██████████| 345586/345586 [00:08<00:00, 40390.39it/s]


Pickling MultiDiGraph
Generating Network Statistics
Instance-Purified OWL-NETS Graph Stats: 57833 nodes, 345586 edges, 19 self-loops, 5 most most common edges: http://www.w3.org/1999/02/22-rdf-syntax-ns#type:232122, http://purl.obolibrary.org/obo/RO_0002434:74414, http://purl.obolibrary.org/obo/RO_0003302:8690, http://purl.obolibrary.org/obo/BFO_0000050:6353, http://purl.obolibrary.org/obo/RO_0004003:4583, http://purl.obolibrary.org/obo/RO_0004026:2707, average degree 5.975584873688033, 5 highest degree nodes: http://purl.obolibrary.org/obo/MONDO_0000001:12926, http://purl.obolibrary.org/obo/SO_0001217:12171, http://purl.obolibrary.org/obo/MONDO_0700096:8280, http://purl.obolibrary.org/obo/SO_0000110:3864, http://purl.obolibrary.org/obo/SO_0000001:3859, http://purl.obolibrary.org/obo/SO_0001411:3858, density: 0.00010332661629699877, 1 component(s): {0: 57833}


OWL-NETS Graph Stats: 201601 triples, 57833 nodes, 220 predicates, 0 classes, 0 individuals, 0 object props, 0 annotation prop

100%|██████████| 891219/891219 [00:09<00:00, 89999.76it/s]


Writing Class Metadata


100%|██████████| 2/2 [00:10<00:00,  5.37s/it]
100%|██████████| 891219/891219 [00:00<00:00, 970228.61it/s] 
100%|██████████| 295500/295500 [00:02<00:00, 115866.00it/s]



*** Processing OWL-NETS Graph ***
Mapping Node and Relation Identifiers to Integers


100%|██████████| 201601/201601 [00:02<00:00, 100702.94it/s]


Writing Class Metadata


100%|██████████| 2/2 [00:10<00:00,  5.16s/it]
100%|██████████| 201601/201601 [00:00<00:00, 754513.63it/s]
100%|██████████| 58053/58053 [00:00<00:00, 97057.81it/s]



*** Processing Purified OWL-NETS Graph ***
Mapping Node and Relation Identifiers to Integers


100%|██████████| 345586/345586 [00:03<00:00, 99917.30it/s] 


Writing Class Metadata


100%|██████████| 2/2 [00:10<00:00,  5.04s/it]
100%|██████████| 345586/345586 [00:00<00:00, 978883.66it/s]
100%|██████████| 58052/58052 [00:00<00:00, 101427.11it/s]


Depduplicating File: ./resources/knowledge_graphs/PheKnowLator_v3.1.1_full_instance_inverseRelations_OWL_AnnotationsOnly.nt


100%|██████████| 3142946/3142946 [00:05<00:00, 542532.68it/s]


Depduplicating File: ./resources/knowledge_graphs/PheKnowLator_v3.1.1_full_instance_inverseRelations_OWL_LogicOnly.nt


100%|██████████| 1309894/1309894 [00:02<00:00, 553836.18it/s]


Merging Files: ./resources/knowledge_graphs/PheKnowLator_v3.1.1_full_instance_inverseRelations_OWL_AnnotationsOnly.nt and ./resources/knowledge_graphs/PheKnowLator_v3.1.1_full_instance_inverseRelations_OWL_LogicOnly.nt


Loading Full (Logic + Annotation) Graph

Deriving Stats

Processing pkt-namespaced BNodes in Full (Logic + Annotation) graph
Identifying BNodes with Namespace: https://github.com/callahantiff/PheKnowLator/pkt/bnode/
Identifying BNodes
Removing Namespace from BNodes
Finalizing Updated Graph
Identifying BNodes with Namespace: https://github.com/callahantiff/PheKnowLator/pkt/
Identifying BNodes
Removing Namespace from BNodes
Finalizing Updated Graph




Applying OWL API Formatting to Knowledge Graph OWL File


Exception in thread "main" java.lang.NoClassDefFoundError: javax/xml/bind/annotation/adapters/HexBinaryAdapter
	at org.openrdf.rio.helpers.RDFParserBase.createBNode(RDFParserBase.java:485)
	at org.openrdf.rio.turtle.TurtleParser.parseNodeID(TurtleParser.java:1149)
	at org.openrdf.rio.turtle.TurtleParser.parseValue(TurtleParser.java:621)
	at org.openrdf.rio.turtle.TurtleParser.parseSubject(TurtleParser.java:448)
	at org.openrdf.rio.turtle.TurtleParser.parseTriples(TurtleParser.java:382)
	at org.openrdf.rio.turtle.TurtleParser.parseStatement(TurtleParser.java:260)
	at org.openrdf.rio.turtle.TurtleParser.parse(TurtleParser.java:215)
	at org.openrdf.rio.turtle.TurtleParser.parse(TurtleParser.java:159)
	at org.semanticweb.owlapi.rio.RioParserImpl.parseDocumentSource(RioParserImpl.java:257)
	at org.semanticweb.owlapi.rio.RioParserImpl.parse(RioParserImpl.java:191)
	at uk.ac.manchester.cs.owl.owlapi.OWLOntologyFactoryImpl.loadOWLOntology(OWLOntologyFactoryImpl.java:197)
	at uk.ac.manchester.c

None

Full (Logic + Annotation) Graph Stats: 4452840 triples, 1310942 nodes, 118 predicates, 75086 classes, 98932 individuals, 730 object props, 217 annotation props
[2m[36m(OwlNets pid=2996975)[0m Filtering Triples[32m [repeated 31x across cluster][0m
[2m[36m(OwlNets pid=2996975)[0m Decoding 13 OWL Classes and Axioms[32m [repeated 31x across cluster][0m


***

# Visualize the just obtained RNA-Knowledge Graph  <a class="anchor" id="build-kg"></a>
Here we show an example of the just generated KG, trying to exploit RDF functionalities. To begin, we need to convert the OWL KG into a Pandas dataframe.

In [16]:
import pandas as pd
from rdflib import Graph, URIRef, Literal

# Load the N-Triples data into an RDFLib graph
g = Graph()
with open('./resources/knowledge_graphs/PheKnowLator_v3.1.1_full_instance_inverseRelations_OWL.nt', 'r') as f:
    data = f.read()
    g.parse(data=data, format='nt')

triples = []

# Iterate through the RDF graph and extract the triples
for subj, pred, obj in g:
    triples.append((subj, pred, obj))

# Convert the list of triples into a Pandas DataFrame
df = pd.DataFrame(triples, columns=['Subject', 'Predicate', 'Object'])
df

Unnamed: 0,Subject,Predicate,Object
0,N8cdcd871cd5448678fa8f03f4eb7e81a,http://www.geneontology.org/formats/oboInOwl#s...,MONDO:equivalentTo
1,Ne8ef1ff876324a4ebcb42220128229a2,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.w3.org/2002/07/owl#Axiom
2,N3f030c68103c4ace89788e0b81177395,http://www.w3.org/2002/07/owl#annotatedTarget,bone marrow disease
3,N9594214d148b467d8e0559c8192fbb07,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.w3.org/2002/07/owl#Axiom
4,Nad8945466855440e86f902adb227319b,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.w3.org/2002/07/owl#Axiom
...,...,...,...
4452835,N8f6c883e85254f308ec53c4624200fc9,http://www.w3.org/2002/07/owl#annotatedProperty,http://www.geneontology.org/formats/oboInOwl#h...
4452836,http://purl.obolibrary.org/obo/MONDO_0012817,http://www.geneontology.org/formats/oboInOwl#h...,UMLS:C3489398
4452837,Nd604ed23342d446d8ccd1c97ab4453a7,http://www.w3.org/2002/07/owl#annotatedTarget,"prostate cancer, hereditary, 13"
4452838,Ne5449de1add246fea5138c6257ef112b,http://www.w3.org/1999/02/22-rdf-syntax-ns#rest,http://www.w3.org/1999/02/22-rdf-syntax-ns#nil


Let us explore the BRAF gene (http://www.ncbi.nlm.nih.gov/gene/673), as an example. BRAF is an oncogene (a gene can become an oncogene due to mutations or increased expression) whose encoded protein is a Serine/Threonine [kinase](https://en.wikipedia.org/wiki/Kinase). The [B-Raf protein](https://www.uniprot.org/uniprotkb/P15056/entry) is involved in sending signals inside cells which are involved in directing cell growth. In 2002, it was shown to be mutated in some human cancers (https://doi.org/10.1038/nature00766).

In [69]:
test_673 = df[df['Subject'].str.endswith("gene/673")]
test_673

Unnamed: 0,Subject,Predicate,Object
1333432,http://www.ncbi.nlm.nih.gov/gene/673,http://www.w3.org/2000/01/rdf-schema#subClassOf,http://purl.obolibrary.org/obo/SO_0000704
2413799,http://www.ncbi.nlm.nih.gov/gene/673,http://www.w3.org/2000/01/rdf-schema#label,BRAF
2931877,http://www.ncbi.nlm.nih.gov/gene/673,http://www.w3.org/2000/01/rdf-schema#subClassOf,http://purl.obolibrary.org/obo/SO_0001217
3475653,http://www.ncbi.nlm.nih.gov/gene/673,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.w3.org/2002/07/owl#Class


BRAF *is_a* (http://www.w3.org/2000/01/rdf-schema#subClassOf) gene (http://purl.obolibrary.org/obo/SO_0000704), specifically a protein coding gene (http://purl.obolibrary.org/obo/SO_0001217).

In [70]:
test_SO_0000704 = df[df['Subject'].str.endswith("SO_0000704")]
test_SO_0000704

Unnamed: 0,Subject,Predicate,Object
460764,http://purl.obolibrary.org/obo/SO_0000704,http://www.w3.org/2000/01/rdf-schema#subClassOf,Nbc2d1b16406a470b8d1d7a458e351cd1
654069,http://purl.obolibrary.org/obo/SO_0000704,http://www.w3.org/2000/01/rdf-schema#label,gene
776765,http://purl.obolibrary.org/obo/SO_0000704,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.w3.org/2002/07/owl#Class
2785128,http://purl.obolibrary.org/obo/SO_0000704,http://www.w3.org/2000/01/rdf-schema#subClassOf,N7d95aee901374565b098ec96f6e61f29
3137131,http://purl.obolibrary.org/obo/SO_0000704,http://www.w3.org/2000/01/rdf-schema#subClassOf,http://purl.obolibrary.org/obo/SO_0001411


A gene *is_a* biological region (http://purl.obolibrary.org/obo/SO_0001411).

Moreover, BRAF is object of the subsequent subjects.

In [71]:
test_673o = df[df['Object'].str.endswith("gene/673")]
test_673o

Unnamed: 0,Subject,Predicate,Object
24734,Nc8f56d6f3ddc4eb2b093e3e2276070c9,http://www.w3.org/2002/07/owl#someValuesFrom,http://www.ncbi.nlm.nih.gov/gene/673
141593,Nc7641b8ae9f74211b44e0b04824d9fd4,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.ncbi.nlm.nih.gov/gene/673
162570,N2f793adb3e0046da97bbf3bcf1eb7504,http://www.w3.org/2002/07/owl#someValuesFrom,http://www.ncbi.nlm.nih.gov/gene/673
240811,Ne13ef0f5dc1b4f6aa274c0b104d54aeb,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.ncbi.nlm.nih.gov/gene/673
354718,N946662813cd445219b7085dfad4345da,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.ncbi.nlm.nih.gov/gene/673
372674,N3d387d80f95c419ba6cda9fc5061ac22,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.ncbi.nlm.nih.gov/gene/673
425589,Nd2f83a5ab8b040fb91341b18d3bbfb05,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.ncbi.nlm.nih.gov/gene/673
605378,N88d1eb310e4f4eceb266e31ebfec5d04,http://www.w3.org/2002/07/owl#someValuesFrom,http://www.ncbi.nlm.nih.gov/gene/673
904917,Nc068d080514147389766675a220667cd,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.ncbi.nlm.nih.gov/gene/673
977060,Ndb58251fee2447a1a3810f87f5cf6dbe,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.ncbi.nlm.nih.gov/gene/673


*Ne13ef0f5dc1b4f6aa274c0b104d54aeb* MD5 hash is subject in specific relationships as follows.

In [72]:
test_N7e = df[df['Subject'].str.contains("Ne13ef0f5dc1b4f6aa274c0b104d54aeb")]
test_N7e

Unnamed: 0,Subject,Predicate,Object
240811,Ne13ef0f5dc1b4f6aa274c0b104d54aeb,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.ncbi.nlm.nih.gov/gene/673
3364139,Ne13ef0f5dc1b4f6aa274c0b104d54aeb,http://purl.obolibrary.org/obo/RO_0003302,Naf981401be4e482897268696873555ff
3686629,Ne13ef0f5dc1b4f6aa274c0b104d54aeb,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.w3.org/2002/07/owl#NamedIndividual


Naf981401be4e482897268696873555ff *corresponds to* (http://www.w3.org/1999/02/22-rdf-syntax-ns#type) BRAF gene (http://www.ncbi.nlm.nih.gov/gene/673) and *causes or contributes to condition* (http://purl.obolibrary.org/obo/RO_0003302), so we are going to focus on the middle row of this dataframe.

In [73]:
test_N4f = df[df['Subject'].str.contains("Naf981401be4e482897268696873555ff")]
test_N4f

Unnamed: 0,Subject,Predicate,Object
1843163,Naf981401be4e482897268696873555ff,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.w3.org/2002/07/owl#NamedIndividual
2888496,Naf981401be4e482897268696873555ff,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://purl.obolibrary.org/obo/MONDO_0009026


[MONDO_0009026](http://purl.obolibrary.org/obo/MONDO_0009026) represents the *Costello syndrome*, a rare multisystemic disorder characterized by failure to thrive, short stature, developmental delay or intellectual disability, joint laxity, soft skin, and distinctive facial features.

Furthermore, let us see if we can extrapolate information about interacting miRNA from our subgraph.

In [74]:
df[df['Object'].str.endswith("gene/673")]

Unnamed: 0,Subject,Predicate,Object
24734,Nc8f56d6f3ddc4eb2b093e3e2276070c9,http://www.w3.org/2002/07/owl#someValuesFrom,http://www.ncbi.nlm.nih.gov/gene/673
141593,Nc7641b8ae9f74211b44e0b04824d9fd4,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.ncbi.nlm.nih.gov/gene/673
162570,N2f793adb3e0046da97bbf3bcf1eb7504,http://www.w3.org/2002/07/owl#someValuesFrom,http://www.ncbi.nlm.nih.gov/gene/673
240811,Ne13ef0f5dc1b4f6aa274c0b104d54aeb,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.ncbi.nlm.nih.gov/gene/673
354718,N946662813cd445219b7085dfad4345da,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.ncbi.nlm.nih.gov/gene/673
372674,N3d387d80f95c419ba6cda9fc5061ac22,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.ncbi.nlm.nih.gov/gene/673
425589,Nd2f83a5ab8b040fb91341b18d3bbfb05,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.ncbi.nlm.nih.gov/gene/673
605378,N88d1eb310e4f4eceb266e31ebfec5d04,http://www.w3.org/2002/07/owl#someValuesFrom,http://www.ncbi.nlm.nih.gov/gene/673
904917,Nc068d080514147389766675a220667cd,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.ncbi.nlm.nih.gov/gene/673
977060,Ndb58251fee2447a1a3810f87f5cf6dbe,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.ncbi.nlm.nih.gov/gene/673


In [75]:
test_Neaf = df[df['Subject'].str.contains("Nc7641b8ae9f74211b44e0b04824d9fd4")]
test_Neaf

Unnamed: 0,Subject,Predicate,Object
141593,Nc7641b8ae9f74211b44e0b04824d9fd4,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.ncbi.nlm.nih.gov/gene/673
1121231,Nc7641b8ae9f74211b44e0b04824d9fd4,http://purl.obolibrary.org/obo/RO_0002434,N8ac94f9d67a9410e878d9d1a371f7f10
3588168,Nc7641b8ae9f74211b44e0b04824d9fd4,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.w3.org/2002/07/owl#NamedIndividual


We are interested in the second relationship: *interacts with* ([RO_0002434](http://purl.obolibrary.org/obo/RO_0002434)).

In [76]:
test_Nae9 = df[df['Subject'].str.contains("N8ac94f9d67a9410e878d9d1a371f7f10")]
test_Nae9

Unnamed: 0,Subject,Predicate,Object
1856251,N8ac94f9d67a9410e878d9d1a371f7f10,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.w3.org/2002/07/owl#NamedIndividual
2986618,N8ac94f9d67a9410e878d9d1a371f7f10,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,https://www.mirbase.org/hairpin/MI0000433
3809591,N8ac94f9d67a9410e878d9d1a371f7f10,http://purl.obolibrary.org/obo/RO_0002434,Nc7641b8ae9f74211b44e0b04824d9fd4


BRAF interacts with the [MI0000433](https://www.mirbase.org/hairpin/MI0000433) miRNA precursor (aka *hsa-let-7g*).

Finally, we can save this RNA-KG subgraph to visualize it.

In [78]:
test_673['Subject'] = '<' + test_673['Subject'].astype(str) + '>'
test_673['Predicate'] = '<' + test_673['Predicate'].astype(str) + '>'
test_673['Object'] = ['<http://purl.obolibrary.org/obo/SO_0000704>',
                      "BRAF",
                      '<http://purl.obolibrary.org/obo/SO_0001217>',
                      '<http://www.w3.org/2002/07/owl#Class>']

test_673

Unnamed: 0,Subject,Predicate,Object
1333432,<http://www.ncbi.nlm.nih.gov/gene/673>,<http://www.w3.org/2000/01/rdf-schema#subClassOf>,<http://purl.obolibrary.org/obo/SO_0000704>
2413799,<http://www.ncbi.nlm.nih.gov/gene/673>,<http://www.w3.org/2000/01/rdf-schema#label>,BRAF
2931877,<http://www.ncbi.nlm.nih.gov/gene/673>,<http://www.w3.org/2000/01/rdf-schema#subClassOf>,<http://purl.obolibrary.org/obo/SO_0001217>
3475653,<http://www.ncbi.nlm.nih.gov/gene/673>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://www.w3.org/2002/07/owl#Class>


In [80]:
test_SO_0000704['Subject'] = '<' + test_SO_0000704['Subject'].astype(str) + '>'
test_SO_0000704['Predicate'] = '<' + test_SO_0000704['Predicate'].astype(str) + '>'
test_SO_0000704['Object'] = ['_:Nbc2d1b16406a470b8d1d7a458e351cd1',
                             "gene",
                             '<http://www.w3.org/2002/07/owl#Class>',
                             '_:N7d95aee901374565b098ec96f6e61f29',
                             '<http://purl.obolibrary.org/obo/SO_0001411>'
                             ]
    
test_SO_0000704

Unnamed: 0,Subject,Predicate,Object
460764,<http://purl.obolibrary.org/obo/SO_0000704>,<http://www.w3.org/2000/01/rdf-schema#subClassOf>,_:Nbc2d1b16406a470b8d1d7a458e351cd1
654069,<http://purl.obolibrary.org/obo/SO_0000704>,<http://www.w3.org/2000/01/rdf-schema#label>,gene
776765,<http://purl.obolibrary.org/obo/SO_0000704>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://www.w3.org/2002/07/owl#Class>
2785128,<http://purl.obolibrary.org/obo/SO_0000704>,<http://www.w3.org/2000/01/rdf-schema#subClassOf>,_:N7d95aee901374565b098ec96f6e61f29
3137131,<http://purl.obolibrary.org/obo/SO_0000704>,<http://www.w3.org/2000/01/rdf-schema#subClassOf>,<http://purl.obolibrary.org/obo/SO_0001411>


In [82]:
test_673o['Subject'] = '_:' + test_673o['Subject'].astype(str)
test_673o['Predicate'] = '<' + test_673o['Predicate'].astype(str) + '>'
test_673o['Object'] = '<' + test_673o['Object'].astype(str) + '>'

test_673o

Unnamed: 0,Subject,Predicate,Object
24734,_:Nc8f56d6f3ddc4eb2b093e3e2276070c9,<http://www.w3.org/2002/07/owl#someValuesFrom>,<http://www.ncbi.nlm.nih.gov/gene/673>
141593,_:Nc7641b8ae9f74211b44e0b04824d9fd4,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://www.ncbi.nlm.nih.gov/gene/673>
162570,_:N2f793adb3e0046da97bbf3bcf1eb7504,<http://www.w3.org/2002/07/owl#someValuesFrom>,<http://www.ncbi.nlm.nih.gov/gene/673>
240811,_:Ne13ef0f5dc1b4f6aa274c0b104d54aeb,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://www.ncbi.nlm.nih.gov/gene/673>
354718,_:N946662813cd445219b7085dfad4345da,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://www.ncbi.nlm.nih.gov/gene/673>
372674,_:N3d387d80f95c419ba6cda9fc5061ac22,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://www.ncbi.nlm.nih.gov/gene/673>
425589,_:Nd2f83a5ab8b040fb91341b18d3bbfb05,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://www.ncbi.nlm.nih.gov/gene/673>
605378,_:N88d1eb310e4f4eceb266e31ebfec5d04,<http://www.w3.org/2002/07/owl#someValuesFrom>,<http://www.ncbi.nlm.nih.gov/gene/673>
904917,_:Nc068d080514147389766675a220667cd,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://www.ncbi.nlm.nih.gov/gene/673>
977060,_:Ndb58251fee2447a1a3810f87f5cf6dbe,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://www.ncbi.nlm.nih.gov/gene/673>


In [83]:
test_N7e['Subject'] = '_:' + test_N7e['Subject'].astype(str)
test_N7e['Predicate'] = '<' + test_N7e['Predicate'].astype(str) + '>'
test_N7e['Object'] = ['<http://www.ncbi.nlm.nih.gov/gene/673>',
                      '_:Naf981401be4e482897268696873555ff',
                      '<http://www.w3.org/2002/07/owl#NamedIndividual>']

test_N7e

Unnamed: 0,Subject,Predicate,Object
240811,_:Ne13ef0f5dc1b4f6aa274c0b104d54aeb,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://www.ncbi.nlm.nih.gov/gene/673>
3364139,_:Ne13ef0f5dc1b4f6aa274c0b104d54aeb,<http://purl.obolibrary.org/obo/RO_0003302>,_:Naf981401be4e482897268696873555ff
3686629,_:Ne13ef0f5dc1b4f6aa274c0b104d54aeb,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://www.w3.org/2002/07/owl#NamedIndividual>


In [84]:
test_N4f['Subject'] = '_:' + test_N4f['Subject'].astype(str)
test_N4f['Predicate'] = '<' + test_N4f['Predicate'].astype(str) + '>'
test_N4f['Object'] = '<' + test_N4f['Object'].astype(str) + '>'

test_N4f

Unnamed: 0,Subject,Predicate,Object
1843163,_:Naf981401be4e482897268696873555ff,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://www.w3.org/2002/07/owl#NamedIndividual>
2888496,_:Naf981401be4e482897268696873555ff,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://purl.obolibrary.org/obo/MONDO_0009026>


In [85]:
test_Nae9['Subject'] = '_:' + test_Nae9['Subject'].astype(str)
test_Nae9['Predicate'] = '<' + test_Nae9['Predicate'].astype(str) + '>'
test_Nae9['Object'] = ['<http://www.w3.org/2002/07/owl#NamedIndividual>',
                       '<https://www.mirbase.org/hairpin/MI0000433>',
                       '_:Nc7641b8ae9f74211b44e0b04824d9fd4',
                       ]

test_Nae9

Unnamed: 0,Subject,Predicate,Object
1856251,_:N8ac94f9d67a9410e878d9d1a371f7f10,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://www.w3.org/2002/07/owl#NamedIndividual>
2986618,_:N8ac94f9d67a9410e878d9d1a371f7f10,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<https://www.mirbase.org/hairpin/MI0000433>
3809591,_:N8ac94f9d67a9410e878d9d1a371f7f10,<http://purl.obolibrary.org/obo/RO_0002434>,_:Nc7641b8ae9f74211b44e0b04824d9fd4


In [86]:
subgraph = pd.concat([test_673, test_SO_0000704, test_673o, test_N7e, test_N4f, test_Nae9
          ], axis=0)
subgraph['.']='.'
subgraph.to_csv('./resources/knowledge_graphs/RNAsubgraph.nt', header=None, sep=' ', index=None)

We can copy-paste some triples from *RNAsubgraph.nt* https://www.ldf.fi/service/rdf-grapher to graphically visualize it.

<a target="_blank" href="https://github.com/emanuelecavalleri/testRNA-KG/assets/33032169/ea813f50-35f4-4242-9cbc-62ffef62bee1"> <img src="https://github.com/emanuelecavalleri/testRNA-KG/assets/33032169/ea813f50-35f4-4242-9cbc-62ffef62bee1"></a> 

(*Click Figure to Enlarge Image in Current Browser Tab*)

https://issemantic.net/rdf-visualizer allows us to dynamically inspect the graph above (in the figure below you can find a screenshot of the output).

<a target="_blank" href="https://github.com/emanuelecavalleri/testRNA-KG/assets/33032169/9c1a6d89-f553-49f4-a9b9-78cc41b269b2"> <img src="https://github.com/emanuelecavalleri/testRNA-KG/assets/33032169/9c1a6d89-f553-49f4-a9b9-78cc41b269b2"></a> 

(*Click Figure to Enlarge Image in Current Browser Tab*)

***
# OWL-NETS

From a machine learning perspective, extracting information from triples that deviate from the conventional "subject-predicate-object" pattern can be challenging due to the inclusion of biologically non-relevant information. OWL-NETS ([NEtwork Transformation for Statistical learning](https://pubmed.ncbi.nlm.nih.gov/29218876/)) is a computational method that reversibly abstracts OWL-encoded biomedical knowledge into a network representation tailored for network inference. OWL-NETS can leverage existing ontology-based knowledge representations and network inference methods to generate novel, biologically-relevant hypotheses. Further, the lossless transformation of OWL-NETS allows for seamless integration of inferred edges back into the original knowledge base, extending its coverage and completeness. 

In [88]:
owlnets = pd.read_csv('./resources/knowledge_graphs/PheKnowLator_v3.1.1_full_instance_inverseRelations_OWLNETS.nt', sep='\s',
                      header=None, engine='python')
owlnets

Unnamed: 0,0,1,2,3
0,<http://purl.obolibrary.org/obo/UBERON_0001156>,<http://purl.obolibrary.org/obo/RO_0002150>,<http://purl.obolibrary.org/obo/UBERON_0001157>,.
1,<http://purl.obolibrary.org/obo/MONDO_0800306>,<http://www.w3.org/2000/01/rdf-schema#subClassOf>,<http://purl.obolibrary.org/obo/MONDO_0020074>,.
2,<https://www.mirbase.org/hairpin/MI0003814>,<http://purl.obolibrary.org/obo/RO_0002434>,<http://www.ncbi.nlm.nih.gov/gene/10847>,.
3,<http://purl.obolibrary.org/obo/ENVO_01001548>,<http://purl.obolibrary.org/obo/BFO_0000051>,<http://purl.obolibrary.org/obo/ENVO_01001618>,.
4,<http://purl.obolibrary.org/obo/MONDO_0034186>,<http://www.w3.org/2000/01/rdf-schema#subClassOf>,<http://purl.obolibrary.org/obo/MONDO_0006025>,.
...,...,...,...,...
320117,<http://purl.obolibrary.org/obo/ECTO_0000657>,<http://www.w3.org/2000/01/rdf-schema#subClassOf>,<http://purl.obolibrary.org/obo/ECTO_0000544>,.
320118,<http://purl.obolibrary.org/obo/MONDO_0019226>,<http://www.w3.org/2000/01/rdf-schema#subClassOf>,<http://purl.obolibrary.org/obo/MONDO_0017706>,.
320119,<http://purl.obolibrary.org/obo/CHR_9606-chr6p22>,<http://www.w3.org/2000/01/rdf-schema#subClassOf>,<http://purl.obolibrary.org/obo/GO_0098687>,.
320120,<http://www.ncbi.nlm.nih.gov/gene/26099>,<http://purl.obolibrary.org/obo/RO_0002434>,<https://www.mirbase.org/hairpin/MI0016848>,.


In [89]:
owlnets[owlnets[0].str.endswith("gene/673>")]

Unnamed: 0,0,1,2,3
9537,<http://www.ncbi.nlm.nih.gov/gene/673>,<http://www.w3.org/2000/01/rdf-schema#subClassOf>,<http://purl.obolibrary.org/obo/SO_0000704>,.
37003,<http://www.ncbi.nlm.nih.gov/gene/673>,<http://purl.obolibrary.org/obo/RO_0002434>,<https://www.mirbase.org/hairpin/MI0000454>,.
46010,<http://www.ncbi.nlm.nih.gov/gene/673>,<http://purl.obolibrary.org/obo/RO_0002434>,<https://www.mirbase.org/hairpin/MI0000681>,.
47864,<http://www.ncbi.nlm.nih.gov/gene/673>,<http://purl.obolibrary.org/obo/RO_0003302>,<http://purl.obolibrary.org/obo/MONDO_0016689>,.
65374,<http://www.ncbi.nlm.nih.gov/gene/673>,<http://www.w3.org/2000/01/rdf-schema#subClassOf>,<http://purl.obolibrary.org/obo/SO_0001217>,.
83365,<http://www.ncbi.nlm.nih.gov/gene/673>,<http://purl.obolibrary.org/obo/RO_0003302>,<http://purl.obolibrary.org/obo/MONDO_0007893>,.
114042,<http://www.ncbi.nlm.nih.gov/gene/673>,<http://purl.obolibrary.org/obo/RO_0002434>,<https://www.mirbase.org/hairpin/MI0000095>,.
122435,<http://www.ncbi.nlm.nih.gov/gene/673>,<http://purl.obolibrary.org/obo/RO_0002434>,<https://www.mirbase.org/hairpin/MI0000433>,.
123699,<http://www.ncbi.nlm.nih.gov/gene/673>,<http://purl.obolibrary.org/obo/RO_0003302>,<http://purl.obolibrary.org/obo/MONDO_0018997>,.
125461,<http://www.ncbi.nlm.nih.gov/gene/673>,<http://purl.obolibrary.org/obo/RO_0003302>,<http://purl.obolibrary.org/obo/MONDO_0017844>,.


During this process, it appears that properties may be lost, but we can store them in another way.

In [90]:
properties = pd.read_csv('./resources/knowledge_graphs/PheKnowLator_v3.1.1_full_instance_inverseRelations_OWLNETS_NodeLabels.txt', sep='\t')
properties

Unnamed: 0,entity_type,integer_id,entity_uri,label,description/definition,synonym
0,NODES,2482,<http://www.ncbi.nlm.nih.gov/gene/3155>,HMGCL,,
1,NODES,44082,<http://purl.obolibrary.org/obo/UBERON_0007284>,presumptive neural plate,,
2,NODES,58049,<http://purl.obolibrary.org/obo/HP_0002902>,Hyponatremia,,
3,NODES,35762,<http://purl.obolibrary.org/obo/HP_0000090>,Nephronophthisis,,
4,NODES,14617,<http://purl.obolibrary.org/obo/UBERON_0006230>,extrinsic ocular pre-muscle mass,,
...,...,...,...,...,...,...
58048,NODES,46139,<http://purl.obolibrary.org/obo/CL_0009042>,enteroendocrine cell of colon,,
58049,NODES,45797,<http://purl.obolibrary.org/obo/MONDO_0007709>,"hematuria, benign familial",,"hematuria, benign familial|hematuria, familial..."
58050,NODES,54071,<http://purl.obolibrary.org/obo/NCBITaxon_3052...,Orthobunyavirus oropoucheense,,
58051,NODES,30583,<http://purl.obolibrary.org/obo/UBERON_0005203>,trachea gland,,


In [91]:
properties[properties['entity_uri'] == '<http://www.ncbi.nlm.nih.gov/gene/673>']

Unnamed: 0,entity_type,integer_id,entity_uri,label,description/definition,synonym
14952,NODES,5714,<http://www.ncbi.nlm.nih.gov/gene/673>,BRAF,,


In [92]:
properties[properties['entity_uri'] == '<http://www.w3.org/2000/01/rdf-schema#subClassOf>']     

Unnamed: 0,entity_type,integer_id,entity_uri,label,description/definition,synonym
46364,RELATIONS,5,<http://www.w3.org/2000/01/rdf-schema#subClassOf>,subClassOf,The subject is a subclass of a class.,


In [93]:
properties[properties['entity_uri'] == '<http://purl.obolibrary.org/obo/RO_0002434>']

Unnamed: 0,entity_type,integer_id,entity_uri,label,description/definition,synonym
50396,RELATIONS,2,<http://purl.obolibrary.org/obo/RO_0002434>,interacts with,A relationship that holds between two entities...,in pairwise interaction with


In [94]:
list(properties[properties['entity_uri'] == '<http://purl.obolibrary.org/obo/RO_0002434>']['description/definition'])

['A relationship that holds between two entities in which the processes executed by the two entities are causally connected.']

In [95]:
properties[properties['entity_uri'] == '<http://purl.obolibrary.org/obo/RO_0003302>']

Unnamed: 0,entity_type,integer_id,entity_uri,label,description/definition,synonym
49417,RELATIONS,34,<http://purl.obolibrary.org/obo/RO_0003302>,causes or contributes to condition,A relationship between an entity (e.g. a genot...,


In [96]:
properties[properties['entity_uri'] == '<https://www.mirbase.org/hairpin/MI0000433>']

Unnamed: 0,entity_type,integer_id,entity_uri,label,description/definition,synonym
48755,NODES,736,<https://www.mirbase.org/hairpin/MI0000433>,hsa-let-7g,let-7g-3p cloned in [2] has a 1 nt 3' extensio...,Homo sapiens let-7g stem-loop


In [97]:
properties[properties['entity_uri'] == '<http://purl.obolibrary.org/obo/MONDO_0009026>']

Unnamed: 0,entity_type,integer_id,entity_uri,label,description/definition,synonym
27859,NODES,5411,<http://purl.obolibrary.org/obo/MONDO_0009026>,Costello syndrome,Costello syndrome (CS) is a rare multisystemic...,Costello syndrome|FCS syndrome|congenital myop...
