_____
***

<img width='700' src="https://user-images.githubusercontent.com/8030363/108961534-b9a66980-7634-11eb-96e2-cc46589dcb8c.png" style="vertical-align:middle">

***
***

**Author:** [TJCallahan](https://mail.google.com/mail/u/0/?view=cm&fs=1&tf=1&to=callahantiff@gmail.com)  
**GitHub Repository:** [PheKnowLator](https://github.com/callahantiff/PheKnowLator/wiki)  
**Current Release:** **[`v2.0.0`](https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0)**

<br>

**Objective:** Knowledge graphs provide meaningful ways to integrate heterogeneous biological data and represent complex biological mechanisms. This work seeks to explore the utility of incorporating existing knowledge of molecular mechanism from ontologies, publicly available data, and the literature to construct a biomedical knowledge graph that models unbiased molecular mechanisms of human disease.

<a target="_blank" href="https://user-images.githubusercontent.com/8030363/103158881-11813b00-4780-11eb-8b45-5063765e7645.png"> <img src="https://user-images.githubusercontent.com/8030363/103158881-11813b00-4780-11eb-8b45-5063765e7645.png"></a> 

(*Click Figure to Enlarge Image in Current Browser Tab*)

<br>

***
***

## Notebook Purpose
**Wiki Page:** **[`Release v2.0.0`](https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0)**

<br>

**Purpose:** This notebook serves as a `main` file for the PheKnowLator project. This scripts walks through this program step-by-step and generates the knowledge graph shown above. There is also a command line version of this file ([`main.py`](https://github.com/callahantiff/PheKnowLator/blob/master/main.py)). Please see the [README](https://github.com/callahantiff/PheKnowLator/blob/master/README.md) for more information.

<br>

**Assumptions:**     
1. Make sure that the following input documents have been constructed (see the [Dependencies Wiki](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies) for more information):  
  - [`resource_info.txt`](https://github.com/callahantiff/PheKnowLator/blob/master/resources/resource_info.txt)
  - [`ontology_source_list.txt`](https://github.com/callahantiff/PheKnowLator/blob/master/resources/ontology_source_list.txt)
  - [`edge_source_list.txt`](https://github.com/callahantiff/PheKnowLator/blob/master/resources/edge_source_list.txt)   

2. Prepare [relations](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#relations-data) and [node metadata](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#node-metadata) files prior to running the scripts.  

3. Select a knowledge graph build type (i.e. `full`, `partial`, or `post-closure`) and construction method (i.e. `instance-based` or `subclass-based`).  

<br>

***
### Table of Contents
***
The three primary steps involved in building a knowledge graph are `Downloading Data Sources`, `Creating Edge Lists`, and `Building the knowledge graphs`.

* [Data Sources](#data-source)  
* [Create Edge Lists](#create-edges)  
* [Build Knowledge Graph](#build-kg)  

***

***

_____
### Set-Up Environment

In [1]:
# !export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
# !export PATH=$JAVA_HOME/bin:$PATH
# !java -version

In [2]:
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-11-openjdk-amd64"
os.environ["PATH"] = f"{os.environ['JAVA_HOME']}/bin:" + os.environ["PATH"]

# Now shell commands will see the updated JAVA_HOME
!java -version

# import needed libraries
import glob
import json
import pandas
import psutil
import ray
import time
import gc

# import module
from pkt_kg.downloads import OntData, LinkedData
from pkt_kg.edge_list import CreatesEdgeList
from pkt_kg.knowledge_graph import FullBuild, PartialBuild, PostClosureBuild

openjdk version "11.0.29" 2025-10-21
OpenJDK Runtime Environment (build 11.0.29+7-post-Ubuntu-1ubuntu124.04)
OpenJDK 64-Bit Server VM (build 11.0.29+7-post-Ubuntu-1ubuntu124.04, mixed mode, sharing)


  from .autonotebook import tqdm as notebook_tqdm
2026-01-08 15:07:13,544	INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.


***
## Download Data Sources <a class="anchor" id="data-source"></a>

**Wiki Page:** **[`Dependencies`](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies)**  

**Purpose:**
This portion of this portion of the algorithm is to download:
1. [Download Ontology Data](#download-ontology-data)  
2. [Download Edge Data](#download-edge-data)   

<br>

**Input Files:**
  - [`resource_info.txt`](https://github.com/callahantiff/PheKnowLator/blob/master/resources/resource_info.txt)
  - [`ontology_source_list.txt`](https://github.com/callahantiff/PheKnowLator/blob/master/resources/ontology_source_list.txt)
  - [`edge_source_list.txt`](https://github.com/callahantiff/PheKnowLator/blob/master/resources/edge_source_list.txt)

<br>

**Assumptions:**  
- All sources used to construct our knowledge graph need to be preprocessed and ready to download prior to running this code. All mapping, filtering, and label data have been generated prior to this step. For assistance with creating these datasets, see the [`Data_Preparation.ipynb`](https://github.com/callahantiff/PheKnowLator/blob/master/notebooks/Data_Preparation.ipynb) Jupyter Notebook.  
- All downloaded and generated data sources for all PheKnowLator builds can be accessed through [Zenodo](https://zenodo.org/doi/10.5281/zenodo.7030039).


***
***
### Ontology Data  <a class="anchor" id="download-ontology-data"></a>
Ontologies are the core data structure used when building PheKnowLator. In the figure above, ontology data are shown in yellow boxes.

In [3]:
ont = OntData('resources/ontology_source_list.txt', 'resources/resource_info.txt')
ont.downloads_data_from_url()

  0%|          | 0/11 [00:00<?, ?it/s]


Downloading: hp_with_imports
['so_with_imports.owl', 'ext_with_imports.owl', 'ontology_source_metadata.txt', 'hp_with_imports.owl', 'go_with_imports.owl', 'chebi_with_imports.owl', 'mondo_with_imports.owl', 'clo_with_imports.owl', 'pr_with_imports.owl', 'pw_with_imports.owl', 'vo_with_imports.owl', 'ro_with_imports.owl', 'README.md'] hp_with_imports
Just copying: 
Downloading: hp_with_imports
 resources/ontologies/hp_with_imports.owl


  9%|▉         | 1/11 [00:08<01:26,  8.68s/it]

The knowledge graph contains 27169 classes, 341782 axioms, 256 object properties, and 0 individuals

Downloading: go_with_imports
['so_with_imports.owl', 'ext_with_imports.owl', 'ontology_source_metadata.txt', 'hp_with_imports.owl', 'go_with_imports.owl', 'chebi_with_imports.owl', 'mondo_with_imports.owl', 'clo_with_imports.owl', 'pr_with_imports.owl', 'pw_with_imports.owl', 'vo_with_imports.owl', 'ro_with_imports.owl', 'README.md'] go_with_imports
Just copying: 
Downloading: go_with_imports
 resources/ontologies/go_with_imports.owl


 18%|█▊        | 2/11 [00:20<01:32, 10.30s/it]

The knowledge graph contains 43832 classes, 509854 axioms, 9 object properties, and 0 individuals

Downloading: mondo_with_imports
['so_with_imports.owl', 'ext_with_imports.owl', 'ontology_source_metadata.txt', 'hp_with_imports.owl', 'go_with_imports.owl', 'chebi_with_imports.owl', 'mondo_with_imports.owl', 'clo_with_imports.owl', 'pr_with_imports.owl', 'pw_with_imports.owl', 'vo_with_imports.owl', 'ro_with_imports.owl', 'README.md'] mondo_with_imports
Just copying: 
Downloading: mondo_with_imports
 resources/ontologies/mondo_with_imports.owl


 27%|██▋       | 3/11 [00:36<01:45, 13.22s/it]

The knowledge graph contains 40975 classes, 612772 axioms, 338 object properties, and 17 individuals

Downloading: vo_with_imports
['so_with_imports.owl', 'ext_with_imports.owl', 'ontology_source_metadata.txt', 'hp_with_imports.owl', 'go_with_imports.owl', 'chebi_with_imports.owl', 'mondo_with_imports.owl', 'clo_with_imports.owl', 'pr_with_imports.owl', 'pw_with_imports.owl', 'vo_with_imports.owl', 'ro_with_imports.owl', 'README.md'] vo_with_imports
Just copying: 
Downloading: vo_with_imports
 resources/ontologies/vo_with_imports.owl


 36%|███▋      | 4/11 [00:38<01:02,  8.87s/it]

The knowledge graph contains 6825 classes, 62120 axioms, 232 object properties, and 167 individuals

Downloading: chebi_with_imports
['so_with_imports.owl', 'ext_with_imports.owl', 'ontology_source_metadata.txt', 'hp_with_imports.owl', 'go_with_imports.owl', 'chebi_with_imports.owl', 'mondo_with_imports.owl', 'clo_with_imports.owl', 'pr_with_imports.owl', 'pw_with_imports.owl', 'vo_with_imports.owl', 'ro_with_imports.owl', 'README.md'] chebi_with_imports
Just copying: 
Downloading: chebi_with_imports
 resources/ontologies/chebi_with_imports.owl


 45%|████▌     | 5/11 [01:11<01:44, 17.37s/it]

The knowledge graph contains 150080 classes, 2719571 axioms, 10 object properties, and 0 individuals

Downloading: ext_with_imports
['so_with_imports.owl', 'ext_with_imports.owl', 'ontology_source_metadata.txt', 'hp_with_imports.owl', 'go_with_imports.owl', 'chebi_with_imports.owl', 'mondo_with_imports.owl', 'clo_with_imports.owl', 'pr_with_imports.owl', 'pw_with_imports.owl', 'vo_with_imports.owl', 'ro_with_imports.owl', 'README.md'] ext_with_imports
Just copying: 
Downloading: ext_with_imports
 resources/ontologies/ext_with_imports.owl


 55%|█████▍    | 6/11 [01:18<01:09, 13.97s/it]

The knowledge graph contains 19096 classes, 266646 axioms, 239 object properties, and 0 individuals

Downloading: clo_with_imports
['so_with_imports.owl', 'ext_with_imports.owl', 'ontology_source_metadata.txt', 'hp_with_imports.owl', 'go_with_imports.owl', 'chebi_with_imports.owl', 'mondo_with_imports.owl', 'clo_with_imports.owl', 'pr_with_imports.owl', 'pw_with_imports.owl', 'vo_with_imports.owl', 'ro_with_imports.owl', 'README.md'] clo_with_imports
Just copying: 
Downloading: clo_with_imports
 resources/ontologies/clo_with_imports.owl


 64%|██████▎   | 7/11 [01:28<00:50, 12.72s/it]

The knowledge graph contains 44858 classes, 548206 axioms, 112 object properties, and 33 individuals

Downloading: pr_with_imports
['so_with_imports.owl', 'ext_with_imports.owl', 'ontology_source_metadata.txt', 'hp_with_imports.owl', 'go_with_imports.owl', 'chebi_with_imports.owl', 'mondo_with_imports.owl', 'clo_with_imports.owl', 'pr_with_imports.owl', 'pw_with_imports.owl', 'vo_with_imports.owl', 'ro_with_imports.owl', 'README.md'] pr_with_imports
Just copying: 
Downloading: pr_with_imports
 resources/ontologies/pr_with_imports.owl


 73%|███████▎  | 8/11 [01:40<00:36, 12.23s/it]

The knowledge graph contains 117081 classes, 1385427 axioms, 12 object properties, and 0 individuals

Downloading: so_with_imports
['so_with_imports.owl', 'ext_with_imports.owl', 'ontology_source_metadata.txt', 'hp_with_imports.owl', 'go_with_imports.owl', 'chebi_with_imports.owl', 'mondo_with_imports.owl', 'clo_with_imports.owl', 'pr_with_imports.owl', 'pw_with_imports.owl', 'vo_with_imports.owl', 'ro_with_imports.owl', 'README.md'] so_with_imports
Just copying: 
Downloading: so_with_imports
 resources/ontologies/so_with_imports.owl


 82%|████████▏ | 9/11 [01:41<00:17,  8.88s/it]

The knowledge graph contains 2363 classes, 23204 axioms, 50 object properties, and 0 individuals

Downloading: pw_with_imports
['so_with_imports.owl', 'ext_with_imports.owl', 'ontology_source_metadata.txt', 'hp_with_imports.owl', 'go_with_imports.owl', 'chebi_with_imports.owl', 'mondo_with_imports.owl', 'clo_with_imports.owl', 'pr_with_imports.owl', 'pw_with_imports.owl', 'vo_with_imports.owl', 'ro_with_imports.owl', 'README.md'] pw_with_imports
Just copying: 
Downloading: pw_with_imports
 resources/ontologies/pw_with_imports.owl


 91%|█████████ | 10/11 [01:43<00:06,  6.58s/it]

The knowledge graph contains 2600 classes, 21868 axioms, 1 object properties, and 0 individuals

Downloading: ro_with_imports
['so_with_imports.owl', 'ext_with_imports.owl', 'ontology_source_metadata.txt', 'hp_with_imports.owl', 'go_with_imports.owl', 'chebi_with_imports.owl', 'mondo_with_imports.owl', 'clo_with_imports.owl', 'pr_with_imports.owl', 'pw_with_imports.owl', 'vo_with_imports.owl', 'ro_with_imports.owl', 'README.md'] ro_with_imports
Just copying: 
Downloading: ro_with_imports
 resources/ontologies/ro_with_imports.owl


100%|██████████| 11/11 [01:44<00:00,  9.47s/it]


The knowledge graph contains 69 classes, 5823 axioms, 600 object properties, and 5 individuals

*** Generating Metadata ***



100%|██████████| 11/11 [00:00<00:00, 54215.45it/s]
100%|██████████| 11/11 [00:00<00:00, 168384.47it/s]


In [4]:
gc.collect()

187

<br>

### Edge Data   <a class="anchor" id="download-edge-data"></a>
In PheKnowLator, classes are nodes that originate from ontologies. Class data sources are Linked Data sources that are used to create edges in the knowledge graph and thus can connect to other class data sources. Sometimes we want to add data that is not already part of an ontology. In that case, data either be added as an `instance` of an existing ontology class or as its own `owl:class` by being added to the knowledge graph as a `subclass` of an existing `owl:class`.

In [5]:
edges = LinkedData('resources/edge_source_list.txt', 'resources/resource_info.txt')
edges.downloads_data_from_url()


*** Downloading Data: edge_source_list to "resources/edge_data/" ***



  0%|          | 0/33 [00:00<?, ?it/s]


Edge: chemical-disease
Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: chemical-gene
Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: chemical-gobp


  9%|▉         | 3/33 [00:01<00:14,  2.10it/s]

Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: chemical-gocc
Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: chemical-gomf


 15%|█▌        | 5/33 [00:03<00:24,  1.16it/s]

Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: chemical-pathway
Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: chemical-phenotype


 21%|██        | 7/33 [00:06<00:27,  1.05s/it]

Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: chemical-protein


 55%|█████▍    | 18/33 [00:08<00:04,  3.38it/s]

Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: disease-phenotype
Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: gene-disease
Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: gene-gene
Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: gene-pathway
Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: gene-phenotype
Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: gene-protein
Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: gene-rna
Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: gobp-pathway
Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: pathway-gocc
Just copying: 
*** Downloading Data: edge_source_list t

 64%|██████▎   | 21/33 [00:08<00:02,  4.27it/s]

Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: protein-catalyst
Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: protein-cell
Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: protein-cofactor
Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: protein-gobp
Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: protein-gocc


 73%|███████▎  | 24/33 [00:09<00:02,  4.20it/s]

Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: protein-gomf
Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: protein-pathway
Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: protein-protein
Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: rna-anatomy
Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: rna-cell
Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: rna-protein
Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: variant-disease


 94%|█████████▍| 31/33 [00:31<00:03,  1.63s/it]

Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: variant-gene


100%|██████████| 33/33 [00:50<00:00,  1.54s/it]


Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


Edge: variant-phenotype
Just copying: 
*** Downloading Data: edge_source_list to "resources/edge_data/" ***


*** Generating Metadata ***



100%|██████████| 33/33 [00:00<00:00, 24359.74it/s]
100%|██████████| 33/33 [00:00<00:00, 72771.84it/s]


<br>

***

## Create Edge Lists <a class="anchor" id="create-edges"></a>

**Wiki Page:** **[`Data Sources`](https://github.com/callahantiff/PheKnowLator/wiki/v2-Data-Sources)**

<br>

**Purpose:** The code below will take the dictionaries of processed data described above and use them to create edge lists for each of the edge types specified in the [`resource_info.txt`](https://github.com/callahantiff/PheKnowLator/blob/master/resources/resource_info.txt). Each edge list will be appended to a nested dictionary (see details below).

<br>

**Assumptions:**  
1. All `ontology` and `edge` data sources have been downloaded.   

2. All code in the [`Data_Preparation.ipynb`](https://github.com/callahantiff/PheKnowLator/blob/master/notebooks/Data_Preparation.ipynb) Jupyter Notebook has been run. This Notebook contains code needed to generate all mapping, filtering, and label data.

<br>

**Output:** `Master_Edge_List_Dict.json`. Below is an example of what the `Master Edge Dictionary` contains for each processed resource:  
```python
master_edges = {'chemical-disease'  :
                {'source_labels'    : ';MESH_;',
                 'data_type'        : 'class-class',
                 'edge_relation'    : 'RO_0002606',
                 'uri'              : ('http://purl.obolibrary.org/obo/',
                                       'http://purl.obolibrary.org/obo/'),
                 'delimiter'        : '#',
                 'column_idx'       : '1;4',
                 'identifier_maps'  : '0:./MESH_CHEBI_MAP.txt;1:disease-dbxref-map',
                 'evidence_criteria': "5;!=;' ",
                 'filter_criteria'  : 'None',
                 'edge_list'        : ['...']}
```

***

In [6]:
# set-up environment for parallel processing -- even if running program serially these steps are needed
cpus = psutil.cpu_count(logical=True)
ray.init()

2026-01-08 15:09:55,228	INFO worker.py:2007 -- Started a local Ray instance.


0,1
Python version:,3.12.9
Ray version:,2.53.0


In [7]:
gc.collect()

19

In [None]:
# combine data sources
combined_edges = dict(edges.data_files, **ont.data_files)
resource_info_loc = './resources/resource_info.txt'

del edges
del ont
gc.collect()

0

[36m(pid=gcs_server)[0m [2026-01-08 15:10:24,496 E 35800 35800] (gcs_server) gcs_server.cc:303: Failed to establish connection to the event+metrics exporter agent. Events and metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14
[33m(raylet)[0m [2026-01-08 15:10:25,180 E 35892 35892] (raylet) main.cc:1032: Failed to establish connection to the metrics exporter agent. Metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14
[36m(pid=35936)[0m [2026-01-08 15:10:25,879 E 35936 35990] core_worker_process.cc:842: Failed to establish connection to the metrics exporter agent. Metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14
[2026-01-08 15:10:26,240 E 33920 35935] core_worker_process.cc:842: Failed to establish connection to the metrics exporter agent. Met

In [None]:
# initialize edge dictionary class
master_edges = CreatesEdgeList(data_files=combined_edges, source_file=resource_info_loc)
master_edges.runs_creates_knowledge_graph_edges(source_file=resource_info_loc, data_files=combined_edges, cpus=cpus)

**Preview Master Edge Data**  
Generate a table that includes each `edge-type`, its primary `relation`, example identifiers, and count of unique edges.

In [None]:
# # read in master edge dictionary
master_edges = json.load(open('resources/Master_Edge_List_Dict.json', 'r'))

# read in relation data
relation_data = open('./resources/relations_data/RELATIONS_LABELS.txt').readlines()
relation_dict = {x.split('\t')[0]: x.split('\t')[1].strip('\n') for x in relation_data}

# print basic stats on each resource
edge_data = [[key, master_edges[key]['edge_relation'],
              ', '.join(master_edges[key]['edge_list'][0]),
              len(master_edges[key]['edge_list'])]
             for key in master_edges.keys()]

# convert dict to pandas df for nice printing
df = pandas.DataFrame(edge_data, columns = ['Edge Type', 'Relation', 'Example Edge', 'Unique Edges']) 
df                

<br><br>

***

## Build Knowledge Graph  <a class="anchor" id="build-kg"></a>
**Wiki Pages:**  
- **[`KG-Construction`](https://github.com/callahantiff/PheKnowLator/wiki/KG-Construction)**  
- **[`relations-data`](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#relations-data)**  
- **[`node-metadata`](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#node-metadata)** 

**Jupyter Notebooks:**  
- [`Data_Preparation.ipynb`](https://github.com/callahantiff/PheKnowLator/blob/master/notebooks/Data_Preparation.ipynb)  
[`Ontology_Cleaning.ipynb`](https://github.com/callahantiff/PheKnowLator/blob/master/notebooks/Ontology_Cleaning.ipynb)  


<br>

**Assumptions:**  
- <u>Construction Approach</u>. If using the `subclass-based` construction approach, please make sure that a `pickled` dictionary mapping each non-ontology data node to an existing ontology class is created and added to the `./resources/knowledge_graph` directory (please see [here](https://github.com/callahantiff/PheKnowLator/tree/master/resources/knowledge_graphs#construction-method) for additional information).   
- <u>Relations Data</u>. If inverse relation data is going to be used to build the knowledge graph, that it has been generated and added to the `./resources/relations_data` directory (please see [here](https://github.com/callahantiff/PheKnowLator/blob/master/resources/relations_data/README.md) for additional information).  
- <u>Node Metadata</u>. If node metadata is going to be used to build the knowledge graph, that it has been generated and added to the `./resources/node_metadata` directory (please see [here](https://github.com/callahantiff/PheKnowLator/blob/master/resources/node_data/README.md) for additional information).  
- <u>Decoding OWL Semantics</u>. If decoding OWL-Semantics, please make sure to provide a list of owl:Property types to keep is created and added to the `./resources/knowledge_graph` directory (please see [here](https://github.com/callahantiff/PheKnowLator/wiki/OWL-NETS-2.0) for additional information). 

<br>

**Input:** 
- `Master_Edge_List_Dict.json`  
- Directory of relations data sources - see [here](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#relations-data) for more information
- Directory of node data sources - see [here](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#node-metadata) for more information

<br>

**Output:** Please see [`Release v2.0.0 Wiki`](https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0) for access to all generated output files.   
- `Knowledge Graph` (`.owl` and Networkx MultiDiGraph `.pkl`)  
- `Class Instance URI-UUID Map` (if "instance" construction approach)   
- `Triple List - Integer`  
- `Triple List - Identifier`  
- `Node Integer-Identifier Map`  
- `Node Attribute Data`  

<br>

The process to build the knowledge graph is somewhat time consuming and can be broken into the following steps:  

1. Merge Ontologies. See [here](https://github.com/callahantiff/PheKnowLator/blob/master/resources/ontologies/README.md) for additional information on how to preprocess the ontologies prior to merging them.    

2. Create Edges. Add edge lists to merged ontologies.  

3. Add Inverse Relations and Node Data. See the [Dependencies](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies) Wiki page for details on how to construct these resources.  

4. Filter OWL Semantics. Filter the knowledge graph with the goal of removing all edges that contain entities that are needed to support owl semantics, but are not biologically meaningful (please see [here](https://github.com/callahantiff/PheKnowLator/wiki/OWL-NETS-2.0) for additional information).

5. Save Edge Lists and Node Metadata. Several versions of the knowledge graph are saved, including: the full knowledge graph (`owl` or Networkx MultiDiGraph `pickle`), triple lists (i.e. integer index and identifier labeled edge lists with a dictionary that maps between the integer indices and node identifiers), and a file of metadata (i.e. identifiers, labels, synonyms, and descriptions) for all nodes in the knowledge graph.  

<br>

**‼ IMPORTANT:**  
- The file containing the merged ontologies is quite large and can take up to 30 minutes to read in.  This is not a limitation of the code directly, but rather a function of the [`RDFLib Library`](https://github.com/RDFLib). While there are other ways to read in this data, we maintain reliance on this library as it is the most user-friendly for non-RDF users.   
- If you'd like to include [node metadata](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#node-metadata) when building the knowledge graph, please hold off on building the knowledge graph until you have generated the node data. For details on how to do this see the [node metadata](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#node-metadata) section of the `Dependencies` Wiki Page or help for help with generating the data, please see the []() section of the [`Data_Preparation.ipynb`](https://github.com/callahantiff/PheKnowLator/blob/master/notebooks/Data_Preparation.ipynb) Jupyter Notebook.

***


In [None]:
# specify input arguments
build = 'full'
construction_approach = 'instance'
add_node_data_to_kg = 'no'
add_inverse_relations_to_kg = 'no'
decode_owl_semantics = 'yes'
kg_directory_location = './resources/knowledge_graphs'


In [None]:
# construct knowledge graphs
if build == 'partial':
    kg = PartialBuild(construction=construction_approach,
                      node_data=add_node_data_to_kg,
                      inverse_relations=add_inverse_relations_to_kg,
                      decode_owl=decode_owl_semantics,
                      cpus=cpus,
                      write_location=kg_directory_location)
elif build == 'post-closure':
    kg = PostClosureBuild(construction=construction_approach,
                          node_data=add_node_data_to_kg,
                          inverse_relations=add_inverse_relations_to_kg,
                          decode_owl=decode_owl_semantics,
                          cpus=cpus,
                          write_location=kg_directory_location)
else:
    kg = FullBuild(construction=construction_approach,
                   node_data=add_node_data_to_kg,
                   inverse_relations=add_inverse_relations_to_kg,
                   decode_owl=decode_owl_semantics,
                   cpus=cpus,
                   write_location=kg_directory_location)

kg.construct_knowledge_graph()
ray.shutdown()

In [None]:
kg.ontologies

<br>

***
***

```
@misc{callahan_tj_2019_3401437,
  author       = {Callahan, TJ},
  title        = {PheKnowLator},
  month        = mar,
  year         = 2019,
  doi          = {10.5281/zenodo.3401437},
  url          = {https://doi.org/10.5281/zenodo.3401437}
}
```