_____
***

<img width='700' src="https://user-images.githubusercontent.com/8030363/108961534-b9a66980-7634-11eb-96e2-cc46589dcb8c.png" style="vertical-align:middle">

***
***

**Author:** [TJCallahan](https://mail.google.com/mail/u/0/?view=cm&fs=1&tf=1&to=callahantiff@gmail.com)  
**GitHub Repository:** [PheKnowLator](https://github.com/callahantiff/PheKnowLator/wiki)  
**Current Release:** **[`v2.0.0`](https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0)**

<br>

**Objective:** Knowledge graphs provide meaningful ways to integrate heterogeneous biological data and represent complex biological mechanisms. This work seeks to explore the utility of incorporating existing knowledge of molecular mechanism from ontologies, publicly available data, and the literature to construct a biomedical knowledge graph that models unbiased molecular mechanisms of human disease.

<a target="_blank" href="https://user-images.githubusercontent.com/8030363/103158881-11813b00-4780-11eb-8b45-5063765e7645.png"> <img src="https://user-images.githubusercontent.com/8030363/103158881-11813b00-4780-11eb-8b45-5063765e7645.png"></a> 

(*Click Figure to Enlarge Image in Current Browser Tab*)

<br>

***
***

## Notebook Purpose
**Wiki Page:** **[`Release v2.0.0`](https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0)**

<br>

**Purpose:** This notebook serves as a `main` file for the PheKnowLator project. This scripts walks through this program step-by-step and generates the knowledge graph shown above. There is also a command line version of this file ([`main.py`](https://github.com/callahantiff/PheKnowLator/blob/master/main.py)). Please see the [README](https://github.com/callahantiff/PheKnowLator/blob/master/README.md) for more information.

<br>

**Assumptions:**
1. Hyperlinks to all downloaded and generated data sources are provided through [this](https://console.cloud.google.com/storage/browser/pheknowlator/release_v2.0.0?project=pheknowlator) dedicated Google Cloud Storage Bucket. <u>This includes examples, from prior builds, of the required input documents mentioned below</u>.     
2. Make sure that the following input documents have been constructed (see the [Dependencies Wiki](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies) for more information):  
  - [`resource_info.txt`](https://github.com/callahantiff/PheKnowLator/blob/master/resources/resource_info.txt)
  - [`ontology_source_list.txt`](https://github.com/callahantiff/PheKnowLator/blob/master/resources/ontology_source_list.txt)
  - [`edge_source_list.txt`](https://github.com/callahantiff/PheKnowLator/blob/master/resources/edge_source_list.txt)   

3. Prepare [relations](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#relations-data) and [node metadata](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#node-metadata) files prior to running the scripts.  

4. Select a knowledge graph build type (i.e. `full`, `partial`, or `post-closure`) and construction method (i.e. `instance-based` or `subclass-based`).  

<br>

***
### Table of Contents
***
The three primary steps involved in building a knowledge graph are `Downloading Data Sources`, `Creating Edge Lists`, and `Building the knowledge graphs`.

* [Data Sources](#data-source)  
* [Create Edge Lists](#create-edges)  
* [Build Knowledge Graph](#build-kg)  

***

***

_____
### Set-Up Environment

In [1]:
# import needed libraries
import glob
import json
import pandas
import ray
import time

# import module
from pkt_kg import downloads, edge_list, knowledge_graph

***
## Download Data Sources <a class="anchor" id="data-source"></a>

**Wiki Page:** **[`Dependencies`](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies)**  

**Purpose:**
This portion of this portion of the algorithm is to download:
1. [Download Ontology Data](#download-ontology-data)  
2. [Download Edge Data](#download-edge-data)   

<br>

**Input Files:**
  - [`resource_info.txt`](https://github.com/callahantiff/PheKnowLator/blob/master/resources/resource_info.txt)
  - [`ontology_source_list.txt`](https://github.com/callahantiff/PheKnowLator/blob/master/resources/ontology_source_list.txt)
  - [`edge_source_list.txt`](https://github.com/callahantiff/PheKnowLator/blob/master/resources/edge_source_list.txt)

<br>

**Assumptions:**  
- All sources used to construct our knowledge graph need to be preprocessed and ready to download prior to running this code. All mapping, filtering, and label data have been generated prior to this step. For assistance with creating these datasets, see the [`Data_Preparation.ipynb`](https://github.com/callahantiff/PheKnowLator/blob/master/notebooks/Data_Preparation.ipynb) Jupyter Notebook.  
- All downloaded and generated data sources for all PheKnowLator builds can be accessed through [this](https://console.cloud.google.com/storage/browser/pheknowlator/release_v2.0.0?project=pheknowlator) dedicated Google Cloud Storage Bucket.


***
***
### Ontology Data  <a class="anchor" id="download-ontology-data"></a>
Ontologies are the core data structure used when building PheKnowLator. In the figure above, ontology data are shown in yellow boxes.

In [2]:
from pkt_kg.downloads import OntData, LinkedData
from pkt_kg.edge_list import CreatesEdgeList
from pkt_kg.knowledge_graph import FullBuild, PartialBuild, PostClosureBuild

In [57]:
ont = OntData('resources/ontology_source_list.txt', 'resources/resource_info.txt')
#not a function of OntData
#ont._writes_source_metadata_locally()

In [58]:
ont.data_path

'resources/ontology_source_list.txt'

In [59]:
ont.parses_resource_file()

In [60]:
ont.gets_data_type()

'Ontology Data'

In [61]:
ont.source_list

{'disease': 'resources/ontologies/mondo_with_imports.owl',
 'napdichem': 'resources/ontologies/chebi_lite_merged_with_imports.owl',
 'protein': 'resources/ontologies/pr_with_imports.owl',
 'pathway': 'resources/ontologies/pw_with_imports.owl',
 'relation': 'resources/ontologies/ro_with_imports_AD_mods.owl',
 'go': 'resources/ontologies/go_with_imports.owl',
 'chemical': 'resources/ontologies/chebi_lite_with_imports.owl',
 'phenotype': 'resources/ontologies/hp_with_imports.owl',
 'anatomy': 'resources/ontologies/ext_with_imports.owl',
 'cell': 'resources/ontologies/clo_with_imports.owl',
 'genomic': 'resources/ontologies/so_with_imports.owl',
 'oae': 'resources/ontologies/oae_merged_with_imports.owl'}

In [62]:
ont.data_files = ont.source_list
ont.generates_source_metadata()


*** Generating Metadata ***



100%|██████████| 12/12 [00:00<00:00, 16799.62it/s]
100%|██████████| 12/12 [00:00<00:00, 66313.11it/s]


In [35]:
dir(ont)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__metaclass__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_writes_source_metadata_locally',
 'data_files',
 'data_path',
 'data_type',
 'downloads_data_from_url',
 'extracts_edge_metadata',
 'generates_source_metadata',
 'gets_data_type',
 'metadata',
 'parses_resource_file',
 'resource_data',
 'resource_dict',
 'resource_info',
 'source_list']

In [63]:
ont._writes_source_metadata_locally()

100%|██████████| 12/12 [00:00<00:00, 54887.29it/s]


In [64]:
ont.resource_info

["chemical-disease|:;MESH_;|class-class|RO_0002606|http://purl.obolibrary.org/obo/|http://purl.obolibrary.org/obo/|t|1;4|0:./resources/processed_data/MESH_CHEBI_MAP.txt;1:./resources/processed_data/DISEASE_MONDO_MAP.txt|5;!=;''|None",
 "chemical-gene|;MESH_;|class-entity|RO_0002434|http://purl.obolibrary.org/obo/|http://www.ncbi.nlm.nih.gov/gene/|t|1;4|0:./resources/processed_data/MESH_CHEBI_MAP.txt|9;affects;not in x|6;==;Homo sapiens::5;.startswith('gene');",
 'chemical-gobp|:;MESH_;GO_|class-class|RO_0002436|http://purl.obolibrary.org/obo/|http://purl.obolibrary.org/obo/|t|1;5|0:./resources/processed_data/MESH_CHEBI_MAP.txt|8;<=;1.04e-47|3;==;Biological Process',
 'chemical-gocc|:;MESH_;GO_|class-class|RO_0002436|http://purl.obolibrary.org/obo/|http://purl.obolibrary.org/obo/|t|1;5|0:./resources/processed_data/MESH_CHEBI_MAP.txt|8;<=;1.04e-47|3;==;Cellular Component',
 'chemical-gomf|:;MESH_;GO_|class-class|RO_0002436|http://purl.obolibrary.org/obo/|http://purl.obolibrary.org/obo/|t

In [65]:
ont.resource_data

'resources/resource_info.txt'

<br>

### Edge Data   <a class="anchor" id="download-edge-data"></a>
In PheKnowLator, classes are nodes that originate from ontologies. Class data sources are Linked Data sources that are used to create edges in the knowledge graph and thus can connect to other class data sources. Sometimes we want to add data that is not already part of an ontology. In that case, data either be added as an `instance` of an existing ontology class or as its own `owl:class` by being added to the knowledge graph as a `subclass` of an existing `owl:class`.

In [14]:
edges = LinkedData('resources/edge_source_list_TC.txt', 'resources/resource_info.txt')
#do this the first time to download all the edge data
edges.downloads_data_from_url()
edges.writes_source_metadata_locally()


*** Downloading Data: edge_source_list_TC to "resources/edge_data/" ***



  0%|          | 0/33 [00:00<?, ?it/s]


Edge: chemical-disease

Edge: chemical-gene

Edge: chemical-gobp


  9%|▉         | 3/33 [00:04<00:43,  1.45s/it]


Edge: chemical-gocc

Edge: chemical-gomf


 15%|█▌        | 5/33 [00:09<00:57,  2.04s/it]


Edge: chemical-pathway


 18%|█▊        | 6/33 [00:10<00:43,  1.63s/it]


Edge: chemical-phenotype


 21%|██        | 7/33 [00:20<01:45,  4.05s/it]


Edge: chemical-protein


 30%|███       | 10/33 [00:28<01:07,  2.92s/it]


Edge: disease-phenotype

Edge: gene-disease

Edge: gene-gene

Edge: gene-pathway

Edge: gene-phenotype

Edge: gene-protein

Edge: gene-rna

Edge: gobp-pathway

Edge: pathway-gocc


 52%|█████▏    | 17/33 [00:29<00:14,  1.07it/s]


Edge: pathway-gomf

Edge: protein-anatomy

Edge: protein-catalyst

Edge: protein-cell


 64%|██████▎   | 21/33 [00:29<00:07,  1.56it/s]


Edge: protein-cofactor

Edge: protein-gobp


 76%|███████▌  | 25/33 [00:31<00:04,  1.75it/s]


Edge: protein-gocc

Edge: protein-gomf

Edge: protein-pathway

Edge: protein-protein

Edge: rna-anatomy

Edge: rna-cell

Edge: rna-protein

Edge: variant-disease

Edge: variant-gene


 97%|█████████▋| 32/33 [00:53<00:01,  1.93s/it]


Edge: variant-phenotype


100%|██████████| 33/33 [01:07<00:00,  2.06s/it]



*** Generating Metadata ***



100%|██████████| 33/33 [00:00<00:00, 13740.89it/s]
100%|██████████| 33/33 [00:00<00:00, 35112.13it/s]


AttributeError: 'LinkedData' object has no attribute 'writes_source_metadata_locally'

In [21]:
edges = LinkedData('resources/edge_source_list.txt', 'resources/resource_info.txt')

In [22]:
edges.resource_data

'resources/resource_info.txt'

In [23]:
edges.parses_resource_file()

In [24]:
edges.source_list

{'chemical-disease': 'resources/edge_data/chemical-disease_CTD_chemicals_diseases.tsv',
 'chemical-gene': 'resources/edge_data/chemical-gene_CTD_chem_gene_ixns.tsv',
 'chemical-gobp': 'resources/edge_data/chemical-gobp_CTD_chem_go_enriched.tsv',
 'chemical-gocc': 'resources/edge_data/chemical-gocc_CTD_chem_go_enriched.tsv',
 'chemical-gomf': 'resources/edge_data/chemical-gomf_CTD_chem_go_enriched.tsv',
 'chemical-pathway': 'resources/edge_data/chemical-pathway_ChEBI2Reactome_All_Levels.txt',
 'chemical-protein': 'resources/edge_data/chemical-protein_CTD_chem_gene_ixns.tsv',
 'chemical-phenotype': 'resources/edge_data/CTD_chemicals_diseases.tsv',
 'disease-phenotype': 'resources/edge_data/phenotype.hpoa',
 'gene-disease': 'resources/edge_data/gene-disease_curated_gene_disease_associations.tsv',
 'gene-gene': 'resources/edge_data/gene-gene_COMBINED.DEFAULT_NETWORKS.BP_COMBINING.txt',
 'gene-pathway': 'resources/edge_data/gene-pathway_CTD_genes_pathways.tsv',
 'gene-phenotype': 'resources

In [25]:
edges.data_files = edges.source_list
edges.generates_source_metadata()


*** Generating Metadata ***



100%|██████████| 37/37 [00:00<00:00, 24290.07it/s]
100%|██████████| 37/37 [00:00<00:00, 54243.01it/s]


In [20]:
dir(edges)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__metaclass__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_writes_source_metadata_locally',
 'data_files',
 'data_path',
 'data_type',
 'downloads_data_from_url',
 'extracts_edge_metadata',
 'generates_source_metadata',
 'gets_data_type',
 'metadata',
 'parses_resource_file',
 'resource_data',
 'resource_dict',
 'resource_info',
 'source_list']

In [26]:
edges._writes_source_metadata_locally()

100%|██████████| 37/37 [00:00<00:00, 51234.48it/s]


In [27]:
edges.data_path

'resources/edge_source_list.txt'

In [28]:
edges.source_list.keys()

dict_keys(['chemical-disease', 'chemical-gene', 'chemical-gobp', 'chemical-gocc', 'chemical-gomf', 'chemical-pathway', 'chemical-protein', 'chemical-phenotype', 'disease-phenotype', 'gene-disease', 'gene-gene', 'gene-pathway', 'gene-phenotype', 'gene-protein', 'gene-rna', 'gobp-pathway', 'pathway-gocc', 'pathway-gomf', 'protein-anatomy', 'protein-catalyst', 'protein-cofactor', 'protein-cell', 'protein-gobp', 'protein-gocc', 'protein-gomf', 'protein-pathway', 'protein-protein', 'rna-anatomy', 'rna-cell', 'rna-protein', 'variant-disease', 'variant-gene', 'variant-phenotype', 'chemical-transporter', 'chemical-molecule', 'chemical-substrate', 'chemical-inhibitor'])

In [66]:
ont.data_files

{'disease': 'resources/ontologies/mondo_with_imports.owl',
 'napdichem': 'resources/ontologies/chebi_lite_merged_with_imports.owl',
 'protein': 'resources/ontologies/pr_with_imports.owl',
 'pathway': 'resources/ontologies/pw_with_imports.owl',
 'relation': 'resources/ontologies/ro_with_imports_AD_mods.owl',
 'go': 'resources/ontologies/go_with_imports.owl',
 'chemical': 'resources/ontologies/chebi_lite_with_imports.owl',
 'phenotype': 'resources/ontologies/hp_with_imports.owl',
 'anatomy': 'resources/ontologies/ext_with_imports.owl',
 'cell': 'resources/ontologies/clo_with_imports.owl',
 'genomic': 'resources/ontologies/so_with_imports.owl',
 'oae': 'resources/ontologies/oae_merged_with_imports.owl'}

In [25]:
edges.data_files

{'chemical-disease': 'resources/edge_data/chemical-disease_CTD_chemicals_diseases.tsv',
 'chemical-gene': 'resources/edge_data/chemical-gene_CTD_chem_gene_ixns.tsv',
 'chemical-gobp': 'resources/edge_data/chemical-gobp_CTD_chem_go_enriched.tsv',
 'chemical-gocc': 'resources/edge_data/chemical-gocc_CTD_chem_go_enriched.tsv',
 'chemical-gomf': 'resources/edge_data/chemical-gomf_CTD_chem_go_enriched.tsv',
 'chemical-pathway': 'resources/edge_data/chemical-pathway_ChEBI2Reactome_All_Levels.txt',
 'chemical-protein': 'resources/edge_data/chemical-protein_CTD_chem_gene_ixns.tsv',
 'gene-disease': 'resources/edge_data/gene-disease_curated_gene_disease_associations.tsv',
 'gene-gene': 'resources/edge_data/gene-gene_COMBINED.DEFAULT_NETWORKS.BP_COMBINING.txt',
 'gene-pathway': 'resources/edge_data/gene-pathway_CTD_genes_pathways.tsv',
 'gene-protein': 'resources/processed_data/ENTREZ_GENE_PRO_ONTOLOGY_MAP.txt',
 'gene-rna': 'resources/processed_data/ENTREZ_GENE_ENSEMBL_TRANSCRIPT_MAP.txt',
 'go

<br>

***

## Create Edge Lists <a class="anchor" id="create-edges"></a>

**Wiki Page:** **[`Data Sources`](https://github.com/callahantiff/PheKnowLator/wiki/v2-Data-Sources)**

<br>

**Purpose:** The code below will take the dictionaries of processed data described above and use them to create edge lists for each of the edge types specified in the [`resource_info.txt`](https://github.com/callahantiff/PheKnowLator/blob/master/resources/resource_info.txt). Each edge list will be appended to a nested dictionary (see details below).

<br>

**Assumptions:**  
1. All `ontology` and `edge` data sources have been downloaded.   

2. All code in the [`Data_Preparation.ipynb`](https://github.com/callahantiff/PheKnowLator/blob/master/notebooks/Data_Preparation.ipynb) Jupyter Notebook has been run. This Notebook contains code needed to generate all mapping, filtering, and label data.

<br>

**Output:** `Master_Edge_List_Dict.json`. Below is an example of what the `Master Edge Dictionary` contains for each processed resource:  
```python
master_edges = {'chemical-disease'  :
                {'source_labels'    : ';MESH_;',
                 'data_type'        : 'class-class',
                 'edge_relation'    : 'RO_0002606',
                 'uri'              : ('http://purl.obolibrary.org/obo/',
                                       'http://purl.obolibrary.org/obo/'),
                 'delimiter'        : '#',
                 'column_idx'       : '1;4',
                 'identifier_maps'  : '0:./MESH_CHEBI_MAP.txt;1:disease-dbxref-map',
                 'evidence_criteria': "5;!=;' ",
                 'filter_criteria'  : 'None',
                 'edge_list'        : ['...']}
```

***

In [67]:
# set-up environment for parallel processing -- even if running program serially these steps are needed
import psutil
cpus = psutil.cpu_count(logical=True)
ray.init()



{'node_ip_address': '130.49.206.138',
 'raylet_ip_address': '130.49.206.138',
 'redis_address': '130.49.206.138:6379',
 'object_store_address': '/tmp/ray/session_2021-10-04_20-49-31_658649_8877/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2021-10-04_20-49-31_658649_8877/sockets/raylet',
 'webui_url': None,
 'session_dir': '/tmp/ray/session_2021-10-04_20-49-31_658649_8877',
 'metrics_export_port': 58115,
 'node_id': '45cac134585ff276d61e2bb7090354c52ca3e11b4127702ec48dd1ba'}

In [68]:
# combine data sources
combined_edges = dict(edges.data_files, **ont.data_files)
resource_info_loc = './resources/resource_info.txt'

# initialize edge dictionary class
master_edges = CreatesEdgeList(data_files=combined_edges, source_file=resource_info_loc)
master_edges.runs_creates_knowledge_graph_edges(source_file=resource_info_loc, data_files=combined_edges, cpus=cpus)

[2m[36m(pid=11781)[0m Finished Edge: gene-protein (gene = 19318, protein = 19135); 19523 unique edges
[2m[36m(pid=11789)[0m Finished Edge: gobp-pathway (gobp = 478, pathway = 671); 671 unique edges
[2m[36m(pid=11781)[0m Finished Edge: rna-protein (rna = 44199, protein = 19197); 44202 unique edges
[2m[36m(pid=11785)[0m Finished Edge: gene-rna (gene = 25529, rna = 179874); 182717 unique edges
[2m[36m(pid=11788)[0m Finished Edge: gene-disease (gene = 5037, disease = 4431); 12738 unique edges
[2m[36m(pid=11794)[0m Finished Edge: gene-phenotype (gene = 6785, phenotype = 1531); 23525 unique edges
[2m[36m(pid=11783)[0m Finished Edge: gene-pathway (gene = 10371, pathway = 1860); 107025 unique edges
[2m[36m(pid=11795)[0m Finished Edge: chemical-pathway (chemical = 2169, pathway = 2232); 29248 unique edges
[2m[36m(pid=11784)[0m Finished Edge: disease-phenotype (disease = 11864, phenotype = 9964); 427477 unique edges
[2m[36m(pid=11795)[0m Finished Edge: protein-cell 

**Preview Master Edge Data**  
Generate a table that includes each `edge-type`, its primary `relation`, example identifiers, and count of unique edges.

In [69]:
master_edges = json.load(open('resources/Master_Edge_List_Dict.json', 'r'))
master_edges.keys()

dict_keys(['chemical-gomf', 'rna-anatomy', 'chemical-molecule', 'protein-pathway', 'chemical-protein', 'protein-protein', 'protein-gomf', 'chemical-substrate', 'variant-disease', 'protein-catalyst', 'rna-protein', 'pathway-gomf', 'protein-anatomy', 'gobp-pathway', 'chemical-gene', 'gene-phenotype', 'protein-cell', 'protein-cofactor', 'chemical-disease', 'protein-gocc', 'gene-gene', 'variant-gene', 'chemical-inhibitor', 'disease-phenotype', 'chemical-gobp', 'variant-phenotype', 'chemical-transporter', 'chemical-pathway', 'chemical-phenotype', 'chemical-gocc', 'gene-pathway', 'protein-gobp', 'gene-protein', 'pathway-gocc', 'rna-cell', 'gene-disease', 'gene-rna'])

In [70]:
# # read in master edge dictionary
#master_edges = json.load(open('resources/Master_Edge_List_Dict.json', 'r'))

# read in relation data
relation_data = open('./resources/relations_data/RELATIONS_LABELS.txt').readlines()
relation_dict = {x.split('\t')[0]: x.split('\t')[1].strip('\n') for x in relation_data}

# print basic stats on each resource
edge_data = [[key,
              relation_dict[master_edges[key]['edge_relation']],
              ', '.join(master_edges[key]['edge_list'][0]),
              len(master_edges[key]['edge_list'])]
             for key in master_edges.keys()]

# convert dict to pandas df for nice printing
df = pandas.DataFrame(edge_data, columns = ['Edge Type', 'Relation', 'Example Edge', 'Unique Edges']) 
df                

Unnamed: 0,Edge Type,Relation,Example Edge,Unique Edges
0,chemical-gomf,molecularly interacts with,"CHEBI_34568, GO_0005488",26788
1,rna-anatomy,located in,"ENST00000442999, CL_0000775",444974
2,chemical-molecule,molecularly interacts with,"CHEBI_6030, PR_P08684",391
3,protein-pathway,participates in,"PR_A0A075B6P5, R-HSA-109582",118158
4,chemical-protein,interacts with,"CHEBI_4592, PR_P07099",66828
5,protein-protein,molecularly interacts with,"PR_P84085, PR_O15020",618069
6,protein-gomf,has function,"PR_A0A024RBG1, GO_0003723",70085
7,chemical-substrate,is substrate of,"CHEBI_8871, PR_P08684",514
8,variant-disease,causes or contributes to condition,"rs11540654, MONDO_0007903",40956
9,protein-catalyst,molecularly interacts with,"PR_Q00266, CHEBI_15377",25136


<br><br>

***

## Build Knowledge Graph  <a class="anchor" id="build-kg"></a>
**Wiki Pages:**  
- **[`KG-Construction`](https://github.com/callahantiff/PheKnowLator/wiki/KG-Construction)**  
- **[`relations-data`](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#relations-data)**  
- **[`node-metadata`](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#node-metadata)** 

**Jupyter Notebooks:**  
- [`Data_Preparation.ipynb`](https://github.com/callahantiff/PheKnowLator/blob/master/notebooks/Data_Preparation.ipynb)  
[`Ontology_Cleaning.ipynb`](https://github.com/callahantiff/PheKnowLator/blob/master/notebooks/Ontology_Cleaning.ipynb)  


<br>

**Assumptions:**  
- <u>Construction Approach</u>. If using the `subclass-based` construction approach, please make sure that a `pickled` dictionary mapping each non-ontology data node to an existing ontology class is created and added to the `./resources/knowledge_graph` directory (please see [here](https://github.com/callahantiff/PheKnowLator/tree/master/resources/knowledge_graphs#construction-method) for additional information).   
- <u>Relations Data</u>. If inverse relation data is going to be used to build the knowledge graph, that it has been generated and added to the `./resources/relations_data` directory (please see [here](https://github.com/callahantiff/PheKnowLator/blob/master/resources/relations_data/README.md) for additional information).  
- <u>Node Metadata</u>. If node metadata is going to be used to build the knowledge graph, that it has been generated and added to the `./resources/node_metadata` directory (please see [here](https://github.com/callahantiff/PheKnowLator/blob/master/resources/node_data/README.md) for additional information).  
- <u>Decoding OWL Semantics</u>. If decoding OWL-Semantics, please make sure to provide a list of owl:Property types to keep is created and added to the `./resources/knowledge_graph` directory (please see [here](https://github.com/callahantiff/PheKnowLator/wiki/OWL-NETS-2.0) for additional information). 

<br>

**Input:** 
- `Master_Edge_List_Dict.json`  
- Directory of relations data sources - see [here](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#relations-data) for more information
- Directory of node data sources - see [here](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#node-metadata) for more information

<br>

**Output:** Please see [`Release v2.0.0 Wiki`](https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0) for access to all generated output files.   
- `Knowledge Graph` (`.owl` and Networkx MultiDiGraph `.pkl`)  
- `Class Instance URI-UUID Map` (if "instance" construction approach)   
- `Triple List - Integer`  
- `Triple List - Identifier`  
- `Node Integer-Identifier Map`  
- `Node Attribute Data`  

<br>

The process to build the knowledge graph is somewhat time consuming and can be broken into the following steps:  

1. Merge Ontologies. See [here](https://github.com/callahantiff/PheKnowLator/blob/master/resources/ontologies/README.md) for additional information on how to preprocess the ontologies prior to merging them.    

2. Create Edges. Add edge lists to merged ontologies.  

3. Add Inverse Relations and Node Data. See the [Dependencies](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies) Wiki page for details on how to construct these resources.  

4. Filter OWL Semantics. Filter the knowledge graph with the goal of removing all edges that contain entities that are needed to support owl semantics, but are not biologically meaningful (please see [here](https://github.com/callahantiff/PheKnowLator/wiki/OWL-NETS-2.0) for additional information).

5. Save Edge Lists and Node Metadata. Several versions of the knowledge graph are saved, including: the full knowledge graph (`owl` or Networkx MultiDiGraph `pickle`), triple lists (i.e. integer index and identifier labeled edge lists with a dictionary that maps between the integer indices and node identifiers), and a file of metadata (i.e. identifiers, labels, synonyms, and descriptions) for all nodes in the knowledge graph.  

<br>

**‼ IMPORTANT:**  
- The file containing the merged ontologies is quite large and can take up to 30 minutes to read in.  This is not a limitation of the code directly, but rather a function of the [`RDFLib Library`](https://github.com/RDFLib). While there are other ways to read in this data, we maintain reliance on this library as it is the most user-friendly for non-RDF users.   
- If you'd like to include [node metadata](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#node-metadata) when building the knowledge graph, please hold off on building the knowledge graph until you have generated the node data. For details on how to do this see the [node metadata](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#node-metadata) section of the `Dependencies` Wiki Page or help for help with generating the data, please see the []() section of the [`Data_Preparation.ipynb`](https://github.com/callahantiff/PheKnowLator/blob/master/notebooks/Data_Preparation.ipynb) Jupyter Notebook.

***


In [74]:
# specify input arguments
build = 'full'
construction_approach = 'instance'
add_node_data_to_kg = 'yes'
add_inverse_relations_to_kg = 'yes'
decode_owl_semantics = 'yes'
kg_directory_location = './resources/knowledge_graphs'

In [75]:
# construct knowledge graphs, run sed command before this
if build == 'partial':
    kg = PartialBuild(construction=construction_approach,
                      node_data=add_node_data_to_kg,
                      inverse_relations=add_inverse_relations_to_kg,
                      decode_owl=decode_owl_semantics,
                      cpus=cpus,
                      write_location=kg_directory_location)
elif build == 'post-closure':
    kg = PostClosureBuild(construction=construction_approach,
                          node_data=add_node_data_to_kg,
                          inverse_relations=add_inverse_relations_to_kg,
                          decode_owl=decode_owl_semantics,
                          cpus=cpus,
                          write_location=kg_directory_location)
else:
    kg = FullBuild(construction=construction_approach,
                   node_data=add_node_data_to_kg,
                   inverse_relations=add_inverse_relations_to_kg,
                   decode_owl=decode_owl_semantics,
                   cpus=cpus,
                   write_location=kg_directory_location)

kg.construct_knowledge_graph()
ray.shutdown()


### Starting Knowledge Graph Build: FULL ###
*** Loading Relations Data ***
Loading and Processing Relation Data
*** Loading Merged Ontologies ***
Merged Ontologies Graph Stats: 9790425 triples, 4129221 nodes, 344 predicates, 545961 classes, 35 individuals, 804 object props, 622 annotation props
*** Loading Node Metadata Data ***
Loading and Processing Node Metadata

Extracting Class and Relation Metadata


100%|██████████| 411379/411379 [00:59<00:00, 6873.80it/s]
100%|██████████| 804/804 [00:00<00:00, 5256.54it/s]


*** Splitting Graph ***
Adding Namespace to BNodes
Creating Logic and Annotation Subsets of Graph


100%|██████████| 918791/918791 [10:56<00:00, 1399.36it/s] 


Annotation Assertions (n=5732140 Triples)
Creating Logic Graph (n=4058285 Triples)


100%|██████████| 4058285/4058285 [01:17<00:00, 52136.80it/s]


Merged Ontologies - Logic Subset Graph Stats: 4058285 triples, 1401503 nodes, 49 predicates, 545961 classes, 35 individuals, 804 object props, 622 annotation props

*** Building Knowledge Graph Edges ***
[2m[36m(pid=30793)[0m 
[2m[36m(pid=30793)[0m Created CHEMICAL-PROTEIN (class-class) Edges: 333466 OWL Edges, 66693 Original Edges; 144227 OWL Nodes, Original Nodes: 4222 chemical(s), 6616 protein(s)
[2m[36m(pid=30757)[0m 
[2m[36m(pid=30757)[0m Created PROTEIN-GOMF (class-class) Edges: 420512 OWL Edges, 140170 Original Edges; 162420 OWL Nodes, Original Nodes: 17799 protein(s), 4447 gomf(s)
[2m[36m(pid=30683)[0m 
[2m[36m(pid=30683)[0m Created PROTEIN-CELL (class-class) Edges: 441182 OWL Edges, 147060 Original Edges; 157234 OWL Nodes, Original Nodes: 10045 protein(s), 125 cell(s)
[2m[36m(pid=30610)[0m 
[2m[36m(pid=30610)[0m Created PROTEIN-GOCC (class-class) Edges: 494198 OWL Edges, 164732 Original Edges; 184935 OWL Nodes, Original Nodes: 18443 protein(s), 1756 goc

100%|██████████| 24252979/24252979 [54:15<00:00, 7449.12it/s]  


Pickling MultiDiGraph
Generating Network Statistics
Full Logic Subset (OWL) Graph Stats: 8582892 nodes, 24252979 edges, 3 self-loops, 5 most most common edges: http://www.w3.org/1999/02/22-rdf-syntax-ns#type:15136505, http://www.w3.org/2000/01/rdf-schema#subClassOf:1523220, http://purl.obolibrary.org/obo/RO_0002436:1015426, http://purl.obolibrary.org/obo/RO_0001015:688776, http://purl.obolibrary.org/obo/RO_0001025:688776, http://www.w3.org/2002/07/owl#onProperty:516214, average degree 2.825735078572584, 5 highest degree nodes: http://www.w3.org/2002/07/owl#NamedIndividual:6835571, http://www.w3.org/2002/07/owl#Class:891813, http://www.w3.org/2002/07/owl#Restriction:516214, http://purl.obolibrary.org/obo/SO_0000673:190860, http://purl.obolibrary.org/obo/NCBITaxon_9606:146689, http://www.w3.org/1999/02/22-rdf-syntax-ns#nil:134810, density: 3.2922882028591343e-07, 2 component(s): {0: 8582885, 1: '7 nodes: http://purl.obolibrary.org/obo/bfo/2014-05-03/classes-only.owl | http://purl.obolibr

  0%|          | 0/17 [00:00<?, ?it/s]

Removing owl:disjointWith Axioms
Filtering Triples



  0%|          | 0/4056604 [00:00<?, ?it/s][A
  0%|          | 1/4056604 [00:00<340:14:10,  3.31it/s][A
  0%|          | 536/4056604 [00:00<1:15:16, 898.06it/s][A
  0%|          | 3075/4056604 [00:00<12:00, 5629.16it/s][A
  0%|          | 4278/4056604 [00:01<16:04, 4201.38it/s][A
  0%|          | 6964/4056604 [00:01<08:43, 7736.66it/s][A
  0%|          | 9569/4056604 [00:01<06:05, 11074.93it/s][A
  0%|          | 12210/4056604 [00:01<04:43, 14241.00it/s][A
  0%|          | 14915/4056604 [00:01<03:55, 17150.60it/s][A
  0%|          | 17527/4056604 [00:01<03:28, 19349.48it/s][A
  0%|          | 20107/4056604 [00:01<03:11, 21024.14it/s][A
  1%|          | 22917/4056604 [00:01<02:55, 22936.34it/s][A
  1%|          | 25541/4056604 [00:01<02:48, 23857.11it/s][A
  1%|          | 28208/4056604 [00:02<02:43, 24653.52it/s][A
  1%|          | 30981/4056604 [00:02<02:37, 25540.25it/s][A
  1%|          | 33639/4056604 [00:02<02:36, 25654.88it/s][A
  1%|          | 36277/4056604 [00

[2m[36m(pid=21361)[0m Decoding 27825 OWL Classes and Axioms
[2m[36m(pid=21421)[0m Decoding 27825 OWL Classes and Axioms
[2m[36m(pid=21505)[0m Decoding 27825 OWL Classes and Axioms
[2m[36m(pid=21567)[0m Decoding 27825 OWL Classes and Axioms
[2m[36m(pid=21649)[0m Decoding 27825 OWL Classes and Axioms
[2m[36m(pid=21704)[0m Decoding 27825 OWL Classes and Axioms
[2m[36m(pid=21820)[0m Decoding 27825 OWL Classes and Axioms
[2m[36m(pid=21905)[0m Decoding 27825 OWL Classes and Axioms
[2m[36m(pid=22002)[0m Decoding 27825 OWL Classes and Axioms
[2m[36m(pid=22085)[0m Decoding 27825 OWL Classes and Axioms
[2m[36m(pid=22146)[0m Decoding 27825 OWL Classes and Axioms
[2m[36m(pid=22228)[0m Decoding 27825 OWL Classes and Axioms
[2m[36m(pid=22383)[0m Decoding 27825 OWL Classes and Axioms
[2m[36m(pid=22453)[0m Decoding 27825 OWL Classes and Axioms
[2m[36m(pid=22498)[0m Decoding 27825 OWL Classes and Axioms
[2m[36m(pid=21361)[0m Filtering Triples
[2m[36m(pi

  6%|▌         | 1/17 [12:53<3:26:10, 773.13s/it]

Removing owl:disjointWith Axioms
Filtering Triples



  0%|          | 0/632265 [00:00<?, ?it/s][A
  0%|          | 31/632265 [00:00<34:03, 309.46it/s][A
  0%|          | 121/632265 [00:00<16:07, 653.53it/s][A
  0%|          | 450/632265 [00:00<05:40, 1853.62it/s][A
  0%|          | 778/632265 [00:00<04:21, 2415.54it/s][A
  0%|          | 1133/632265 [00:00<03:43, 2823.70it/s][A
  0%|          | 1509/632265 [00:00<03:20, 3139.25it/s][A
  0%|          | 1898/632265 [00:00<03:06, 3383.21it/s][A
  0%|          | 2272/632265 [00:00<03:00, 3495.48it/s][A
  0%|          | 2685/632265 [00:00<02:50, 3693.19it/s][A
  0%|          | 3107/632265 [00:01<02:43, 3854.43it/s][A
  1%|          | 3507/632265 [00:01<02:41, 3897.14it/s][A
  1%|          | 3923/632265 [00:01<02:37, 3976.93it/s][A
  1%|          | 4321/632265 [00:01<03:05, 3384.28it/s][A
  1%|          | 4724/632265 [00:01<02:56, 3557.31it/s][A
  1%|          | 5144/632265 [00:01<02:47, 3735.31it/s][A
  1%|          | 5563/632265 [00:01<02:42, 3861.99it/s][A
  1%|          |

Removing owl:disjointWith Axioms
Filtering Triples



  0%|          | 0/967901 [00:00<?, ?it/s][A
  0%|          | 1/967901 [00:00<30:53:14,  8.70it/s][A
  0%|          | 135/967901 [00:00<30:53, 522.13it/s][A
  0%|          | 296/967901 [00:00<17:44, 908.61it/s][A
  0%|          | 464/967901 [00:00<13:49, 1166.20it/s][A
  0%|          | 635/967901 [00:00<12:00, 1342.17it/s][A
  0%|          | 801/967901 [00:00<11:11, 1440.82it/s][A
  0%|          | 951/967901 [00:00<13:31, 1192.25it/s][A
  0%|          | 1103/967901 [00:00<12:35, 1279.32it/s][A
  0%|          | 1267/967901 [00:01<11:41, 1377.37it/s][A
  0%|          | 1439/967901 [00:01<10:56, 1472.74it/s][A
  0%|          | 1607/967901 [00:01<10:31, 1529.20it/s][A
  0%|          | 1775/967901 [00:01<10:14, 1572.81it/s][A
  0%|          | 1941/967901 [00:01<10:05, 1596.02it/s][A
  0%|          | 2109/967901 [00:01<09:56, 1618.79it/s][A
  0%|          | 2282/967901 [00:01<09:45, 1649.50it/s][A
  0%|          | 2449/967901 [00:01<09:45, 1647.75it/s][A
  0%|          | 26

[2m[36m(pid=24746)[0m Decoding 1822 OWL Classes and Axioms
[2m[36m(pid=24772)[0m Decoding 1822 OWL Classes and Axioms
[2m[36m(pid=24797)[0m Decoding 1822 OWL Classes and Axioms
[2m[36m(pid=24823)[0m Decoding 1822 OWL Classes and Axioms
[2m[36m(pid=24885)[0m Decoding 1822 OWL Classes and Axioms
[2m[36m(pid=24910)[0m Decoding 1822 OWL Classes and Axioms
[2m[36m(pid=24938)[0m Decoding 1822 OWL Classes and Axioms
[2m[36m(pid=24964)[0m Decoding 1821 OWL Classes and Axioms
[2m[36m(pid=24999)[0m Decoding 1821 OWL Classes and Axioms
[2m[36m(pid=25027)[0m Decoding 1821 OWL Classes and Axioms
[2m[36m(pid=25054)[0m Decoding 1821 OWL Classes and Axioms
[2m[36m(pid=25098)[0m Decoding 1821 OWL Classes and Axioms
[2m[36m(pid=25165)[0m Decoding 1821 OWL Classes and Axioms
[2m[36m(pid=24746)[0m Filtering Triples
[2m[36m(pid=24797)[0m Filtering Triples
[2m[36m(pid=24823)[0m Filtering Triples
[2m[36m(pid=24772)[0m Filtering Triples
[2m[36m(pid=24885)[

 18%|█▊        | 3/17 [27:28<2:08:02, 548.75s/it]

Removing owl:disjointWith Axioms
Filtering Triples



  0%|          | 0/862766 [00:00<?, ?it/s][A
  0%|          | 1/862766 [00:00<28:23:49,  8.44it/s][A

[2m[36m(pid=25365)[0m Filtering Triples



  0%|          | 335/862766 [00:00<07:51, 1828.91it/s][A
  0%|          | 632/862766 [00:00<07:47, 1844.59it/s][A
  0%|          | 1001/862766 [00:00<05:50, 2456.35it/s][A
  0%|          | 1374/862766 [00:00<05:00, 2864.52it/s][A
  0%|          | 1749/862766 [00:00<04:34, 3141.13it/s][A
  0%|          | 2134/862766 [00:00<04:16, 3358.65it/s][A
  0%|          | 2509/862766 [00:00<04:07, 3476.11it/s][A
  0%|          | 2891/862766 [00:00<04:00, 3579.57it/s][A
  0%|          | 3277/862766 [00:01<03:54, 3661.84it/s][A
  0%|          | 3657/862766 [00:01<03:52, 3701.37it/s][A
  0%|          | 4042/862766 [00:01<03:49, 3744.73it/s][A
  1%|          | 4441/862766 [00:01<03:44, 3815.62it/s][A
  1%|          | 4824/862766 [00:01<03:44, 3815.80it/s][A
  1%|          | 5211/862766 [00:01<03:43, 3831.91it/s][A
  1%|          | 5595/862766 [00:01<04:32, 3148.65it/s][A
  1%|          | 5987/862766 [00:01<04:15, 3349.12it/s][A
  1%|          | 6367/862766 [00:01<04:06, 3469.24it/s][

Removing owl:disjointWith Axioms
Filtering Triples



  0%|          | 0/305646 [00:00<?, ?it/s][A
  0%|          | 257/305646 [00:00<01:58, 2567.98it/s][A
  0%|          | 521/305646 [00:00<01:57, 2607.23it/s][A
  0%|          | 910/305646 [00:00<01:35, 3191.97it/s][A
  0%|          | 1300/305646 [00:00<01:27, 3471.27it/s][A
  1%|          | 1691/305646 [00:00<01:23, 3628.22it/s][A
  1%|          | 2083/305646 [00:00<01:21, 3726.91it/s][A
  1%|          | 2481/305646 [00:00<01:19, 3808.94it/s][A
  1%|          | 2883/305646 [00:00<01:18, 3875.06it/s][A
  1%|          | 3281/305646 [00:00<01:17, 3907.47it/s][A
  1%|          | 3683/305646 [00:01<01:16, 3939.68it/s][A
  1%|▏         | 4083/305646 [00:01<01:16, 3955.94it/s][A
  1%|▏         | 4485/305646 [00:01<01:15, 3967.47it/s][A
  2%|▏         | 4882/305646 [00:01<01:16, 3935.51it/s][A
  2%|▏         | 5286/305646 [00:01<01:15, 3965.54it/s][A
  2%|▏         | 5692/305646 [00:01<01:15, 3993.24it/s][A
  2%|▏         | 6101/305646 [00:01<01:14, 4020.55it/s][A
  2%|▏      

Removing owl:disjointWith Axioms
Filtering Triples



  0%|          | 0/956144 [00:00<?, ?it/s][A
  0%|          | 1/956144 [00:00<27:26:07,  9.68it/s][A
  0%|          | 274/956144 [00:00<12:31, 1272.54it/s][A
  0%|          | 841/956144 [00:00<05:15, 3027.34it/s][A
  0%|          | 1162/956144 [00:00<05:47, 2750.51it/s][A
  0%|          | 1794/956144 [00:00<04:06, 3876.17it/s][A
  0%|          | 2387/956144 [00:00<03:31, 4514.04it/s][A
  0%|          | 3027/956144 [00:00<03:07, 5093.53it/s][A
  0%|          | 3683/956144 [00:00<02:51, 5539.04it/s][A
  0%|          | 4333/956144 [00:00<02:43, 5827.31it/s][A
  1%|          | 4981/956144 [00:01<02:37, 6021.81it/s][A
  1%|          | 5622/956144 [00:01<02:34, 6137.26it/s][A
  1%|          | 6275/956144 [00:01<02:31, 6251.58it/s][A
  1%|          | 6936/956144 [00:01<02:29, 6357.77it/s][A
  1%|          | 7581/956144 [00:01<02:28, 6384.27it/s][A
  1%|          | 8234/956144 [00:01<02:27, 6424.39it/s][A
  1%|          | 8888/956144 [00:01<02:26, 6458.72it/s][A
  1%|        

[2m[36m(pid=27262)[0m Decoding 12840 OWL Classes and Axioms
[2m[36m(pid=27297)[0m Decoding 12840 OWL Classes and Axioms
[2m[36m(pid=27322)[0m Decoding 12840 OWL Classes and Axioms
[2m[36m(pid=27349)[0m Decoding 12840 OWL Classes and Axioms
[2m[36m(pid=27392)[0m Decoding 12840 OWL Classes and Axioms
[2m[36m(pid=27425)[0m Decoding 12840 OWL Classes and Axioms
[2m[36m(pid=27262)[0m Filtering Triples
[2m[36m(pid=27482)[0m Decoding 12840 OWL Classes and Axioms
[2m[36m(pid=27297)[0m Filtering Triples
[2m[36m(pid=27511)[0m Decoding 12839 OWL Classes and Axioms
[2m[36m(pid=27322)[0m Filtering Triples
[2m[36m(pid=27538)[0m Decoding 12839 OWL Classes and Axioms
[2m[36m(pid=27349)[0m Filtering Triples
[2m[36m(pid=27561)[0m Decoding 12839 OWL Classes and Axioms
[2m[36m(pid=27392)[0m Filtering Triples
[2m[36m(pid=27587)[0m Decoding 12839 OWL Classes and Axioms
[2m[36m(pid=27616)[0m Decoding 12839 OWL Classes and Axioms
[2m[36m(pid=27425)[0m Filt

 35%|███▌      | 6/17 [38:17<54:35, 297.74s/it]  

Removing owl:disjointWith Axioms
Filtering Triples



  0%|          | 0/345802 [00:00<?, ?it/s][A
  0%|          | 186/345802 [00:00<03:06, 1858.11it/s][A

[2m[36m(pid=27749)[0m Filtering Triples



  0%|          | 558/345802 [00:00<01:57, 2949.26it/s][A
  0%|          | 853/345802 [00:00<02:00, 2865.89it/s][A
  0%|          | 1255/345802 [00:00<01:44, 3312.12it/s][A
  0%|          | 1655/345802 [00:00<01:36, 3555.67it/s][A
  1%|          | 2073/345802 [00:00<01:31, 3762.19it/s][A
  1%|          | 2490/345802 [00:00<01:28, 3892.05it/s][A
  1%|          | 2903/345802 [00:00<01:26, 3966.53it/s][A
  1%|          | 3326/345802 [00:00<01:24, 4046.86it/s][A
  1%|          | 3745/345802 [00:01<01:23, 4090.51it/s][A
  1%|          | 4167/345802 [00:01<01:22, 4127.71it/s][A
  1%|▏         | 4588/345802 [00:01<01:22, 4152.69it/s][A
  1%|▏         | 5005/345802 [00:01<01:21, 4157.83it/s][A
  2%|▏         | 5428/345802 [00:01<01:21, 4175.40it/s][A
  2%|▏         | 5854/345802 [00:01<01:21, 4196.89it/s][A
  2%|▏         | 6279/345802 [00:01<01:20, 4212.06it/s][A
  2%|▏         | 6704/345802 [00:01<01:20, 4222.68it/s][A
  2%|▏         | 7127/345802 [00:01<01:27, 3875.91it/s][

Removing owl:disjointWith Axioms
Filtering Triples



  0%|          | 0/545947 [00:00<?, ?it/s][A
  0%|          | 82/545947 [00:00<16:29, 551.77it/s][A
  0%|          | 503/545947 [00:00<03:53, 2340.20it/s][A
  0%|          | 994/545947 [00:00<02:40, 3394.72it/s][A
  0%|          | 1489/545947 [00:00<02:17, 3968.42it/s][A
  0%|          | 1989/545947 [00:00<02:05, 4322.92it/s][A
  0%|          | 2495/545947 [00:00<01:59, 4565.03it/s][A
  1%|          | 2970/545947 [00:00<01:57, 4622.79it/s][A
  1%|          | 3468/545947 [00:00<01:54, 4733.65it/s][A
  1%|          | 3972/545947 [00:00<01:52, 4824.91it/s][A
  1%|          | 4480/545947 [00:01<01:50, 4901.98it/s][A
  1%|          | 4999/545947 [00:01<01:48, 4978.33it/s][A
  1%|          | 5537/545947 [00:01<01:45, 5098.55it/s][A
  1%|          | 6049/545947 [00:01<02:07, 4239.94it/s][A
  1%|          | 6575/545947 [00:01<01:59, 4507.95it/s][A
  1%|▏         | 7071/545947 [00:01<01:56, 4631.20it/s][A
  1%|▏         | 7604/545947 [00:01<01:51, 4826.78it/s][A
  1%|▏        

[2m[36m(pid=28335)[0m Decoding 8628 OWL Classes and Axioms
[2m[36m(pid=28360)[0m Decoding 8628 OWL Classes and Axioms
[2m[36m(pid=28415)[0m Decoding 8628 OWL Classes and Axioms
[2m[36m(pid=28441)[0m Decoding 8628 OWL Classes and Axioms
[2m[36m(pid=28469)[0m Decoding 8627 OWL Classes and Axioms
[2m[36m(pid=28335)[0m Filtering Triples
[2m[36m(pid=28494)[0m Decoding 8627 OWL Classes and Axioms
[2m[36m(pid=28360)[0m Filtering Triples
[2m[36m(pid=28520)[0m Decoding 8627 OWL Classes and Axioms
[2m[36m(pid=28415)[0m Filtering Triples
[2m[36m(pid=28545)[0m Decoding 8627 OWL Classes and Axioms
[2m[36m(pid=28441)[0m Filtering Triples
[2m[36m(pid=28571)[0m Decoding 8627 OWL Classes and Axioms
[2m[36m(pid=28469)[0m Filtering Triples
[2m[36m(pid=28601)[0m Decoding 8627 OWL Classes and Axioms
[2m[36m(pid=28494)[0m Filtering Triples
[2m[36m(pid=28626)[0m Decoding 8627 OWL Classes and Axioms
[2m[36m(pid=28520)[0m Filtering Triples
[2m[36m(pid=286

 47%|████▋     | 8/17 [43:39<33:40, 224.48s/it]

Removing owl:disjointWith Axioms
Filtering Triples



  0%|          | 0/288562 [00:00<?, ?it/s][A
  0%|          | 177/288562 [00:00<02:43, 1768.63it/s][A

[2m[36m(pid=28789)[0m Filtering Triples



  0%|          | 611/288562 [00:00<01:27, 3272.68it/s][A
  0%|          | 939/288562 [00:00<01:29, 3224.39it/s][A
  0%|          | 1359/288562 [00:00<01:19, 3601.71it/s][A
  1%|          | 1811/288562 [00:00<01:12, 3929.49it/s][A
  1%|          | 2264/288562 [00:00<01:09, 4132.06it/s][A
  1%|          | 2717/288562 [00:00<01:07, 4259.06it/s][A
  1%|          | 3157/288562 [00:00<01:06, 4302.93it/s][A
  1%|▏         | 3621/288562 [00:00<01:04, 4405.83it/s][A
  1%|▏         | 4073/288562 [00:01<01:04, 4438.87it/s][A
  2%|▏         | 4542/288562 [00:01<01:02, 4511.26it/s][A
  2%|▏         | 5005/288562 [00:01<01:02, 4545.65it/s][A
  2%|▏         | 5460/288562 [00:01<01:06, 4288.40it/s][A
  2%|▏         | 5924/288562 [00:01<01:04, 4386.32it/s][A
  2%|▏         | 6390/288562 [00:01<01:03, 4465.61it/s][A
  2%|▏         | 6858/288562 [00:01<01:02, 4527.48it/s][A
  3%|▎         | 7328/288562 [00:01<01:01, 4576.75it/s][A
  3%|▎         | 7787/288562 [00:01<01:06, 4239.72it/s][

Removing owl:disjointWith Axioms
Filtering Triples



  0%|          | 0/275550 [00:00<?, ?it/s][A
  0%|          | 249/275550 [00:00<01:50, 2488.38it/s][A
  0%|          | 551/275550 [00:00<01:38, 2800.24it/s][A
  0%|          | 966/275550 [00:00<01:20, 3415.55it/s][A
  1%|          | 1400/275550 [00:00<01:12, 3775.43it/s][A
  1%|          | 1837/275550 [00:00<01:08, 3987.91it/s][A
  1%|          | 2273/275550 [00:00<01:06, 4113.10it/s][A
  1%|          | 2685/275550 [00:00<01:10, 3861.69it/s][A
  1%|          | 3122/275550 [00:00<01:07, 4015.33it/s][A
  1%|▏         | 3565/275550 [00:00<01:05, 4139.66it/s][A
  1%|▏         | 4002/275550 [00:01<01:04, 4207.97it/s][A
  2%|▏         | 4445/275550 [00:01<01:03, 4272.89it/s][A
  2%|▏         | 4886/275550 [00:01<01:02, 4310.55it/s][A
  2%|▏         | 5329/275550 [00:01<01:02, 4341.66it/s][A
  2%|▏         | 5778/275550 [00:01<01:01, 4385.95it/s][A
  2%|▏         | 6236/275550 [00:01<01:00, 4442.45it/s][A
  2%|▏         | 6695/275550 [00:01<00:59, 4485.53it/s][A
  3%|▎      

[2m[36m(pid=29086)[0m Decoding 493 OWL Classes and Axioms
[2m[36m(pid=29090)[0m Decoding 493 OWL Classes and Axioms
[2m[36m(pid=29115)[0m Decoding 493 OWL Classes and Axioms
[2m[36m(pid=29140)[0m Decoding 493 OWL Classes and Axioms
[2m[36m(pid=29165)[0m Decoding 493 OWL Classes and Axioms
[2m[36m(pid=29195)[0m Decoding 492 OWL Classes and Axioms
[2m[36m(pid=29229)[0m Decoding 492 OWL Classes and Axioms
[2m[36m(pid=29271)[0m Decoding 492 OWL Classes and Axioms
[2m[36m(pid=29288)[0m Decoding 492 OWL Classes and Axioms
[2m[36m(pid=29314)[0m Decoding 492 OWL Classes and Axioms
[2m[36m(pid=29334)[0m Decoding 492 OWL Classes and Axioms
[2m[36m(pid=29090)[0m Filtering Triples
[2m[36m(pid=29115)[0m Filtering Triples
[2m[36m(pid=29140)[0m Filtering Triples
[2m[36m(pid=29165)[0m Filtering Triples
[2m[36m(pid=29229)[0m Filtering Triples
[2m[36m(pid=29288)[0m Filtering Triples
[2m[36m(pid=29086)[0m Filtering Triples
[2m[36m(pid=29195)[0m Fil

 59%|█████▉    | 10/17 [46:18<17:22, 148.91s/it]

[2m[36m(pid=29462)[0m Decoding 492 OWL Classes and Axioms
Removing owl:disjointWith Axioms
Filtering Triples



  0%|          | 0/258536 [00:00<?, ?it/s][A
  0%|          | 288/258536 [00:00<01:29, 2878.33it/s][A

[2m[36m(pid=29462)[0m Filtering Triples



  0%|          | 633/258536 [00:00<01:22, 3107.95it/s][A
  0%|          | 1055/258536 [00:00<01:11, 3604.40it/s][A
  1%|          | 1481/258536 [00:00<01:06, 3859.42it/s][A
  1%|          | 1913/258536 [00:00<01:03, 4021.23it/s][A
  1%|          | 2344/258536 [00:00<01:02, 4116.27it/s][A
  1%|          | 2787/258536 [00:00<01:00, 4216.39it/s][A
  1%|          | 3226/258536 [00:00<00:59, 4267.57it/s][A
  1%|▏         | 3670/258536 [00:00<00:59, 4319.70it/s][A
  2%|▏         | 4109/258536 [00:01<00:58, 4340.93it/s][A
  2%|▏         | 4558/258536 [00:01<00:57, 4386.02it/s][A
  2%|▏         | 5002/258536 [00:01<00:57, 4401.39it/s][A
  2%|▏         | 5448/258536 [00:01<00:57, 4414.82it/s][A
  2%|▏         | 5893/258536 [00:01<00:57, 4423.71it/s][A
  2%|▏         | 6336/258536 [00:01<01:00, 4163.04it/s][A
  3%|▎         | 6756/258536 [00:01<01:00, 4147.47it/s][A
  3%|▎         | 7211/258536 [00:01<00:58, 4263.31it/s][A
  3%|▎         | 7661/258536 [00:01<00:57, 4330.31it/s]

[2m[36m(pid=29673)[0m Decoding 425 OWL Classes and Axioms
[2m[36m(pid=29677)[0m Decoding 425 OWL Classes and Axioms
[2m[36m(pid=29677)[0m Filtering Triples
[2m[36m(pid=29702)[0m Decoding 425 OWL Classes and Axioms
[2m[36m(pid=29727)[0m Decoding 425 OWL Classes and Axioms
[2m[36m(pid=29727)[0m Filtering Triples
[2m[36m(pid=29752)[0m Decoding 425 OWL Classes and Axioms
[2m[36m(pid=29777)[0m Decoding 424 OWL Classes and Axioms
[2m[36m(pid=29802)[0m Decoding 424 OWL Classes and Axioms
[2m[36m(pid=29827)[0m Decoding 424 OWL Classes and Axioms
[2m[36m(pid=29878)[0m Decoding 424 OWL Classes and Axioms
[2m[36m(pid=29896)[0m Decoding 424 OWL Classes and Axioms
[2m[36m(pid=29777)[0m Filtering Triples
[2m[36m(pid=29827)[0m Filtering Triples
[2m[36m(pid=29673)[0m Filtering Triples
[2m[36m(pid=29702)[0m Filtering Triples
[2m[36m(pid=29752)[0m Filtering Triples
[2m[36m(pid=29802)[0m Filtering Triples
[2m[36m(pid=29896)[0m Filtering Triples
[2

 65%|██████▍   | 11/17 [47:36<12:43, 127.26s/it]

Removing owl:disjointWith Axioms
Filtering Triples



  0%|          | 0/266896 [00:00<?, ?it/s][A
  0%|          | 295/266896 [00:00<01:30, 2949.86it/s][A

[2m[36m(pid=30053)[0m Decoding 424 OWL Classes and Axioms
[2m[36m(pid=30053)[0m Filtering Triples



  0%|          | 629/266896 [00:00<01:24, 3143.79it/s][A
  0%|          | 1089/266896 [00:00<01:09, 3802.42it/s][A
  1%|          | 1561/266896 [00:00<01:03, 4161.96it/s][A
  1%|          | 2035/266896 [00:00<01:00, 4367.84it/s][A
  1%|          | 2507/266896 [00:00<00:58, 4486.28it/s][A
  1%|          | 2990/266896 [00:00<00:57, 4595.83it/s][A
  1%|▏         | 3450/266896 [00:00<01:01, 4264.76it/s][A
  1%|▏         | 3930/266896 [00:00<00:59, 4422.93it/s][A
  2%|▏         | 4413/266896 [00:01<00:57, 4542.50it/s][A
  2%|▏         | 4879/266896 [00:01<00:57, 4575.65it/s][A
  2%|▏         | 5372/266896 [00:01<00:55, 4679.54it/s][A
  2%|▏         | 5852/266896 [00:01<00:55, 4713.84it/s][A
  2%|▏         | 6328/266896 [00:01<00:55, 4727.22it/s][A
  3%|▎         | 6802/266896 [00:01<00:54, 4729.63it/s][A
  3%|▎         | 7292/266896 [00:01<00:54, 4780.17it/s][A
  3%|▎         | 7777/266896 [00:01<00:53, 4800.25it/s][A
  3%|▎         | 8258/266896 [00:01<00:53, 4795.03it/s]

[2m[36m(pid=30352)[0m Decoding 809 OWL Classes and Axioms
[2m[36m(pid=30356)[0m Decoding 809 OWL Classes and Axioms
[2m[36m(pid=30381)[0m Decoding 809 OWL Classes and Axioms
[2m[36m(pid=30407)[0m Decoding 809 OWL Classes and Axioms
[2m[36m(pid=30433)[0m Decoding 809 OWL Classes and Axioms
[2m[36m(pid=30471)[0m Decoding 809 OWL Classes and Axioms
[2m[36m(pid=30484)[0m Decoding 809 OWL Classes and Axioms
[2m[36m(pid=30509)[0m Decoding 809 OWL Classes and Axioms
[2m[36m(pid=30534)[0m Decoding 809 OWL Classes and Axioms
[2m[36m(pid=30561)[0m Decoding 808 OWL Classes and Axioms
[2m[36m(pid=30582)[0m Decoding 808 OWL Classes and Axioms
[2m[36m(pid=30352)[0m Filtering Triples
[2m[36m(pid=30356)[0m Filtering Triples
[2m[36m(pid=30433)[0m Filtering Triples
[2m[36m(pid=30471)[0m Filtering Triples
[2m[36m(pid=30381)[0m Filtering Triples
[2m[36m(pid=30407)[0m Filtering Triples
[2m[36m(pid=30509)[0m Filtering Triples
[2m[36m(pid=30607)[0m Dec

 71%|███████   | 12/17 [49:55<10:54, 130.80s/it]

[2m[36m(pid=30721)[0m Decoding 808 OWL Classes and Axioms
Removing owl:disjointWith Axioms
Filtering Triples



  0%|          | 0/282714 [00:00<?, ?it/s][A
  0%|          | 170/282714 [00:00<02:46, 1696.99it/s][A
  0%|          | 546/282714 [00:00<01:37, 2908.26it/s][A


[2m[36m(pid=30721)[0m Filtering Triples


  0%|          | 913/282714 [00:00<01:26, 3254.47it/s][A
  0%|          | 1322/282714 [00:00<01:18, 3583.43it/s][A
  1%|          | 1714/282714 [00:00<01:15, 3703.60it/s][A
  1%|          | 2085/282714 [00:00<01:22, 3414.81it/s][A
  1%|          | 2483/282714 [00:00<01:18, 3584.27it/s][A
  1%|          | 2865/282714 [00:00<01:16, 3653.86it/s][A
  1%|          | 3268/282714 [00:00<01:14, 3766.65it/s][A
  1%|▏         | 3653/282714 [00:01<01:13, 3788.08it/s][A
  1%|▏         | 4053/282714 [00:01<01:12, 3851.59it/s][A
  2%|▏         | 4465/282714 [00:01<01:10, 3930.43it/s][A
  2%|▏         | 4859/282714 [00:01<01:12, 3852.06it/s][A
  2%|▏         | 5266/282714 [00:01<01:10, 3916.22it/s][A
  2%|▏         | 5659/282714 [00:01<01:11, 3878.74it/s][A
  2%|▏         | 6054/282714 [00:01<01:11, 3895.04it/s][A
  2%|▏         | 6459/282714 [00:01<01:10, 3938.65it/s][A
  2%|▏         | 6854/282714 [00:01<01:10, 3935.08it/s][A
  3%|▎         | 7251/282714 [00:01<01:09, 3944.46it/s][

[2m[36m(pid=30995)[0m Decoding 526 OWL Classes and Axioms
[2m[36m(pid=31014)[0m Decoding 526 OWL Classes and Axioms
[2m[36m(pid=31014)[0m Filtering Triples
[2m[36m(pid=31045)[0m Decoding 526 OWL Classes and Axioms
[2m[36m(pid=31080)[0m Decoding 526 OWL Classes and Axioms
[2m[36m(pid=31123)[0m Decoding 526 OWL Classes and Axioms
[2m[36m(pid=31156)[0m Decoding 526 OWL Classes and Axioms
[2m[36m(pid=31183)[0m Decoding 526 OWL Classes and Axioms
[2m[36m(pid=31212)[0m Decoding 525 OWL Classes and Axioms
[2m[36m(pid=31237)[0m Decoding 525 OWL Classes and Axioms
[2m[36m(pid=31262)[0m Decoding 525 OWL Classes and Axioms
[2m[36m(pid=31287)[0m Decoding 525 OWL Classes and Axioms
[2m[36m(pid=31312)[0m Decoding 525 OWL Classes and Axioms
[2m[36m(pid=30995)[0m Filtering Triples
[2m[36m(pid=31123)[0m Filtering Triples
[2m[36m(pid=31045)[0m Filtering Triples
[2m[36m(pid=31080)[0m Filtering Triples
[2m[36m(pid=31237)[0m Filtering Triples
[2m[36m(

 76%|███████▋  | 13/17 [51:33<08:04, 121.06s/it]

[2m[36m(pid=31412)[0m Decoding 525 OWL Classes and Axioms
Removing owl:disjointWith Axioms
Filtering Triples



  0%|          | 0/244652 [00:00<?, ?it/s][A
  0%|          | 225/244652 [00:00<01:48, 2244.73it/s][A

[2m[36m(pid=31412)[0m Filtering Triples



  0%|          | 587/244652 [00:00<01:20, 3050.01it/s][A
  0%|          | 893/244652 [00:00<01:21, 3005.38it/s][A
  1%|          | 1268/244652 [00:00<01:13, 3294.01it/s][A
  1%|          | 1648/244652 [00:00<01:09, 3473.59it/s][A
  1%|          | 1996/244652 [00:00<01:10, 3436.48it/s][A
  1%|          | 2366/244652 [00:00<01:08, 3519.90it/s][A
  1%|          | 2741/244652 [00:00<01:07, 3591.72it/s][A
  1%|▏         | 3121/244652 [00:00<01:06, 3655.36it/s][A
  1%|▏         | 3494/244652 [00:01<01:05, 3676.76it/s][A
  2%|▏         | 3876/244652 [00:01<01:04, 3717.99it/s][A
  2%|▏         | 4248/244652 [00:01<01:08, 3520.73it/s][A
  2%|▏         | 4603/244652 [00:01<01:09, 3471.90it/s][A
  2%|▏         | 4972/244652 [00:01<01:07, 3533.75it/s][A
  2%|▏         | 5348/244652 [00:01<01:06, 3598.48it/s][A
  2%|▏         | 5723/244652 [00:01<01:05, 3639.02it/s][A
  2%|▏         | 6094/244652 [00:01<01:05, 3658.19it/s][A
  3%|▎         | 6473/244652 [00:01<01:04, 3695.57it/s][

Removing owl:disjointWith Axioms
Filtering Triples



  0%|          | 0/277124 [00:00<?, ?it/s][A
  0%|          | 178/277124 [00:00<02:35, 1779.93it/s][A
  0%|          | 407/277124 [00:00<02:13, 2079.71it/s][A
  0%|          | 753/277124 [00:00<01:42, 2707.21it/s][A
  0%|          | 1113/277124 [00:00<01:30, 3059.39it/s][A
  1%|          | 1468/277124 [00:00<01:25, 3227.73it/s][A
  1%|          | 1800/277124 [00:00<01:24, 3255.32it/s][A
  1%|          | 2162/277124 [00:00<01:21, 3373.76it/s][A
  1%|          | 2546/277124 [00:00<01:18, 3519.34it/s][A
  1%|          | 2908/277124 [00:00<01:17, 3548.56it/s][A
  1%|          | 3263/277124 [00:01<01:17, 3534.44it/s][A
  1%|▏         | 3617/277124 [00:01<01:18, 3490.44it/s][A
  1%|▏         | 3967/277124 [00:01<01:25, 3200.04it/s][A
  2%|▏         | 4330/277124 [00:01<01:22, 3314.24it/s][A
  2%|▏         | 4701/277124 [00:01<01:19, 3427.69it/s][A
  2%|▏         | 5058/277124 [00:01<01:18, 3468.00it/s][A
  2%|▏         | 5415/277124 [00:01<01:17, 3491.57it/s][A
  2%|▏      

[2m[36m(pid=31760)[0m Decoding 1661 OWL Classes and Axioms
[2m[36m(pid=31786)[0m Decoding 1661 OWL Classes and Axioms
[2m[36m(pid=31811)[0m Decoding 1661 OWL Classes and Axioms
[2m[36m(pid=31760)[0m Filtering Triples
[2m[36m(pid=31843)[0m Decoding 1661 OWL Classes and Axioms
[2m[36m(pid=31868)[0m Decoding 1661 OWL Classes and Axioms
[2m[36m(pid=31895)[0m Decoding 1661 OWL Classes and Axioms
[2m[36m(pid=31931)[0m Decoding 1661 OWL Classes and Axioms
[2m[36m(pid=31956)[0m Decoding 1661 OWL Classes and Axioms
[2m[36m(pid=31786)[0m Filtering Triples
[2m[36m(pid=31811)[0m Filtering Triples
[2m[36m(pid=31843)[0m Filtering Triples
[2m[36m(pid=31868)[0m Filtering Triples
[2m[36m(pid=31895)[0m Filtering Triples
[2m[36m(pid=31989)[0m Decoding 1661 OWL Classes and Axioms
[2m[36m(pid=32015)[0m Decoding 1661 OWL Classes and Axioms
[2m[36m(pid=32040)[0m Decoding 1661 OWL Classes and Axioms
[2m[36m(pid=32065)[0m Decoding 1661 OWL Classes and Axiom

 88%|████████▊ | 15/17 [54:33<03:33, 106.52s/it]

[2m[36m(pid=32167)[0m Decoding 1660 OWL Classes and Axioms
Removing owl:disjointWith Axioms
Filtering Triples



  0%|          | 0/389036 [00:00<?, ?it/s][A

[2m[36m(pid=32167)[0m Filtering Triples



  0%|          | 277/389036 [00:00<02:20, 2768.41it/s][A
  0%|          | 642/389036 [00:00<01:58, 3284.91it/s][A
  0%|          | 1207/389036 [00:00<01:28, 4363.22it/s][A
  0%|          | 1712/389036 [00:00<01:23, 4633.07it/s][A
  1%|          | 2273/389036 [00:00<01:17, 4983.21it/s][A
  1%|          | 2772/389036 [00:00<01:26, 4474.45it/s][A
  1%|          | 3351/389036 [00:00<01:19, 4872.89it/s][A
  1%|          | 3934/389036 [00:00<01:14, 5161.74it/s][A
  1%|          | 4518/389036 [00:00<01:11, 5363.21it/s][A
  1%|▏         | 5117/389036 [00:01<01:09, 5550.76it/s][A
  1%|▏         | 5698/389036 [00:01<01:08, 5627.47it/s][A
  2%|▏         | 6293/389036 [00:01<01:06, 5722.78it/s][A
  2%|▏         | 6891/389036 [00:01<01:05, 5799.09it/s][A
  2%|▏         | 7493/389036 [00:01<01:05, 5863.38it/s][A
  2%|▏         | 8097/389036 [00:01<01:04, 5914.40it/s][A
  2%|▏         | 8690/389036 [00:01<01:04, 5894.59it/s][A
  2%|▏         | 9284/389036 [00:01<01:04, 5906.99it/s][

[2m[36m(pid=32315)[0m Decoding 3971 OWL Classes and Axioms
[2m[36m(pid=32341)[0m Decoding 3971 OWL Classes and Axioms
[2m[36m(pid=32366)[0m Decoding 3971 OWL Classes and Axioms
[2m[36m(pid=32394)[0m Decoding 3971 OWL Classes and Axioms
[2m[36m(pid=32419)[0m Decoding 3971 OWL Classes and Axioms
[2m[36m(pid=32444)[0m Decoding 3971 OWL Classes and Axioms
[2m[36m(pid=32470)[0m Decoding 3971 OWL Classes and Axioms
[2m[36m(pid=32315)[0m Filtering Triples
[2m[36m(pid=32341)[0m Filtering Triples
[2m[36m(pid=32366)[0m Filtering Triples
[2m[36m(pid=32548)[0m Decoding 3971 OWL Classes and Axioms
[2m[36m(pid=32625)[0m Decoding 3971 OWL Classes and Axioms
[2m[36m(pid=32394)[0m Filtering Triples
[2m[36m(pid=32652)[0m Decoding 3971 OWL Classes and Axioms
[2m[36m(pid=32725)[0m Decoding 3971 OWL Classes and Axioms
[2m[36m(pid=32419)[0m Filtering Triples
[2m[36m(pid=32444)[0m Filtering Triples
[2m[36m(pid=32470)[0m Filtering Triples
[2m[36m(pid=396

 94%|█████████▍| 16/17 [56:23<01:47, 107.56s/it]

Removing owl:disjointWith Axioms
Filtering Triples



  0%|          | 0/254192 [00:00<?, ?it/s][A
  0%|          | 206/254192 [00:00<02:03, 2057.38it/s][A

[2m[36m(pid=565)[0m Filtering Triples



  0%|          | 573/254192 [00:00<01:24, 2989.25it/s][A
  0%|          | 872/254192 [00:00<01:27, 2897.49it/s][A
  0%|          | 1245/254192 [00:00<01:18, 3218.90it/s][A
  1%|          | 1568/254192 [00:00<01:24, 3001.66it/s][A
  1%|          | 1931/254192 [00:00<01:18, 3201.20it/s][A
  1%|          | 2283/254192 [00:00<01:16, 3300.65it/s][A
  1%|          | 2620/254192 [00:00<01:15, 3320.25it/s][A
  1%|          | 2986/254192 [00:00<01:13, 3422.25it/s][A
  1%|▏         | 3330/254192 [00:01<01:13, 3422.14it/s][A
  1%|▏         | 3687/254192 [00:01<01:12, 3462.01it/s][A
  2%|▏         | 4040/254192 [00:01<01:11, 3481.20it/s][A
  2%|▏         | 4399/254192 [00:01<01:11, 3513.66it/s][A
  2%|▏         | 4751/254192 [00:01<01:11, 3493.95it/s][A
  2%|▏         | 5101/254192 [00:01<01:12, 3434.43it/s][A
  2%|▏         | 5461/254192 [00:01<01:11, 3478.63it/s][A
  2%|▏         | 5812/254192 [00:01<01:11, 3487.10it/s][A
  2%|▏         | 6179/254192 [00:01<01:10, 3540.51it/s][

[2m[36m(pid=851)[0m Decoding 1592 OWL Classes and Axioms
[2m[36m(pid=877)[0m Decoding 1592 OWL Classes and Axioms
[2m[36m(pid=910)[0m Decoding 1592 OWL Classes and Axioms
[2m[36m(pid=935)[0m Decoding 1592 OWL Classes and Axioms
[2m[36m(pid=968)[0m Decoding 1592 OWL Classes and Axioms
[2m[36m(pid=986)[0m Decoding 1592 OWL Classes and Axioms
[2m[36m(pid=851)[0m Filtering Triples
[2m[36m(pid=877)[0m Filtering Triples
[2m[36m(pid=910)[0m Filtering Triples
[2m[36m(pid=1011)[0m Decoding 1592 OWL Classes and Axioms
[2m[36m(pid=1036)[0m Decoding 1592 OWL Classes and Axioms
[2m[36m(pid=1061)[0m Decoding 1592 OWL Classes and Axioms
[2m[36m(pid=1086)[0m Decoding 1592 OWL Classes and Axioms
[2m[36m(pid=1113)[0m Decoding 1592 OWL Classes and Axioms
[2m[36m(pid=935)[0m Filtering Triples
[2m[36m(pid=1036)[0m Filtering Triples
[2m[36m(pid=1061)[0m Filtering Triples
[2m[36m(pid=1086)[0m Filtering Triples
[2m[36m(pid=968)[0m Filtering Triples
[2m

100%|██████████| 17/17 [58:00<00:00, 204.72s/it]

[2m[36m(pid=1322)[0m Decoding 1592 OWL Classes and Axioms
[2m[36m(pid=1322)[0m Filtering Triples
Ensuring OWL-NETS Graph Contains a Single Connected Component
Obtaining node list



100%|██████████| 14485140/14485140 [00:09<00:00, 1565996.92it/s]


Identifying root nodes


100%|██████████| 757111/757111 [10:20<00:00, 1219.82it/s]


Updating graph connectivity
848 triples added to make connected
Serializing OWL-NETS Graph
Converting Knowledge Graph to MultiDiGraph


100%|██████████| 7243418/7243418 [11:51<00:00, 10185.71it/s]


Pickling MultiDiGraph
Generating Network Statistics
OWL-NETS Graph Stats: 757112 nodes, 7243418 edges, 408 self-loops, 5 most most common edges: http://www.w3.org/2000/01/rdf-schema#subClassOf:1174591, http://purl.obolibrary.org/obo/RO_0002436:1015391, http://purl.obolibrary.org/obo/RO_0001025:688971, http://purl.obolibrary.org/obo/RO_0001015:688786, http://purl.obolibrary.org/obo/RO_0002201:420566, http://purl.obolibrary.org/obo/RO_0002200:420566, average degree 9.567168397806402, 5 highest degree nodes: http://purl.obolibrary.org/obo/SO_0000673:190850, http://purl.obolibrary.org/obo/SO_0001483:121020, http://purl.obolibrary.org/obo/NCBITaxon_9606:116478, http://purl.obolibrary.org/obo/SO_0001217:105046, http://purl.obolibrary.org/obo/UBERON_0000473:43795, http://purl.obolibrary.org/obo/SO_0002113:29340, density: 1.2636414472655133e-05, 1 component(s): {0: 757112}
Purifying Graph Based on Construction Approach
Determining what triples need purification
Processing 1174591 http://www.w3

100%|██████████| 1174591/1174591 [09:02<00:00, 2163.93it/s] 


Serializing Instance-Purified OWL-NETS Graph
Converting Knowledge Graph to MultiDiGraph


100%|██████████| 10580216/10580216 [18:21<00:00, 9606.26it/s] 


Pickling MultiDiGraph
Generating Network Statistics
Instance-Purified OWL-NETS Graph Stats: 757112 nodes, 10580216 edges, 881 self-loops, 5 most most common edges: http://www.w3.org/1999/02/22-rdf-syntax-ns#type:4512245, http://purl.obolibrary.org/obo/RO_0002436:1015391, http://purl.obolibrary.org/obo/RO_0001025:688971, http://purl.obolibrary.org/obo/RO_0001015:688786, http://purl.obolibrary.org/obo/RO_0002201:420566, http://purl.obolibrary.org/obo/RO_0002200:420566, average degree 13.97443971301472, 5 highest degree nodes: http://purl.obolibrary.org/obo/SO_0001411:218105, http://purl.obolibrary.org/obo/SO_0000001:214744, http://purl.obolibrary.org/obo/SO_0000110:197351, http://purl.obolibrary.org/obo/SO_0000673:191183, http://purl.obolibrary.org/obo/SO_0000831:134520, http://purl.obolibrary.org/obo/SO_0001483:121032, density: 1.8457583779676587e-05, 1 component(s): {0: 757112}


OWL-NETS Graph Stats: 7243418 triples, 757112 nodes, 282 predicates, 0 classes, 0 individuals, 0 object pro

100%|██████████| 24252979/24252979 [31:17<00:00, 12914.29it/s] 


Writing Class Metadata


100%|██████████| 24252979/24252979 [00:30<00:00, 790532.53it/s]
100%|██████████| 8582929/8582929 [09:23<00:00, 15236.87it/s]



*** Processing OWL-NETS Graph ***
Mapping Node and Relation Identifiers to Integers


100%|██████████| 7243418/7243418 [07:10<00:00, 16840.71it/s]


Writing Class Metadata


100%|██████████| 7243418/7243418 [00:08<00:00, 875244.31it/s] 
100%|██████████| 757393/757393 [00:31<00:00, 23705.86it/s]



*** Processing Purified OWL-NETS Graph ***
Mapping Node and Relation Identifiers to Integers


100%|██████████| 10580216/10580216 [10:05<00:00, 17474.77it/s]


Writing Class Metadata


100%|██████████| 10580216/10580216 [00:11<00:00, 903873.85it/s] 
100%|██████████| 757392/757392 [00:31<00:00, 24201.38it/s]


Depduplicating File: ./resources/knowledge_graphs/PheKnowLator_v3.0.0_full_instance_inverseRelations_OWL_AnnotationsOnly.nt


100%|██████████| 6471130/6471130 [00:19<00:00, 323564.60it/s]


Depduplicating File: ./resources/knowledge_graphs/PheKnowLator_v3.0.0_full_instance_inverseRelations_OWL_LogicOnly.nt


100%|██████████| 24252979/24252979 [02:29<00:00, 162695.14it/s]


Merging Files: ./resources/knowledge_graphs/PheKnowLator_v3.0.0_full_instance_inverseRelations_OWL_AnnotationsOnly.nt and ./resources/knowledge_graphs/PheKnowLator_v3.0.0_full_instance_inverseRelations_OWL_LogicOnly.nt


Loading Full (Logic + Annotation) Graph

Deriving Stats

Full (Logic + Annotation) Graph Stats: 30724109 triples, 12030427 nodes, 369 predicates, 891813 classes, 6835571 individuals, 806 object props, 622 annotation props


In [None]:
###MODIFIED/ADDITIONS BELOW -- Sanya (09/16/2021)

In [None]:
## create dictionary from NodeLabels and serialize it

In [1]:
import os
import os.path
from tqdm import tqdm
import networkx as nx 
from rdflib import Graph, URIRef, BNode, Namespace, Literal  
from rdflib.namespace import RDF, OWL  

In [2]:
import pandas as pd

In [3]:
NX_GRAPH_PATH = '/home/sanya/PheKnowLatorv2/resources/knowledge_graphs/'
OUT_NX_GRAPH_NAME = 'PheKnowLator_v3.0.0_full_instance_inverseRelations_OWLNETS_NodeLabels.nt'

In [4]:
nodedf = pd.read_csv(NX_GRAPH_PATH+'PheKnowLator_v3.0.0_full_instance_inverseRelations_OWLNETS_NodeLabels.txt', sep='\t')
nodedf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 752949 entries, 0 to 752948
Data columns (total 6 columns):
 #   Column                  Non-Null Count   Dtype 
---  ------                  --------------   ----- 
 0   entity_type             752838 non-null  object
 1   integer_id              752917 non-null  object
 2   entity_uri              752868 non-null  object
 3   label                   752767 non-null  object
 4   description/definition  752761 non-null  object
 5   synonym                 752711 non-null  object
dtypes: object(6)
memory usage: 34.5+ MB


In [9]:
nodedf2 = pd.read_csv('/home/sanya/kg_test/pl-build_tc/PheKnowLator_v3.0.0_full_instance_inverseRelations_OWLNETS_NodeLabels.txt', sep='\t')
nodedf2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 745701 entries, 0 to 745700
Data columns (total 6 columns):
 #   Column                  Non-Null Count   Dtype 
---  ------                  --------------   ----- 
 0   entity_type             691490 non-null  object
 1   integer_id              745649 non-null  object
 2   entity_uri              745598 non-null  object
 3   label                   691394 non-null  object
 4   description/definition  691394 non-null  object
 5   synonym                 691341 non-null  object
dtypes: object(6)
memory usage: 34.1+ MB


In [7]:
nodedf = nodedf.dropna(subset = ['entity_uri'])
nodedf.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 745598 entries, 0 to 745700
Data columns (total 6 columns):
 #   Column                  Non-Null Count   Dtype 
---  ------                  --------------   ----- 
 0   entity_type             691394 non-null  object
 1   integer_id              745598 non-null  object
 2   entity_uri              745598 non-null  object
 3   label                   691394 non-null  object
 4   description/definition  691394 non-null  object
 5   synonym                 691341 non-null  object
dtypes: object(6)
memory usage: 39.8+ MB


In [None]:
nodedf.loc[]

In [84]:
nodedf = nodedf.reset_index(drop=True)

In [87]:
#create rdflib graph from dataframe triples and serialize as ntriples file
nodegraph  = Graph()
pred_label = URIRef("http://www.w3.org/2000/01/rdf-schema#label")
for i in range(len(nodedf.index)):
    entity_uri = nodedf.at[i, 'entity_uri']
    if isinstance(entity_uri, float):
        print(entity_uri)
    entity_uri = entity_uri.replace('<', '')
    entity_uri = entity_uri.replace('>', '')
    label = nodedf.at[i, 'label']
    entity_node = URIRef(entity_uri)
    nodegraph.add((entity_node, pred_label, Literal(label)))
len(nodegraph)

752863

In [88]:
nodegraph.serialize(NX_GRAPH_PATH+OUT_NX_GRAPH_NAME, format='nt')

In [10]:
#checking for foreign characters
nodedf2.loc[nodedf2['entity_uri'] == '<http://purl.obolibrary.org/obo/SO_0000704>']

Unnamed: 0,entity_type,integer_id,entity_uri,label,description/definition,synonym
472103,NODES,4024,<http://purl.obolibrary.org/obo/SO_0000704>,gene,A region (or regions) that includes all of the...,INSDC_feature:gene|INSDC_feature:gene


In [6]:
nodedf.loc[nodedf['entity_uri'] == '<http://purl.obolibrary.org/obo/GO_0002682>']

Unnamed: 0,entity_type,integer_id,entity_uri,label,description/definition,synonym
686102,NODES,10467,<http://purl.obolibrary.org/obo/GO_0002682>,免疫系统过程调控,"Any process that modulates the frequency, rate...",


In [14]:
nodedf2.at[1, 'label']

'pantothenic acid metabolic pathway'

In [7]:
import re
import numpy as np
nodes = []

for i in range(len(nodedf.index)):
    label = nodedf.at[i, 'label']
    if isinstance(label, str):
        uri = nodedf.at[i, 'entity_uri']
        if re.search("[\u4e00-\u9FFF]", label):
            nodes.append(uri)

len(nodes)

1974

In [21]:
nodes[10:20]

['<http://purl.obolibrary.org/obo/CLO_0053675>',
 '<http://purl.obolibrary.org/obo/CLO_0051777>',
 '<http://purl.obolibrary.org/obo/CLO_0054281>',
 '<http://purl.obolibrary.org/obo/GO_0051179>',
 '<http://purl.obolibrary.org/obo/CLO_0001952>',
 '<http://purl.obolibrary.org/obo/CLO_0052720>',
 '<http://purl.obolibrary.org/obo/CLO_0051921>',
 '<http://purl.obolibrary.org/obo/CLO_0054124>',
 '<http://purl.obolibrary.org/obo/CLO_0053444>',
 '<http://purl.obolibrary.org/obo/CLO_0051848>']

In [8]:
nodedf.loc[nodedf['entity_uri'] == '<http://purl.obolibrary.org/obo/SO_0000704>']

Unnamed: 0,entity_type,integer_id,entity_uri,label,description/definition,synonym
226058,NODES,4394,<http://purl.obolibrary.org/obo/SO_0000704>,基因,A region (or regions) that includes all of the...,INSDC_feature:gene|INSDC_feature:gene


In [25]:
nodedf.loc[nodedf2['entity_uri'] == nodes[0]]

Unnamed: 0,entity_type,integer_id,entity_uri,label,description/definition,synonym
1077,NODES,742600,<http://purl.obolibrary.org/obo/CLO_0052035>,NICR295 细胞,,


<br>

***
***

```
@misc{callahan_tj_2019_3401437,
  author       = {Callahan, TJ},
  title        = {PheKnowLator},
  month        = mar,
  year         = 2019,
  doi          = {10.5281/zenodo.3401437},
  url          = {https://doi.org/10.5281/zenodo.3401437}
}
```