# Setting up a test subgraph and TRAPI endpoint for KG-Bioportal

## Prepare subgraph for ontologies relevant to clinical data

Load the set of ontologies to work with. This is defined in `clinical_ontologies.yaml`.

In [1]:
import yaml

In [2]:
test_ontologies = []
with open ('clinical_ontologies.yaml', 'r') as infile:
    ontologies_dict = yaml.safe_load(infile)

for ontology in ontologies_dict['ontologies']:
    try:
        if ontology['test_set']:
            test_ontologies.append(ontology['name'])
    except KeyError:
        continue

test_ontologies

['RADLEX', 'ATC']

Now build a graph of these alone. This assumes that the transformed Bioportal graphs are in `../transformed/ontologies/` by default, 

In [3]:
test_ontologies_str = ",".join(test_ontologies)

In [4]:
%cd ../
!python run.py catmerge --include_only {test_ontologies_str}

/home/harry/kg-bioportal
Validating RADLEX...
Validating ATC...
Merging KG files...
  name: merged-kg 
  source: None 
  nodes: ['../transformed/ontologies/RADLEX/RADLEX_41_nodes.tsv', '../transformed/ontologies/ATC/ATC_17_nodes.tsv']
  edges: ['blank_header.tsv', '../transformed/ontologies/RADLEX/RADLEX_41_edges.tsv', '../transformed/ontologies/ATC/ATC_17_edges.tsv'] 
  mappings: None
  output_dir: data/merged

Reading node and edge files
Merging...
Generating QC report
['merged-kg_nodes.tsv', 'merged-kg_edges.tsv']
Reading merged graph to process duplicates...
Node count before removing complete duplicates: 53387
Node count after removing complete duplicates: 46813
  uniq_df = nodes_df.groupby('id').agg(lambda x: '|'.join(set(x)))
Node count after merging duplicate nodes: 46813
Complete.


See how the result looks.

In [5]:
!tar -xvzf data/merged/merged-kg.tar.gz

merged-kg_nodes.tsv
merged-kg_edges.tsv


In [6]:
!head merged-kg_edges.tsv

id	object	subject	predicate	category	provided_by	relation	primary_knowledge_source	aggregator_knowledge_source
urn:uuid:4e9885ae-7f97-494f-9aa6-0922c935743e	http://radlex.org/RID/RID15494	http://radlex.org/RID/RID15495	biolink:subclass_of		RADLEX_41_edges	rdfs:subClassOf	Radiology Lexicon - submission 41	BioPortal 2022-07-20
urn:uuid:f1784da5-75f5-47c9-b91b-8478123f522b	http://radlex.org/RID/RID40450	http://radlex.org/RID/RID40451	biolink:subclass_of		RADLEX_41_edges	rdfs:subClassOf	Radiology Lexicon - submission 41	BioPortal 2022-07-20
urn:uuid:24c0825e-c8e0-4492-b67f-5f8e5190c8e4	http://radlex.org/RID/RID9922	http://radlex.org/RID/RID39138	biolink:related_to		RADLEX_41_edges	http://radlex.org/RID/Anatomical_Site	Radiology Lexicon - submission 41	BioPortal 2022-07-20
urn:uuid:ee358cd9-2446-4901-a3f1-14fb88656e53	http://radlex.org/RID/RID22874	http://radlex.org/RID/RID44044	biolink:related_to		RADLEX_41_edges	http://radlex.org/RID/Has_Regional_Part	Radiology Lexicon - submission 41	Bio

In [7]:
!head merged-kg_nodes.tsv

id	provided_by
http://bioportal.bioontology.org/ontologies/umls/hasSTY	ATC_17_nodes
http://purl.bioontology.org/ontology/STY/T099	ATC_17_nodes
http://purl.bioontology.org/ontology/UATC/A09AA01	ATC_17_nodes
http://purl.bioontology.org/ontology/UATC/A11EA	ATC_17_nodes
http://purl.bioontology.org/ontology/UATC/ATC_LEVEL	ATC_17_nodes
http://purl.bioontology.org/ontology/UATC/C03XA	ATC_17_nodes
http://purl.bioontology.org/ontology/UATC/C10AX10	ATC_17_nodes
http://purl.bioontology.org/ontology/UATC/D03BA03	ATC_17_nodes
http://purl.bioontology.org/ontology/UATC/D05AD02	ATC_17_nodes


These didn't really get CURIE-d properly (an issue with the transform not using the expected prefix set) but let's go ahead anyway.

## Set up and run Plater on its own

## Set up and run Automat