# Querying an OWL ontology using SPARQL commands

We will use owlready2 to access and query our ontology. Specifically, we will query for treatments matching variants in cBioPortal.

<font color="red">

We implement this functionality into functions in `query_ontology.py`, taking into account modularity and task decomposition. This notebook format is included just for demonstration purposes.

</font>

### Import Modules

In [1]:
import os
import re

import numpy as np
import pandas as pd
from owlready2 import *

from sample_patients import sample_patient_records

### Load Ontology

**Note:** Update path to local ontology matching your local file structure.

In [2]:
local_ontology = "github/bmi-210-final-project/ontology/oncokb.owl"
onto = get_ontology(local_ontology).load()

### Load cBioPortal Mutation Data

**Note:** Update path to cBioPortal mutation data to match your local file structure.

In [3]:
# Sample mutation data for a subset of patients.
csv_path = "mutations.csv"
out_path = "mutations_1000.csv"
n_patients = 1000
sample_patient_records(csv_path, out_path)

Done


In [4]:
# Load subset of patient data
cBioPortal_mutations = pd.read_csv(out_path)
cBioPortal_mutations = cBioPortal_mutations.loc[:, ["patientId", "proteinChange", "entrezGeneId"]]
cBioPortal_mutations.head()

Unnamed: 0,patientId,proteinChange,entrezGeneId
0,Patient0001,G12C,3845
1,Patient0001,R216*,324
2,Patient0001,R505C,55294
3,Patient0001,E1286*,324
4,Patient0001,R4822H,58508


### Load Gene List

In [5]:
# Load genes matching Entrez Gene ID
gene_list = pd.read_csv("CancerGeneList.tsv", sep="\t", usecols=[0, 1])
cBioPortal_mutations = cBioPortal_mutations.merge(gene_list, left_on="entrezGeneId", right_on="Entrez_Id", how="left").drop("Entrez_Id", axis=1)
cBioPortal_mutations.head()

Unnamed: 0,patientId,proteinChange,entrezGeneId,Gene_Symbol
0,Patient0001,G12C,3845,KRAS
1,Patient0001,R216*,324,APC
2,Patient0001,R505C,55294,FBXW7
3,Patient0001,E1286*,324,APC
4,Patient0001,R4822H,58508,KMT2C


### Generate SPARQL Query

In [6]:
example_gene = "KRAS"
example_variant = "G12C"
evidence_level = 4

example_regimen = list(
    default_world.sparql(
        f"""
        SELECT distinct ?regimen
        {{
            ?biomarker rdfs:subClassOf oncokb:Biomarker.
            ?biomarker rdfs:subClassOf ?r1.
            ?r1 owl:onProperty oncokb:hasGene.
            ?r1 owl:someValuesFrom oncokb:{re.escape(example_gene)}.

            ?biomarker rdfs:subClassOf ?R2.
            ?R2 owl:onProperty oncokb:hasVariant.
            ?R2 owl:someValuesFrom oncokb:{re.escape(example_variant)}.
            
            ?regimen rdfs:subClassOf oncokb:TherapyRegimen.
            ?regimen rdfs:subClassOf ?restriction2.
            ?restriction2 owl:onProperty oncokb:hasEvidenceLevel{evidence_level}.
            ?restriction2 owl:someValuesFrom ?biomarker.
        }}
        """
    )
)

example_regimen = [regimen[0] for regimen in example_regimen]
print(example_regimen)

[oncokb.Trametinib, oncokb.Binimetinib, oncokb.Cobimetinib]


### Custom Command Line Tool

Query for and save the therapy regimen associated with each mutated gene in a specific patient. We will use "Patient1035" – the 1000th patient in our cBioPortal mutation output file – as an example.

**Usage of command line tool:** `python query_therapy_regimen.py <ONTOLOGY_PATH> <MUTATION_TABLE_PATH> <GENE_LIST_PATH> <PATIENT_NAME> <REGIMEN_SAVE_PATH>`

In [7]:
!python query_therapy_regimen.py "github/bmi-210-final-project/ontology/oncokb.owl" "mutations.csv" "CancerGeneList.tsv" "Patient1035" "example_regimen_list.csv"

Patient 'Patient1035' found!
