


# OntoGPT Examples

### Install and import needed libraries

In [5]:
# %pip install ontogpt
# %pip install llm_gpt4all

import ontogpt
import os

### Set API keys

You need an OpenAI API key and a bioportal API key to run the code in this notebook. 

In [6]:
oai_key = os.getenv('OPENAI_API_KEY')
bioportal_key = os.getenv('BIOPORTAL_API_KEY')

!runoak set-apikey -e openai oai_key
!runoak set-apikey -e bioportal bioportal_key


Make sure OntoGPT is installed propertly by running the help command. 

In [7]:
!ontogpt --help

Usage: ontogpt [OPTIONS] COMMAND [ARGS]...

  CLI for ontogpt.

  :param verbose: Verbosity while running. :param quiet: Boolean to be quiet
  or verbose.

Options:
  -v, --verbose
  -q, --quiet TEXT
  --cache-db TEXT        Path to sqlite database to cache prompt-completion
                         results
  --skip-annotator TEXT  Skip one or more annotators (e.g. --skip-annotator
                         gilda)
  --version              Show the version and exit.
  --help                 Show this message and exit.

Commands:
  answer                        Answer a set of questions defined in YAML.
  categorize-mappings           Categorize a collection of SSSOM mappings.
  clinical-notes                Create mock clinical notes.
  complete                      Prompt completion.
  convert                       Convert output format.
  convert-examples              Convert training examples from YAML.
  diagnose                      Diagnose a clinical case represented as...
  dump-

### Run OntoGPT against a predefined template

First, create a text file as input. Save the file to the current directory. 

In [8]:
content = 'One treatment for asthma is Albuterol (also known as salbutamol).'

with open('drug_info.txt', 'w') as file:
    file.write(content)

Now, run the input file against an existing template, 'drug'. Print the results. 

In [9]:
!ontogpt extract -t drug -i drug_info.txt

---
input_text: One treatment for asthma is Albuterol (also known as salbutamol).
raw_completion_output: |-
  disease: asthma
  drug: Albuterol (also known as salbutamol)
  mechanism_links: asthma; is treated by; Albuterol (also known as salbutamol)
prompt: |+
  Split the following piece of text into fields in the following format:

  subject: <the value for subject>
  predicate: <the value for predicate>
  object: <the value for object>


  Text:
  Albuterol (also known as salbutamol)

  ===

extracted_object:
  disease: MESH:D001249
  drug: drugbank:DB01001
  mechanism_links:
    - subject: MESH:D001249
      predicate: AUTO:%28not%20provided%29
      object: AUTO:%28not%20provided%29
    - predicate: AUTO:is%20treated%20by
    - subject: MESH:D000420
      predicate: AUTO:also%20known%20as
      object: MESH:D000420
named_entities:
  - id: MESH:D001249
    label: asthma
  - id: drugbank:DB01001
    label: Albuterol (also known as salbutamol)
  - id: AUTO:%28not%20provided%29
    lab

### Run OntoGPT against a custom schema

In the last example, OntoGPT returned data about the drug Albuterol according to the structure and annotations set forth in a predefined schema. 

You can also define your own schemas. Say you have text snippets that describe treatments for common conditions. You want to locate the RXNORM code(s) for each treatment. 

First, you need to define a custom schema. You can learn more about the format required here: https://monarch-initiative.github.io/ontogpt/custom/

The simple schema in the next cell defines a root class, TreatmentData, which has one property, Treatment. Members of the Treatment class are annotated with RXNORM codes using the RXNORM prefix. 

In [10]:
rxnorm_schema = """id: http://w3id.org/ontogpt/rxnorm_schema
name: rxnorm_schema
title: rxnorm Information Schema
description: >-
  A schema for extracting information related to common medical procedures via rxnorm.

license: https://creativecommons.org/publicdomain/zero/1.0/
prefixes:
  linkml: https://w3id.org/linkml/
  rxnorm: http://w3id.org/ontogpt/rxnorm_schema/
  SNOMED: http://snomed.info/sct
  MESH: http://mesh.com

default_prefix: rxnorm
default_range: string

imports:
  - linkml:types
  - core

classes:
  TreatmentData:
    tree_root: true
    attributes:
      treatments:
        description: semicolon-separated list of treatments for common conditions
        multivalued: true
        range: Treatment


  Treatment:
    is_a: NamedEntity
    annotations:
      annotators: bioportal:RXNORM
    id_prefixes:
      - RXNORM
"""

First you need to save the template to the /templates directory where OntoGPT is installed.

In [11]:
# path to OntoGPT
ontogpt_dir = os.path.dirname(ontogpt.__file__)
# path to the templates directory
templates_dir = os.path.join(ontogpt_dir, 'templates')

# Give your schema a filename
filename = 'new_rxnorm_template.yaml'
new_template_path = os.path.join(templates_dir, filename)

# Write the content to a file 
with open(new_template_path, 'w') as file:
    file.write(rxnorm_schema)

# If you want to verify that the template was saved to the template dir, uncomment the next two lines. 
# files = os.listdir(templates_dir)
# print(files)

Then you need to generate a Pydantic version of your schema, converting .yaml to .py. 

In [12]:
# Navigate to the templates dir 
%cd $templates_dir

# Optionally, uncomment to check that you're in the templates dir and your yaml file has data in it
# %pwd
# %cat new_rxnorm_template.yaml

# Generate Pydantic version
!gen-pydantic --pydantic-version 2 new_rxnorm_template.yaml > new_rxnorm_template.py

/Users/releach/miniconda3/lib/python3.11/site-packages/ontogpt/templates


Finally, you're ready to use the new template. You can use it with the drug_info.txt file we defined earlier: 

In [13]:
!ontogpt extract -t new_rxnorm_template -i drug_info.txt

---
input_text: One treatment for asthma is Albuterol (also known as salbutamol).
raw_completion_output: 'treatments: Albuterol; salbutamol'
prompt: |+
  From the text below, extract the following entities in the following format:

  treatments: <semicolon-separated list of treatments for common conditions>


  Text:
  One treatment for asthma is Albuterol (also known as salbutamol).

  ===

extracted_object:
  treatments:
    - RXNORM:435
    - AUTO:salbutamol
named_entities:
  - id: RXNORM:435
    label: Albuterol
  - id: AUTO:salbutamol
    label: salbutamol


Or you can experiment with different inputs. 

In [14]:
content = 'Treatments for depression include Wellbutrin, Zoloft, Lexapro, and Celexa.'

with open('depression_treatments.txt', 'w') as file:
    file.write(content)

!ontogpt extract -t new_rxnorm_template -i depression_treatments.txt 

---
input_text: Treatments for depression include Wellbutrin, Zoloft, Lexapro, and Celexa.
raw_completion_output: 'treatments: Wellbutrin; Zoloft; Lexapro; Celexa'
prompt: |+
  From the text below, extract the following entities in the following format:

  treatments: <semicolon-separated list of treatments for common conditions>


  Text:
  Treatments for depression include Wellbutrin, Zoloft, Lexapro, and Celexa.

  ===

extracted_object:
  treatments:
    - RXNORM:42568
    - RXNORM:82728
    - RXNORM:352741
    - RXNORM:215928
named_entities:
  - id: RXNORM:42568
    label: Wellbutrin
  - id: RXNORM:82728
    label: Zoloft
  - id: RXNORM:352741
    label: Lexapro
  - id: RXNORM:215928
    label: Celexa


## Create a custom schema with multiple classes 

Here's another example of a custom schema with multiple classes. This schema roughly mirrors the elements in the Healthwise Condition Basics content set, though several key ontologies are not available so the exeperiment is limited. 

In [17]:
condbasics = """
id: http://w3id.org/ontogpt/condition_basics
name: condition_basics_schema
title: Condition Basics Information Schema
description: >-
  A schema for extracting information related to medical conditions, symptoms, and treatments. 

license: https://creativecommons.org/publicdomain/zero/1.0/
prefixes:
  linkml: https://w3id.org/linkml/
  SNOMED: http://snomed.info/sct
  MESH: http://mesh.com
  RXNORM: http://purl.bioontology.org/ontology/RXNORM/
  ICD10CM: https://bioportal.bioontology.org/ontologies/ICD10CM

default_prefix: cbs
default_range: string

imports:
  - linkml:types
  - core

classes:
  ConditionBasicsData:
    tree_root: true
    attributes:
      medications:
        description: semicolon-separated list of common medications for the medical condition discussed in the text.  
        multivalued: true
        range:  Medication
      symptoms: 
        description: semicolon-separated list of symptoms commonly associated with the medical condition discussed in the text. 
        multivalued: true
        range: Symptom
      causes:
        description: semicolon-separated list of the causes of the medical conditions dicussed in the text. 
        multivalued: true
        range: Cause
      


  Medication:
    is_a: NamedEntity
    annotations:
      annotators: bioportal:RXNORM
    id_prefixes:
      - RXNORM


  Symptom:
    is_a: NamedEntity
    annotations:
      annotators: bioportal:SNOMEDCT
    id_prefixes:
      - SNOMEDCT


  Cause:
    is_a: NamedEntity
    annotations:
      annotators: bioportal:SNOMEDCT
    id_prefixes:
      - SNOMEDCT
"""


# Assign a filename
filename = 'condbasics.yaml'
new_template_path = os.path.join(templates_dir, filename)

# Write the content to a file 
with open(new_template_path, 'w') as file:
    file.write(condbasics)

%cd $templates_dir
# %pwd
# %cat new_rxnorm_template.yaml

# Generate Pydantic version
!gen-pydantic --pydantic-version 2 condbasics.yaml > condbasics.py

/Users/releach/miniconda3/lib/python3.11/site-packages/ontogpt/templates


In [19]:
content = """
What is cirrhosis?
Cirrhosis is a very serious condition in which scarring damages the liver. This scar tissue prevents the liver from working as it should. That can cause problems with blood clotting, which can lead to bleeding and bruising. Cirrhosis can also cause fluid buildup in the belly, jaundice, and severe bleeding in the digestive tract.
What causes it?
Cirrhosis can have many causes. Long-term, heavy use of alcohol can cause cirrhosis. So can chronic viral hepatitis. Other causes include autoimmune diseases, non-alcoholic steatohepatitis (NASH), blocked bile ducts in the liver, and certain diseases.
What are the symptoms?
You may not have symptoms in the early stages of cirrhosis. But as it gets worse, symptoms may include fatigue, yellowing of the skin (jaundice), small red spots and tiny lines on the skin, bruising easily, weight loss, itching, belly pain, and bleeding in the digestive tract.
How is cirrhosis treated?
Treatment may include medicines such as Diuretics, surgery, or lifestyle changes. Treatment can't cure cirrhosis. But it can sometimes prevent or delay more liver damage. To limit the damage to your liver and help control symptoms, you can make lifestyle changes. For example, don't drink alcohol. Limit sodium and fat. And avoid medicines that can harm your liver.
"""

with open('cirrhosis.txt', 'w') as file:
    file.write(content)

!ontogpt extract -t condbasics -i cirrhosis.txt 

---
input_text: |2

  What is cirrhosis?
  Cirrhosis is a very serious condition in which scarring damages the liver. This scar tissue prevents the liver from working as it should. That can cause problems with blood clotting, which can lead to bleeding and bruising. Cirrhosis can also cause fluid buildup in the belly, jaundice, and severe bleeding in the digestive tract.
  What causes it?
  Cirrhosis can have many causes. Long-term, heavy use of alcohol can cause cirrhosis. So can chronic viral hepatitis. Other causes include autoimmune diseases, non-alcoholic steatohepatitis (NASH), blocked bile ducts in the liver, and certain diseases.
  What are the symptoms?
  You may not have symptoms in the early stages of cirrhosis. But as it gets worse, symptoms may include fatigue, yellowing of the skin (jaundice), small red spots and tiny lines on the skin, bruising easily, weight loss, itching, belly pain, and bleeding in the digestive tract.
  How is cirrhosis treated?
  Treatment may inclu

TK: Grouding against a local model?