# OWL Ontology Exploration

This notebook explores OWL ontologies, with a focus on taxonomic ontologies. We'll start by setting up the necessary libraries and then explore various taxonomic ontologies including TaxMeOn and alternatives.

TaxMeOn: https://www.ldf.fi/schema/taxmeon/index.html


In [None]:
# Install required packages if not already installed
import subprocess
import sys

# def install_package(package):
#     try:
#         __import__(package)
#     except ImportError:
#         print(f"Installing {package}...")
#         subprocess.check_call([sys.executable, "-m", "pip", "install", package])

# # Install ontology-related packages
# packages = [
#     "owlready2",  # For working with OWL ontologies
#     "rdflib",     # For RDF/OWL processing
#     "requests",   # For downloading files
#     "lxml",       # For XML parsing
#     "matplotlib", # For visualization
#     "networkx",   # For graph analysis
#     "pandas"      # For data manipulation
# ]

# for package in packages:
#     install_package(package)


In [None]:
# Import libraries
import requests
import os
from pathlib import Path
import owlready2
from owlready2 import *
import rdflib
from rdflib import Graph, Namespace, URIRef, Literal
import pandas as pd
import matplotlib.pyplot as plt
import networkx as nx
from urllib.parse import urlparse

print("Libraries imported successfully!")
# print(f"OWLReady2 version: {owlready2.__spec__}")
print(f"RDFLib version: {rdflib.__version__}")


Libraries imported successfully!
OWLReady2 version: ModuleSpec(name='owlready2', loader=<_frozen_importlib_external.SourceFileLoader object at 0x105a4d730>, origin='/Users/ken/Documents/wk/master-birder-paper/.venv/lib/python3.13/site-packages/owlready2/__init__.py', submodule_search_locations=['/Users/ken/Documents/wk/master-birder-paper/.venv/lib/python3.13/site-packages/owlready2'])
RDFLib version: 7.2.1


## Searching for TaxMeOn OWL File

Let's search for TaxMeOn and explore various sources where taxonomic ontologies might be available.


In [None]:
# Define potential sources for taxonomic ontologies
ontology_sources = {
    "Bioportal": "https://bioportal.bioontology.org/",
    "OBO Foundry": "https://obofoundry.org/",
    "NCBO": "https://www.bioontology.org/",
    "GitHub": "https://github.com/",
    "TaxMeOn Official": "Unknown - needs investigation"
}

print("Potential sources for taxonomic ontologies:")
for source, url in ontology_sources.items():
    print(f"- {source}: {url}")

# Let's search for TaxMeOn in common ontology repositories
taxmeon_search_urls = [
    "https://bioportal.bioontology.org/search?q=TaxMeOn",
    "https://obofoundry.org/ontology/taxmeon",
    "https://github.com/search?q=TaxMeOn+ontology"
]

print("\nTaxMeOn search URLs:")
for url in taxmeon_search_urls:
    print(f"- {url}")


In [None]:
# Let's try to search for TaxMeOn using web requests
def search_bioportal(query):
    """Search BioPortal for ontologies"""
    try:
        # BioPortal API endpoint
        url = f"https://data.bioontology.org/search"
        params = {
            'q': query,
            'apikey': '',  # You might need an API key for full access
            'format': 'json'
        }
        response = requests.get(url, params=params, timeout=10)
        if response.status_code == 200:
            return response.json()
        else:
            print(f"BioPortal search failed with status code: {response.status_code}")
            return None
    except Exception as e:
        print(f"Error searching BioPortal: {e}")
        return None

# Search for TaxMeOn
print("Searching BioPortal for TaxMeOn...")
taxmeon_results = search_bioportal("TaxMeOn")
if taxmeon_results:
    print("BioPortal search results:")
    print(taxmeon_results)
else:
    print("No results from BioPortal or search failed")


## Alternative Taxonomic Ontologies

Since TaxMeOn might not be readily available, let's explore some well-known taxonomic ontologies that are publicly accessible:


In [None]:
# Well-known taxonomic ontologies that are publicly available
taxonomic_ontologies = {
    "NCBITaxon": {
        "name": "NCBI Taxonomy",
        "url": "https://bioportal.bioontology.org/ontologies/NCBITAXON",
        "description": "NCBI's taxonomic classification system",
        "owl_url": "http://purl.obolibrary.org/obo/ncbitaxon.owl"
    },
    "ITIS": {
        "name": "Integrated Taxonomic Information System",
        "url": "https://www.itis.gov/",
        "description": "Authoritative taxonomic information on plants, animals, fungi, and microbes",
        "owl_url": None  # May need to be converted
    },
    "GBIF": {
        "name": "Global Biodiversity Information Facility",
        "url": "https://www.gbif.org/",
        "description": "Global biodiversity data infrastructure",
        "owl_url": None  # May need to be converted
    },
    "Catalogue of Life": {
        "name": "Catalogue of Life",
        "url": "https://www.catalogueoflife.org/",
        "description": "Global species database",
        "owl_url": None  # May need to be converted
    },
    "TaxonConcept": {
        "name": "TaxonConcept Ontology",
        "url": "https://bioportal.bioontology.org/ontologies/TAXONCONCEPT",
        "description": "Ontology for taxonomic concepts",
        "owl_url": "http://purl.obolibrary.org/obo/taxonconcept.owl"
    }
}

print("Available taxonomic ontologies:")
for key, ontology in taxonomic_ontologies.items():
    print(f"\n{ontology['name']}:")
    print(f"  Description: {ontology['description']}")
    print(f"  URL: {ontology['url']}")
    if ontology['owl_url']:
        print(f"  OWL URL: {ontology['owl_url']}")
    else:
        print(f"  OWL URL: Not directly available (may need conversion)")


## Downloading and Loading OWL Ontologies

Let's create functions to download and load OWL ontologies, starting with NCBI Taxonomy as an example.


In [None]:
# Create a directory for storing downloaded ontologies
ontology_dir = Path("../data/ontologies")
ontology_dir.mkdir(exist_ok=True)

def download_ontology(url, filename):
    """Download an ontology file from a URL"""
    try:
        print(f"Downloading {filename} from {url}...")
        response = requests.get(url, timeout=30)
        response.raise_for_status()
        
        filepath = ontology_dir / filename
        with open(filepath, 'wb') as f:
            f.write(response.content)
        
        print(f"Downloaded {filename} ({len(response.content)} bytes)")
        return filepath
    except Exception as e:
        print(f"Error downloading {filename}: {e}")
        return None

def load_ontology_with_owlready2(filepath):
    """Load an ontology using OWLReady2"""
    try:
        print(f"Loading ontology with OWLReady2: {filepath}")
        onto = get_ontology(f"file://{filepath.absolute()}").load()
        print(f"Successfully loaded ontology: {onto.name}")
        return onto
    except Exception as e:
        print(f"Error loading ontology with OWLReady2: {e}")
        return None

def load_ontology_with_rdflib(filepath):
    """Load an ontology using RDFLib"""
    try:
        print(f"Loading ontology with RDFLib: {filepath}")
        g = Graph()
        g.parse(filepath)
        print(f"Successfully loaded ontology with RDFLib ({len(g)} triples)")
        return g
    except Exception as e:
        print(f"Error loading ontology with RDFLib: {e}")
        return None

print(f"Ontology directory: {ontology_dir.absolute()}")


In [None]:
# Let's try to download and load NCBI Taxonomy as an example
# Note: This is a large ontology, so we'll start with a smaller one for testing

# First, let's try a smaller taxonomic ontology for testing
test_ontology_url = "http://purl.obolibrary.org/obo/taxonconcept.owl"
test_filename = "taxonconcept.owl"

# Download the test ontology
filepath = download_ontology(test_ontology_url, test_filename)

if filepath and filepath.exists():
    print(f"\nOntology file downloaded successfully: {filepath}")
    print(f"File size: {filepath.stat().st_size} bytes")
    
    # Try loading with RDFLib first (usually more reliable for large files)
    rdf_graph = load_ontology_with_rdflib(filepath)
    
    if rdf_graph:
        print(f"\nRDFLib Graph loaded successfully!")
        print(f"Number of triples: {len(rdf_graph)}")
        
        # Show some basic statistics
        subjects = set()
        predicates = set()
        objects = set()
        
        for s, p, o in rdf_graph:
            subjects.add(s)
            predicates.add(p)
            objects.add(o)
        
        print(f"Number of unique subjects: {len(subjects)}")
        print(f"Number of unique predicates: {len(predicates)}")
        print(f"Number of unique objects: {len(objects)}")
        
        # Show some example triples
        print(f"\nFirst 5 triples:")
        for i, (s, p, o) in enumerate(rdf_graph):
            if i >= 5:
                break
            print(f"  {s} -> {p} -> {o}")
else:
    print("Failed to download the test ontology")


## Searching for TaxMeOn in Academic Literature

Let's search for TaxMeOn in academic papers and repositories to find more information about its availability.


In [None]:
# Search for TaxMeOn in academic literature and repositories
def search_academic_sources():
    """Search for TaxMeOn in various academic sources"""
    
    search_queries = [
        "TaxMeOn ontology taxonomic",
        "TaxMeOn OWL file download",
        "TaxMeOn taxonomic metadata ontology",
        "TaxMeOn life sciences ontology"
    ]
    
    academic_sources = {
        "Google Scholar": "https://scholar.google.com/scholar?q=",
        "PubMed": "https://pubmed.ncbi.nlm.nih.gov/?term=",
        "arXiv": "https://arxiv.org/search/?query=",
        "ResearchGate": "https://www.researchgate.net/search?q=",
        "GitHub": "https://github.com/search?q="
    }
    
    print("Search queries for TaxMeOn:")
    for query in search_queries:
        print(f"- {query}")
    
    print("\nAcademic sources to search:")
    for source, base_url in academic_sources.items():
        print(f"- {source}: {base_url}")
        for query in search_queries[:2]:  # Show first 2 queries as examples
            search_url = base_url + query.replace(" ", "+")
            print(f"  Example: {search_url}")
        print()

search_academic_sources()


## Next Steps and Recommendations

Based on our search, here are the next steps to find and work with TaxMeOn:


In [None]:
# Recommendations for finding TaxMeOn
recommendations = [
    "1. Contact the original developers/authors of TaxMeOn directly",
    "2. Search academic databases (PubMed, Google Scholar) for papers mentioning TaxMeOn",
    "3. Check if TaxMeOn is available through institutional repositories",
    "4. Look for alternative taxonomic ontologies that serve similar purposes",
    "5. Consider creating a custom ontology based on your specific needs",
    "6. Explore the OBO Foundry for related taxonomic ontologies"
]

print("Recommendations for finding TaxMeOn:")
for rec in recommendations:
    print(rec)

print("\nAlternative approaches:")
print("- Use NCBI Taxonomy (well-established, publicly available)")
print("- Use ITIS data (can be converted to OWL)")
print("- Use GBIF taxonomic backbone")
print("- Create a custom ontology for your specific use case")

print("\nIf you find TaxMeOn, you can use the functions in this notebook to:")
print("- Download the OWL file")
print("- Load it with OWLReady2 or RDFLib")
print("- Analyze its structure and content")
print("- Visualize the ontology graph")
print("- Query the ontology for specific taxonomic information")
