## Prerequisites

Before running this notebook, make sure you have done the following:

1. Map `localhost:27018` to the Mongo server you want to use
2. Update .env so that we have MONGO_HOST and MONGO_PORT sorted like this:

MONGO_HOST=localhost
MONGO_USERNAME=admin
MONGO_DBNAME=nmdc
MONGO_PORT=27018

3. Load a recent dump of the production Mongo database into that Mongo server (see `$ make mongorestore-nmdc-db` for an example)
4. from a GH clone of nmdc-runtime
```
% deactivate
% rm -rf .venv
% python -m venv .venv
% source .venv/bin/activate
% pip install -r requirements/dev.txt
% pip install -r requirements/main.txt
% pip install jupyter
% python -m ipykernel install --user --name=nmdc-runtime --display-name "Python (nmdc-runtime)"
```
From the jupyter notebook interface itself, select exactly the kernal you created above: 'Python (nmdc-runtime)' 
Then run these cells in order. 


In [None]:
from dotenv import load_dotenv
from dotenv import dotenv_values
import os

# Load current .env values
env_values = dotenv_values()

# Remove each .env variable from os.environ
for key in env_values.keys():
    os.environ.pop(key, None)

# Load environment variables from .env file
load_dotenv("../../.env")

# Check if MONGO_HOST is loaded
print("MONGO_HOST:", os.getenv("MONGO_HOST"))
print("MONGO_PORT:", os.getenv("MONGO_PORT"))

In [2]:
import logging

# Create logger
logger = logging.getLogger("my_logger")
logger.setLevel(logging.DEBUG)  # Set level to show messages

# Create console handler with output format
handler = logging.StreamHandler()
handler.setFormatter(logging.Formatter("%(levelname)s: %(message)s"))

# Avoid adding multiple handlers (prevents duplicate logs)
if not logger.hasHandlers():
    logger.addHandler(handler)

# Example usage
logger.info("This is an info message.")
logger.debug("This is a debug message.")
logger.warning("This is a warning message.")


INFO: This is an info message.
DEBUG: This is a debug message.


In [3]:
import sys
print(sys.executable)


/Users/SMoxon/Documents/src/nmdc-runtime/.venv/bin/python


In [4]:
# !pip install ontology-loader==v0.1.4-rc1
from ontology_loader.ontology_load_controller import OntologyLoaderController


def run_ontology_loader(source_ontology="envo", output_directory=None, generate_reports=True):
    """
    Run the OntologyLoaderController inside the Jupyter Notebook.

    Args:
        source_ontology (str): The ontology to load (default: "envo").
        output_directory (str or None): Directory to save reports. Uses temp directory if None.
        generate_reports (bool): Whether to generate reports.
    """
    if output_directory is None:
        output_directory = os.path.join(os.getcwd(), "ontology_reports")  # Save reports in current working dir

    print(f"Running Ontology Loader for ontology: {source_ontology}")

    print(os.getenv("MONGO_HOST"))
    print(os.getenv("MONGO_PORT"))
    loader = OntologyLoaderController(
        source_ontology=source_ontology,
        output_directory=output_directory,
        generate_reports=generate_reports
    )


    try:
        loader.run_ontology_loader()
        print("Ontology load completed successfully!")
    except Exception as e:
        print(f"Error running ontology loader: {e}")

# Run the ontology loader inside the notebook
run_ontology_loader()


2025-03-06 11:05:34,744 - INFO - Processing ontology: envo
2025-03-06 11:05:34,745 - INFO - Preparing ontology: envo
2025-03-06 11:05:34,747 - INFO - Removing existing pystow directory for envo: /Users/SMoxon/.data/envo
2025-03-06 11:05:34,752 - INFO - downloading with urllib from https://s3.amazonaws.com/bbop-sqlite/envo.db.gz to /Users/SMoxon/.data/envo/envo.db.gz/envo.db.gz


Running Ontology Loader for ontology: envo
localhost
27018


Downloading envo.db.gz: 0.00B [00:00, ?B/s]

2025-03-06 11:05:50,434 - INFO - Extracting /Users/SMoxon/.data/envo/envo.db.gz/envo.db.gz to /Users/SMoxon/.data/envo/envo.db.gz/envo.db...
2025-03-06 11:05:50,694 - INFO - Ontology database is ready at: /Users/SMoxon/.data/envo/envo.db.gz/envo.db
2025-03-06 11:05:50,748 - INFO - Locator: /Users/SMoxon/.data/envo/envo.db.gz/envo.db
2025-03-06 11:05:50,748 - INFO - Locator, post-processed: sqlite:////Users/SMoxon/.data/envo/envo.db.gz/envo.db
2025-03-06 11:05:50,759 - INFO - Precomputing lookups
2025-03-06 11:05:51,178 - INFO - Query: SELECT node.id AS node_id 
FROM node 
WHERE (node.id NOT IN (SELECT deprecated_node.id 
FROM deprecated_node))
2025-03-06 11:05:54,993 - INFO - Extracted 4066 ontology classes.
2025-03-06 11:05:54,995 - INFO - Query: SELECT node.id AS node_id 
FROM node 
WHERE (node.id NOT IN (SELECT deprecated_node.id 
FROM deprecated_node))
2025-03-06 11:06:00,830 - INFO - Extracted 32404 ontology relations.
2025-03-06 11:06:00,831 - INFO - MongoDB connection string: mo

['nmdc']


2025-03-06 11:06:12,610 - INFO - Finished upserting 4066 OntologyClass objects into MongoDB.
2025-03-06 11:06:12,611 - INFO - No metadata for ontology_class_set; no derivations
2025-03-06 11:06:12,621 - INFO - No obsolete ontology classes found. No relations deleted.
2025-03-06 11:06:13,694 - INFO - No metadata for ontology_relation_set; no derivations
2025-03-06 11:07:32,068 - INFO - Finished processing 32404 OntologyRelation objects. Inserted 32404 new relations.
2025-03-06 11:07:33,358 - INFO - Report generated: /Users/SMoxon/Documents/src/nmdc-runtime/docs/nb/ontology_reports/ontology_updates.tsv
2025-03-06 11:07:33,360 - INFO - Report generated: /Users/SMoxon/Documents/src/nmdc-runtime/docs/nb/ontology_reports/ontology_inserts.tsv
2025-03-06 11:07:33,361 - INFO - Processing complete. Data inserted into MongoDB.


Ontology load completed successfully!


In [7]:
!less /Users/SMoxon/Documents/src/nmdc-runtime/docs/nb/ontology_reports/ontology_inserts.tsv

id      id      type    name    description     alternative_identifiers alternative_names       definition      relations
[H[2J[H[H[2J[Hid      id      type    name    description     alternative_identifiers alternative_names       definition      relations
[1m~[m
[1m~[m
[1m~[m
[1m~[m
[1m~[m
[1m~[m
[1m~[m
[1m~[m
[1m~[m
[1m~[m
[1m~[m
[1m~[m
[1m~[m
[1m~[m
[1m~[m
[1m~[m
[1m~[m
[1m~[m
[1m~[m
[1m~[m
[1m~[m
[K[7m(END)[m[K

In [5]:
!less /Users/SMoxon/Documents/src/nmdc-runtime/docs/nb/ontology_reports/ontology_updates.tsv

id      id      type    name    description     alternative_identifiers alternative_names       definition      relations
ENVO:00000000   ENVO:00000000   nmdc:OntologyClass                      []      ['geographic feature', 'macroscopic spatial feature']   An astrononmical body part which delimited by physical discontinuities with its surroundings.   []
ENVO:00000002   ENVO:00000002   nmdc:OntologyClass                      []      ['manmade feature', 'man-made feature', 'anthropogenic geographic feature']     An anthropogenic geographic feature is a geographic feature resulting from the influence of human beings on nature.     []
ENVO:00000004   ENVO:00000004   nmdc:OntologyClass                      []      ['prefecture', 'civil area', 'administrative region', 'protectorate', 'sheikdom', 'trade zone', 'administrative entity', 'neutral zone (political)', 'leased zone (government)', 'boundary region', 'free trade zone', 'administrative area', 'sultanate', 'governed place', 'district',

In [9]:
from ontology_loader.mongodb_loader import MongoDBLoader
from ontology_loader.utils import load_yaml_from_package

nmdc_sv = load_yaml_from_package("nmdc_schema", "nmdc_materialized_patterns.yaml")
mdb = MongoDBLoader(schema_view=nmdc_sv)

2025-03-05 12:49:55,431 - INFO - MongoDB connection string: mongodb://admin:root@localhost:27018/nmdc?authSource=admin
2025-03-05 12:49:55,433 - INFO - Initializing databases
2025-03-05 12:49:55,434 - INFO - Attaching mongodb://admin:root@localhost:27018/nmdc?authSource=admin
2025-03-05 12:49:55,434 - INFO - Connected to MongoDB: <linkml_store.api.stores.mongodb.mongodb_database.MongoDBDatabase object at 0x126e00650>


In [11]:
print(mdb.client.handle)

mongodb://admin:root@localhost:27018/nmdc?authSource=admin


In [22]:
database = mdb.client.get_database()
ontology_class_set = database.get_collection("ontology_class_set")
qr = ontology_class_set.find()
print("first ontology_class_set rows: ", qr.rows[1])
print(qr.rows[1])

ontology_relation_set = database.get_collection("ontology_relation_set")
qr = ontology_relation_set.find()
print("number of ontology_relation_set rows: ", qr.num_rows)
print("first ontology_relation_set rows: ", qr.rows[1])


first ontology_class_set rows:  {'id': 'ENVO:00000001', 'type': 'nmdc:OntologyClass', 'name': None, 'description': None, 'alternative_identifiers': [], 'alternative_names': ['bedding-plane cave'], 'definition': 'A cavity developed along a bedding-plane and elongate in cross-section as a result.', 'relations': []}
{'id': 'ENVO:00000001', 'type': 'nmdc:OntologyClass', 'name': None, 'description': None, 'alternative_identifiers': [], 'alternative_names': ['bedding-plane cave'], 'definition': 'A cavity developed along a bedding-plane and elongate in cross-section as a result.', 'relations': []}
number of ontology_relation_set rows:  1852298
first ontology_relation_set rows:  {'type': 'nmdc:OntologyRelation', 'subject': 'ENVO:00000000', 'predicate': 'is_a', 'object': 'ENVO:01000813'}
