# 📄 ODF HTML Context Loader
Explore and clean the content of the ODF Template HTML file and combine it with Neo4j graph context.

## Import Libraries

In [1]:
from bs4 import BeautifulSoup

## Define File Paths

In [2]:
# 📂 Path to the ODF HTML file
HTM_FILE_PATH = "../../../Data/html_files/introduction/working_with_files/ODF_Template_File_(ODT).htm"

## Load File Paths

In [3]:
# 📥 Load and clean HTML content
try:
    with open(HTM_FILE_PATH, "r", encoding="utf-8") as f:
        soup = BeautifulSoup(f, "html.parser")
        raw_text = soup.get_text(separator="\n").strip()
        lines = [line.strip() for line in raw_text.splitlines() if line.strip()]
        
        # ❌ Remove known boilerplate patterns
        remove_phrases = {
            "Click here to see this page in full context",
            "*Maximize screen to view table of contents*",
            "Back", "Forward",
            "ODF Template File (ODT)"  # May appear at top & again
        }
        clean_lines = [line for line in lines if line not in remove_phrases]

        # Join cleaned lines
        odf_text = "\n".join(clean_lines)
except Exception as e:
    print(f"⚠️ Failed to load or parse HTML file: {e}")
    odf_text = ""

## Visualise Sample Extracts

In [4]:
# 📋 Preview cleaned text
print(odf_text[:1000])  # Preview first 1000 characters

A template (ODT) can be created from fresh or from an existing ODF file. This means once a preferred format has been created it can be saved as a template and utilized thereafter.
Unlike a View file (a file with VEW extension), which only saves the track layout, the template function saves all the ODF constituents in a template format. This ensures that preferred components, such as library files (i.e. lithology, headers with included location maps, symbols, modifiers) are incorporated into the template. Any plot objects in the ODF will also be incorporated into the template saved.
Once an ODT file has been created, it may be distributed to other users.
The ODT file is a powerful tool when a final log format has been approved. The ODT file will always contain library information (headers, lithology, modifiers, structures, and symbols), view file contents (track layout information, depth and screen units, scale and pen information (optional)) and ini file settings (curve defaults, compu