# ðŸ”® cellspell â€” XPath Spell (Colab)

Query XML (and HTML) files using XPath expressions, powered by `xmllint`.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sreent/jupyter-query-magics/blob/main/examples/colab_xpath.ipynb)

## Setup

In [None]:
!apt-get install -y libxml2-utils -qq
!pip install cellspell -q

In [None]:
%load_ext cellspell.xpath

## Create Sample XML Files

In [None]:
%%writefile books.xml
<?xml version="1.0"?>
<bookstore>
    <book category="fiction">
        <title lang="en">The Great Gatsby</title>
        <author>F. Scott Fitzgerald</author>
        <year>1925</year>
        <price>10.99</price>
    </book>
    <book category="tech">
        <title lang="en">Python Cookbook</title>
        <author>David Beazley</author>
        <year>2013</year>
        <price>39.99</price>
    </book>
    <book category="tech">
        <title lang="th">Database Systems</title>
        <author>Ramez Elmasri</author>
        <year>2015</year>
        <price>45.00</price>
    </book>
    <book category="fiction">
        <title lang="en">1984</title>
        <author>George Orwell</author>
        <year>1949</year>
        <price>8.99</price>
    </book>
</bookstore>

## Validate & Info

In [None]:
# Check well-formedness
%xpath books.xml

In [None]:
# Show xmllint version
%xpath

## Basic Queries

In [None]:
# Get all book titles
%%xpath books.xml
//book/title/text()

In [None]:
# Get tech book titles
%%xpath books.xml
//book[@category='tech']/title/text()

In [None]:
# Books over $30
%%xpath books.xml
//book[price > 30]/title/text()

In [None]:
# Count books
%%xpath books.xml
count(//book)

In [None]:
# Get all authors
%%xpath books.xml
//book/author/text()

In [None]:
# Get titles in Thai language
%%xpath books.xml
//book/title[@lang='th']/text()

## Formatted XML Output

In [None]:
%%xpath --format books.xml
//book[@category='tech']

In [None]:
%%xpath --format books.xml
//book[year < 1950]

## HTML Parsing

In [None]:
%%writefile page.html
<html>
<body>
    <nav>
        <a href="/home">Home</a>
        <a href="/about">About</a>
    </nav>
    <div class="content">
        <h1>Welcome</h1>
        <ul>
            <li class="item">Item 1</li>
            <li class="item">Item 2</li>
            <li class="item">Item 3</li>
        </ul>
        <a href="https://example.com">External Link</a>
    </div>
</body>
</html>

In [None]:
%%xpath --html page.html
//div[@class='content']//li/text()

In [None]:
%%xpath --html page.html
//a/@href

## Query Another File

In [None]:
%%writefile employees.xml
<?xml version="1.0"?>
<company>
    <department name="Engineering">
        <employee id="1">
            <name>Alice</name>
            <role>Senior Engineer</role>
            <salary>120000</salary>
        </employee>
        <employee id="2">
            <name>Bob</name>
            <role>DevOps</role>
            <salary>110000</salary>
        </employee>
    </department>
    <department name="Marketing">
        <employee id="3">
            <name>Charlie</name>
            <role>Content Lead</role>
            <salary>95000</salary>
        </employee>
    </department>
</company>

In [None]:
%%xpath employees.xml
//department[@name='Engineering']/employee/name/text()

In [None]:
%%xpath --format employees.xml
//employee[salary > 100000]

In [None]:
%%xpath employees.xml
count(//employee)

## Access Results in Python

The last query result is stored in `_xpath`.

In [None]:
%%xpath books.xml
//book/title/text()

In [None]:
# Access the result as a Python string
print(type(_xpath))
print(_xpath)