# Selectors

Selectors are used to search for elements in a `BeautifulSoup` objects. This tutorial demonstrates various simple selectors, which are core features of `soupsavvy`.

## API

Every `soupsavvy` selector follows a consistent interface, providing an API to:
- Search for elements within `BeautifulSoup` objects.
- Check for selectors equality.
- Combine selectors to create more complex queries.

### Find

The `find` method searches for the first element that matches the selector. 

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, to_soupsavvy

soup = BeautifulSoup(
    """
    <span class="title">Animal Farm</span>
    <p class="price">Price: $10</p>
    <p class="price">Price: $20</p>
    <p class="price">Price: $30</p>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)
selector = ClassSelector("price")
selector.find(element)

#### Strict

When no match is found, the behavior of the `find` method is controlled by the `strict` parameter:
- **`True`** - Raises a `TagNotFoundException`.
- **`False`** - Returns `None`.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, to_soupsavvy
from soupsavvy.exceptions import TagNotFoundException

soup = BeautifulSoup(
    """
    <span class="title">Animal Farm</span>
    <p>Hello World</p>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)
selector = ClassSelector("price")

print(f"NOT STRICT: {selector.find(element)}")

try:
    selector.find(element, strict=True)
except TagNotFoundException as e:
    print(f"STRICT: {e}")

#### Recursive

The search depth is defined by the `recursive` parameter:
- **`True`** - Performs a recursive search on the element's descendants.
- **`False`** - Searches only within the direct children of the element.

This parameter also applies to the `find_all` method.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <span class="title">Animal Farm</span>
        <div class="inactive">
            <p class="price">Price: $10</p>
            <p class="price">Price: $20</p>
        </div>
        <p class="price">Price: $30</p>
    """,
    features="html.parser",
)
element = to_soupsavvy(soup)
selector = ClassSelector("price")
selector.find(element, recursive=False)

### Find All

The `find_all` method searches for all elements that match the selector.  
The results list contains unique elements, maintaining the same order as they appear in the document.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, to_soupsavvy

soup = BeautifulSoup(
    """
    <span>Hello World</span>
    <p class="price">Price: $10</p>
    <p class="price">Price: $20</p>
    <p class="price">Price: $30</p>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)
selector = ClassSelector("price")
selector.find_all(element)

#### Limit

You can restrict the number of elements returned by using the `limit` parameter:
- **`None`** - Returns all matching elements.
- **`int`** - Returns up to the specified number of matching elements.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, to_soupsavvy

soup = BeautifulSoup(
    """
    <span>Hello World</span>
    <p class="price">Price: $10</p>
    <p class="price">Price: $20</p>
    <p class="price">Price: $30</p>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)
selector = ClassSelector("price")
selector.find_all(element, limit=2)

### Equality

All selectors can be compared, if two selectors are equal, their search results will always be identical.

In [None]:
from soupsavvy import TypeSelector

print(f"{TypeSelector('p') == TypeSelector('div') = }")
print(f"{TypeSelector('p') == TypeSelector('p') = }")

### Combining

Selectors can be combined in various ways to create composite selectors. Read about this in the following tutorial.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector, to_soupsavvy

soup = BeautifulSoup(
    """
    <span>Hello World</span>
    <p class="price">Price: $10</p>
    <a class="price">Price: $20</a>
    <p class="price">Price: $30</p>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)
selector = ClassSelector("price") & TypeSelector("a")
selector.find(element)

## AttributeSelector

Attribute selectors in `soupsavvy` allow you to select elements based on their attribute values.  
For more information about the CSS counterpart, refer to [Mozilla](https://developer.mozilla.org/en-US/docs/Web/CSS/Attribute_selectors).

Find element with specific attribute, regardless of the attribute's value.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import AttributeSelector, to_soupsavvy

soup = BeautifulSoup(
    """
    <span>Animal Farm</span>
    <a href="/shop">Price: $20</a>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)
selector = AttributeSelector("href")
selector.find(element)

Find element with an exact attribute value by passing string.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import AttributeSelector, to_soupsavvy

soup = BeautifulSoup(
    """
    <p class="price">Price: $20</p>
    <a role="main">Home</a>
    <a role="button">Add to Cart</a>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)
selector = AttributeSelector("role", value="button")
selector.find(element)

Find elements based on a regular expression pattern.

In [None]:
import re

from bs4 import BeautifulSoup

from soupsavvy import AttributeSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <span>Animal Farm</span>
        <a href="https://www.fictiondb.com/title/animal-farm~george-orwell~161188.htm">fictiondb</a>
        <a href="https://search.worldcat.org/title/1056176764">worldcat</a>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)
selector = AttributeSelector("href", value=re.compile(r"worldcat\.org/.*/\d{10}"))
selector.find(element)

### Specific attribute selectors

Most commonly used attributes for selection have their own dedicated selectors: 

- **`IdSelector`**: Matches elements by their `id` attribute value.
- **`ClassSelector`**: Matches elements by their `class` attribute value.

For more information about css counterparts refer to Mozilla for [Class](https://developer.mozilla.org/en-US/docs/Web/CSS/Class_selectors) and [ID](https://developer.mozilla.org/en-US/docs/Web/CSS/ID_selectors) selectors.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, to_soupsavvy

soup = BeautifulSoup(
    """<span class="title">Animal Farm</span><p class="price">Price: $20</p>""",
    features="lxml",
)
element = to_soupsavvy(soup)
price_selector = ClassSelector("price")
price_selector.find(element)

In [None]:
import re

from bs4 import BeautifulSoup

from soupsavvy import IdSelector, to_soupsavvy

soup = BeautifulSoup(
    """<p id="12ghj8">Book</p><p id="13cji0" class="price">Price: $20</p>""",
    features="lxml",
)
element = to_soupsavvy(soup)
price_selector = IdSelector(re.compile(r"^13.*0$"))
price_selector.find(element)

## TypeSelector

`TypeSelector` is used to select elements based on their tag name. For more information about css counterpart refer to [Mozilla](https://developer.mozilla.org/en-US/docs/Web/CSS/Type_selectors).

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import TypeSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <span class="title">Animal Farm</span>
        <p class="price">Price: $10</p>
        <span>Hello World</span>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)
price_selector = TypeSelector("p")
price_selector.find(element)

## UniversalSelector

`UniversalSelector` is a wildcard selector, that matches any tag.  
Its css counterpart is `*`, for more information about css counterpart refer to [Mozilla](https://developer.mozilla.org/en-US/docs/Web/CSS/Universal_selectors).

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import UniversalSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <p class="title" lang="es">Rebelión en la granja</p>
        <p class="description" lang="en">Some animals are more equal than others</p>
        <span class="title" lang="en">Animal Farm</span>
    """,
    features="html.parser",
)
element = to_soupsavvy(soup)
any_selector = UniversalSelector()
any_selector.find(element)

## PatternSelector

`PatternSelector` is designed to select elements based on their text content.
While `BeautifulSoup` returns `NavigableString` for such queries, which is limiting, `PatternSelector` returns elements with text content that matches the provided pattern.

Find element with an exact attribute value by passing string.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import PatternSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <p class="title" lang="es">Rebelión en la granja</p>
        <p class="description" lang="en">Some animals are more equal than others</p>
        <span class="title" lang="en">Animal Farm</span>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)
selector = PatternSelector("Animal Farm")
selector.find(element)

Find elements based on a regular expression pattern.

In [None]:
import re

from bs4 import BeautifulSoup

from soupsavvy import PatternSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <p class="title" lang="es">Rebelión en la granja</p>
        <p class="description" lang="en">Some animals are more equal than others</p>
        <span class="title" lang="en">Animal Farm</span>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)
selector = PatternSelector(re.compile(r"animal", re.IGNORECASE))
selector.find(element)

## XPathSelector

The `XPathSelector` enables the use of XPath expressions to select elements, a feature not natively supported by `BeautifulSoup`. It relies on the `lxml`, that needs to be installed. The XPath expression must target html elements in order to return valid results.

In [18]:
from bs4 import BeautifulSoup

from soupsavvy import XPathSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <span class="id">1234</span>
        <span class="title">Frankenstein</span>
        <p class="title">Wild Animal</p>
        <span class="title">Animal Farm</span>
    """,
    features="html.parser",
)
element = to_soupsavvy(soup)
selector = XPathSelector("//span[@class='title'][contains(text(),'Animal')]")
# selector.find(element)

## ExpressionSelector

The `ExpressionSelector` allows you to define your own custom logic for selecting elements by providing a predicate function. This function evaluates each element and decides whether it should be included in the result set.

This works similarly to the `BeautifulSoup` API, where you can pass a predicate function to `find` methods:

```python
soup.find(lambda tag: tag.name == 'div')
```

In [1]:
from bs4 import BeautifulSoup

from soupsavvy import ExpressionSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <span class="id">1234</span>
        <p class="title">Animal Farm</p>
        <span class="title">Frankenstein</span>
    """,
    features="html.parser",
)
element = to_soupsavvy(soup)
selector = ExpressionSelector(
    lambda tag: tag.name != "p" and "title" in tag.get()["class"]
)
selector.find(element)

SoupElement(<span class="title">Frankenstein</span>)

## Conclusion

These fundamental selectors form the core of `soupsavvy` and provide the building blocks for more complex queries.  
Read about composite selectors [here](https://soupsavvy.readthedocs.io/en/stable/demos/combining.html).

**Enjoy `soupsavvy` and leave us feedback!**  
**Happy scraping!**