# Selectors

In this tutorial, we'll explore the functionality of various simple `soupsavvy` selectors, showcasing how they can be used to perform searches in `BeautifulSoup` objects.

## AttributeSelector

Attribute selectors allow you to select elements based on element attribute value. For more information about this selector refer to [Mozilla](https://developer.mozilla.org/en-US/docs/Web/CSS/Attribute_selectors).

### Finding by attribute name

`AttributeSelector` can be used to find HTML elements with a specific attribute name. If an element contains the given attribute, it matches the selector, regardless of the attribute's value.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import AttributeSelector

soup = BeautifulSoup(
    """<p id="12ghj8">Book</p><p class="price">Price: $20</p>""", features="lxml"
)
price_selector = AttributeSelector("class")
price_selector.find(soup)

### Finding by exact value

`AttributeSelector` can be used to match the exact value of a given attribute. By passing a `string` as the `value` parameter, the selector will only match elements whose attribute matches this exact value.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import AttributeSelector

soup = BeautifulSoup(
    """<span class="title">Animal Farm</span><p class="price">Price: $20</p>""",
    features="lxml",
)
price_selector = AttributeSelector("class", value="price")
price_selector.find(soup)

### Finding by regex

For more flexible searches, `AttributeSelector` can also match elements based on a regular expression pattern. By passing a compiled regex pattern, you can perform partial matches, complex text patterns, or other advanced text queries on attribute value.

In [None]:
import re

from bs4 import BeautifulSoup

from soupsavvy import AttributeSelector

soup = BeautifulSoup(
    """
        <a href="https://www.fictiondb.com/title/animal-farm~george-orwell~161188.htm">fictiondb</a>
        <a href="https://search.worldcat.org/title/1056176764">worldcat</a>
    """,
    features="lxml",
)
price_selector = AttributeSelector("href", value=re.compile(r"worldcat\.org/.*/\d{10}"))
price_selector.find(soup)

### Other functionalities

Let's explore some common functionalities available with the `AttributeSelector` and other selectors in `soupsavvy`. These functionalities behave consistently across all selector types, so the following examples apply universally.

#### Using `strict` mode

When no match is found, the behavior of the `AttributeSelector.find` method depends on the `strict` parameter. If `strict` is set to `True`, it raises a `TagNotFoundException`. If `strict` is set to `False` (the default), it returns `None`.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import AttributeSelector
from soupsavvy.exceptions import TagNotFoundException

soup = BeautifulSoup(
    """<span class="title">Animal Farm</span><p class="how_much">Price: $20</p>""",
    features="lxml",
)
price_selector = AttributeSelector("class", value="price")

print(f"NOT STRICT: {price_selector.find(soup)}")

try:
    price_selector.find(soup, strict=True)
except TagNotFoundException as e:
    print(f"STRICT: {e}")

#### Finding all elements

The `find_all` method can be used to return all matching elements, similar to the `BeautifulSoup` `Tag.find_all` method. The elements in the result list are always unique and maintain the same order as they appear in the document.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import AttributeSelector

soup = BeautifulSoup(
    """
        <span class="title">Animal Farm</span>
        <p class="price">Price: $10</p>
        <p class="price">Price: $20</p>
        <p class="price">Price: $30</p>
    """,
    features="lxml",
)
price_selector = AttributeSelector("class", value="price")
price_selector.find_all(soup)

#### Using `limit` option

When using the `find_all` method, the `limit` parameter can be used to restrict the number of elements returned. If `limit` is set to `None` (the default), all matching elements are returned. This functionality is derived from the `BeautifulSoup` `Tag.find_all` method.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import AttributeSelector

soup = BeautifulSoup(
    """
        <span class="title">Animal Farm</span>
        <p class="price">Price: $10</p>
        <p class="price">Price: $20</p>
        <p class="price">Price: $30</p>
    """,
    features="lxml",
)
price_selector = AttributeSelector("class", value="price")
price_selector.find_all(soup, limit=2)

#### Using `recursive` option

Both `find` and `find_all` methods have a `recursive` parameter. If `recursive` is set to `True` (the default), the search includes the entire document. If `recursive` is set to `False`, the search is limited to the direct children of the element. This behavior is consistent with how the `recursive` parameter works in `BeautifulSoup`.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import AttributeSelector

soup = BeautifulSoup(
    """
        <span class="title">Animal Farm</span>
        <div>
            <p class="price">Price: $10</p>
            <p class="price">Price: $20</p>
        </div>
        <p class="price">Price: $30</p>
    """,
    features="html.parser",
)
price_selector = AttributeSelector("class", value="price")
price_selector.find(soup, recursive=False)

### Specific attribute selectors

`soupsavvy` offers convenience classes for selecting elements based on specific attributes, such as `id` and `class`. These classes, simplify the selection process by pre-defining commonly used attribute names. 

**Convenience Selectors:**

- **`IdSelector`**: Selector for matching elements by their `id` attribute value.

- **`ClassSelector`**: Selector for matching elements by their `class` attribute value.

Both are subclasses of `AttributeSelector`. By pre-assigning attribute names, they offer a more intuitive interface and reduce the amount of boilerplate code needed for common attribute-based searches.

For more information about this selector refer to Mozilla for [Class](https://developer.mozilla.org/en-US/docs/Web/CSS/Class_selectors) and [ID](https://developer.mozilla.org/en-US/docs/Web/CSS/ID_selectors) selectors.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector

soup = BeautifulSoup(
    """<span class="title">Animal Farm</span><p class="price">Price: $20</p>""",
    features="lxml",
)
price_selector = ClassSelector("price")
price_selector.find(soup)

In [None]:
import re

from bs4 import BeautifulSoup

from soupsavvy import IdSelector

soup = BeautifulSoup(
    """<p id="12ghj8">Book</p><p id="13cji0" class="price">Price: $20</p>""",
    features="lxml",
)
price_selector = IdSelector(re.compile(r"^13.*0$"))
price_selector.find(soup)

These are subclasses of `AttributeSelector` and provide a more user-friendly interface, as they skip `name` parameter which is pre-assigned to the component specific value.

## TypeSelector

`TypeSelector` is used to select elements based on their tag name. Selects all elements of the given type within a document. For more information about this selector refer to [Mozilla](https://developer.mozilla.org/en-US/docs/Web/CSS/Type_selectors).

### Finding by tag name

The `TypeSelector` can be used to find elements based solely on their tag name. For example, selecting all `<p>` elements in a document. This basic usage allows you to target elements without specifying any additional attributes.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import TypeSelector

soup = BeautifulSoup(
    """<p class="price">Price: $10</p><p class="price">Price: $20</p>""",
    features="lxml",
)
price_selector = TypeSelector("p")
price_selector.find(soup)

`TypeSelector` can be combined with `AttributeSelector` as well as with any other, to make composite selector and perform more complex searches. More about this in `Combining` tutorial.

## UniversalSelector

This selector is a wildcard selector that matches any tag. It can be used to select any element in the document. It is equivalent to using `*` in CSS selectors. For more information about this selector refer to [Mozilla](https://developer.mozilla.org/en-US/docs/Web/CSS/Universal_selectors).

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import UniversalSelector

soup = BeautifulSoup(
    """
        <p class="title" lang="es">Rebelión en la granja</p>
        <p class="description" lang="en">Some animals are more equal than others</p>
        <span class="title" lang="en">Animal Farm</span>
    """,
    features="html.parser",
)
any_selector = UniversalSelector()
any_selector.find(soup)

When using `find_all` method, it returns all elements in the document. This can be restricted to only direct children of the element by setting `recursive` parameter to `False`.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import UniversalSelector

soup = BeautifulSoup(
    """
        <p class="title" lang="es">Rebelión en la granja</p>
        <p class="description" lang="en">Some animals are more equal than others</p>
        <span class="title" lang="en">Animal Farm</span>
    """,
    features="html.parser",
)
any_selector = UniversalSelector()
any_selector.find_all(soup)

## PatternSelector

`PatternSelector` is designed to select elements based on their text content. Unlike the standard `BeautifulSoup` implementation, which returns `NavigableString` when querying by text content, `PatternSelector` provides a more consistent and practical approach.

In `BeautifulSoup`, searching for elements by text content always returns `NavigableString` when element was found, which can be cumbersome and inconsistent, especially when further searching or processing is needed. `NavigableString` does not support additional searches or operations, making it less useful for many applications.

In [None]:
import re

from bs4 import BeautifulSoup

soup = BeautifulSoup(
    """
        <p class="title" lang="es">Rebelión en la granja</p>
        <p class="description" lang="en">Some animals are more equal than others</p>
        <span class="title" lang="en">Animal Farm</span>
    """,
    features="lxml",
)
result = soup.find(string=re.compile("animal", re.IGNORECASE))
print(f"Result is of type {type(result)} : {result}")

`PatternSelector` addresses this limitation by returning a `Tag` object instead of `NavigableString`. This behavior aligns with other selectors and allows for more effective manipulation and querying of the text-containing elements.

### Finding by plain text

When searching by plain text, `PatternSelector` matches elements whose text content exactly matches the provided string.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import PatternSelector

soup = BeautifulSoup(
    """
        <p class="title" lang="es">Rebelión en la granja</p>
        <p class="description" lang="en">Some animals are more equal than others</p>
        <span class="title" lang="en">Animal Farm</span>
    """,
    features="lxml",
)
animal_selector = PatternSelector("Animal Farm")
animal_selector.find(soup)

### Finding by regex

For more flexible searches, `PatternSelector` can also match elements based on a regular expression pattern. By passing a compiled regex pattern, you can perform partial matches, complex text patterns, or other advanced text queries

In [None]:
import re

from bs4 import BeautifulSoup

from soupsavvy import PatternSelector

soup = BeautifulSoup(
    """
        <p class="title" lang="es">Rebelión en la granja</p>
        <p class="description" lang="en">Some animals are more equal than others</p>
        <span class="title" lang="en">Animal Farm</span>
    """,
    features="lxml",
)
animal_selector = PatternSelector(re.compile(r"animal", re.IGNORECASE))
animal_selector.find(soup)

## Summary

These fundamental selectors form the core of `soupsavvy` and provide the building blocks for more complex queries. In the next tutorials, we'll explore advanced options and composite selectors that build upon these basics to enable even more powerful and flexible searching capabilities.

**Enjoy `soupsavvy` and leave us feedback!**  
**Happy scraping!**