# About

## Motivation

While `BeautifulSoup` is great for parsing HTML, its search engine is limited and struggles with complex queries. The `Tag` component, couples node representation with search logic, making it difficult to extend or customize.
`soupsavvy` addresses these limitations by introducing a flexible, reusable selector system that integrates seamlessly with `BeautifulSoup` and can be tailored to user needs.

## Selector concept

With `BeautifulSoup`, dictionary of parameters can be used to imitate the concept of selector, but this approach is limited to simple queries like text, attributes, and name matching.

In [None]:
from bs4 import BeautifulSoup

PRICE_SELECTOR = {"name": "p", "attrs": {"class": "price"}}

soup = BeautifulSoup("""<p class="price">Price: $10</p>""", "lxml")
soup.find(**PRICE_SELECTOR)

Alternatively, function can be used to implement more complex search logic.

In [None]:
from bs4 import BeautifulSoup, Tag

# CSS Selector equivalent: p.price > span ~ a.link


def select_sth(tag: Tag):
    # Find the first <p> tag with class "price"
    tag1 = tag.find("p", attrs={"class": "price"})

    if not isinstance(tag1, Tag):
        return None

    # Find the first <span> tag within the <p> tag
    tag2 = tag1.find("span", recursive=False)

    if not isinstance(tag2, Tag):
        return None

    # Find the next sibling <a> tag with class "link"
    return tag2.find_next_sibling("a", attrs={"class": "link"})


soup = BeautifulSoup(
    """
    <p class="price">
        <span>Price: $10</span>
        <a class="link"></a>
    </p>
    """,
    features="lxml",
)
select_sth(soup)

`soupsavvy` encapsulates search logic and offers a more structured, declarative approach to defining selectors, including composite ones, that can handle intricate, layered criteria. Selectors in `soupsavvy` are designed to be intuitive and feature API similar to `BeautifulSoup`.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector

PRICE_SELECTOR = {"name": "p", "attrs": {"class": "price"}}

soup = BeautifulSoup("""<p class="price">Price: $10</p>""", features="lxml")
selector = TypeSelector("p") & ClassSelector("price")
selector.find(soup)

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector

soup = BeautifulSoup(
    """
    <p class="price">
        <span>Price: $10</span>
        <a class="link"></a>
    </p>
    """,
    features="lxml",
)
selector = ClassSelector("price") > TypeSelector("span") + ClassSelector("link")
selector.find(soup)

## Conclusion

`soupsavvy` introduces declarative selectors that address the limitations of `BeautifulSoup`, offering custom solutions for even the most complex queries. Read more about selectors in the following sections.

**Enjoy `soupsavvy` and leave us feedback!**  
**Happy scraping!**