# Models

`soupsavvy` supports object oriented web scraping out of the box with user-defined models. Models can be used to find instances of particular structures in provided html. Model defines an object that is expected to be found in particular scope element. `Operation` can be used to extract and transform desired data from target html element.

## Operation

Operations are simple objects that wrap transformation logic, extracting text from element, converting to type or any custom transformation on provided object. Operations can be applied after selector and can be chained at will. Combining `selector` with `operation` can be done with pipe `|` operator. The same way, operations are chained with `|`. In example below, target information we want to extract is price of the book, that is expected to be inside element with class `price`, that is direct child of element with class `book`. In order to extract information in desired format, we can chain operations to extract text (`Text`), remove currency symbol (custom operation `Operation`) and convert to float (`Operation`). 

`Operation` is component that handles any user defined transformation. It expects callable that takes single argument and returns transformed value. This callable is applied to input object.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector
from soupsavvy.operations import Operation, Text

text = """
    <div class="book" href="www.book.com">
        <p class="title">Animal Farm</p>
        <p class="price">100$</p>
        <p>George Orwell</p>
    </div>
"""
soup = BeautifulSoup(text, features="lxml")
selector = ClassSelector("book") > ClassSelector("price")
operation = Text() | Operation(lambda x: x.strip("$")) | Operation(int)

pipeline = selector | operation
pipeline.find(soup)

## Model

### Definition

Requirements for user-defined model are:
- Class inheriting from `soupavvy.models.BaseModel`
- `__scope__` class attribute that defines scope of model
- At least one field defined as class attribute

Scope - defines element that is expected to contain all fields of model. It must be `soupsavvy` selector. If element matches scope selector, model is considered to be found and is subsequently extracted from it.

Field - defines single piece of information that is expected to be found in scope element. It needs to be defined as class attribute as in example below `title` and `price`. All attribute with value of instance `TagSearcher` is considered model field. Value can be for example:

- Selector ex. `ClassSelector("book")`
- Selector-Operation pipeline ex `ClassSelector("price") | Text()`
- Any Model class that inherits from `BaseModel` ex. `Author`
- Operation-Selector mixin (components that can be used both as selector and operation) ex. `Text()` or `Href()`

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector
from soupsavvy.models import BaseModel
from soupsavvy.operations import Operation, Text


class Book(BaseModel):

    __scope__ = TypeSelector("div") & ClassSelector("book")

    title = ClassSelector("title") | Text()
    price = ClassSelector("price") | Text() | Operation(int)


text = """
    <div class="book">
        <p class="title">Animal Farm</p>
        <p class="price">100</p>
        <p>George Orwell</p>
    </div>
"""
soup = BeautifulSoup(text, features="lxml")
Book.find(soup)

If no scope is found in provided tag, `find` method returns `None` and model is not extracted by default. 

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector
from soupsavvy.models import BaseModel
from soupsavvy.operations import Operation, Text


class Book(BaseModel):

    __scope__ = TypeSelector("div") & ClassSelector("book")

    title = ClassSelector("title") | Text()
    price = ClassSelector("price") | Text() | Operation(int)


text = """
    <div class="ebook" href="www.book.com">
        <p class="title">Animal Farm</p>
        <p class="price">100</p>
        <p>George Orwell</p>
    </div>
"""
soup = BeautifulSoup(text, features="lxml")
result = Book.find(soup)
assert result is None

When using `strict` mode, when `find` method does not find scope `ModelNotFoundException` exception is raised.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector
from soupsavvy.exceptions import ModelNotFoundException
from soupsavvy.models import BaseModel
from soupsavvy.operations import Operation, Text


class Book(BaseModel):

    __scope__ = TypeSelector("div") & ClassSelector("book")

    title = ClassSelector("title") | Text()
    price = ClassSelector("price") | Text() | Operation(int)


text = """
    <div class="ebook" href="www.book.com">
        <p class="title">Animal Farm</p>
        <p class="price">100</p>
        <p>George Orwell</p>
    </div>
"""
soup = BeautifulSoup(text, features="lxml")

try:
    Book.find(soup, strict=True)
except ModelNotFoundException as e:
    print(e)

By default, if any error occurs during extraction, it is propagated and model is not build. In example below, `price` element is not found in scope, selector returns `None` and `Text` operation fails, because it cannot extract text from `None`. `strict` parameter applies only to scope search. Field selector is *forgiving* and always applies next operation irrespective of result of previous one. Any expected edge cases need to be explicitly handled. 

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector
from soupsavvy.exceptions import FieldExtractionException
from soupsavvy.models import BaseModel
from soupsavvy.operations import Operation, Text


class Book(BaseModel):

    __scope__ = TypeSelector("div") & ClassSelector("book")

    title = ClassSelector("title") | Text()
    price = ClassSelector("price") | Text() | Operation(int)


text = """
    <div class="book" href="www.book.com">
        <p class="title">Animal Farm</p>
        <p>George Orwell</p>
    </div>
"""
soup = BeautifulSoup(text, features="lxml")

try:
    Book.find(soup)
except FieldExtractionException as e:
    print(e)

When expecting, that `price` element might not be present in scope, one of the options is to use `SkipNone` operation wrapper, that skips operation if input is `None`. This way, extraction of text and converting to integer happens only if `price` element is found. `SkipNone` operation returns `None` if input is `None`, otherwise it applies wrapped operation. By default all fields are nullable, so if field selector returns `None`, field of model instance is set to `None`.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector
from soupsavvy.models import BaseModel, SkipNone
from soupsavvy.operations import Operation, Text


class Book(BaseModel):

    __scope__ = TypeSelector("div") & ClassSelector("book")

    title = ClassSelector("title") | Text()
    price = ClassSelector("price") | SkipNone(Text() | Operation(int))


text = """
    <div class="book" href="www.book.com">
        <p class="title">Animal Farm</p>
        <p>George Orwell</p>
    </div>
"""
soup = BeautifulSoup(text, features="lxml")
Book.find(soup)

Another way of handling errors is `Suppress` operation wrapper, that suppresses any exception that occurs during operation execution. It returns `None` if exception occurs. In case we expect, that input can have value, that is incompatible with operation, `Suppress` is used to handle this case. In example below, `price` element is expected to be present, but we allow case, when text is empty, so we suppress exception that occurs when converting empty string to integer, resulting in `None`.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector
from soupsavvy.models import BaseModel, Suppress
from soupsavvy.operations import Operation, Text


class Book(BaseModel):

    __scope__ = TypeSelector("div") & ClassSelector("book")

    title = ClassSelector("title") | Text()
    price = ClassSelector("price") | Text() | Suppress(Operation(int))


text = """
    <div class="book" href="www.book.com">
        <p class="title">Animal Farm</p>
        <p class="price"></p>
        <p>George Orwell</p>
    </div>
"""
soup = BeautifulSoup(text, features="lxml")
Book.find(soup)

Such case can be handled more specifically with `Default` field wrapper. When field selector returns `None`, default value is used instead. When we know, that `price` element with empty text means, that price is `0`, we can use `Default` operation to handle this case. All errors need to be explicitly handled, `Default` selector propagates any exception that occurs during extraction.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector
from soupsavvy.models import BaseModel, Default, Suppress
from soupsavvy.operations import Operation, Text


class Book(BaseModel):

    __scope__ = TypeSelector("div") & ClassSelector("book")

    title = ClassSelector("title") | Text()
    price = Default(ClassSelector("price") | Text() | Suppress(Operation(int)), 0)


text = """
    <div class="book" href="www.book.com">
        <p class="title">Animal Farm</p>
        <p class="price">hundred</p>
        <p>George Orwell</p>
    </div>
"""
soup = BeautifulSoup(text, features="lxml")
Book.find(soup)

As mentioned before, by default, any field is nullable, so when field selector returns `None`, field of model instance is set to `None`. This can be changed by `Required` field wrapper, that raises `FieldExtractionException` when field selector returns `None`. It is used to ensure the presence of field in model instance. All errors need to be explicitly handled, `Required` selector propagates any exception that occurs during extraction.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector
from soupsavvy.exceptions import FieldExtractionException
from soupsavvy.models import BaseModel, Required, SkipNone
from soupsavvy.operations import Operation, Text


class Book(BaseModel):

    __scope__ = TypeSelector("div") & ClassSelector("book")

    title = ClassSelector("title") | Text()
    price = Required(ClassSelector("price") | SkipNone(Text() | Operation(int)))


text = """
    <div class="book" href="www.book.com">
        <p class="title">Animal Farm</p>
        <p>George Orwell</p>
    </div>
"""
soup = BeautifulSoup(text, features="lxml")

try:
    Book.find(soup)
except FieldExtractionException as e:
    print(e)

By default, only first element matching field selector is used for field extraction. If different result is expected, it can be defined with proper selector. `soupsavvy.selectors.nth` selectors might be used to match nth element matching selector. In this case, `NthLastOfSelector` selector is used to match the last element with class `price` in scope element.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector
from soupsavvy.models import BaseModel
from soupsavvy.operations import Operation, Text
from soupsavvy.selectors.nth import NthLastOfSelector


class Book(BaseModel):

    __scope__ = TypeSelector("div") & ClassSelector("book")

    title = ClassSelector("title") | Text()
    price = NthLastOfSelector(ClassSelector("price"), nth="1") | Text() | Operation(int)


text = """
    <div class="book" href="www.book.com">
        <p class="price"><s>100</s></p>
        <p class="price"><s>80</s></p>
        <p class="title">Animal Farm</p>
        <p class="price">60</p>
        <p>George Orwell</p>
    </div>
"""
soup = BeautifulSoup(text, features="lxml")
Book.find(soup)

If we expect list of elements to be found in scope, we can use `All` field wrapper. It always extracts all elements matching field selector and returns list of extracted values. In example below all prices available for given book are extracted.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector
from soupsavvy.models import All, BaseModel
from soupsavvy.operations import Operation, Text


class Book(BaseModel):

    __scope__ = TypeSelector("div") & ClassSelector("book")

    title = ClassSelector("title") | Text()
    price = All(ClassSelector("price") | Text() | Operation(int))


text = """
    <div class="book" href="www.book.com">
        <p class="title">Animal Farm</p>
        <p class="price"><s>100</s></p>
        <p class="price"><s>80</s></p>
        <p class="price">60</p>
        <p>George Orwell</p>
    </div>
"""
soup = BeautifulSoup(text, features="lxml")
Book.find(soup)

When extracted field need to be further transformed, `__post_init__` can be defined in model class to handle such cases in post initialization step. It works similar way to python `dataclass`. In example below, `price` is set to min of all prices for given book.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector
from soupsavvy.models import All, BaseModel
from soupsavvy.operations import Operation, Text


class Book(BaseModel):

    __scope__ = TypeSelector("div") & ClassSelector("book")

    title = ClassSelector("title") | Text()
    price = All(ClassSelector("price") | Text() | Operation(int))

    def __post_init__(self) -> None:
        self.price = min(self.price) # type: ignore

text = """
    <div class="book" href="www.book.com">
        <p class="title">Animal Farm</p>
        <p class="price"><s>100</s></p>
        <p class="price"><s>80</s></p>
        <p class="price">60</p>
        <p>George Orwell</p>
    </div>
"""
soup = BeautifulSoup(text, features="lxml")
Book.find(soup)

Custom `Operation` can be used for any transformation. In example below, `Operation` is used to remove currency symbol from price text before converting it to integer. 

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector
from soupsavvy.models import BaseModel
from soupsavvy.operations import Operation, Text


class Book(BaseModel):

    __scope__ = TypeSelector("div") & ClassSelector("book")

    title = ClassSelector("title") | Text()
    price = (
        ClassSelector("price")
        | Text()
        | Operation(lambda x: x.strip("$"))
        | Operation(int)
    )


text = """
    <div class="book" href="www.book.com">
        <p class="title">Animal Farm</p>
        <p class="price">100$</p>
        <p>George Orwell</p>
    </div>
"""
soup = BeautifulSoup(text, features="lxml")
Book.find(soup)

As mentioned previously, any Model class can be used as field selector. In example below, `Author` model is used to extract author information from `author` element in scope. Instance of `Author` is set as `Book` model attribute.

In [None]:
import re
from datetime import datetime

from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, PatternSelector, TypeSelector
from soupsavvy.models import BaseModel
from soupsavvy.operations import Operation, Text
from soupsavvy.selectors.css import FirstChild


class Author(BaseModel):
    __scope__ = ClassSelector("author")

    birth = (
        PatternSelector(re.compile(r"\d{4}-\d{2}-\d{2}"))
        | Text()
        | Operation(lambda x: datetime.strptime(x, "%Y-%m-%d"))
    )
    name = FirstChild() | Text()


class Book(BaseModel):

    __scope__ = TypeSelector("div") & ClassSelector("book")

    author = Author
    title = ClassSelector("title") | Text()


text = """
    <div class="book" href="www.book.com">
        <p class="title">Animal Farm</p>
        <div class="author">
            <p>George Orwell</p>
            <p>Great author</p>
            <p>1903-06-25</p>
        </div>
    </div>
"""
soup = BeautifulSoup(text, features="lxml")
Book.find(soup)

### Inheritance

In case of inheritance, fields are inherited by default, as expected. New fields can be specified to extend parent model. In example below, `eBook` inherits from `Book` models and extends it with two extra fields: `link` and `duration`. It also overwrites `__scope__`, but this is not required as special fields are inherited from parent model. Note that `eBook` uses `Href` as field selector, which is selector-operation mixin mentioned previously. It extracts `href` attribute directly from scope element.

In [None]:
import re

from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, PatternSelector, TypeSelector
from soupsavvy.models import BaseModel
from soupsavvy.operations import Href, Operation, Text


class Book(BaseModel):

    __scope__ = TypeSelector("div") & ClassSelector("book")

    title = ClassSelector("title") | Text()
    price = ClassSelector("price") | Text() | Operation(int)


class eBook(Book):
    __scope__ = TypeSelector("div") & ClassSelector("ebook")

    link = Href()
    duration = PatternSelector(re.compile(r"\d{1,2}:\d{2}")) | Text()


text = """
    <div class="ebook" href="www.ebook.com">
        <p class="title">Animal Farm</p>
        <p class="price">50</p>
        <p>George Orwell</p>
        <p>2:30</p>
    </div>
"""
soup = BeautifulSoup(text, features="lxml")
eBook.find(soup)

This behavior can be turned off by setting `inherit_fields` to `False` in model class. In such case, only fields defined in subclass are used for extraction. In example below, `eBook` model does not inherit fields from `Book` model, so only `link` and `duration` are extracted.

In [None]:
import re

from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, PatternSelector, TypeSelector
from soupsavvy.models import BaseModel
from soupsavvy.operations import Href, Operation, Text


class Book(BaseModel):

    __scope__ = TypeSelector("div")

    title = ClassSelector("title") | Text()
    price = ClassSelector("price") | Text() | Operation(int)


class eBook(Book):
    __inherit_fields__ = False

    link = Href()
    duration = PatternSelector(re.compile(r"\d{1,2}:\d{2}")) | Text()


text = """
    <div class="ebook" href="www.ebook.com">
        <p class="title">Animal Farm</p>
        <p class="price">50</p>
        <p>George Orwell</p>
        <p>2:30</p>
    </div>
"""
soup = BeautifulSoup(text, features="lxml")
eBook.find(soup)

In general, it is recommended to use as specific scope selector as possible, to avoid matching elements, that has nothing to do with the model. Ensure that scope selector matches only elements that are expected to contain model instance. You can use `HasSelector` to match elements that have tags used to extract fields.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, HasSelector, TypeSelector
from soupsavvy.models import BaseModel
from soupsavvy.operations import Operation, Text

PRICE_SELECTOR = ClassSelector("price")
TITLE_SELECTOR = ClassSelector("title")


class Book(BaseModel):

    __scope__ = (
        ClassSelector("book") & HasSelector(PRICE_SELECTOR) & HasSelector(TITLE_SELECTOR)
    )

    title = TITLE_SELECTOR | Text()
    price = PRICE_SELECTOR | Text() | Operation(int)


text = """
    <div class="book">Unavailable</div>
    <div class="book">
        <p class="title">Animal Farm</p>
        <p>George Orwell</p>
        <p>4:30</p>
    </div>
    <div class="book">
        <p class="price">50</p>
        <p>Lois Lowry</p>
        <p>3:30</p>
    </div>
    <div class="book">
        <p class="title">Brave New World</p>
        <p class="price">50</p>
        <p>Aldous Huxley</p>
        <p>2:30</p>
    </div>
"""
soup = BeautifulSoup(text, features="lxml")
Book.find(soup)

`find_all` method uses the same logic as `find`, it first matches that scope element and then extracts all fields from it. It returns list of model instances from all elements matching scope selector.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector
from soupsavvy.models import BaseModel
from soupsavvy.operations import Operation, Text
from soupsavvy.selectors.css import LastOfType


class Book(BaseModel):

    __scope__ = TypeSelector("div") & ClassSelector("book")

    title = ClassSelector("title") | Text()
    price = ClassSelector("price") | Text() | Operation(int)
    author = (LastOfType() & TypeSelector("p")) | Text()


text = """
    <div class="ebook" href="www.book.com">
        <p class="title">Animal Farm</p>
        <p class="price">100</p>
        <p>George Orwell</p>
    </div>
    <div class="book">
        <p class="title">Brave New World</p>
        <p class="price">100</p>
        <p>Aldous Huxley</p>
    </div>
    <div class="book">
        <p class="title">The Giver</p>
        <p class="price">80</p>
        <p>Lois Lowry</p>
    </div>
"""
soup = BeautifulSoup(text, features="lxml")
Book.find_all(soup)

`find_all` method propagates any errors, if extraction of any model without found scope elements failed, `find_all` method fails, raising `FieldExtractionException`.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector
from soupsavvy.models import BaseModel
from soupsavvy.operations import Operation, Text
from soupsavvy.selectors.css import LastOfType
from soupsavvy.exceptions import FieldExtractionException


class Book(BaseModel):

    __scope__ = TypeSelector("div") & ClassSelector("book")

    title = ClassSelector("title") | Text()
    price = ClassSelector("price") | Text() | Operation(int)
    author = (LastOfType() & TypeSelector("p")) | Text()


text = """
    <div class="book">
        <p class="title">Brave New World</p>
        <p class="price">100</p>
        <p>Al
    </div>
    <div class="book">
        <p class="title">The Giver</p>
        <p>Lois Lowry</p>
    </div>
"""
soup = BeautifulSoup(text, features="lxml")

try:
    Book.find_all(soup)
except FieldExtractionException as e:
    print(e)

If no scope elements are matched, `find_all` method returns empty list.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector
from soupsavvy.models import BaseModel
from soupsavvy.operations import Operation, Text
from soupsavvy.selectors.css import LastOfType


class Book(BaseModel):

    __scope__ = TypeSelector("div") & ClassSelector("book")

    title = ClassSelector("title") | Text()
    price = ClassSelector("price") | Text() | Operation(int)
    author = (LastOfType() & TypeSelector("p")) | Text()


text = """
    <div class="ebook" href="www.book.com">
        <p class="title">Animal Farm</p>
        <p class="price">100</p>
        <p>George Orwell</p>
    </div>
    <p class="title">Animal Farm</p>
    <p class="price">100</p>
    <p>George Orwell</p>
"""
soup = BeautifulSoup(text, features="lxml")
Book.find_all(soup)

### Recursive option

Recursive option applies only to scope search. When set to `True`, model scope is searched in all descendants of provided tag, if `False` only in direct children. When scope is found, field selectors search for elements always in recursive mode irrespective of `recursive` parameter.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector
from soupsavvy.models import BaseModel
from soupsavvy.operations import Operation, Text
from soupsavvy.selectors.css import LastOfType


class Book(BaseModel):

    __scope__ = TypeSelector("div") & ClassSelector("book")

    title = ClassSelector("title") | Text()
    price = ClassSelector("price") | Text() | Operation(int)
    author = (LastOfType() & TypeSelector("p")) | Text()


text = """
    <div class="book" href="www.book.com">
        <span>
            <p class="title">Animal Farm</p>
            <p class="price">100</p>
            <span class="author">
                <p>George Orwell</p>
            </span>
        </span>
    </div>
"""
soup = BeautifulSoup(text, features="html.parser")
Book.find(soup, recursive=False)

In order to change this behavior and find field elements only within scope element children, relative selector can be used. It's best to create it with `Anchor`.
`Anchor > selector` narrows down the search to only child elements. In example below, only `price` elements that are direct children of `book` element are matched.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import Anchor, ClassSelector, TypeSelector
from soupsavvy.models import All, BaseModel
from soupsavvy.operations import Operation, Text


class Book(BaseModel):

    __scope__ = TypeSelector("div") & ClassSelector("book")

    title = ClassSelector("title") | Text()
    price = All((Anchor > ClassSelector("price")) | Text() | Operation(int))


text = """
    <div class="book" href="www.book.com">
        <span>
            <p class="title">Animal Farm</p>
            <p class="price">100</p>
            <p class="price">50</p>
            <span class="author">
                <p>George Orwell</p>
            </span>
        </span>
        <p class="price">200</p>
    </div>
"""
soup = BeautifulSoup(text, features="html.parser")
Book.find(soup)

With non-recursive option, scope is search only within children of element passed to `find` methods. In example below, scope element is found only if this element is `span`. When `body` is passed, it does not have any children, that would have type `div` and class `book`, thus scope is not found and returned model is `None`.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector
from soupsavvy.models import BaseModel
from soupsavvy.operations import Operation, Text
from soupsavvy.selectors.css import LastOfType


class Book(BaseModel):

    __scope__ = TypeSelector("div") & ClassSelector("book")

    title = ClassSelector("title") | Text()
    price = ClassSelector("price") | Text() | Operation(int)
    author = (LastOfType() & TypeSelector("p")) | Text()


text = """
    <span>
        <div class="book" href="www.book.com">
            <p class="title">Animal Farm</p>
            <p class="price">100</p>
            <p>George Orwell</p>
        </div>
    </span>
"""
soup = BeautifulSoup(text, features="lxml")
result = Book.find(soup.body, recursive=False)  # type: ignore
assert result is None

Book.find(soup.span, recursive=False)  # type: ignore

`__post_init__` can be used to replace some operations or perform more complex one, that depend on other extracted fields. New attributes can be set in this method as well.

In [None]:
from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector
from soupsavvy.models import BaseModel
from soupsavvy.operations import Operation, Text
from soupsavvy.selectors.css import LastOfType


class Book(BaseModel):

    __scope__ = TypeSelector("div") & ClassSelector("book")

    title = ClassSelector("title") | Text()
    price = ClassSelector("price") | Text()
    author = (LastOfType() & TypeSelector("p")) | Text()

    def __post_init__(self) -> None:
        self.price = int(str(self.price).strip("$"))
        self.title = str(self.title).upper()
        self.affordable = (self.price < 100) or (self.author == "George Orwell")


text = """
    <div class="book" href="www.book.com">
        <p class="title">Animal Farm</p>
        <p class="price">100$</p>
        <p>George Orwell</p>
    </div>
"""
soup = BeautifulSoup(text, features="lxml")
result = Book.find(soup)
print(result)
print(f"Is affordable: {result.affordable}")

For fans of clean and consistent typing, `typing.cast` can be used to hint type checkers on type of instance field. By default, it expects the same type as field selector type. In example below, `typing.cast` is used to hint type checker, that attribute `title` is of type `str` and `price` is of type `int` or is `None`.

In [None]:
from typing import cast, Optional

from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector
from soupsavvy.models import BaseModel, SkipNone
from soupsavvy.operations import Operation, Text


class Book(BaseModel):

    __scope__ = TypeSelector("div") & ClassSelector("book")

    title = cast(str, ClassSelector("title") | Text())
    price = cast(Optional[int], ClassSelector("price") | SkipNone(Text() | Operation(int)))


text = """
    <div class="book" href="www.book.com">
        <p class="title">Animal Farm</p>
        <p class="price">100</p>
        <p>George Orwell</p>
    </div>
"""
soup = BeautifulSoup(text, features="lxml")
Book.find(soup)