# Showcase

We will explore how to use `soupsavvy` to extract information from the [Rotten Tomatoes](https://editorial.rottentomatoes.com/) website, a well-known review aggregator for film and television.

In [None]:
import requests
from bs4 import BeautifulSoup

from soupsavvy import to_soupsavvy

headers = {
    "User-Agent": "Mozilla/5.0",
    "Accept-Language": "en-US,en;q=0.9",
}
url = "https://editorial.rottentomatoes.com/guide/best-movies-of-all-time/"
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "lxml")
element = to_soupsavvy(soup)

## Selectors

Selectors are the core feature of `soupsavvy`, enabling a declarative approach to locating HTML elements. In this example, we use `movie_selector` to identify movie elements on the webpage, which are characterized by `p` tag name and `movie` class attribute. For a detailed guide and additional examples, see [tutorial](https://soupsavvy.readthedocs.io/en/stable/demos/selectors.html).

In [None]:
from soupsavvy import ClassSelector, TypeSelector

movie_selector = ClassSelector("movie") & TypeSelector("p")
result = movie_selector.find_all(element)

print(f"Found {len(result)} movies.\n")
print(result[0])

## Pipelines

`soupsavvy` selection pipeline is a combination of selectors and operations, which provides an efficient way to extract and transform information from a webpage. For instance, we can use `title_pipeline` to locate the first 5 movie titles, extract their text, and transform them to uppercase. For detailed guide with examples, see [tutorial](https://soupsavvy.readthedocs.io/en/stable/demos/models.html#combining-with-selector).

In [None]:
from soupsavvy import ClassSelector, TypeSelector
from soupsavvy.operations import Operation, Text

movie_selector = ClassSelector("movie") & TypeSelector("p")
title_pipeline = (
    (movie_selector >> ClassSelector("title")) | Text(strip=True) | Operation(str.upper)
)
result = title_pipeline.find_all(element, limit=5)
print(result)

## Models

Models define **scraping schemas**, utilizing selectors and operations to extract structured information from the webpage. Model is user-defined data structure representing an entity of interest in scraping. For example, the `Movie` model can be used to extract details such as the movie's `title` and `score` from [Rotten Tomatoes](https://www.rottentomatoes.com/). A comprehensive guide with examples on models is available [here](https://soupsavvy.readthedocs.io/en/stable/demos/models.html#model).

In [None]:
from soupsavvy import ClassSelector, TypeSelector
from soupsavvy.models import BaseModel, post
from soupsavvy.operations import Text


class Movie(BaseModel):
    __scope__ = ClassSelector("movie") & TypeSelector("p")

    title = ClassSelector("title") | Text(strip=True)
    score = ClassSelector("score") | Text()

    @post("score")
    def process_score(self, score: str) -> int:
        """
        There are multiple methods of transforming field values,
        field post-process methods are one of them.
        """
        return int(score.strip("%"))


Movie.find(element)

### Migrations

Model instances can be seamlessly migrated to other data structures, with object attributes passed directly to the target model's constructor. This functionality ensures smooth integration with third-party libraries such as `pydantic` or `sqlalchemy`. For example, `Movie` instances can be migrated to a parallel `sqlalchemy` model, `MovieSQL`, and saved to a database. More examples and complex use cases can be found in [tutorial](https://soupsavvy.readthedocs.io/en/stable/demos/models.html#migration).

In [None]:
from sqlalchemy import Column, Identity, Integer, MetaData, String, create_engine
from sqlalchemy.orm import Session, declarative_base

from soupsavvy import ClassSelector, TypeSelector
from soupsavvy.models import BaseModel
from soupsavvy.operations import Operation, Text

Base = declarative_base(metadata=MetaData())


class SQLMovie(Base):
    __tablename__ = "movie"

    id = Column(Integer, Identity(start=1, increment=1), primary_key=True)
    title = Column(String(128), nullable=False)
    score = Column(Integer)

    def __repr__(self):
        return f"<Movie(title={self.title}, score={self.score})>"


engine = create_engine("sqlite:///:memory:")
Base.metadata.create_all(engine)


class Movie(BaseModel):
    __scope__ = ClassSelector("movie") & TypeSelector("p")

    title = ClassSelector("title") | Text(strip=True)
    # chaining operations is another way of transforming field values
    score = ClassSelector("score") | Text() | Operation(lambda x: int(x.strip("%")))


movie = Movie.find(element)
sql_movie = movie.migrate(SQLMovie)

with Session(engine) as session:
    session.add(sql_movie)
    session.commit()

    result = session.query(SQLMovie).one()

result

### Composite Models

Models in `soupsavvy` are highly flexible, supporting multiple fields, including sub-models. For instance, we will use `MovieDetails` as a field within the `Movie` model to separate movie-specific information from Rotten Tomatoes' `score` and `rank`. For more information, check out [tutorial](https://soupsavvy.readthedocs.io/en/stable/demos/models.html#model-composition).

In [None]:
from soupsavvy import ClassSelector, TypeSelector
from soupsavvy.models import BaseModel, post
from soupsavvy.operations import Operation, Text
from soupsavvy.selectors.css import FirstOfType


class MovieDetails(BaseModel):
    __scope__ = ClassSelector("details")

    title = ClassSelector("title") | Text(strip=True)
    year = (
        ClassSelector("year")
        | Text()
        | Operation(lambda x: x.strip("()"))
        | Operation(int)
    )


class Movie(BaseModel):
    __scope__ = (ClassSelector("movie") & TypeSelector("p")) << TypeSelector("tr")

    rank = (FirstOfType() & TypeSelector("td")) | Text()
    score = ClassSelector("score") | Text()
    # Model can be a field of another model
    details = MovieDetails

    @post("score")
    def process_score(self, score: str) -> int:
        return int(score.strip("%"))

    def __post_init__(self) -> None:
        """
        Post initialization method is another way of transforming field values.
        Here, access to all fields is available.
        """
        self.rank = int(str(self.rank).strip("."))


result = Movie.find_all(element)
print(f"Found {len(result)} movies")
result[8]

## Conclusion

These are just a few of the many powerful features available in `soupsavvy`.  
To explore them further, dive into the [Documentation](https://soupsavvy.readthedocs.io) and start building your scraping workflows!

**Enjoy `soupsavvy` and leave us feedback!**  
**Happy scraping!**