Skip to content
This repository was archived by the owner on Mar 16, 2025. It is now read-only.

✨ Events and Basic Spider

Choose a tag to compare

@roniemartinez roniemartinez released this 23 Mar 19:22
· 401 commits to master since this release
0.11.0
37ccfff

What's Changed

Features

Documentation

Fixes

Other

✨ Basic Spider

Example

dude scrape ... --follow-urls

or

if __name__ == "__main__":
    import dude

    dude.run(..., follow_urls=True)

✨ Events

More details at https://roniemartinez.github.io/dude/advanced/14_events.html

Example

import uuid
from pathlib import Path

from dude import post_setup, pre_setup, startup

SAVE_DIR: Path


@startup()
def initialize_csv():
    """
    Connection to databases or API and other use-cases can be done here before the web scraping process is started.
    """
    global SAVE_DIR
    SAVE_DIR = Path(__file__).resolve().parent / "temp"
    SAVE_DIR.mkdir(exist_ok=True)


@pre_setup()
def screenshot(page):
    """
    Perform actions here after loading a page (or after a successful HTTP response) and before modifying things in the
    setup stage.
    """
    unique_name = str(uuid.uuid4())
    page.screenshot(path=SAVE_DIR / f"{unique_name}.png")  # noqa


@post_setup()
def print_pdf(page):
    """
    Perform actions here after running the setup stage.
    """
    unique_name = str(uuid.uuid4())
    page.pdf(path=SAVE_DIR / f"{unique_name}.pdf")  # noqa


if __name__ == "__main__":
    import dude

    dude.run(urls=["https://dude.ron.sh"])

Diagram showing when events are executed

image

Full Changelog: 0.10.1...0.11.0