Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new decorators: css(), xpath(), regex() and text() #34

Closed
roniemartinez opened this issue Feb 22, 2022 · 1 comment
Closed

Add new decorators: css(), xpath(), regex() and text() #34

roniemartinez opened this issue Feb 22, 2022 · 1 comment
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@roniemartinez
Copy link
Owner

BeautifulSoup4 (#19, #32) and Parcel (#33) have methods like .css(), .xpath(), etc. and we can use these to add useful and more readable way to find elements in a page. For developers/web scrapers, the words "CSS" and "XPath" sounds more familiar than the word "select".

New decorators:

  1. @css() - for CSS selectors
  2. @xpath() - for XPath selectors
  3. @regex() - so as to not be confused with the standard library re
  4. @text() - this is supported by Playwright but these can be just created from the regex support of BeautifulSoup4 and Parcel
@roniemartinez roniemartinez added enhancement New feature or request help wanted Extra attention is needed labels Feb 22, 2022
@roniemartinez
Copy link
Owner Author

Closing this as this makes the framework more complicated.

Instead, you can specify the selector type within @select() decorator: https://roniemartinez.github.io/dude/basic_usage.html#supported-selector-types

Examples (5 options):

from dude import select


@select(selector="<any-selector>")  # Any type of selector
def handler1(element):
    return {"<key>": "<value-extracted-from-element>"}


@select(css="<css-selector>")  # CSS selector
def handler2(element):
    return {"<key>": "<value-extracted-from-element>"}


@select(xpath="<xpath-selector>")  # XPath selector
def handler3(element):
    return {"<key>": "<value-extracted-from-element>"}


@select(text="<text-selector>")  # Text selector
def handler4(element):
    return {"<key>": "<value-extracted-from-element>"}


@select(regex="<regex-selector>")  # Regex selector
def handler5(element):
    return {"<key>": "<value-extracted-from-element>"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant