Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced scraping system #48

Closed
metafates opened this issue Jun 29, 2022 · 0 comments · Fixed by #63
Closed

Advanced scraping system #48

metafates opened this issue Jun 29, 2022 · 0 comments · Fixed by #63
Assignees
Labels
feature New feature request

Comments

@metafates
Copy link
Owner

metafates commented Jun 29, 2022

Feature Description

Current scraping system is very weak and unstable with a lot of restrictions. If one site gets blocked it is very complicated to find a new one that would pass all the requirements. So I propose to use embedded scripts that would allow to define more complex actions.

Solution you would like

Ferret

Ferret is a declarative query language. It has the ability to scrape JS rendered pages, handle all page events and emulate user interactions.

Syntax looks like that

LET doc = DOCUMENT('https://github.com/topics')

FOR el IN ELEMENTS(doc, '.py-4.border-bottom')
    LIMIT 10

    LET url = ELEMENT(el, 'a')
    LET name = ELEMENT(el, '.f3')
    LET description = ELEMENT(el, '.f5')

    RETURN {
        name: TRIM(name.innerText),
        description: TRIM(description.innerText),
        url: 'https://github.com' + url.attributes.href
    }
            

Alternatives you have considered

Integrate Lua scripts with Gopher Lua. But that is way more complicated than Ferret and unnecessary to be honest

Anko is a great alternative!

Additional context

No response

@metafates metafates added the feature New feature request label Jun 29, 2022
@metafates metafates self-assigned this Jun 29, 2022
@metafates metafates pinned this issue Jun 29, 2022
@metafates metafates mentioned this issue Aug 8, 2022
@metafates metafates unpinned this issue Aug 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant