-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update headless browser docs #4613
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4613 +/- ##
=======================================
Coverage 88.42% 88.42%
=======================================
Files 162 162
Lines 10522 10522
Branches 1521 1521
=======================================
Hits 9304 9304
Misses 944 944
Partials 274 274 |
docs/topics/dynamic-content.rst
Outdated
|
||
class PyppeteerSpider(scrapy.Spider): | ||
name = "pyppeteer" | ||
start_urls = ["data:,"] # avoid making an actual upstream request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL, nice trick
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the length of the Puppeteer-specific data, I would create a “Puppeteer” subsection instead of adding this directly under “Using a headless browser”.
On a different topic, was the removal of Selenium on purpose? I’m OK with favoring Puppeteer over Selenium in the documentation disclosing the pros of the former, but I am not sure about removing Selenium as an option as long as scrapy-selenium
works; it’s a Scrapy plugin that people can use without the need to use the asyncio reactor. And if Pupeteer gets its own section, I guess the same makes sense for Selenium, even if the section is 1 paragraph long.
docs/topics/dynamic-content.rst
Outdated
|
||
* https://github.com/elacuesta/scrapy-pyppeteer | ||
* https://github.com/lopuhin/scrapy-pyppeteer | ||
* https://github.com/clemfromspace/scrapy-puppeteer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if we could feature some pros of using each solution. And if any solution provides no pros over the rest, it may be better not to cover it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed https://github.com/lopuhin/scrapy-pyppeteer and https://github.com/clemfromspace/scrapy-puppeteer because they are unmaintained, and https://github.com/clemfromspace/scrapy-selenium because Selenium is blocking and I don't think we should encourage it's usage here.
Should this be now rewritten to Playwright? |
Co-authored-by: Adrián Chaves <adrian@chaves.io>
Co-authored-by: Adrián Chaves <adrian@chaves.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest keeping the alphabetical sorting of the links at the end, but other than that ✔️
Closes #4484