Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update headless browser docs #4613

Merged
merged 7 commits into from
Jul 29, 2021
Merged

Conversation

elacuesta
Copy link
Member

Closes #4484

@codecov
Copy link

codecov bot commented Jun 6, 2020

Codecov Report

Merging #4613 (b66530c) into master (abe0b37) will not change coverage.
The diff coverage is n/a.

❗ Current head b66530c differs from pull request most recent head 4b62ac6. Consider uploading reports for the commit 4b62ac6 to get more accurate results

@@           Coverage Diff           @@
##           master    #4613   +/-   ##
=======================================
  Coverage   88.42%   88.42%           
=======================================
  Files         162      162           
  Lines       10522    10522           
  Branches     1521     1521           
=======================================
  Hits         9304     9304           
  Misses        944      944           
  Partials      274      274           


class PyppeteerSpider(scrapy.Spider):
name = "pyppeteer"
start_urls = ["data:,"] # avoid making an actual upstream request
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL, nice trick

Copy link
Member

@Gallaecio Gallaecio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the length of the Puppeteer-specific data, I would create a “Puppeteer” subsection instead of adding this directly under “Using a headless browser”.

On a different topic, was the removal of Selenium on purpose? I’m OK with favoring Puppeteer over Selenium in the documentation disclosing the pros of the former, but I am not sure about removing Selenium as an option as long as scrapy-selenium works; it’s a Scrapy plugin that people can use without the need to use the asyncio reactor. And if Pupeteer gets its own section, I guess the same makes sense for Selenium, even if the section is 1 paragraph long.

docs/topics/dynamic-content.rst Outdated Show resolved Hide resolved
docs/topics/dynamic-content.rst Outdated Show resolved Hide resolved
docs/topics/dynamic-content.rst Outdated Show resolved Hide resolved

* https://github.com/elacuesta/scrapy-pyppeteer
* https://github.com/lopuhin/scrapy-pyppeteer
* https://github.com/clemfromspace/scrapy-puppeteer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if we could feature some pros of using each solution. And if any solution provides no pros over the rest, it may be better not to cover it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed https://github.com/lopuhin/scrapy-pyppeteer and https://github.com/clemfromspace/scrapy-puppeteer because they are unmaintained, and https://github.com/clemfromspace/scrapy-selenium because Selenium is blocking and I don't think we should encourage it's usage here.

@wRAR
Copy link
Member

wRAR commented Jul 27, 2021

Should this be now rewritten to Playwright?

Copy link
Member

@Gallaecio Gallaecio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest keeping the alphabetical sorting of the links at the end, but other than that ✔️

@wRAR wRAR merged commit 22bd012 into scrapy:master Jul 29, 2021
@elacuesta elacuesta deleted the docs-headless-browser branch September 8, 2021 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scrapy with Puppeteer and/or Playwright?
4 participants