Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Browser Rendering? #18

Closed
chavenor opened this issue Oct 17, 2019 · 7 comments
Closed

Browser Rendering? #18

chavenor opened this issue Oct 17, 2019 · 7 comments

Comments

@chavenor
Copy link

Is HMLT, CSS, JS in-browser rendering on the roadmap?

Thanks!

@oltarasenko
Copy link
Collaborator

@chavenor Yes, definitely!

@Ziinc
Copy link
Collaborator

Ziinc commented Nov 28, 2019

This could be handled by a middleware. For the browser rendering, there are two stable options:

  • puppeteer (chrome only)
  • hound (requires Selenium server running + browser)

I'm thinking the hound route would be better

@oltarasenko
Copy link
Collaborator

I had a plan to showcase a library which we were developing in the past: https://github.com/scrapinghub/splash.

It might be used as a first option (e.g. no development required, just use splash as a proxy). However in the long term I would love to have the support for routing requests through the headless browsers (e.g. I think now browsers can be controlled via HTTPApi directly without selenium).

  • Selenium has limitations regarding setting http requests headers, etc.
    Still thinking here.

@Ziinc
Copy link
Collaborator

Ziinc commented Nov 28, 2019

The usual solution for modifying the http headers is through the BrowserMob proxy

https://github.com/lightbody/browsermob-proxy

Using splash would require maintaining the elixir wrapper for the http api, which would be beyond the scope of crawly as a crawling engine.

Using hound would leverage their existing API wrapper instead.

If the goal is to simply render, then splash might be a good choice. If additional things like closing modals, interacting with the page is necessary, then hound might be a better choice

@Ziinc
Copy link
Collaborator

Ziinc commented Dec 30, 2019

Merging issue into #27

@Ziinc Ziinc closed this as completed Dec 30, 2019
@oltarasenko
Copy link
Collaborator

oltarasenko commented Dec 30, 2019

@Ziinc I want to discuss this again. Splash is not a full-featured replacement of the browser-based requests system. (It's a JS renderer)

We need to work on the support of something like headless chrome client. It will be required for those targets which would ban by the fingerprints of the HTTP header strings.

@oltarasenko oltarasenko reopened this Dec 30, 2019
@oltarasenko
Copy link
Collaborator

@chavenor I would assume that with basic splash renderer this can be closed. Of course, we would have to continue towards the headless Chrome. However, for now, I don't see demand or requests for that feature immediately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants