-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Browser Rendering? #18
Comments
@chavenor Yes, definitely! |
This could be handled by a middleware. For the browser rendering, there are two stable options:
I'm thinking the hound route would be better |
I had a plan to showcase a library which we were developing in the past: https://github.com/scrapinghub/splash. It might be used as a first option (e.g. no development required, just use splash as a proxy). However in the long term I would love to have the support for routing requests through the headless browsers (e.g. I think now browsers can be controlled via HTTPApi directly without selenium).
|
The usual solution for modifying the http headers is through the BrowserMob proxy https://github.com/lightbody/browsermob-proxy Using splash would require maintaining the elixir wrapper for the http api, which would be beyond the scope of crawly as a crawling engine. Using hound would leverage their existing API wrapper instead. If the goal is to simply render, then splash might be a good choice. If additional things like closing modals, interacting with the page is necessary, then hound might be a better choice |
Merging issue into #27 |
@Ziinc I want to discuss this again. Splash is not a full-featured replacement of the browser-based requests system. (It's a JS renderer) We need to work on the support of something like headless chrome client. It will be required for those targets which would ban by the fingerprints of the HTTP header strings. |
@chavenor I would assume that with basic splash renderer this can be closed. Of course, we would have to continue towards the headless Chrome. However, for now, I don't see demand or requests for that feature immediately. |
Is HMLT, CSS, JS in-browser rendering on the roadmap?
Thanks!
The text was updated successfully, but these errors were encountered: