Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to configure this with a remote service like Browserless.io? #216

Closed
jklina opened this issue Nov 12, 2021 · 17 comments
Closed

Comments

@jklina
Copy link
Contributor

jklina commented Nov 12, 2021

I was curious if it was possible to configure the protocol to work with a remote service rather than a local instance of Chrome. Thanks so much!

@route
Copy link
Member

route commented Nov 12, 2021

@jklina There's :url https://github.com/rubycdp/ferrum#customization it should be like browserWSEndpoint for browserless. Try it and let us know!

@jklina
Copy link
Contributor Author

jklina commented Nov 12, 2021

Thanks for the tip! I had tried that, but I think it expects an http URL, not the wss url:

irb(main):006:0> b = Ferrum::Browser.new(url: "wss://chrome.browserless.io?token=sometoken")          
Traceback (most recent call last):
       16: from (irb):6:in `rescue in irb_binding'
       15: from (irb):6:in `new'
       14: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/gems/2.7.0/gems/ferrum-0.11/lib/ferrum/browser.rb:63:in `initialize'
       13: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/gems/2.7.0/gems/ferrum-0.11/lib/ferrum/browser.rb:125:in `start'
       12: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/gems/2.7.0/gems/ferrum-0.11/lib/ferrum/browser/process.rb:30:in `start'
       11: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/gems/2.7.0/gems/ferrum-0.11/lib/ferrum/browser/process.rb:30:in `new'
       10: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/gems/2.7.0/gems/ferrum-0.11/lib/ferrum/browser/process.rb:61:in `initialize'
        9: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/2.7.0/net/http.rb:458:in `get'
        8: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/2.7.0/net/http.rb:481:in `get_response'
        7: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/2.7.0/net/http.rb:606:in `start'
        6: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/2.7.0/net/http.rb:933:in `start'
        5: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/2.7.0/net/http.rb:483:in `block in get_response'
        4: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/2.7.0/net/http.rb:1393:in `request_get'
        3: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/2.7.0/net/http.rb:1393:in `new'
        2: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/2.7.0/net/http/request.rb:15:in `initialize'
        1: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/2.7.0/net/http/generic_request.rb:17:in `initialize'
ArgumentError (not an HTTP URI)
irb(main):007:0> 

It looks like the library uses an http url to get the wss url. It seems the host and port options might be what I'm looking for, but they seem to be ignored. I'll keep poking around in the mean time!

@route
Copy link
Member

route commented Nov 12, 2021

Oh I'm afraid that it currently is not possible because we expect url for the browser rather than ws. Take a look at https://github.com/rubycdp/ferrum/blob/master/lib/ferrum/browser/process.rb#L67 I think we might also accept ws_url. Just try to fix it locally and if it works we can merge a PR

@geoffharcourt
Copy link

We were able to use remote Chrome in a separate Docker container (from Browserless' images) with Cuprite by passing the URL in to the url option when configuring the driver.

@nickhammond
Copy link
Contributor

It looks like this might be possible via this section that mentions via host and port in their documentation, I don't currently use Browserless though.

@ktimothy
Copy link

ktimothy commented Aug 17, 2022

I was also using capybara + cuprite + ferrum with browserless.io docker image. It worked, just had to specify browser url for ferrum (cuprite) driver.

@dhnaranjo
Copy link

Self-hosted docker images from Browserless do not experience the problem the OP is describing. I spent some time on this today and found myself out of my depth, but here's what I know:

  • Browserless requires SSL connections, Ferrum enforces HTTP in Ferrum::Browser::Process#parse_browser_versions and also builds a non-SSL socket in Ferrum::Browser::WebSocket#initialize

  • Both HTTPS and WSS connections to Browserless require you to pass a query param with your API token, which is stripped out during Ferrum::Browser::Process#initialize, in Ferrum::Browser::Process#parse_browser_versions, and various other places.

I tried hacking something together but this is out of my nerd comfort zone. Hope this helps someone else figure it out.

@route
Copy link
Member

route commented Nov 10, 2022

I'll implement this when I find time, but the main issue now is that browserless uses only one connection, but ferrum uses many (one per browser and one per page) which seemed as a good design decision to me at the moment and even now.

@dhnaranjo
Copy link

For now I'm just managing my own lil Chrome services, but I will definitely appreciate it when I can turn an infrastructure problem into a money problem.

Ferrum already simplifies things plenty by letting me skip Puppetteer. I appreciate y'alls work.

@borlafdev
Copy link

First of all thanks to @route for this amazing library, i'vve been using it for years and the api is great.

I have come to this issue after trying to connect ferrum with a chrome instance provided by a third-party provider (Brighdata 'Scraping Browser' product)

I have modified some methods to include basic authentication in the first communication with the browser (json/version) and add OpenSSL socket to open a connection with their websockers (I don't know why the TCP connection did not work).

After getting the WebSockets connection working, it returns an exception when trying to enable the page (by calling the "Page.enable" method)

I have tried to compare the commands that puppeteer sends via WebSockets and I have tried to patch the ferrum code to call the same methods in the same order (using sessionId in addition to contextId and targetId), but also when calling "Page.enable", the call times out and raises DeadBrowserError exception.

I would love to understand how ferrum uses many connections to try to work with third party providers, I could even take charge of developing it and add a pull request, but I am lost on how to do this.

@route
Copy link
Member

route commented Sep 14, 2023

@borlafdev I think the difference is that for every page Ferrum opens up a new connection, which is not the case for Puppeteer. To multiplex and use the same WebSocket they used to call something like sendMessageToTarget https://chromedevtools.github.io/devtools-protocol/tot/Target/#method-sendMessageToTarget which is now deprecated.
Instead of using a dedicated WebSocket we might do the same thing they do with sessions, but I'm afraid it requires RND and things like Target.setAutoAttach or Target.autoAttachRelated to start using sessions.

but also when calling "Page.enable", the call times out and raises DeadBrowserError exception.

I think this is due to the page not being available on the websocket because it's behind the service.

route added a commit that referenced this issue Jan 7, 2024
@route route closed this as completed in e1efe3e Jan 7, 2024
@Aubermean
Copy link

Both HTTPS and WSS connections to Browserless require you to pass a query param with your API token, which is stripped out during Ferrum::Browser::Process#initialize, in Ferrum::Browser::Process#parse_browser_versions, and various other places.

Is this still the case? I am struggling to get Ferrum to pass on the query/url params, it seems to be stripping them..?

@ahiskali
Copy link

Is this still the case? I am struggling to get Ferrum to pass on the query/url params, it seems to be stripping them..?

Same problem here, passing proxy url is working for connections to browserless through devtools, but not working through ferrum.

@route
Copy link
Member

route commented May 31, 2024

Guys you could have provided version at least, and a script you are running. That would be more helpful than +1

@PabloScolpino
Copy link

Hey @route ,

First of all thank you for all the work.

Secondly, I have created this example project which reproduces the error.

after cloning, you can make build setup_test test and you will see it.

@route route reopened this Jun 10, 2024
@route
Copy link
Member

route commented Jun 10, 2024

@PabloScolpino awesome thanks I'll take a look. WIll keep it open for now.

@route
Copy link
Member

route commented Jun 12, 2024

@geoffharcourt

    cuprite (0.15)
      capybara (~> 3.0)
      ferrum (~> 0.14.0)

latest cuprite doesn't depend on the latest ferrum, and I believe this is fixed in latest Ferrum.

@route route closed this as completed Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants