Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: run puppeteer in the browser #2119

Closed
Janpot opened this issue Feb 28, 2018 · 31 comments
Closed

Feature request: run puppeteer in the browser #2119

Janpot opened this issue Feb 28, 2018 · 31 comments

Comments

@Janpot
Copy link
Contributor

Janpot commented Feb 28, 2018

Just trying to feel the water here, but it seems to me that apart from downloading chrome and launching a browser, puppeteer isn't really doing anything that can't be done in a browser. I'm thinking puppeteer.connect() in a webpage. Would there be any interest in supporting this? Am I overlooking any barriers that are in the way of achieving this? I can probably make some time to look into it.

@aslushnikov
Copy link
Contributor

@Janpot this definitely sounds interesting. I don't think it makes sense to have it as a part of this repository - it deserves a separate project.

Do you have any success with this?

@Janpot
Copy link
Contributor Author

Janpot commented Apr 11, 2018

I tinkered a bit with it a month ago. I think it's feasible to create a build of puppeteer that runs in the browser but I haven't picked it back up. I think a separate project would only complicate things as the goal would be to not add code at all, just refactor here and there and add a build target.

@aslushnikov
Copy link
Contributor

I think a separate project would only complicate things as the goal would be to not add code at all, just refactor here and there and add a build target.

This would be ideal. @JoelEinbinder had a prototype as well and refactored pptr codebase to simplify things there. However, iirc he still needed to mock fs and some other node modules we use occasionally.

@Janpot
Copy link
Contributor Author

Janpot commented Apr 12, 2018

@aslushnikov Ok, so I did a quick and very very dirty test again, building puppeteer with browserify. To make it work:

  • I used exposify to shim require('ws') with WebSocket, and './BrowserFetcher' with null
  • I changed ChromiumRevision in Launcher.js and set it to null
  • I removed the second argument in new WebSocket(url) inConnection.js
  • I shimmed WebSocket with
      WebSocket.prototype.on = function (eventName, handler) {
        WebSocket.prototype.addEventListener.call(this, eventName, ({data}) => handler(data));
      }

Then with running browserless locally (docker run -p 3000:3000 browserless/chrome), I was able to make

const puppeteer = require('puppeteer');
puppeteer.connect({ browserWSEndpoint: 'ws://localhost:3000' })
  .then(async browser => {
    const page = await browser.newPage();
    await page.goto('https://example.com');
    console.log(await page.content())
  });

work in the browser.

So to list the main problems:

  1. helpers.projectRoot in ChromiumRevision breaks the browser
  2. WebSocket seems to be used in a browser incompatible way
  3. BrowserFetcher needs to be excluded

Will see if one of these days I can find some time to clean this up a bit, and find better solutions to these problems.

@aslushnikov
Copy link
Contributor

Thanks @Janpot for the follow-up

BrowserFetcher needs to be excluded

Right, I'd expect both BrowserFetcher and Launcher to be excluded.

I shimmed WebSocket with

This is an interesting approach. Another option might be implementing Connection class with whatever transport is used to drive pptr in browser.

@Janpot
Copy link
Contributor Author

Janpot commented Apr 12, 2018

Launcher seems to be needed to be able to connect. As for the shimming it was basically a "make it work as quickly as possible" approach. Will look into more maintainable ways later.

@Janpot
Copy link
Contributor Author

Janpot commented Apr 13, 2018

@aslushnikov I added my proof of concept as a PR #2374

@elisherer
Copy link
Contributor

The developer tools can overcome some browser security enforcements like CORS.
e.g. Accessing an iframe from a different origin and running javascript code on that frame will probably be blocked if tried from inside the browser, wouldn't it?

@aslushnikov
Copy link
Contributor

A nice summary on what's anti-bundleable in pptr was given here: #2245 (comment)

We should:

  • fix these
  • bundle puppeteer for web as part of our testsuite to make sure we don't regress the bundle'ability in future

@Janpot
Copy link
Contributor Author

Janpot commented Sep 6, 2018

correct me if I'm wrong, but now that puppeteer-core is a thing, I guess trying to bundle puppeteer doesn't make much sense anymore? I think I should rather concentrate on bundling puppeteer-core. I haven't picked this back up again, but I'd assume the browser incompatible thing that's left in that case is the websocket implementation.

@aslushnikov
Copy link
Contributor

correct me if I'm wrong, but now that puppeteer-core is a thing, I guess trying to bundle puppeteer doesn't make much sense anymore?

@Janpot: puppeteer-core is the same codebase as puppeteer; but yes, you'd probably want to depend on puppeteer-core.

I'd assume the browser incompatible thing that's left in that case is the websocket implementation.

Also the dynamic imports - I have a promising draft to cleanup these.

aslushnikov added a commit to aslushnikov/puppeteer that referenced this issue Sep 6, 2018
This patch removes all dynamic requires in Puppeteer. This should
make it much simpler to bundle puppeteer/puppeteer-core packages.

We used dynamic requires in a few places in lib/:
- BrowserFetcher was choosing between `http` and `https` based on some
  runtime value. This was easy to fix with explicit `require`.
- BrowserFetcher and Launcher needed to know project root to store
  chromium revisions and to read package name and chromium revision from
  package.json. (projectRoot value would be different in node6).
  Instead of doing a backwards logic to infer these
  variables, we now pass them directly from `//index.js`.

With this patch, I was able to bundle Puppeteer using browserify and
the following config in `package.json`:

```json
  "browser": {
    "./lib/BrowserFetcher.js": false,
    "ws": "./lib/BrowserWebSocket",
    "fs": false,
    "child_process": false,
    "rimraf": false,
    "readline": false
  }
```

(where `lib/BrowserWebSocket.js` is a courtesy of @Janpot from
puppeteer#2374)

And command:

```sh
$ browserify -r puppeteer:./index.js > ppweb.js
```

References puppeteer#2119
aslushnikov added a commit that referenced this issue Sep 6, 2018
This patch removes all dynamic requires in Puppeteer. This should
make it much simpler to bundle puppeteer/puppeteer-core packages.

We used dynamic requires in a few places in lib/:
- BrowserFetcher was choosing between `http` and `https` based on some
  runtime value. This was easy to fix with explicit `require`.
- BrowserFetcher and Launcher needed to know project root to store
  chromium revisions and to read package name and chromium revision from
  package.json. (projectRoot value would be different in node6).
  Instead of doing a backwards logic to infer these
  variables, we now pass them directly from `//index.js`.

With this patch, I was able to bundle Puppeteer using browserify and
the following config in `package.json`:

```json
  "browser": {
    "./lib/BrowserFetcher.js": false,
    "ws": "./lib/BrowserWebSocket",
    "fs": false,
    "child_process": false,
    "rimraf": false,
    "readline": false
  }
```

(where `lib/BrowserWebSocket.js` is a courtesy of @Janpot from
#2374)

And command:

```sh
$ browserify -r puppeteer:./index.js > ppweb.js
```

References #2119
aslushnikov added a commit to aslushnikov/puppeteer that referenced this issue Sep 7, 2018
Currently connection assumes that transport is a websocket
and tries to handle websocket-related errors.

This patch:
- moves ConnectionTransport interface to use callbacks instead
  of events. This way it could be used in browser context as well.
- introduces WebSocketTransport that implements ConnectionTransport
  interface for ws.

This is a preparation step for 2 things:
- exposing `transport` option in the `puppeteer.connect` method
- better support for `browserify`

References puppeteer#2119
aslushnikov added a commit that referenced this issue Sep 7, 2018
Currently connection assumes that transport is a websocket
and tries to handle websocket-related errors.

This patch:
- moves ConnectionTransport interface to use callbacks instead
  of events. This way it could be used in browser context as well.
- introduces WebSocketTransport that implements ConnectionTransport
  interface for ws.

This is a preparation step for 2 things:
- exposing `transport` option in the `puppeteer.connect` method
- better support for `browserify`

References #2119
aslushnikov added a commit to aslushnikov/puppeteer that referenced this issue Sep 7, 2018
Bundled version of Puppeteer should rely on native WebSocket.

Luckily, 'ws' module supports the same interface as the native
browser websockets. This patch switches WebSocketTransport to
use the browser-compliant interface of 'ws'.

After this patch, I was able to bundle Puppeteer for browser
using the following config in `package.json`:

```json
"browser": {
  "./lib/BrowserFetcher.js": false,
  "ws": "./lib/BrowserWebSocket",
  "fs": false,
  "child_process": false,
  "rimraf": false,
  "readline": false
}
```

where `./lib/BrowserWebSocket` is:

```js
module.exports = WebSocket;
```

and the bundling command is:

```sh
$ browserify -r ./index.js:puppeteer > ppweb.js
```

References puppeteer#2119
aslushnikov added a commit that referenced this issue Sep 11, 2018
Bundled version of Puppeteer should rely on native WebSocket.

Luckily, 'ws' module supports the same interface as the native
browser websockets. This patch switches WebSocketTransport to
use the browser-compliant interface of 'ws'.

After this patch, I was able to bundle Puppeteer for browser
using the following config in `package.json`:

```json
"browser": {
  "./lib/BrowserFetcher.js": false,
  "ws": "./lib/BrowserWebSocket",
  "fs": false,
  "child_process": false,
  "rimraf": false,
  "readline": false
}
```

where `./lib/BrowserWebSocket` is:

```js
module.exports = WebSocket;
```

and the bundling command is:

```sh
$ browserify -r ./index.js:puppeteer > ppweb.js
```

References #2119
@Janpot Janpot closed this as completed Sep 13, 2018
@noamalffasy
Copy link

When will the updated version be out?

@aslushnikov
Copy link
Contributor

@noamalffasy The next release is scheduled for October, 4 (you can see next release date in the very beginning of our documentation).

@noamalffasy
Copy link

Is there a way I can get this version without waiting until the next release?

@aslushnikov
Copy link
Contributor

@noamalffasy you can either clone from the github directly, or install the tip-of-tree release with npm i puppeteer@next.

@noamalffasy
Copy link

That worked!
Thank you!

@noamalffasy
Copy link

noamalffasy commented Sep 26, 2018

Okay I have an issue now with bundling,

ERROR in ./node_modules/puppeteer/lib/WebSocketTransport.js
Module not found: Error: Can't resolve 'ws' in '/node_modules/puppeteer/lib'
 @ ./node_modules/puppeteer/lib/WebSocketTransport.js 16:18-31
 @ ./node_modules/puppeteer/lib/Launcher.js
 @ ./node_modules/puppeteer/lib/Puppeteer.js
 @ ./node_modules/puppeteer/index.js

I'm using webpack

@aslushnikov
Copy link
Contributor

@noamalffasy I'm not sure what's the lib/WebSocketTransport.js; there's a suitable one in utils/browser/WebSocket.js.

Note though: we don't currently publish bits we use to bundle, but you can git clone puppeteer and then run npm run bundle to bundle it locally.

@noamalffasy
Copy link

So this feature is only available if you clone the repository? Or is it temporary?

@aslushnikov
Copy link
Contributor

@noamalffasy we're not shipping any bundled version of puppeteer for web, but we made sure that there are no obstacles in bundling puppeteer.

@noamalffasy
Copy link

But there is an issue, maybe I need to change my webpack config?

const path = require("path");

module.exports = {
  entry: "./src/main.ts",
  mode: "production",
  module: {
    rules: [
      {
        test: /\.ts$/,
        loaders: "babel-loader",
        exclude: /node_modules/
      },
      {
        test: /\.js$/,
        use: ["source-map-loader"],
        enforce: "pre"
      }
    ]
  },
  resolve: {
    extensions: [".ts", ".js", ".json"]
  },
  output: {
    filename: "bundle.js",
    path: path.resolve(__dirname, "dist")
  }
};

@brandonros
Copy link

I wrote some code that scrapes some web pages. It doesn't do too well in a cloud hosted environment like DigitalOcean. It'd be neat if a user could load a page served by my API that would then allow their browser tabs to be controlled through the regular puppeteer API (if they permitted/allowed it, etc. etc.). This is the opposite of me having to waste the server resources to run a web browser, while still allowing me to do scriptable things like user input, clicking, evaluating scripts, etc.

Was that kind of the vision here? Is that possible and I am just misunderstood?

@aslushnikov
Copy link
Contributor

It'd be neat if a user could load a page served by my API that would then allow their browser tabs to be controlled through the regular puppeteer API (if they permitted/allowed it, etc. etc.). This is the opposite of me having to waste the server resources to run a web browser, while still allowing me to do scriptable things like user input, clicking, evaluating scripts, etc.

I think this is possible using the bundled version of puppeteer and extension's chrome.debugger.

Tip: you can pass a custom transport to puppeteer connect using the transport experimental option; check out our test in the /utils/browser/test.js

@woniesong92
Copy link

woniesong92 commented Nov 18, 2018

@aslushnikov Q: Can I use this to use puppeteer inside of an already opened browser? For example, if I'm already logged into Facebook, can I execute a Puppeteer script inside the same browser so I don't have to login again? I thought it wouldn't work because if I had to launch a new headless browser, the cookies would be gone but comments on this issue and the merged PR give me some hope. Looking forward to your reply!

@Janpot
Copy link
Contributor Author

Janpot commented Nov 18, 2018

@woniesong92 You can connect puppeteer to any browser that talks the devtools protocol. For that you'll first need to start chrome with an extra CLI flag --remote-debugging-port=9229. Then you can open http://localhost:9229/json/version and find the webSocketDebuggerUrl. Use that in puppeteer.connect as browserWSEndpoint instead of using puppeteer.launch.

@ryzam
Copy link

ryzam commented Dec 3, 2018

@Janpot do u have any document guideline how to run puppeteer in browser without running nodejs?

@evanrolfe
Copy link

evanrolfe commented Jul 5, 2019

I think this is possible using the bundled version of puppeteer and extension's chrome.debugger.

Tip: you can pass a custom transport to puppeteer connect using the transport experimental option; check out our test in the /utils/browser/test.js

@aslushnikov since chrome.debugger provides a sendCommand and onEvent functions, would it be possible to pass this in without having to use the experimental Target.exposeDevToolsProtocol command? i.e. we could have a ConnectionTransport class like ChromeDebuggerTransport which could be wrap the chrome.debugger functions so that Puppeteer could use it?

The reason why I ask this is because I cannot manage to get the Target.exposeDevToolsProtocol command to work properly - I never end up with the window.cdp object defined and there is not much information about this command apart from the API docs. This is what I've been trying:

    chrome.tabs.getCurrent((tab) => {
      let currentTabTarget = {tabId: tab.id};

      chrome.debugger.attach(currentTabTarget, '1.3', () => {
        if(chrome.runtime.lastError) {
          alert(chrome.runtime.lastError.message);
        }
      });

      chrome.debugger.getTargets((targets) => {
        currentTarget = targets.find((info) => { return info.url == tab.url });
        chrome.debugger.sendCommand(currentTabTarget, 'Target.exposeDevToolsProtocol', {targetId: currentTarget.id});
        chrome.debugger.detach(currentTabTarget, () => {
          if(chrome.runtime.lastError) {
            alert(chrome.runtime.lastError.message);
          }else{
            alert(window.cdp)
          }
        });
      });
    });

@aslushnikov
Copy link
Contributor

@aslushnikov since chrome.debugger provides a sendCommand and onEvent functions, would it be possible to pass this in without having to use the experimental Target.exposeDevToolsProtocol command?

IIRC the chrome.debugger doesn't expose the newest version of DevTools protocol - with proper target ids, flattened session management etc. So I don't think it would be possible.

The reason why I ask this is because I cannot manage to get the Target.exposeDevToolsProtocol command to work properly - I never end up with the window.cdp object defined

Yeah, I think this is because chrome.debugger simply is not up-to-date with the modern DevTools protocol. The only way to get the window.cdp is to use devtools protocol right away externally when launching chrome, e.g. with Puppeteer, and then embed puppeteer-web. This is what we do with our puppeteer-web tests.

@evanrolfe
Copy link

@aslushnikov many thanks for getting back to me on this. I can't seem to find any information about the DevTools version that chrome.debugger exposes, but also since its in an experimental feature I think you're right this is probably just not possible at the moment.

In the future, if chrome.debugger becomes more stable and up-to-date, being able to run puppeteer-web using the chrome.debugger interface would be very useful for me. This could allow us to write chrome-extensions which could use puppeteer without the need for users to launch the browser from the command line.

@m-rousse
Copy link

m-rousse commented Aug 6, 2019

Hi, we are trying to use the chrome.debugger api to interact with the browser using puppeteer. We were able to send messages back and forth. We mocked a few calls that are denied by the browser (eg. all Target.* commands), which allowed us to execute a few actions, however we are not able to retrieve the sessionId associated to targets as they are normally retrieved using the Target.attachToTarget. Using chrome.debugger.attach does not return the sessionId either.

@aslushnikov if I understand well, the devtools exposed by chrome.debugger are not the same than the ones exposed by the websocket obtained with --remote-debugging-port. Is it planned to update the chrome.debugger API/accessible commands? Also, is there a chromium issue to track this feature request?

@sachinwins
Copy link

Wanted to check any update on "launching the browser other than the command line" from the client-side JS page?

Basically, I want to deliver HTML page with some JavaScript file to the User, that will launch the browser (Currently, we are launching it from the command line). Once the browser is launched I will get "webSocketDebuggerURL" from http://127.0.0.1:9222/json/version. to connect.

Any help on, how I can achieve it without using any server/command Line?

KtorZ added a commit to CardanoSolutions/ogmios that referenced this issue Aug 2, 2021
…entEmitter.

  I noticed this while trying to actually use the TypeScript client in a
  browser context and went digging a bit into this `isomorphic-ws`
  module. Spoiler: it's a lie! The module is merely a switch which
  selects the right import at runtime. Yet, it does not attempt to fill
  the gaps between the Node.js and the Browser-base WebSocket.

  The main issue being that, on Node.js, the WebSocket is an instance of
  [EventEmitter](https://nodejs.org/api/events.html#events_class_eventemitter)
  which comes with fairly useful methods like `once`, `removeAllListener` and
  so forth. On the browser however, we are doomed with the
  [EventTarget](https://developer.mozilla.org/en-US/docs/Web/API/EventTarget)
  and its crappy API :'( ...

  One particularly surprising thing is that, Puppeteer and the supposed
  cross-platform testing isn't any useful here since Puppeteer seems to
  be using its own emulation of the WebSocket, which isn't at all the
  one used by the Browser but has an API closer to the Node.js one. So
  while the browser tests are all passing, they do not actually pass on
  a real browser 🤦

  puppeteer/puppeteer#2119 (comment)

  This PR introduces a slightly better `IsomorphicWebSocket` interface
  as a drop-in replacement for our internal use. It only covers the
  `on`, `once`, `removeListener` and `removeAllListeners` which we use
  internally.

  I had to resort to a JavaScript module for that because I couldn't get
  the TypeScript compiler to cooperate. As a consequence, the .js module
  does not get copied in the `dist` by default, I had to manually copy
  it as part of the build command, which _seems wrong_ but I am too
  unfamiliar with the TypeScript tooling :/
KtorZ added a commit to CardanoSolutions/ogmios that referenced this issue Aug 13, 2021
…entEmitter.

  I noticed this while trying to actually use the TypeScript client in a
  browser context and went digging a bit into this `isomorphic-ws`
  module. Spoiler: it's a lie! The module is merely a switch which
  selects the right import at runtime. Yet, it does not attempt to fill
  the gaps between the Node.js and the Browser-base WebSocket.

  The main issue being that, on Node.js, the WebSocket is an instance of
  [EventEmitter](https://nodejs.org/api/events.html#events_class_eventemitter)
  which comes with fairly useful methods like `once`, `removeAllListener` and
  so forth. On the browser however, we are doomed with the
  [EventTarget](https://developer.mozilla.org/en-US/docs/Web/API/EventTarget)
  and its crappy API :'( ...

  One particularly surprising thing is that, Puppeteer and the supposed
  cross-platform testing isn't any useful here since Puppeteer seems to
  be using its own emulation of the WebSocket, which isn't at all the
  one used by the Browser but has an API closer to the Node.js one. So
  while the browser tests are all passing, they do not actually pass on
  a real browser 🤦

  puppeteer/puppeteer#2119 (comment)

  This PR introduces a slightly better `IsomorphicWebSocket` interface
  as a drop-in replacement for our internal use. It only covers the
  `on`, `once`, `removeListener` and `removeAllListeners` which we use
  internally.

  I had to resort to a JavaScript module for that because I couldn't get
  the TypeScript compiler to cooperate. As a consequence, the .js module
  does not get copied in the `dist` by default, I had to manually copy
  it as part of the build command, which _seems wrong_ but I am too
  unfamiliar with the TypeScript tooling :/
KtorZ added a commit to CardanoSolutions/ogmios that referenced this issue Aug 13, 2021
…entEmitter.

  I noticed this while trying to actually use the TypeScript client in a
  browser context and went digging a bit into this `isomorphic-ws`
  module. Spoiler: it's a lie! The module is merely a switch which
  selects the right import at runtime. Yet, it does not attempt to fill
  the gaps between the Node.js and the Browser-base WebSocket.

  The main issue being that, on Node.js, the WebSocket is an instance of
  [EventEmitter](https://nodejs.org/api/events.html#events_class_eventemitter)
  which comes with fairly useful methods like `once`, `removeAllListener` and
  so forth. On the browser however, we are doomed with the
  [EventTarget](https://developer.mozilla.org/en-US/docs/Web/API/EventTarget)
  and its crappy API :'( ...

  One particularly surprising thing is that, Puppeteer and the supposed
  cross-platform testing isn't any useful here since Puppeteer seems to
  be using its own emulation of the WebSocket, which isn't at all the
  one used by the Browser but has an API closer to the Node.js one. So
  while the browser tests are all passing, they do not actually pass on
  a real browser 🤦

  puppeteer/puppeteer#2119 (comment)

  This PR introduces a slightly better `IsomorphicWebSocket` interface
  as a drop-in replacement for our internal use. It only covers the
  `on`, `once`, `removeListener` and `removeAllListeners` which we use
  internally.

  I had to resort to a JavaScript module for that because I couldn't get
  the TypeScript compiler to cooperate. As a consequence, the .js module
  does not get copied in the `dist` by default, I had to manually copy
  it as part of the build command, which _seems wrong_ but I am too
  unfamiliar with the TypeScript tooling :/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants