Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The page.setContent() function does not render html source. #2913

Closed
kbbmanse opened this issue Jul 19, 2018 · 11 comments
Closed

The page.setContent() function does not render html source. #2913

kbbmanse opened this issue Jul 19, 2018 · 11 comments
Labels
chromium Issues with Puppeteer-Chromium

Comments

@kbbmanse
Copy link

kbbmanse commented Jul 19, 2018

  • Puppeteer version:1.5.0
  • Platform / OS version:mac/sierra 10.12.6
  • URLs (if applicable):
  • Node.js version:8.9.0

I'm going to use the puppeteer to replace phantomjs.

The functions that were used in the past are as follows.

  1. Download the html before rendering
  2. Read the html file downloaded on phantomjs, and call page.setContent ()
  3. Get the results of the final rendered content using page.content

on the command line

curl 'https://www.pikicast.com' > src.html

load_test_phantomjs.js that used on phantomjs

var fs = require('fs'), webPage = require('webpage');

var page = webPage.create();
var htmlUrl = 'https://www.pikicast.com';
var htmlContent = fs.read('./src.html');
console.log("---------Before rendering---------");
console.log(htmlContent);
page.setContent(htmlContent, htmlUrl);

var timer = setTimeout(function() {
    console.log("---------After rendering---------");
    console.log(page.content);
}, 3000);

What is the expected result?
When I used the 'load_test_phantomjs.js' code on phantomjs above, I got rendered results.
I want to get a rendered html using the below puppeteer code.

load_test_puppeteer.js that used on puppeteer

const fs = require('fs');
const puppeteer = require('puppeteer');

const htmlContent = fs.readFileSync('./src.html', {encoding:'utf-8'});
puppeteer.launch({headless:false, dumpio:true}).then(async browser=> {
    const page = await browser.newPage();
    console.log("---------Before rendering---------");
    console.log(htmlContent);
    await page.setContent(htmlContent);
    setTimeout(async () => {
        console.log("---------After rendering---------");
        console.log(await page.content());
    }, 3000);
});

What happens instead?
I got a non-rendered html.

@vsemozhetbyt
Copy link
Contributor

vsemozhetbyt commented Jul 19, 2018

I cannot reproduce with 1.5.0-post on Windows 7 x64 using this code:

'use strict';

const puppeteer = require('puppeteer');

const htmlContent = `
  <!doctype html>
  <html>
    <head><meta charset='UTF-8'><title>Test</title></head>
    <body>Test</body>
  </html>
`;

puppeteer.launch({ headless: false }).then(async (browser) => {
  const page = await browser.newPage();
  await page.setContent(htmlContent);
  setTimeout(async () => {
    console.log(await page.content());
  }, 3000);
});

Output and browser screenshot:

s

@vsemozhetbyt
Copy link
Contributor

vsemozhetbyt commented Jul 19, 2018

BTW, when I open the result of the curl 'https://www.pikicast.com' > src.html in the browser, I get a blank page with some errors in the browser console.

@kbbmanse
Copy link
Author

kbbmanse commented Jul 20, 2018

This is the result of phantomjs.
The output before and after calling the page.setContent() function is different.
The results were too long to capture only a fraction.
phantomjs_result

This is the result of puppeteer.
The output before and after calling the page.setContent() function is the same.
puppeteer_result

@kbbmanse kbbmanse changed the title The page.setContent() function does not load html source. The page.setContent() function does not render html source. Jul 20, 2018
@vsemozhetbyt
Copy link
Contributor

The difference may be from the htmlUrl you provide for the phantomjs and do not provide for the puppeteer: src.html contains some relative URLs, so phantomjs can resolve them against the htmlUrl while puppeteer cannot resolve them.

@vsemozhetbyt
Copy link
Contributor

vsemozhetbyt commented Jul 20, 2018

@aslushnikov Is there a way to provide a base URL for page.setContent(html) to resolve relative URLs?

@aslushnikov
Copy link
Contributor

@aslushnikov Is there a way to provide a base URL for page.setContent(html) to resolve relative URLs?

The only thing I can think of is injecting <base> tag into html.

@vsemozhetbyt
Copy link
Contributor

@aslushnikov Unfortunately, this approach is unavailable due to CSP (page.setBypassCSP(true) does not help with blank page).

'use strict';

const fs = require('fs');
const puppeteer = require('puppeteer');

const baseURL = 'https://www.pikicast.com/';
const htmlContent = fs.readFileSync('./src.html', {encoding:'utf-8'})
                    .replace(/<head>/i, `<head><base href='${baseURL}'>`);

puppeteer.launch({headless:false, dumpio:true}).then(async browser=> {
    const page = await browser.newPage();
    await page.setBypassCSP(true);
    console.log("---------Before rendering---------");
    console.log(htmlContent);
    await page.setContent(htmlContent);
    setTimeout(async () => {
        console.log("---------After rendering---------");
        console.log(await page.content());
    }, 3000);
});
Errors in the browser console:
VM19:3 A parser-blocking, cross site (i.e. different eTLD+1) script, https://www.pikicast.com/js/lib/require.js, is invoked via document.write. The network request for this script MAY be blocked by the browser in this or a future page load due to poor network connectivity. If blocked in this page load, it will be confirmed in a subsequent console message. See https://www.chromestatus.com/feature/5718547946799104 for more details.html @ VM19:3
VM19:3 A parser-blocking, cross site (i.e. different eTLD+1) script, https://www.pikicast.com/js/lib/require.js, is invoked via document.write. The network request for this script MAY be blocked by the browser in this or a future page load due to poor network connectivity. If blocked in this page load, it will be confirmed in a subsequent console message. See https://www.chromestatus.com/feature/5718547946799104 for more details.html @ VM19:3
about:blank:1 Uncaught DOMException: Failed to set the 'domain' property on 'Document': 'pikicast.com' is not a suffix of ''.
    at about:blank:1:17
(anonymous) @ about:blank:1
app.v3.js?cb=v3.1524100716370:1 Uncaught DOMException: Failed to read the 'cookie' property from 'Document': Access is denied for this document.
    at t.i.get_storage (https://www.pikicast.com/js/app.v3.js?cb=v3.1524100716370:1:12282)
    at t.i.reinit (https://www.pikicast.com/js/app.v3.js?cb=v3.1524100716370:1:9638)
    at t.i.reinit (https://www.pikicast.com/js/app.v3.js?cb=v3.1524100716370:1:9609)
    at https://www.pikicast.com/js/app.v3.js?cb=v3.1524100716370:1:6865
    at Object.execCb (https://www.pikicast.com/js/lib/require.js:1:11575)
    at E.check (https://www.pikicast.com/js/lib/require.js:1:6025)
    at E.<anonymous> (https://www.pikicast.com/js/lib/require.js:1:8198)
    at https://www.pikicast.com/js/lib/require.js:1:642
    at https://www.pikicast.com/js/lib/require.js:1:8630
    at v (https://www.pikicast.com/js/lib/require.js:1:202)
i.get_storage @ app.v3.js?cb=v3.1524100716370:1
i.reinit @ app.v3.js?cb=v3.1524100716370:1
i.reinit @ app.v3.js?cb=v3.1524100716370:1
(anonymous) @ app.v3.js?cb=v3.1524100716370:1
execCb @ require.js:1
check @ require.js:1
(anonymous) @ require.js:1
(anonymous) @ require.js:1
(anonymous) @ require.js:1
v @ require.js:1
emit @ require.js:1
check @ require.js:1
(anonymous) @ require.js:1
(anonymous) @ require.js:1
(anonymous) @ require.js:1
v @ require.js:1
emit @ require.js:1
check @ require.js:1
enable @ require.js:1
init @ require.js:1
x @ require.js:1
completeLoad @ require.js:1
onScriptLoad @ require.js:1

@aslushnikov aslushnikov added the chromium Issues with Puppeteer-Chromium label Dec 6, 2018
@aslushnikov
Copy link
Contributor

Don't know how we can help here - closing.

@semoal
Copy link
Contributor

semoal commented Apr 29, 2019

@vsemozhetbyt did you fixed it? I have the same problem.
The scripts are getting intercepted, because the html is written via document.write

@vsemozhetbyt
Copy link
Contributor

@semoal No, I did not, sorry.

@semoal
Copy link
Contributor

semoal commented Apr 29, 2019

Fixed it with args: [ '--disable-web-security'], @vsemozhetbyt

Give it a try!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chromium Issues with Puppeteer-Chromium
Projects
None yet
Development

No branches or pull requests

4 participants