Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

request.continue no longer working as expected with latest Puppeteer #4030

Closed
leem32 opened this issue Feb 17, 2019 · 6 comments
Closed

request.continue no longer working as expected with latest Puppeteer #4030

leem32 opened this issue Feb 17, 2019 · 6 comments

Comments

@leem32
Copy link

leem32 commented Feb 17, 2019

Puppeteer version: 1.12.2

I've just updated to the latest version of puppeteer and noticed part of my script has stopped working.

The code below intercepts each request and if it was a document request check the URL for duplicate paths e.g 'example.com/login.php/login.php'. The code used to work fine with older versions of Puppeteer, but now the request.continue part of the URL no longer seems to be working. It just doesn't alter the request url.

Has something changed for request.continue syntax in a recent version of Puppeteer??

                request.continue({
                        url: newRequestUrl
                });

remove duplicate paths e.g example.com/foobar/foobar/ -> example.com/foobar/

        await page.setRequestInterception(true);

        page.on('request', request => {
            let blocked = false;
            let hasForwardSlash = false;

            if (request.resourceType() == "document") {
                console.log("document request");
                console.log(request.url());

                let requestUrl = request.url();

                if (requestUrl.endsWith("/")) {
                    hasForwardSlash = true; // readd slashes later
                    requestUrl = requestUrl.replace(/(\/)*$/, '');
                }

              let urlIntoArray = requestUrl.split("/");

             if (urlIntoArray[urlIntoArray.length - 1] == urlIntoArray[urlIntoArray.length - 2]) {

                    let paths = urlIntoArray[urlIntoArray.length - 1];
                    let newRequestUrl = requestUrl.replace("/" + paths, "");

                    if (hasForwardSlash) {
                        newRequestUrl = newRequestUrl + "/";
                 }

                console.log(newRequestUrl);

                request.continue({
                        url: newRequestUrl
                });
                return; // prevent calling continue twice
                 }
            }
            request.continue();
           });

EDIT: Just wanted to add that i've just tried the above code in Puppeteer 1.3.0 and can confirm it does indeed change the request URL. So why not in the latest puppeteer?

EDIT 2: I've managed to narrow when it stopped working down to between Puppeteer 1.6.0 and 1.7.0. It worked in Puppeteer 1.6.0, but stops working by 1.7.0.

EDIT 3: The problem starts it Puppeteer 1.7.0 in Puppeteer 1.6.2 it works. Has something changed with the request intercepted syntax in 1.7.0?? I've taken a look at the docs but couldn't find anything.

@leem32
Copy link
Author

leem32 commented Mar 1, 2019

I still haven't got a fix for this. Why would the above code correctly intercept and alter the request URL in Puppeteer <= 1.6.2 but not work from Puppeteer 1.7.0 onwards??

@aslushnikov
Copy link
Contributor

@leem32 the url is changed in way that's not observable by page. E.g. you can navigate to https://example.com, and serve some other page instead under this origin. This is different from redirect: redirects are visible to the page. We should clarify this in the docs.

What exactly is not working for you?

@leem32
Copy link
Author

leem32 commented Mar 2, 2019

I've added some script below that shows my problem.
When the intercepted request finds a duplicate path e.g login.php/login.php, the script removes this duplicate path and uses request.continue({url: newRequestUrl}); to pass on the URL. Next, Page.frameNavigated picks up the URL and adds it to an array if it's not the same as the previous array item.

This works perfectly fine in Puppeteer <= 1.6..2. You can see in the console.log the frameUrl does not contain the duplicate path. However, if you run the same script in Puppeteer >= 1.7.0 you can see in the console.log the frameUrl does still contain the duplicate path. So, it looks like request.continue is not passing on the intercepted URL.

const puppeteer = require('puppeteer');
const fs = require("fs");

// var url = "http://example.com";
var url = 'https://gazellegames.net/login.php/';

(async() => {

  const browser = await puppeteer.launch({args: ['--no-sandbox']});

  const page = await browser.newPage();

    // make sure urls dont have double forward slashes except after protocol
    url = url.replace(/([^:])(\/\/+)/g, '$1/');

    // note: add trailing slash since chrome adds it
    if (!url.endsWith('/')) {
        url = url + '/';
    }

    // urls hold redirect chain
    let urls = [url];

    const client = await page.target().createCDPSession();

        await client.send('Page.enable');

        await page.setRequestInterception(true);

        page.on('request', request => {
            let blocked = false;
            let hasForwardSlash = false;

            // remove duplicate paths e.g example.com/foobar/foobar/ -> example.com/foobar/
            if (request.resourceType() == "document") {

                console.log("document request");
                console.log(request.url());

                let requestUrl = request.url();

                if (requestUrl.endsWith("/")) {
                    hasForwardSlash = true; // readd slashes later
                    requestUrl = requestUrl.replace(/(\/)*$/, '');
                    // console.log("has forward slash");
                }

                let urlIntoArray = requestUrl.split("/");

                if (urlIntoArray[urlIntoArray.length - 1] == urlIntoArray[urlIntoArray.length - 2]) {

                    let paths = urlIntoArray[urlIntoArray.length - 1];
                    let newRequestUrl = requestUrl.replace("/" + paths, "");

                    if (hasForwardSlash) {
                        newRequestUrl = newRequestUrl + "/";
                        // console.log("add back forward slash");
                    }

                    console.log("newRequestUrl: " + newRequestUrl);

                      request.continue({
                        url: newRequestUrl
                      });

                      return; // prevent calling continue twice
                }
            }

            // console.log(request);
            request.continue();
            // return;
        });

        // get client side navigation redirect urls
        await client.on('Page.frameNavigated', (e) => {

            if (!e.frame.parentId) {

                let lastUrl = urls[urls.length - 1];

                let frameUrl = e.frame.url;

                // intercepted request has removed duplicate paths in puppeteer <= 1.6.2 but not in puppeteer >= 1.7.0
                console.log("frame url should now show duplicate paths are removed: ");
                console.log(frameUrl);

                // note: add trailing slash since chrome adds it
                if (!frameUrl.endsWith('/')) {
                    frameUrl = e.frame.url + '/';
                }

                console.log("last url");
                console.log(lastUrl);

                if (!lastUrl.endsWith('/')) {
                    lastUrl = urls[urls.length - 1] + '/';
                }

                if (frameUrl !== lastUrl && frameUrl !== "chrome-error://chromewebdata/") {
                    urls.push(e.frame.url);
                }
            }
        });

  await page.goto(url);

  browser.close();

  console.log("Redirects: " + urls);

})();

@aslushnikov
Copy link
Contributor

@leem32 ah, the old behavior was a buggy one - when you continue to a new URL, the web page should not know there's been a "redirect".

What you probably want instead is a real redirect. Instead of doing request.continue, respond with a redirect:

request.respond({
  status: 302,
  headers: {
    location: newRequestURL
  },
});

Does this help?

@leem32
Copy link
Author

leem32 commented Mar 4, 2019

Yep, that's fixed the issue. Thanks :)

aslushnikov added a commit to aslushnikov/puppeteer that referenced this issue Mar 5, 2019
Drive-by: add clarification to docs/api.md regarding
chaning "URL".

References puppeteer#4030
aslushnikov added a commit that referenced this issue Mar 5, 2019
Drive-by: add clarification to docs/api.md regarding
chaning "URL".

References #4030
kiku-jw pushed a commit to kiku-jw/puppeteer that referenced this issue Apr 6, 2019
Drive-by: add clarification to docs/api.md regarding
chaning "URL".

References puppeteer#4030
@beeblook
Copy link

beeblook commented Feb 7, 2022

So, is there any way to rewrite request url using Puppeteer >= 1.7.0 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants