Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Puppeteer/chrome does not follow/show valid 302 redirects when the final URI is not known #1453

Closed
eblanshey opened this issue Nov 22, 2017 · 6 comments

Comments

@eblanshey
Copy link

eblanshey commented Nov 22, 2017

We are using Puppeteer to verify that advertiser links are redirecting to the play store as they are supposed to. Advertisers use a combination of 302 server redirects and front-end javascript redirects (with 200 response codes), making Puppeteer a great choice to evaluate final redirect destinations. The issue is that 302 redirects are not being redirected to in Puppeteer.

Steps to reproduce

What steps will reproduce the problem?

(async () => {
        const browser = await puppeteer.launch();
        const page = await browser.newPage();
        await page.emulate(determineDevice(device)); // returns the correct device from DeviceDescriptors.js

        const finalResponse = await page.goto("http://appclk.me/store.php?page=1", {waitUntil: 'networkidle0'});

        process.stdout.write(JSON.stringify({
            success: true,
            statusCode: finalResponse.status,
            url: finalResponse.url
        }));

        process.exit(0);
})();

What is the expected result?

The example URL used, http://appclk.me/store.php?page=1 returns a 200 status code that uses meta and javascript redirects that point to http://appclk.me/store.php?page=2. This subsequent URL redirects with a 302 to a market link: market://details?id=com.kabam.marvelbattle. What SHOULD happen is one of the following:

  • The final URL after the network is idle should be http://appclk.me/store.php?page=2 with a status code of 302. We could use the Response headers to verify that it redirects to the market link.
  • OR The final URL after network is idle should be market://details?id=com.kabam.marvelbattle (though Chrome doesn't know how to handle this URL.)

What happens instead?

The final URL that is printed is http://appclk.me/store.php?page=1 with a status code of 200.

Notes

We are seeing the same behavior in Chrome: open a new tab with the Network dev tools open and visit the first link. You will see that the second redirect is never shown in the network tab. This is strange behavior as the network tab should show all requests and responses. Other browsers like FF do show the second request in the Network tab.

I'm aware that this is something with Chrome and not Puppeteer, but I'm posting here in the hopes that someone will point out a flag or some option I'm not aware of that correctly shows ALL network requests so that we can accomplish our goal of verifying redirects properly. Any suggestions are welcome.

@eblanshey
Copy link
Author

I found the solution, which is to use page.setRequestInterception()

        const browser = await puppeteer.launch();
        const page = await browser.newPage();
        await page.emulate(determineDevice(device));
        await page.setRequestInterception(true);

        page.on('request', request => {
            console.log('GOT NEW REQUEST', request.url);
            request.continue();
        });

        page.on('response', response => {
            console.log('GOT NEW RESPONSE', response.status, response.headers);
        });

What's interesting here is that if I comment out await page.setRequestInterception(true); but still leave the request/response event listeners, the second request/response is NOT printed to the console. If setRequestInterception IS used, then THREE requests are logged, with the second correctly displaying a 302, and the third being the "market://" link. Why would requesting to intercept requests change the events that are fired? Is this a bug?

@eblanshey
Copy link
Author

I'll close this issue as the original issue was resolved. Feel free to create another if someone thinks the above bug I mentioned is legitimate.

@7starsone
Copy link

Hello, may a redirect be the origin of the error Cannot find context with specified id undefined ? How to follow the redirect with https://github.com/nesk/puphpeteer? Sorry, the owner banned me from comments because of these issues and I wasn't neither offensive nor rude, I just sent the log and details, then I understood alone that it may be a redirect issue and the first requested page/URL which goes away...(maybe...). Thanks

@zhaojiyu
Copy link

@eblanshey Hey, I met the same problem, did you solved it?

@eblanshey
Copy link
Author

@zhaojiyu the solution I posted above is what worked for me.

@elainema
Copy link

emulate

sometimes, the program goes to the event page.requestfailed rather than page.response, so can't handle the 302 status, do you hava any solutions ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants