Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't get octet-stream response | net::ERR_ABORTED #2114

Closed
drizzle-mizzle opened this issue Mar 24, 2023 · 10 comments
Closed

Can't get octet-stream response | net::ERR_ABORTED #2114

drizzle-mizzle opened this issue Mar 24, 2023 · 10 comments

Comments

@drizzle-mizzle
Copy link

Description

When I try to GoToAsync on a page that responds with a application/octet-stream content type:

  • In headless mode, it throws up PuppeteerSharp.NavigationException: net::ERR_ABORTED.
  • In non-headless mode, it interprets response as a file that needs to be downloaded, throws up the same exception as in headless mode, but successfully downloads response as a binary file without extension to Downloads directory.

Minimal example reproducing the issue

using var browserFetcher = new BrowserFetcher();
await browserFetcher.DownloadAsync();
browser = await Puppeteer.LaunchAsync(new() { Headless = true} );

var page = await browser.NewPageAsync();
await page.SetRequestInterceptionAsync(true);

page.Request += (s, e) =>
{
    // sets POST method, adds some headers and binds serialized data
    var payload = CreateRequestPayload(HttpMethod.Post, data); 

    await e.Request.ContinueAsync(payload);
};

var response = await page.GoToAsync(url);
var content = await response.TextAsync();

Expected behavior:

As I know that in my particular case, this application/octet-stream response actually is just a text string without extension, I expect that var content will have this text data.
For example,

fetch(_same_request_).then((response) => response.text())

works absolutely fine, but sadly I can't use it because of a cloudflare protection. I've tested it in my normal browser, and it worked, but failed with EvaluateFunctionAsync.

Actual behavior:

PuppeteerSharp.NavigationException: net::ERR_ABORTED at _my_url_ at _my_url_
 ---> PuppeteerSharp.NavigationException: net::ERR_ABORTED at _my_url_
   at PuppeteerSharp.FrameManager.NavigateAsync(CDPSession client, String url, String referrer, String frameId) in C:\projects\puppeteer-sharp\lib\PuppeteerSharp\FrameManager.cs:line 197
   at PuppeteerSharp.FrameManager.NavigateFrameAsync(Frame frame, String url, NavigationOptions options) in C:\projects\puppeteer-sharp\lib\PuppeteerSharp\FrameManager.cs:line 79
   --- End of inner exception stack trace ---
   at PuppeteerSharp.FrameManager.NavigateFrameAsync(Frame frame, String url, NavigationOptions options) in C:\projects\puppeteer-sharp\lib\PuppeteerSharp\FrameManager.cs:line 89
Call finished

As I mentioned, in non-headless mode I actually still can get needed data, though I'll need to catch this exception and open it (data) as a file from my download directory. But I really need it to work in headless mode.

Versions

9.0.2 / net7.0

@drizzle-mizzle
Copy link
Author

I'm not sure if it's appropriate to mention the exact service that I'm trying to scrap bypassing the cloudflare, and how to reproduce the exact same request, but, if you'll need that data, just tell me.

@kblok
Copy link
Member

kblok commented Mar 27, 2023

net::ERR_ABORTED comes from the browser. I would try to implement the new headless mode, to see if that fixes it.

@drizzle-mizzle
Copy link
Author

drizzle-mizzle commented Mar 27, 2023

What do you mean by new headless mode?
(Or was it not addressed to me?)

@kblok
Copy link
Member

kblok commented Mar 27, 2023

@drizzle-mizzle this.

@drizzle-mizzle
Copy link
Author

Ah, never knew about it. Thanks, I'll try it today.

@kblok
Copy link
Member

kblok commented Mar 27, 2023

You can try that out in puppeteer (node.js). We need to make a few changes to support this in .NET

@amaitland
Copy link
Contributor

  • In headless mode, it throws up PuppeteerSharp.NavigationException: net::ERR_ABORTED.

For application/octet-stream then net::ERR_ABORTED is exactly what I'd expect. Chromium aborts displaying the page and triggers a download.

  • but successfully downloads response as a binary file without extension to Downloads directory.

You should be able to set the download path to achieve the same behaviour. Browser.setDownloadBehavior allows for specifying a folder for downloads.

Puppeteer itself doesn't yet support any of the download related methods/events

I would try to implement the new headless mode, to see if that fixes it.

I'd be very surprised if new headless changed the behaviour.

@drizzle-mizzle
Copy link
Author

drizzle-mizzle commented Mar 31, 2023

You should be able to set the download path to achieve the same behaviour.

In headless mode it just won't start download at all, so it's doesn't really matter.

@drizzle-mizzle
Copy link
Author

Sorry, I was wrong. I've simply set setDownloadBehavior improperly. Now it works.

@drizzle-mizzle
Copy link
Author

Well, my problem is solved on this rate. Thanks for the tip.

Though, I'm not sure if this issue should be closed or not. You say it's a Chromium problem in the first place, not a Puppeteer, but maybe at least it should be handled some other way? With more obvious exception type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants