Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to check if range-requests are in use? #8422

Closed
rogierlommers opened this issue May 18, 2017 · 19 comments
Closed

How to check if range-requests are in use? #8422

rogierlommers opened this issue May 18, 2017 · 19 comments

Comments

@rogierlommers
Copy link

We are having a hard time serving large PDFs to our customers with pdf.js. Some investigation learned us that the concept of "range requests" could fix this. Therefore we tried to generate a fastWebView-enabled PDF with ghostscript:

gs -q -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -dCompatibilityLevel=1.5 -dFastWebView=true

The generated PDF is being served by Apache/2.4.18 which (correct me if I'm wrong) supports range requests.

Now how can I test that pdf.js actually uses range requests?

@yurydelendik
Copy link
Contributor

Now how can I test that pdf.js actually uses range requests?

There is no diagnostics information coming from PDF.js core yet. However browser console shall have 206 responses in the network monitor. If you don't see 206s for files more than 128k, then there is a problem with server -- inspect request and response HTTP headers for initial XHR.

Please notice some WebKit-based browser still have a defect with caching such requests, so we are disabling that for them (e.g. Safari).

Closing as answered. Provide more concrete information/example for better explanation. See also https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#range

@yurydelendik
Copy link
Contributor

Attaching screenshot for expected range request activity:

screen shot 2017-05-18 at 8 13 23 am

@rogierlommers
Copy link
Author

I see 206 responses, but it looks like pdf.js still fetches the whole PDF and I'm using Chrome on MacOS. This is a supported combination, right?

@yurydelendik
Copy link
Contributor

I see 206 responses, but it looks like pdf.js still fetches the whole PDF and I'm using Chrome on MacOS. This is a supported combination, right?

Correct.

@rogierlommers PDF.js will make an attempt to load entire PDF with first XHR and when headers come it will abort the fetch. With local connections, you might not see that since it's really fast. Try to do it at remote server. Also pay attention to the caching -- it's okay for content to be cached, but that means you might be receiving entire PDF from first XHR.

@yurydelendik
Copy link
Contributor

(Assuming you guys are working on the same problem) See also #8425

@yurydelendik
Copy link
Contributor

See that first 200 has only 4.0kb in length:

screen shot 2017-05-19 at 8 34 41 am

@rogierlommers
Copy link
Author

rogierlommers commented May 22, 2017

Sorry for all my questions, but please have a look at attached screenshot. As you can see, I get 206s, indicating that range-requests are working fine. Right? But for some reason, Chrome is downloading the full PDF while I expect it to load only the first x bytes.

screen shot 2017-05-22 at 07 44 15

@yurydelendik
Copy link
Contributor

yurydelendik commented May 22, 2017

But for some reason, Chrome is downloading the full PDF while I expect it to load only the first x bytes.

I don't understand what file and what is expected? By looking at 9789027673633.pdf 200 response, it downloaded 12.7kb, next 206 response asked 64.3kb. Unless your file only 12.7kb, then your next 206 requests/responses look fishy.

@rogierlommers
Copy link
Author

Then we have a different understanding of this feature. My assumption was that:

  • if a PDF is web-optimzed
  • pdf.js only downloads the first x bytes
  • until the user selects other pages of the document
  • then the bytes corresponding to the other pages will be downloaded

Now my conclusion is that

  • pdf.js starts downloading a web-optimized pdf
  • if page 1 is succesfully downloaded, it starts render this page client-side
  • and continue downloading the remaining bytes of the document (regardless if the user has selected/requested these pages

@yurydelendik
Copy link
Contributor

PDF.js has two other options, disableAutoFetch and disableStream. The former stops any range-requests downloading if enough data is fetched, the latter disables fetching for progressive download capable browsers. See also #7937 and https://github.com/mozilla/pdf.js/wiki/Debugging-PDF.js#url-parameters

@rogierlommers
Copy link
Author

Thanks; it all works fine now.

@dlandis
Copy link

dlandis commented Oct 18, 2017

Hi @yurydelendik ,

I'm having a similar issue and I had a question about your comment:

See that first 200 has only 4.0kb in length:

So what should the response body be for the initial response with the 200 code before a range request has been made by pdf.js? Can the body be empty, for example, as long as there is a response header Accept-Ranges: bytes response header? Will that trigger pdf.js to make a range request?

Thanks

@yurydelendik
Copy link
Contributor

All http responses needs to be valid, so first response must be piped in-full until it's cancelled.

@dlandis
Copy link

dlandis commented Oct 18, 2017

@yurydelendik Thanks for your response.

I'm wondering if it was ever discussed just performing range requests from the outset (maybe configurable via a param) ?

I noticed the RFC says:

A client MAY generate range requests without having received this header field for the resource involved .

This would potentially help the server (depending on how it was implemented) so it wouldn't have to load the whole document for that initial request.

And then the client, in those cases, wouldn't need to cancel that initial response and then switch to range requests. Wouldn't that be simpler?

Thanks

@yurydelendik
Copy link
Contributor

@dlandis sorry, I don't follow your thoughts. There is an option to override default behavior -- you can implement PDFDataRangeTransport with only HTTP range requests. It's not possible in general case IMHO.

@dlandis
Copy link

dlandis commented Oct 18, 2017

you can implement PDFDataRangeTransport with only HTTP range requests

@yurydelendik Thanks again, it sounds like that is what I need. I don't suppose there is an example?

@yurydelendik
Copy link
Contributor

@richiepriya
Copy link

@yurydelendik i have some issue in downloading the pdf using range requests. I am using spring boot application to provide download service. viewer.html makes 1st request which is cancelled since service supports range request and initiates partial request which is as expected but there are no further requests from browser, Its just one where i am expecting it to request till whole pdf is downloaded. Is there any special header that needs to be added in response so that browser sends all request to service.

@vedidinakar
Copy link

We have a pdf of size 200mb i want to load first 1 page once its downloaded to the browser. Can you please help me a way to achieve this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants