New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pdfjs for QtWebEngine #2330

Closed
The-Compiler opened this Issue Feb 20, 2017 · 38 comments

Comments

Projects
None yet
7 participants
@The-Compiler
Collaborator

The-Compiler commented Feb 20, 2017

No description provided.

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented May 4, 2017

@Kingdread

This comment has been minimized.

Collaborator

Kingdread commented May 8, 2017

I think the problem is not getting pdf.js into WebEngine, as we can use the qute:// scheme and setHtml, just as we do for WebKit.

The problem is that WebEngine doesn't have an unsupportedContent signal. As a workaround, we could use the downloadRequested signal (which we handle anyway) and do some mimetype-based action there, but we'd probably need some way to detect if the user clicked a PDF link or if he clicked the "Download" button inside pdf.js, as both downloads seem to have type QWebEngineDownloadItem::DownloadAttribute. One way would be to detect blob: URLs, but I'm not sure if that's a good idea.

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented May 8, 2017

Hmm, that's indeed problematic... Note that downloadRequested is global, and we have no idea in which tab the request originated.

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented May 8, 2017

Looking at what information we have available, here's the possibilities I can see:

  • Catch .pdf URLs in acceptNavigationRequest and open pdfjs for them - this is per-tab, but we can only catch PDFs which actually have that in the URL
  • Use the global downloadRequested, check for .pdf in the filename there (which should also work for PDFs with e.g. a Content-Disposition header), and just open pdfjs in whatever tab is currently focused
  • Similar to above, but let the user decide to use pdfjs in the download dialog (with e.g. ctrl-p, similar to ctrl-o). That means we won't have to care about the pdfjs download button, and we also can get rid of the pdfjs-enabled option (we'd do the same with QtWebKit then). Still we can't know which tab the request is coming from.
@Kingdread

This comment has been minimized.

Collaborator

Kingdread commented May 8, 2017

If we want to mimic the current behaviour, we could keep a global mapping from navigation request to corresponding tab, so that we can access the right tab in downloadRequested - but that's just crying out to cause bugs and would probably fail in quite some edge cases (e.g. same URL opened twice, requests with redirect, ...).

I think the 3rd idea sounds interesting, though I'm not sure how obtrusive it would be in practice, especially if you open lots of documents or open multiple documents at once in background tabs.

For the 2nd and 3rd cases I'd do mimetype sniffing (QWebEngineDownloadItem.mimeType()), as that seems more robust and is what we do with WebKit.

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented May 8, 2017

Oh, I didn't know about QWebEngineDownloadItem::mimeType()!

When you open multiple documents at once, things are going to suck either way, though, if we always open things in the currently focused tab...

As for the URL mapping: I'll probably need something like that for uMatrix (#28) anyways, though there it doesn't matter too much when we get two possible tabs. While there are going to be some corner cases, I'd wonder how well that'd work in practice...

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented May 8, 2017

I also opened QTBUG-60668 - I think there would be a better way for Qt to get this information from Chromium.

@Kingdread

This comment has been minimized.

Collaborator

Kingdread commented May 9, 2017

I've thrown together a small prototype that enables pdf.js with webengine. It's just a PoC, but it seems to work surprisingly well and might be worth fleshing out, at least until we get proper unsupportedContent support (if ever).

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented May 9, 2017

Doesn't look too bad conceptually! Note that we'd need to properly clean up that mapping when closing a tab though (or just save tab IDs instead of tab objects or so).

@Kingdread

This comment has been minimized.

Collaborator

Kingdread commented May 10, 2017

Right, I've changed it so that the last requested URL is an attribute on the WebEnginePage itself, and to find the page I'm iterating over all pages in the objreg. That way, we don't have to add more bookkeeping. Though I'm not completely satisfied with the tab._widget access and the "typechecks".

There are issues with frames (PDFs in a frame are still downloaded) and opening the same page multiple times, but the first problem is probably not really an issue, and the second one could be fixed by removing a tab which has pdf.js loaded from the list of candidates.

Also, pdf.js seems to trigger a lot of warnings about undefined elements.

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented May 17, 2017

FWIW I removed pdfjs from built packages for now in c5957bc - we should revert that once pdfjs works with QtWebEngine.

As for the request interceptor issue, I plan to think about it some more and open a separate issue for it in a bit.

@Kingdread

This comment has been minimized.

Collaborator

Kingdread commented May 17, 2017

Alright. I've been playing around with it a bit, but it seems that QtWebEngine is a bit more strict with respect to security/cross-origin stuff, so getting pdf.js to load its own resources and the PDF document is a bit more tricky and maybe involves solving #1557 as well - or tuning the security parameters for qute://, if that's possible.

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented May 17, 2017

FWIW the more I think about it, the more I think we should just bite the bullet, do a normal download (to disk with QtWebEngine) and then open it from there - i.e. a similar concept to #1635.

I'd rather avoid juggling with a JS bridge (or QtWebChannel) if possible though. Can we just save them to disk in a temporary folder somewhere, and pass a file:// URL (which I think should be accessible from qute://) to pdfjs?

@Kingdread

This comment has been minimized.

Collaborator

Kingdread commented May 18, 2017

Unfortunately, no:

15:36:11 ERROR: [:0] XMLHttpRequest cannot load file:///tmp/qutebrowser-downloads-i7ux_dw1/tmpcc
ksjmjwpdf. Cross origin requests are only supported for protocol schemes: http, data, chrome, ht
tps.

It seems like we need to provide the actual file via qute:// URLs too.

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented May 18, 2017

Should be easy to have a qute://file or something to get a file then...

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented Jun 1, 2017

The comments for QTBUG-53414 also has some talk about how to load pdfjs in a way it has the needed permissions. Haven't looked at them in detail yet, though.

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented Sep 12, 2017

Random idea regarding finding the correct tab: What about opening a new tab? It might be kind of annoying, sure, but probably still better than a rather odd hack.

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented Nov 10, 2017

Related: This Qt patch which hopefully will make it to Qt 5.11 (somewhen mid-2018) should make it possible to get the tab a download originated from.

@ddevault

This comment has been minimized.

ddevault commented Jun 11, 2018

Did that patch ever land in Qt?

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented Jun 11, 2018

Unfortunately not, it wasn't even properly submitted by the author so far. It should still be possible to have pdfjs support without that though, just with PDFs opening in a new tab.

FWIW, #3010 (comment) also had some details related to the patch.

@tgy

This comment has been minimized.

tgy commented Jul 17, 2018

Is it currently possible to view PDFs from within qutebrowser?

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented Jul 17, 2018

@tgy Only when using the QtWebKit backend (which has other drawbacks, see #4039) - note that you can use ctrl-x (:prompt-open-download) in the download prompt to open them in your PDF reader, though.

@tgy

This comment has been minimized.

tgy commented Jul 17, 2018

@The-Compiler Thanks for your quick answer. I like the single-click-to-open-pdf-in-new-tab workflow. Does brew cask install qutebrowser build qutebrowser with the QtWebKit backend? That's how I installed it on mac OS.

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented Jul 17, 2018

No, that installs the .dmg release, which only comes with QtWebEngine.

@tgy

This comment has been minimized.

tgy commented Jul 17, 2018

Alright. I'll survive without this functionality for now! Thanks for your responsiveness. You're a great maintainer.

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented Sep 6, 2018

I've had a look at this (see the experiments repo), but ran into a strange issue - the PDF is displayed, but then the renderer process crashes because PDF.js creates an invalid blob: URL from JS or something...

[13767:13773:0906/181417.829985:VERBOSE1:blob_dispatcher_host.cc(70)] BlobDispatcherHost::OnRegisterPublicBlobURL(blob:qute:///ce823582-a3f7-4b90-95f9-b6061774ed5e, 1fc889a4-bdce-4c17-bda2-235db22a2c04): Invalid or prohibited URL.
[13767:13773:0906/181417.829993:ERROR:bad_message.cc(25)] Terminating renderer for bad IPC message, reason 139

139 is BDH_DISALLOWED_ORIGIN, see the Chromium code generating the message

No idea what's going on there... 😟

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented Sep 6, 2018

Debugging this from the JS side, the crash happens around here - after that line, the promise gets returned, the "then" on line 32 is highlighted, and one step later the loading continues somehow and crashes.

Looks like that was a red herring - it crashes around here when trying to create an object URL for a blob.

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented Sep 6, 2018

Minimal JS reproducer:

let blob = new Blob(['foo']);
URL.createObjectURL(blob);

Looks like PDF.js supports setting PDFJS.disableCreateObjectURL = true; to force it to use data: URLs instead though, which seems like a suitable workaround! 🎉

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented Sep 6, 2018

Pushed my work to the pdfjs branch in this repo now, as it finally looks like I have something usable.

@Kingdread can I talk you into a review once it's finished, or rather not? 😉

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented Sep 10, 2018

I now merged the pdfjs branch, so this is ready now! 🎉

@prosoitos

This comment has been minimized.

prosoitos commented Sep 10, 2018

Thank you SO MUCH @The-Compiler! This is really cool!!! 😄

@nemrod

This comment has been minimized.

nemrod commented Oct 2, 2018

I've been waiting for this, so excited it's here. :D

Is it available in production yet though? I expected it to be in 1.4.2, being the next release after it was merged, but it wasn't in the changelog and I can't seem to set content.pdfjs to true even after upgrading (same message as before, not supported on QtWebEngine). On Arch, with pdfjs installed (shows up on version page). :/

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented Oct 2, 2018

v1.4.2 is a patch release based on the v1.4.x branch (i.e. on v1.4.0), and as such it only contains bugfixes.

It'll be in v1.5.0, which will be released once PyQt5 5.11.3 is, which hopefully means tomorrow.

@prosoitos

This comment has been minimized.

prosoitos commented Oct 4, 2018

This is so great!!!! Thank you so much @The-Compiler and everybody else who worked on this!!!! Loving it!

@user202729

This comment has been minimized.

Contributor

user202729 commented Oct 6, 2018

(Possible issue: Using temporary URL for pdfjs means it doesn't work when the browser is closed&reopened. For example with ZZ. Ideally the original URL should be displayed)

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented Oct 6, 2018

True, I opened #4303 for that.

@tgy

This comment has been minimized.

tgy commented Oct 16, 2018

I upgraded to qutebrowser v1.5.1 and opening a PDF link still tries to download the file and does not open PDF.js. Is there something I need to do more to have PDF.js be launched when I open a PDF link?

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented Oct 16, 2018

@tgy Either :set content.pdfjs true, or hit ctrl-p ("Open download via PDF.js", :prompt-open-download --pdfs) when the download prompt appears.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment