Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache remotely hosted images #409

Closed
wants to merge 7 commits into from

Conversation

dopeboy
Copy link

@dopeboy dopeboy commented Nov 17, 2018

Spurred from #396, this PR caches referenced images that are remotely hosted. As of now, every keystroke initiates a remotely hosted image to be redownloaded. This can tax low data cap connections.

This first draft scans the text for HTML tags that rely on remotely hosted images. Each time such an image is encountered, it gets saved to /tmp/ and added to a dictionary for tracking. When the tab or application is closed, these images are deleted. @mitya57, I'd love to get your feedback on this first take.

Next up is:

  • Supporting markdown images
  • Handling when there is no network connection or poor network connection (will crash right now)
  • Writing some tests

ReText/tab.py Outdated Show resolved Hide resolved
ReText/tab.py Outdated Show resolved Hide resolved
ReText/tab.py Outdated Show resolved Hide resolved
ReText/tab.py Outdated Show resolved Hide resolved
ReText/tab.py Outdated Show resolved Hide resolved
@dopeboy
Copy link
Author

dopeboy commented Nov 23, 2018

Thanks for your feedback @mitya57 - made some changes that should resolve things. Will proceed on the two latter bulletpoints if there aren't any other flags.

@mitya57
Copy link
Member

mitya57 commented Nov 24, 2018

Looks better now, I marked all my other comments as resolved.

@dopeboy
Copy link
Author

dopeboy commented Dec 7, 2018

@mitya57 now includes handling no network and tests. I see the Travis build failing off the xvfb-run command but that does work locally for me; would you mind taking a look at that?

Otherwise, should be all good to go by my count.

tests/test_window.py Outdated Show resolved Hide resolved
@mitya57
Copy link
Member

mitya57 commented Dec 17, 2018

There are some remaining problems with tests. I don't have much time, but I will look myself and merge this later. Sorry for the delay!

@mitya57
Copy link
Member

mitya57 commented Dec 31, 2018

Sorry again for taking so much time for reply. Today I found out that Qt WebEngine already caches all images on the page. You can check it yourself by adding a request interceptor and printing all HTTP requests for images:

diff --git a/ReText/webenginepreview.py b/ReText/webenginepreview.py
index 849a956..15ccff4 100644
--- a/ReText/webenginepreview.py
+++ b/ReText/webenginepreview.py
@@ -20,12 +20,21 @@ from ReText import globalSettings
 from ReText.preview import ReTextWebPreview
 from ReText.syncscroll import SyncScroll
 from PyQt5.QtGui import QDesktopServices, QGuiApplication
+from PyQt5.QtWebEngineCore import QWebEngineUrlRequestInterceptor, QWebEngineUrlRequestInfo
 from PyQt5.QtWebEngineWidgets import QWebEnginePage, QWebEngineView, QWebEngineSettings
 
 
+class ReTextRequestInterceptor(QWebEngineUrlRequestInterceptor):
+    def interceptRequest(self, info):
+        if info.resourceType() == QWebEngineUrlRequestInfo.ResourceTypeImage:
+            print(info.requestMethod(), info.requestUrl())
+
+
 class ReTextWebEnginePage(QWebEnginePage):
     def __init__(self, parent, tab):
         QWebEnginePage.__init__(self, parent)
+        self.interceptor = ReTextRequestInterceptor()
+        self.profile().setRequestInterceptor(self.interceptor)
         self.tab = tab
 
     def setScrollPosition(self, pos):

The request will be printed only the first time you load the page with an image. On further loads the images will be loaded from cache, even if you edit the text.

I believe the Qt WebKit backend behaves the same way.

So maybe there is no sense in implementing our own cache, and we can just rely on the existing caches? If you want then I can implement a mechanism to clear the cache when the preview is closed and opened again.

@dopeboy
Copy link
Author

dopeboy commented Jan 8, 2019

Sorry for the late reply here as well, coming back from holiday.

I haven't tried your code yet; my original observation came from monitoring my network usage. I've attached a video showing more - does that not happen on your end?

https://drive.google.com/file/d/16XNQCUF56WSEDwW03RMtvYUoeg8IvV4c/view?usp=sharing

@mitya57
Copy link
Member

mitya57 commented Jan 9, 2019

Do you use Qt WebKit or Qt WebEngine renderer? (Check what option is present/selected in the Edit menu.)

@dopeboy
Copy link
Author

dopeboy commented Jan 14, 2019

You are right, WebEngine does cache images. WebKit does not. The reason why I didn't use WebEngine to start with is because it doesn't preview my images:

image

Given that the WebEngine is experimental and not the default renderer, I think this PR still has some merit. Happy to adjust the PR code to only function under WebKit if you like. Will defer to your guidance here.

EDIT 01/21: My mistake, WebEngine is the default renderer.

mitya57 added a commit that referenced this pull request Jan 22, 2019
@mitya57
Copy link
Member

mitya57 commented Jan 22, 2019

You are right, WebEngine does cache images. WebKit does not. The reason why I didn't use WebEngine to start with is because it doesn't preview my images

You have some misunderstanding here.

ReText has three preview backends, based on QTextBrowser (the default one), Qt WebKit and Qt WebEngine (Chromium).

Qt WebKit is deprecated upstream. If you are using an official Qt build (e.g. from the PyQt5 wheel), it will not have the Qt WebKit module. However some Linux distributions, such as Debian/Ubuntu, still ship it. If Qt WebKit is installed, ReText will have an option in the Edit menu to use it. If it is not installed, but Qt WebEngine is, ReText will have an option for WebEngine.

QTextBrowser does not display external images at all, it renders only local images (in fact I recommend to save images locally anyway). So cache is not applicable to this backend.

Qt WebEngine uses cache by default, as I illustrated in my example above.

Finally, Qt WebKit supports cache, but apparently does not use it by default. I have just implemented basic cache support and pushed it to webkitpreview-cache branch, see the linked commit. I this this solution is cleaner than your proposed mechanism: it has less lines of code, and it properly supports HTTP Cache-Control / Expires headers. Can you please test it?

I have not yet implemented clearing the cache when page is closed, but I can try to implement it too.

@dopeboy
Copy link
Author

dopeboy commented Jan 30, 2019

Thanks for taking the time to explain that. I've just tested the new branch it is great. Agreed, this is a far more elegant solution.

Closing this one down.

@dopeboy dopeboy closed this Jan 30, 2019
@mitya57
Copy link
Member

mitya57 commented Jan 30, 2019

Thanks for confirming! I will merge my branch a bit later (still I'm very limited in time).

mitya57 added a commit that referenced this pull request Feb 26, 2019
@mitya57 mitya57 mentioned this pull request Nov 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants