Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefetch/cache images in items #60

Closed
gwern opened this issue Jul 17, 2014 · 14 comments
Closed

Prefetch/cache images in items #60

gwern opened this issue Jul 17, 2014 · 14 comments

Comments

@gwern
Copy link
Contributor

gwern commented Jul 17, 2014

I have an unfortunately slow Internet connection; when I read through my items in Liferea, there is one thing that makes the experience noticeably less pleasant than Google Reader was: whenever a full-text post appears, with images in it, I must wait for the image to be fetched, which is noticeably slow. Since a good chunk of my statistics & economics feeds include diagrams, graphs, photos etc, that means it's a problem for a lot of them. And then I have a bunch of wallpaper RSS feeds, where the content is just a thumbnail - so on each and every single item, I am left waiting patiently for the darn thing to load.

(Reader was better in that the 'stream' model meant that the images would load at the bottom before I got to them, but Liferea's one-item-at-a-time model doesn't do that.)

So I think it would be a good idea if, on new items being downloaded from an RSS feed, they were parsed for image links & the images prefetched to be stored locally and displayed instantly on viewing an item.

@asl97
Copy link
Contributor

asl97 commented Jul 17, 2014

@gwern
a browser app and a desktop app have BIG differents, all google reader need to do is embed the image with the image tag in the page for every single feed (basically dump everything onto the page) and let the browser handle the download and cache, a browser's default behaviour is to load all images, including those that aren't visible which is why it load in the background (and also use up a lot of bandwidth)

iirc, liferea use the webkit for rendering the feed on DEMAND, meaning it doesn't load it until it is needed/shown, in fact, i am thankful that it doesn't start downloading every single image in my feed

and i think it's impossible to do something like that on liferea itself without major changes but it might be possible to write a web interface plugin (access from browser) for liferea to do something like what is stated above if the plugin is able to get the list of feed.

if it was done as you suggested, (parse/prefetch)

  1. where would the image be store?
    • if we want to store in a config directory or db, go to 2
      • additional note: what if there conflicting names for the image?
        • do we store them with the key being the domain name to avoid conflict? (store them in a folder with the domain name)
          • what about the path? more folders for path?
        • how about using md5 hash for the name? the chance of a conflict with that is lesser
    • if we want to store it in the cache of the webkit, go to 3
  2. how should one replace the link in the feed to show the one store locally?
    • change the link on the fly before letting webkit render the feed?
      • if we change it on the fly, we would need to store the location of the file
    • change the link and store it in the db?
      • what if you or someone delete the image? there would be no way to get another copy since the link been change unless you manage to get the feed again
    • what if changing the link break the feed?
  3. we would need to know where webkit store them and how it store them
    • iirc browser use magic number and string for the name of the cache so this way is also kind of impossible

as you see, there is a lot of things to think about, most idea sound simple but are in fact are very complex

@lwindolf
if one want to make the plugin stated above,
is it possible for plugin to get the list of feed from liferea itself?
or should the plugin just directly read from the sql db?

@lwindolf
Copy link
Owner

@asl97 getting the list of feeds should be no problem, directly reading from the DB would not be possible and I think is not a good idea.

@gwern As for the feature request: it is frequently requested by users, so there is demand and interest, still project policy is "no feature requests". This is a 1 developer project and feature requests do not make sense.

@gwern
Copy link
Contributor Author

gwern commented Jul 17, 2014

asl97:

(and also use up a lot of bandwidth)

Bandwidth which is predictably going to be used anyway and which can be loaded in the background when the user is reading something or off doing something else entirely.

in fact, i am thankful that it doesn't start downloading every single image in my feed

I'm not. Maybe you're blessed with a connection so fast you can take the UI hit of loading images as late in the process as possible, but I'm not. And the download could happen at any point - after it finishes syncing all the RSS feeds would be a fine time, for example.

change the link and store it in the db?

You're making this more complex than it needs to be. Store each image named as the image URL. This is unique, requires no modification, makes lookup very easy, and is easy to GC (simple n^2 algorithm: for each local image, scan all RSS feeds, and if the image URL is not stored in any RSS feed, delete it). If the file isn't there, then load the URL as usual.

if we change it on the fly, we would need to store the location of the file

No, see above. If the cached filename is deterministic, then there's no need to store the location.

what if you or someone delete the image? there would be no way to get another copy since the link been change unless you manage to get the feed again

Uh, what happens if someone deletes an image now? Liferea can't load it, obviously! I don't see how this is any worse (and it sounds much better as it helps with the censorship case where someone posts something and realizes that maybe they shouldn't've).

what if changing the link break the feed?

Why would it do that...?

as you see, there is a lot of things to think about, most idea sound simple but are in fact are very complex

No, I really think you are making this more complex than it needs to be.

@asl97
Copy link
Contributor

asl97 commented Jul 18, 2014

@gwern you aren't reading my post right, try following the list nest
edit: btw, have you try that combined view? it should do what you ask
it would download all the image in the feed and most likely will redownload it again (unnecessary*) when the cache magically clear.

*tbh, most people don't read the same feed twice

@gwern
Copy link
Contributor Author

gwern commented Jul 18, 2014

I rather think I was reading it right, and what on earth is a 'list nest'?

@asl97
Copy link
Contributor

asl97 commented Jul 18, 2014

  1. lets forget about the list.
  2. think about it, if the file is store as url encoded path name (you can't have / in a file name), the right click save image as would end up saving with the url encoded path name instead of the file name and i don't think anyone would want that.
  3. have you tried that combined view?
  4. you could always use a filter do to what you said, have it parse the rss, download the image and store it somewhere then change the link to point to the local storage.

@gwern
Copy link
Contributor Author

gwern commented Jul 18, 2014

the right click save image as would end up saving with the url encoded path name instead of the file name and i don't think anyone would want that.

How often do you do that? If 'most people don't read the same feed twice', they certainly don't save a lot of images in their RSS reader.

have you tried that combined view?

Yes, I don't much care for it from the UI perspective (it improves on the preloading part, but is too loose an interface for my tastes).

you could always use a filter do to what you said, have it parse the rss, download the image and store it somewhere then change the link to point to the local storage.

If I was going to set up a complex out-of-reader architecture, I'd just use Squid.

@gwern
Copy link
Contributor Author

gwern commented Jul 29, 2014

have you tried that combined view?

Oh, I never did reply to that one did I. Yes, I am familiar with the combined view, but that's a cure which is almost as bad as the disease: the combined view has really nasty behavior with marking items read, which then makes them disappear, and, as far as I can tell with Liferea, there is no easy way to call back up 'read' items - or any way at all short of blowing away your entire Liferea installation and refetching every feed to view all read RSS items. (I'd file an issue about that, but I guess that would be a feature, and we wouldn't want that, now would we?)

Specifically, to give an example from a few seconds ago: I'm reading the combined view when not all the feeds have been fetched yet, and it occurs to me that if I could sort the items oldest to newest, it would make for a more sensible experience. So I right click on the top-level folder and look at the options, but go figure, there's no such sorting option. Oh well. I return to reading, hit space bar, and... Liferea jumps to my last subfolder ('wiki'), because apparently I forgot to click in the main pane and the focus was left in the folder sidebar pane and this has the fun side-effect of marking as read all 125 items, whether or not I had scrolled down to them yet. And as mentioned, there's no way to retrieve 'read' items, so what do I do now? Hope there was nothing important in the 125, I suppose.

So it's a matter of picking my poison: either safely navigate item by item while suffering excruciating image loading times, or use the combined view to load all the images simultaneously but suffer the risk of all unread items being wiped out by an errant mouse click or lack of click.

@lwindolf
Copy link
Owner

I totally see your point. The combined view is too simple in its current form. It should be at least somewhere on the functionality of online aggregators where only items that you did scroll by completely (or >50%) are marked read. Sadly I don't have the time to build this.

Since 1.10 the going back to previously read items is much improved (when not using the combined view) with the item history feature.

@gwern
Copy link
Contributor Author

gwern commented Jul 29, 2014

Since 1.10 the going back to previously read items is much improved (when not using the combined view) with the item history feature.

This was with Debian's version 1.10.9 of Liferea. I assume you mean how when you click on a specific RSS feed, rather than folder or subfolder, it'll show you the old items? Yeah, that's better than nothing, but it still doesn't help with my example from last night - am I really going to click through and hand-inspect the read items 347 times for all 347 RSS feeds I have?

@lwindolf
Copy link
Owner

Nope. I mean the new toolbar buttons to navigate backwards and forwards in your reading history.

@lwindolf
Copy link
Owner

Closing this ticket as it is a feature request.

@lwindolf lwindolf self-assigned this Jul 29, 2014
@gwern
Copy link
Contributor Author

gwern commented Jul 29, 2014

I mean the new toolbar buttons to navigate backwards and forwards in your reading history.

What are those? I don't see anything relevant when I look at all-read folders or specific feeds. (I do see 2 little grayed-out inaccessible arrows which don't do anything, but that can't be what you mean.)

@asl97
Copy link
Contributor

asl97 commented Sep 7, 2014

@lwindolf, if i make an issue asking about how to get the list of feed in a plugin from liferea or is it currently impossible to do so, you would most likely say i am misusing the ticket system so i am just going to ask it here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants