Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get unread feeds directly from NewsBlur #513

Open
Limero opened this issue May 7, 2019 · 13 comments
Labels

Comments

@Limero
Copy link
Contributor

@Limero Limero commented May 7, 2019

The way NewsBoat currently work with NewsBlur is, it fetches a list of all subscribed feeds and then manually checks every single one of them. I think it would be better if NewsBoat fetched the status of the feeds directly from NewsBlur and showed it instantly, and then only recheck them manually if triggered.

The GET /reader/feeds endpoint currently used will give this information, so it might not be too difficult to adjust the current implementation.

This would make NewsBoat with NewsBlur load in seconds instead of minutes, and would make it way more useful.

@Minoru

This comment has been minimized.

Copy link
Member

@Minoru Minoru commented May 8, 2019

@t6 @aniranc Hey, do you still use Newsbeuter/Newsboat with NewsBlur? Do you have time to investigate this suggestion?

@Limero Thanks for the heads-up. I haven't really touched that API, but if noone else looks into it, I'll do it.

@Minoru Minoru added the enhancement label May 8, 2019
@Limero

This comment has been minimized.

Copy link
Contributor Author

@Limero Limero commented Jun 24, 2019

Nobody else seems to be interested to investigate. Did you ever have time to look into this? @Minoru

@Minoru

This comment has been minimized.

Copy link
Member

@Minoru Minoru commented Jun 25, 2019

No, sorry. All I did just now is took a glance at the API docs for GET /reader/feeds. It only provides unread counts, not lists of unread IDs, so we can't use it to speed up fetching the unread items. Using the counts directly is not an option either: Newsboat never stores them, because it's assumed that each RssFeed object contains all its RssItems. This assumption is cemented by our support for killfiles; we can't calculate the number of (unread) items unless we know how many of them are killed, which implies we need to see all items.

So I guess this particular route is closed to us; Newsboat is not a client to any of the remote services it interacts with, so we won't special-case this for NewsBlur (even if we could, which I don't think we can).

With that said, I'm not ready to close the issue. The original problem is that Newsboat takes a long time to start up—not the specific API endpoint it uses. So the next step is to look at what our code currently does, compare that to what endpoints NewsBlur currently provides, and decide if there is something we can do to speed things up. I'm not committing to do this anytime soon, though—sorry, but I don't think I'll be able to hold that word.

@Limero

This comment has been minimized.

Copy link
Contributor Author

@Limero Limero commented Jun 25, 2019

GET /reader/feeds returns a list of all subscribed feeds. Feeds where "nt" > 0 means that they have unread items. Isn't it possible to grab the whole list, remove all items with "nt" = 0 and then check the rest of the feeds with GET /reader/river_stories?

That would make Newsboat almost instant for Newsblur. Or is that the special-case that should be avoided?

@Minoru

This comment has been minimized.

Copy link
Member

@Minoru Minoru commented Jun 25, 2019

No, the special cases I had in mind are more like "let's treat NewsBlur feeds as special entities, served not by RssFeed/RssItem but by classes specific to NewsBlur".

Your suggestion seems fine, but won't it break if user marks some feed read before Newsboat fetches it? As long as the count is zero, Newsboat will be missing some articles, which will likely confuse the user.

@Limero

This comment has been minimized.

Copy link
Contributor Author

@Limero Limero commented Jun 25, 2019

The whole process can be done with two GET requests right after each other. If you call GET /reader/feeds and then the user marks one of the feeds read, when the GET /reader/river_stories request is done, it will just return empty for that feed. So shouldn't be a problem.

The items of multiple feeds can be fetched at the same time, by appending the feed ids to the path
river_stories?feeds=123&feeds=456

This is how the official NewsBlur webui works.

  1. Call GET /reader/feeds
  2. Get feed ids of all feeds with nt > 0 from the results
  3. Call river_stories once with the feed ids
@Minoru

This comment has been minimized.

Copy link
Member

@Minoru Minoru commented Jun 25, 2019

I had a different situation in mind:

  1. Newsboat is not running
  2. User opens NewsBlur, sees there are new items in feed X, and reads them all (or marks the feed read—doesn't make a difference)
  3. User starts Newsboat, which sees nt==0 and doesn't fetch that feed, even though there are some items there that Newsboat doesn't have yet
  4. User opens feed X in Newsboat and sees that there are no items there. Reloading the feed doesn't help—Newsboat detects nt==0 and doesn't load anything.

But wait, I lost where we were going with this. On startup, Newsboat does two things: authenticates with NewsBlur and requests a list of feed URLs. Those are done with NewsBlurApi::authenticate (one request to /api/login) and NewsBlurApi::get_subscribed_urls (one request to /reader/feeds). You sure that's the stage that takes minutes to complete?

You can estimate this very simply by the output. Newsboat prints "Loading URLs from NewsBlur..." before calling authenticate and get_subsribed_urls. Then it calls then, and then it prints "done.". Then it prints "Loading articles from cache..." and proceeds to parts that don't do any network requests.

I bet it spends the most of time loading the articles. It's a known problem, especially with large number of items and/or large number of killfiles.

@Limero

This comment has been minimized.

Copy link
Contributor Author

@Limero Limero commented Jun 26, 2019

I understand what you mean now. The NewsBlur webui solves this by fetching the items of the feeds only when it's opened (with GET /reader/feed/123). Newsboat could do the same when a feed with nt = 0 on the last sync is opened. Main problem with this would be the added latency when opening feeds and the count being incorrect until the feed is refreshed.

I don't know the optimal solution.

I have many feeds and it takes forever to manually check every single one of them every time I open Newsboat.

@Minoru

This comment has been minimized.

Copy link
Member

@Minoru Minoru commented Jun 30, 2019

Yeah, "fetch items when the feed is opened" is the kind of special-casing I wrote about, above. Let's not do that; it doesn't fit Newsboat at all. "Newsboat way" is "having everything in our cache, and updating the cache explicitly during reloads".

Were you able to confirm that the startup is slow because it takes a long time to open the cache, @Limero?

I have many feeds and it takes forever to manually check every single one of them every time I open Newsboat.

There are reload-all operation, bound to "R" by default. You might speed it up some if you increase the number of reload-threads, but NewsBlur might be limiting the number of concurrent connections—you'll have to experiment with that.

@Limero

This comment has been minimized.

Copy link
Contributor Author

@Limero Limero commented Jul 30, 2019

The thing that takes a long time is the reload-all. Cached entries are instant.

I think we won't be able to fix this. It's an inherent problem with the "Newsboat way" you mention. A way that has both advantages and disadvantages.

The solution would have to be something like an optional setting that made Newsboat act like a dumb client, never manually fetching the feeds. I don't know if this is anything that would be acceptable to add.

@Minoru

This comment has been minimized.

Copy link
Member

@Minoru Minoru commented Aug 3, 2019

The thing that takes a long time is the reload-all. Cached entries are instant.

Did changing the number of reload threads help?

Getting back to the original issue: What I wanted you to do is start Newsboat and look at its output before the user interface appear. There are four distinct stages there:

  1. Newsboat prints "Loading URLs from NewsBlur...";
  2. Newsboat prints "done.";
  3. Newsboat prints "Loading articles from cache...";
  4. Newsboat shows the user interface.

The delay is somewhere between those four; I'd like to know which ones. This will help me understand what functions cause the issue.

The solution would have to be something like an optional setting that made Newsboat act like a dumb client, never manually fetching the feeds.

Sorry, but no. This is a fundamental change, and I think it's way easier to create a separate program and re-implement all the relevant Newsboat features than try to bend Newsboat into supporting both ways of doing things.

@Limero

This comment has been minimized.

Copy link
Contributor Author

@Limero Limero commented Sep 10, 2019

Changing the number of threads definitely makes it faster. But it's still not close to instant, because it's not a client and it's manually checking all feeds.

This is the thing that takes time, and yes, I don't think we can ever solve this without making a separate program with different behavior.
image

@Minoru

This comment has been minimized.

Copy link
Member

@Minoru Minoru commented Sep 10, 2019

Ah, okay, so the startup itself is fast—the slow part is initial reloading of all feeds. I see.

Slow reload can be mitigated somewhat if we keep a persistent connection(s) to the server, and route all requests through it instead of creating a new one in each thread. Apart from that, I'm not sure if we can do anything to improve it further.

Lack of maintainers sure doesn't help here. Basically, if you want it to work better, you got to work on it yourself, because no-one else does at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.