Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Old posts showing up in feed #69

Closed
LinAGKar opened this issue Feb 18, 2022 · 9 comments
Closed

Old posts showing up in feed #69

LinAGKar opened this issue Feb 18, 2022 · 9 comments

Comments

@LinAGKar
Copy link

Don't know if the issue is with hnrss or with Hacker News itself, but I'm subscribed to the frontpage feed, and there are a few old posts that keep popping up over and over again in the feed. This happens with:

https://news.ycombinator.com/item?id=24352278
https://news.ycombinator.com/item?id=25027279
https://news.ycombinator.com/item?id=26150497
https://news.ycombinator.com/item?id=26531943
https://news.ycombinator.com/item?id=29048079

@edavis
Copy link
Member

edavis commented Feb 18, 2022

Thanks for the report. That is really strange. I can't think of a reason why these would be lobbed onto /frontpage like that.

  • Are you using any filters (e.g., points=X or comments=X) or search queries on this feed?
  • FWIW, /frontpage is just a wrapper around https://hn.algolia.com/api/v1/search_by_date?tags=front_page and I'm not seeing these old stories at the moment.
  • /frontpage can sometimes be an odd fit for an RSS feed because occasionally when a story gets tagged with front_page it'll show up as like the fifth <item> in the feed.
  • What RSS reader are you using?

At this point I think the most likely scenario is some old data is getting served up by Algolia (maybe from some stale cache?). Though any decent feed reader should track the <guid> so it only shows up once... unless the ID is somehow changing, too?

@LinAGKar
Copy link
Author

LinAGKar commented Feb 18, 2022

I am not using any filters.

They only seem to show up occasionally, though the same ones each time.

I'm using Nextcloud News. It seems like it only remembers a limited number of items, so it forgets the old entries and when they reappear it shows them as unread again.

Edit: Looks like you can configure how many items it keeps, current is 200. I'll try increasing that.

@edavis
Copy link
Member

edavis commented Feb 18, 2022

Gotcha, thanks.

Still not sure how/why these old posts are re-appearing, but an increased limit should at least make it less annoying.

When did these old posts start re-appearing? Is this the first time or has it happened before?

@LinAGKar
Copy link
Author

It's been like that since I added the feed, half a year or so ago.

@ntns
Copy link

ntns commented Feb 22, 2022

Hi!

I have the same issue when I query https://hnrss.org/frontpage.jsonfeed?count=50

The problem seems to be on Algolia's side as I get the same old stories if I query https://hn.algolia.com/api/v1/search_by_date?tags=front_page&hitsPerPage=50 (the last 5 stories are the stories mentioned by @LinAGKar )

Also, Algolia is not respecting the hitsPerPage=50 filter and only returns 35 results.

Edit: I guess Algolia only returns the stories on the front page right now and not the stories that have appeared in the front page in the past. The solution for me will be limiting the count to 30.

@edavis
Copy link
Member

edavis commented Feb 22, 2022

Wow, this is a great find. Many thanks. Here's what I think is happening:

It looks like Algolia is always adding and removing that front_page tag as stories rise onto and fall off the HN frontpage. So when everything is working as intended, there should only be 30 stories in the Algolia search index with the front_page tag.

So right now when Algolia answers a request for 50 stories tagged with front_page:

  • What should happen is Algolia returns all 30 current stories tagged with front_page
  • What is happening instead is Algolia returns all 35 stories tagged with front_page — 30 of which are actually on the HN frontpage right now, and five from 2021-10-30 and earlier that never had their front_page tag removed once they fell off the HN frontpage.

As far as Algolia is concerned, those older posts are still on the HN frontpage and it's just doing what you asked.

(So I don't think the 35 results is an Algolia limit as much as it's just the total number of stories with the front_page tag.)

In @LinAGKar's case, hnrss.org/frontpage without any filters is an Algolia query for tags=front_page which by default returns 20 hits. I'm still a little stumped on how those older posts show up as there should always be at least 30 current stories tagged with front_page... but if somehow enough current HN frontpage stories lost their front_page tag and weren't replaced with newly tagged stories then the older stories could become visible inside hnrss.

I'll open an issue with the Algolia folks, maybe they'll have an idea. Feel free to post if you notice anything else relevant. Thanks again!

@agg23
Copy link

agg23 commented Aug 1, 2022

These same posts are still appearing for me. I am using the basic front page feed: https://hnrss.org/frontpage

@spiffytech
Copy link

Just FYI: this still happens with new stories, not just stories from the date range described above.

https://news.ycombinator.com/item?id=33766243
https://news.ycombinator.com/item?id=33768449
https://news.ycombinator.com/item?id=33769009

All from 2022-11-27 / 2022-11-28.

I'm using the Bazqux RSS reader with this feed URL: https://hnrss.org/frontpage?comments=5&link=comments&count=100

@edavis
Copy link
Member

edavis commented Mar 5, 2023

Alright, I've deployed a tentative fix for this.

Since it doesn't seem like Algolia will fix their side, I've add a workaround to hnrss.

When fetching front_page stories from Algolia, only posts created in the past week are returned (218feb8).

If there still seems to be an issue, feel free to re-open. Thanks everybody for your reports on this.

@edavis edavis closed this as completed Mar 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants