Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Articles Being Incorrectly Auto-Purged...? #120

Open
tonyduckles opened this Issue · 2 comments

1 participant

@tonyduckles

I'm seeing a strange problem on my self-hosted install of Newsblur: only the newest 100 articles from any given feed are sticking around in the back-end database. Or at least that's what I seems like:

  • For high-traffic feeds, after letting them sit for a few days, they're all pegged at 100 unread items. If I view the feed in All + Newest mode, the list ends after 100 articles.
  • For mid-traffic feeds, even though the unread count isn't at 100, I'm still only able to see 100 articles total in All + Newest mode.

I definitely don't see this behavior on http://dev.newsblur.com/. So, that makes me think it's some kind of problem with my config/settings. I didn't see anything obvious in the settings.py or local_settings.py files. And I didn't see any obvious settings anywhere in the back-end MySQL database.

A few more details, in case they're relevant:

  • My install is following the "circular" branch from Github.
  • I used ./manage.py loaddata config/fixtures/bootstrap.json during the install process to bootstrap my DB.
  • I have a crontab entry that is running manage.py refresh_feeds --force every 15 minutes. I also have other crontab entries for running manage.py collect_stats and ... collect_feedback every 15 minutes.

Any ideas on what's going on here? Is there some setting/tunable that I'm overlooking?

Thanks!

@tonyduckles

Hmm, this sounds like the same problem:
https://getsatisfaction.com/newsblur/topics/feed_cut_off_after_100_entries

On the upside, I can easily tweak the code in my self-host install. :)

FWIW, I tend to agree with the commenters in the above thread. Once some of the scalability concerns are eased with production newsblur.com, I'd love to see relaxed article retention handling on the production site. Rather than having the logic be based on article count + subscriber count, I think it'd (arguably) make sense to tweak trim_feeds to be date-based instead, e.g. purge articles older than say 6 months (rather than a set upper-bound of articles per feed). That kind of date-based logic seems like it would work well on both huge-multi-user sites (e.g. newsblur.com) as well as small single-user self-hosted sites.

@tonyduckles

I took a stab at this:
https://github.com/tonyduckles/NewsBlur/compare/smarter_trim_feed

The idea is to keep either (a) 500 articles per feed or (b) the past 180 days worth of articles, whichever is greater. I opted to add a DAYS_UNTIL_PURGE option to settings.py so that folks could override this to "0" if they wanted to disable article-purging altogether (e.g. for RSS packrats).

Also, I suppose I could rename "DAYS_UNTIL_PURGE" to "DAYS_UNTIL_TRIM" to try to be consistent with trim_feeds. And I suppose we could add a "TRIM_CUTOFF" option to settings.py if we wanted an easy site-customizable setting.

Let me know if this seems interesting to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.