Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
I'm running the above code (Ruby 1.9.3). I'm monitoring the URL on my feed reader's window alongside my console. The updated_feed.updated? method seems to be returning true or false totally randomly. It may return true when the feed hasn't been updated and it may return false when the feed has been updated.
updated_feed.new_entries.length will always be 0.
Hi @RedFred7, sorry this isn't working as you'd expect - to be totally honest, I don't think very highly of the update features of Feedjira. What I recommend is that users stick to the
Can you tell me a little more about what you're doing with the library - maybe I can help point you in the right direction.
Hi Jon and thanks for the quick reply.
I'm working on a collaboration app that requires RSS integration, that
The trouble is, as I said in the issue description, that #updated?
I've been getting round the issue by keeping track of the feed entries
Couple of questions if I may:
Any guidance on what I need to do in order to know when the feed's
On Fri 11 Apr 2014 22:26:22 BST, Jon Allured wrote:
Hey @RedFred7, thanks for taking the time to write that out - hopefully I can provide some insight!
I think you're trying to decide if you can trust
# lib/feedjira/feed_utilities.rb def find_new_entries_for(feed) return feed.entries if self.entries.length == 0 latest_entry = self.entries.first found_new_entries =  feed.entries.each do |entry| if entry.entry_id.nil? && latest_entry.entry_id.nil? break if entry.url == latest_entry.url else break if entry.entry_id == latest_entry.entry_id || entry.url == latest_entry.url end found_new_entries << entry end found_new_entries end
Here you can see that
But more broadly, I wanted to talk strategy for a sec. Like I said in my comment yesterday, I don't feel very good about the update parts of Feedjira. Even if they worked a little more consistently, I think the approach is pretty naive.
If you take a look at that
The reality is that feed authors do all sorts of wacky things and break this implementation. What if an article in the feed has a typo in the url and they "fix" it after you've already seen the article? Should that be a new entry or not? Currently, it would be added to the list. What if the content of the post has been updated, but entry_id and url don't change? Should that be a new entry? Because with this implementation it wouldn't be.
What if the order of the posts change?
There are all kinds of business rules here that a Feedjira user will need to decide for themselves and thus, I believe the update stuff is of no value. I think users of Feedjira should stick to
If you want to see an implementation of updating your Feeds in a Rails context, I'd recommend you take a look at Stringer. Its a user of Feedjira and I like how the updating works, see
The business logic here is still a little naive, but that was the choice they made - they didn't try to rely on Feedjira deciding which articles were new, they just fetch them all each time the job is run and then use their own code to find the ones that are new. Simple and completely in their control.
Sorry for the novel, I've been sketching what Feedjira 2.0 might look like recently and so this stuff has been on my mind. I hope that helps and I'd be really interested in any feedback you might have.
Will chime in (with another novel - sorry!) and say that things aren't all sunshine and rainbows for Stringer either :)
I think a simple but largely effective approach is to use the URL as a unique constraint. Trying to break out early is dangerous - most feeds are ordered by a timestamp (which could be when it was created, updated, published, etc) but there is nothing stopping someone from publishing a draft that was written two weeks ago (and timestamped in the past) at a later date.
You can't really trust the
Even if you completely distrust the feed's timestamps and use a unique URL scheme as I mentioned above - there are some feeds that just update posts with new information, but keep the same URL. One particular example I remember was an auction site that would constantly be updated as items were sold/posted for sale. You can't really know if the author was fixing a typo (probably not necessary to alert an end user) or a larger content update (probably does warrant an alert).
You will still find some cases where a blogger moves platforms or domains and all of the URLs change - now you've got a 100 "new posts" that are actually old content!
For your particular use case: I would recommend you store the URLs of every entry in the feed in a database (or some other persistent source) and detect "new entries" by checking for the existing of the URL. I think you'll find that this gets you 90% of the way there with much less headache than trying to compare timestamps.
thanks very much for both your replies. I see what you're getting at,
My only concern is the extra overhead it takes to keep track of all
Ideally, what I'd like from an RSS gem or service would be an
If that's agreeable with Jon, may be I could fork Feedjira and try to
sorry about the length of this email, got myself in the 'novel' mood. :D
On Sun 13 Apr 2014 21:39:35 BST, matt swanson wrote:
@RedFred7 you may want to have a look at https://superfeedr.com/ then - they handle all the feed parsing and hit a callback when new items are added. I don't know how robust their "new post detection" algorithms are, but I assume they have at least the same functionality as feedjira (if not better).
Maybe @julien51 can weigh in :)
very interesting! I'll have a play with their API and see how it goes,
On Mon 14 Apr 2014 17:27:56 BST, matt swanson wrote:
Thanks @swanson for the mention! @RedFred7 I'd be happy to help directly should you have any question/problem. Feel free to post to https://github.com/superfeedr/documentation/issues?state=open
yes please do close this. Also, thanks for the insights during our
On Sun 20 Apr 2014 14:25:40 BST, Jon Allured wrote: