New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thoughts on Version 2.0 #221

Closed
jonallured opened this Issue Apr 15, 2014 · 20 comments

Comments

Projects
None yet
10 participants
@jonallured
Member

jonallured commented Apr 15, 2014

I wrote a blog post outlining some of my thoughts for the next version of Feedjira, which will be a fairly major rewrite.

In that post, I asked for your feedback, so please feel free to share! ❤️ ❤️ ❤️

@amatriain

This comment has been minimized.

Show comment
Hide comment
@amatriain

amatriain Apr 15, 2014

Contributor

I suspect I'll be in the minority, but I'll go ahead anyway: I think feedjira attempts to do too much. Its features can be classified into:

  • fetching and updating feeds. This is the part of the gem that deals with HTTP and HTTP caching headers (if-modified-since, etc). Currently this is implemented as a wrapper around curb.
  • parsing the XML fetched by the above and returning a collection of objects (feed, entries etc).

I see them as functionally independent, and I don't think the gem should be doing both things. I don't see much value added in having those HTTP client capabilities in the gem, I'd prefer that feedjira just parsed XML and returned feed and entry objects. This way the developer would be free to choose whatever HTTP client implementation he preferred, and feedjira itself would be free of dependencies from HTTP gems. Also this would simplify feedjira's design and make maintenance easier.

This is what I'm doing in my current project. I don't use feedjira's fetching capabilities at all: instead I fetch feeds using RestClient and parse them with Feedjira::Feed.parse. This works fine, but because of the way feedjira is currently implemented I have curb as a transitive dependency in my project, despite it not being used at all. Needless to say I'm not happy having to deploy a gem which never gets invoked.

So my suggestion would be this: remove fetching capabilities from feedjira. Do not bother switching curb with another gem, just remove the HTTP wrapper. Some examples in the documentation showing how to use some of the popular HTTP gems (RestClient, curb, even plain Net::HTTP) and how to pass the response body to feedjira for parsing would be enough. Or at least consider moving HTTP client features to a separate project, so that feedjira can be used without dragging the curb dependency.

Contributor

amatriain commented Apr 15, 2014

I suspect I'll be in the minority, but I'll go ahead anyway: I think feedjira attempts to do too much. Its features can be classified into:

  • fetching and updating feeds. This is the part of the gem that deals with HTTP and HTTP caching headers (if-modified-since, etc). Currently this is implemented as a wrapper around curb.
  • parsing the XML fetched by the above and returning a collection of objects (feed, entries etc).

I see them as functionally independent, and I don't think the gem should be doing both things. I don't see much value added in having those HTTP client capabilities in the gem, I'd prefer that feedjira just parsed XML and returned feed and entry objects. This way the developer would be free to choose whatever HTTP client implementation he preferred, and feedjira itself would be free of dependencies from HTTP gems. Also this would simplify feedjira's design and make maintenance easier.

This is what I'm doing in my current project. I don't use feedjira's fetching capabilities at all: instead I fetch feeds using RestClient and parse them with Feedjira::Feed.parse. This works fine, but because of the way feedjira is currently implemented I have curb as a transitive dependency in my project, despite it not being used at all. Needless to say I'm not happy having to deploy a gem which never gets invoked.

So my suggestion would be this: remove fetching capabilities from feedjira. Do not bother switching curb with another gem, just remove the HTTP wrapper. Some examples in the documentation showing how to use some of the popular HTTP gems (RestClient, curb, even plain Net::HTTP) and how to pass the response body to feedjira for parsing would be enough. Or at least consider moving HTTP client features to a separate project, so that feedjira can be used without dragging the curb dependency.

@swanson

This comment has been minimized.

Show comment
Hide comment
@swanson

swanson Apr 19, 2014

Contributor

I see the primary use-case as such (I personally think defining this helps give "vision" and gives us something to fall back on when debating features):

Given the URL of any type of RSS feed, return a normalized Ruby representation of the feed and stories.

As per of other discussions about new_stories - I would shed the portions of the codebase that handle that. If feedjira cannot reliably detect new items, then this feature is not very useful and potentially a source of headache for both users (you could feel mislead) and maintainers (who have to debug individual mal-formed feeds).

I like having the HTTP client built-in - less stuff I have to do. Maybe the solution for @amatriain's concerns is to make the client pluggable (though not sure if this would completely remove the need for the curb/faraday gem).


I'll add my thoughts as a consumer of the gem:

  • Change HTTP library: not of personal interest to me, don't use jRuby.
  • Forget concurrency: +1
  • More Objects: doesn't affect consumers, but if it aids in development +1
  • Fewer parsers: I'd have to see how this shakes out, but I'd like to not write custom parsers - I am taking a dependency of feedjira to handle this for me. Unless this is really an edge-case for very strange feeds, I would be -1 on this
  • Custom parser project: see above
  • Exceptions vs callbacks: +1, I don't think callbacks fits unless you are doing some concurrency stuff
  • Feed updating is business logic: see my above comments
  • Configuration: +1, seems reasonable
Contributor

swanson commented Apr 19, 2014

I see the primary use-case as such (I personally think defining this helps give "vision" and gives us something to fall back on when debating features):

Given the URL of any type of RSS feed, return a normalized Ruby representation of the feed and stories.

As per of other discussions about new_stories - I would shed the portions of the codebase that handle that. If feedjira cannot reliably detect new items, then this feature is not very useful and potentially a source of headache for both users (you could feel mislead) and maintainers (who have to debug individual mal-formed feeds).

I like having the HTTP client built-in - less stuff I have to do. Maybe the solution for @amatriain's concerns is to make the client pluggable (though not sure if this would completely remove the need for the curb/faraday gem).


I'll add my thoughts as a consumer of the gem:

  • Change HTTP library: not of personal interest to me, don't use jRuby.
  • Forget concurrency: +1
  • More Objects: doesn't affect consumers, but if it aids in development +1
  • Fewer parsers: I'd have to see how this shakes out, but I'd like to not write custom parsers - I am taking a dependency of feedjira to handle this for me. Unless this is really an edge-case for very strange feeds, I would be -1 on this
  • Custom parser project: see above
  • Exceptions vs callbacks: +1, I don't think callbacks fits unless you are doing some concurrency stuff
  • Feed updating is business logic: see my above comments
  • Configuration: +1, seems reasonable
@swanson

This comment has been minimized.

Show comment
Hide comment
@swanson

swanson Apr 19, 2014

Contributor

Just a couple random ideas:

  • Auto-detect feeds from non-RSS urls (removes need to also use feedbag/columbus gem for discovery)
  • OPML import/export? Seems like it might be out of scope, but I couldn't find a decent gem for this
  • Increase/bring up to date the sample feeds for integration testing - would be great to say that feedjira has rock-solid parsing of the Top X feeds by popularity
  • Some way for the gem to "phone home" and record feeds that didn't parse correctly - probably not feasible but would be awesome if the gem could detect malformed feeds and record them somewhere to be fixed later.
  • Better documentation of the normalized feed objects - seems like lots of stuff is aliased, would be good to the guaranteed attributes clearly defined (A Story has these fields: X, Y, Z - remove public accessors for non-standard attrs)
  • Better documentation on options: timeout, max_redirects, user_agent were all fields that I needed to configure down the line, but didn't even know existed when I first used the gem
  • Support for authenticated feeds? Not sure what this actually entails, but did receive a few requests for this on Stringer - example: authenticated feeds for paid screencasts
  • Support for audio/video podcast feeds?
  • Auto-fix relative urls - some feeds use relative links ("/blog/a-different-post") that should usually be expanded to "example.com/blog/a-different-post" (here is our implementation: https://github.com/swanson/stringer/blob/master/app/repositories/story_repository.rb#L92)
  • I don't love the name fetch_and_parse - if I am giving feedjira a URL I am expecting the fetching to happen, otherwise how would it parse? Plus the name is hard to type ;)
Contributor

swanson commented Apr 19, 2014

Just a couple random ideas:

  • Auto-detect feeds from non-RSS urls (removes need to also use feedbag/columbus gem for discovery)
  • OPML import/export? Seems like it might be out of scope, but I couldn't find a decent gem for this
  • Increase/bring up to date the sample feeds for integration testing - would be great to say that feedjira has rock-solid parsing of the Top X feeds by popularity
  • Some way for the gem to "phone home" and record feeds that didn't parse correctly - probably not feasible but would be awesome if the gem could detect malformed feeds and record them somewhere to be fixed later.
  • Better documentation of the normalized feed objects - seems like lots of stuff is aliased, would be good to the guaranteed attributes clearly defined (A Story has these fields: X, Y, Z - remove public accessors for non-standard attrs)
  • Better documentation on options: timeout, max_redirects, user_agent were all fields that I needed to configure down the line, but didn't even know existed when I first used the gem
  • Support for authenticated feeds? Not sure what this actually entails, but did receive a few requests for this on Stringer - example: authenticated feeds for paid screencasts
  • Support for audio/video podcast feeds?
  • Auto-fix relative urls - some feeds use relative links ("/blog/a-different-post") that should usually be expanded to "example.com/blog/a-different-post" (here is our implementation: https://github.com/swanson/stringer/blob/master/app/repositories/story_repository.rb#L92)
  • I don't love the name fetch_and_parse - if I am giving feedjira a URL I am expecting the fetching to happen, otherwise how would it parse? Plus the name is hard to type ;)
@benubois

This comment has been minimized.

Show comment
Hide comment
@benubois

benubois Apr 19, 2014

Contributor

@jonallured I'm super happy that you've taken over as maintainer. Feedjira is a great project and it sounds like there is a bright future ahead!

Change HTTP Library

I agree with @amatriain about making the HTTP library pluggable or leaving it out entirely.

The main benefit I get from Feedjira is as a feed parser. Including an HTTP library makes it easy to get started and for updating feeds, but if the latter is no longer on the table a BYO HTTP solution makes more sense.

Forget Concurrency

+1 for this not being the reality. It's worked better for me to process feeds one at a time and leave the concurrency to Sidekiq.

Fewer Parsers

I think the notion of a parser only makes sense if it helps to write the code that parses the feed, but this is an implementation detail users of Feedjira shouldn't have to worry about.

The most helpful representation of a feed would be to provide the data that is present in the feed in a consistent manner like Python's feedparser.

The way parsers are set up now don't really reflect how people write RSS/Atom.

People frequently mix in atom tags in RSS i.e. <atom:link> etc. Also something like iTunes RSS is RSS 2.0. with extra namespaced tags.

Raise Exceptions Instead of Callbacks

Getting rid of callbacks sounds fine, but I don't like the sound of exceptions.

I think exceptions should only be used if something truly exceptional happens.

HTTP error codes are too common to be treated as exceptions. I'd rather see a consistent object returned like:

OpenStruct.new({
    http_request: '...',
    http_response: '...',
    feed: '...'
})

This way in the application you could check for a parsed feed:

if result.feed.present?
  # do stuff with the feed
    result.feed.entries.each...
else
  # see what went wrong
    result.http_response.code...
end

Feed Updating is Business Logic

👍

I never really tried Feedjira's feed updating implementation.

I've done OK with generating ids based on entry attributes that should't change. These ids go into Redis. This way during a refresh the code checks if the key exists in Redis. If not it's probably a new article and gets inserted into the database.

If a guid is present something like sha1(feed_url + guid) works pretty well.

If not sha1(feed_url + entry_url + published + title)

Configuration

👍

Contributor

benubois commented Apr 19, 2014

@jonallured I'm super happy that you've taken over as maintainer. Feedjira is a great project and it sounds like there is a bright future ahead!

Change HTTP Library

I agree with @amatriain about making the HTTP library pluggable or leaving it out entirely.

The main benefit I get from Feedjira is as a feed parser. Including an HTTP library makes it easy to get started and for updating feeds, but if the latter is no longer on the table a BYO HTTP solution makes more sense.

Forget Concurrency

+1 for this not being the reality. It's worked better for me to process feeds one at a time and leave the concurrency to Sidekiq.

Fewer Parsers

I think the notion of a parser only makes sense if it helps to write the code that parses the feed, but this is an implementation detail users of Feedjira shouldn't have to worry about.

The most helpful representation of a feed would be to provide the data that is present in the feed in a consistent manner like Python's feedparser.

The way parsers are set up now don't really reflect how people write RSS/Atom.

People frequently mix in atom tags in RSS i.e. <atom:link> etc. Also something like iTunes RSS is RSS 2.0. with extra namespaced tags.

Raise Exceptions Instead of Callbacks

Getting rid of callbacks sounds fine, but I don't like the sound of exceptions.

I think exceptions should only be used if something truly exceptional happens.

HTTP error codes are too common to be treated as exceptions. I'd rather see a consistent object returned like:

OpenStruct.new({
    http_request: '...',
    http_response: '...',
    feed: '...'
})

This way in the application you could check for a parsed feed:

if result.feed.present?
  # do stuff with the feed
    result.feed.entries.each...
else
  # see what went wrong
    result.http_response.code...
end

Feed Updating is Business Logic

👍

I never really tried Feedjira's feed updating implementation.

I've done OK with generating ids based on entry attributes that should't change. These ids go into Redis. This way during a refresh the code checks if the key exists in Redis. If not it's probably a new article and gets inserted into the database.

If a guid is present something like sha1(feed_url + guid) works pretty well.

If not sha1(feed_url + entry_url + published + title)

Configuration

👍

@jonallured

This comment has been minimized.

Show comment
Hide comment
@jonallured

jonallured Apr 20, 2014

Member

Thanks so much for the great feedback - I really appreciate you guys spending some time writing out your thoughts! 😄

Member

jonallured commented Apr 20, 2014

Thanks so much for the great feedback - I really appreciate you guys spending some time writing out your thoughts! 😄

@gregf

This comment has been minimized.

Show comment
Hide comment
@gregf

gregf Apr 20, 2014

As far as weeding out what features should be part of the core, what about have feedjira-core, and feedjira-more like datamapper and others have done. Keep core stripped down, only parses feeds. Everything else could be stuck in feedjira-more, possibly have it rely more on community contributions.

gregf commented Apr 20, 2014

As far as weeding out what features should be part of the core, what about have feedjira-core, and feedjira-more like datamapper and others have done. Keep core stripped down, only parses feeds. Everything else could be stuck in feedjira-more, possibly have it rely more on community contributions.

@jonallured

This comment has been minimized.

Show comment
Hide comment
@jonallured

jonallured Apr 25, 2014

Member

@gregf interesting idea - I hadn't thought of that for this project. Will ponder! :)

Member

jonallured commented Apr 25, 2014

@gregf interesting idea - I hadn't thought of that for this project. Will ponder! :)

@speedmax

This comment has been minimized.

Show comment
Hide comment
@speedmax

speedmax Jun 18, 2014

Given RSS might contain any sort of extensions (MediaRSS, GeoRSS... etc.)

People frequently mix in atom tags in RSS i.e. atom:link etc. Also something like iTunes RSS is RSS 2.0. with extra namespaced tags.

I'd like to propose a concept of "Extensions" where we start with base RSS/Atom parsers, we mix-in support for itunes, georss, itunes with a simple detection method.

All extensions can be moved to a feedjira-extensions gem

Now, Feedjira's parser becomes a composable. I believe this fits nicely with design goal of XML syndication.

lib/parsers/
    /rss.rb
    /atom.rb
lib/extensions/
    itunes.rb
    feedburner.rb
    media_rss.rb
    geo_rss.rb
    google_docs.rb
# extensions/media_rss.rb
module MediaRSS
    def self.detect?(source)
       source.match(/xmlns:media/)
    end

    def self.extended(base)
       base.instance_eval do
          element "media:content", as: :image, with: { medium: "image" }
          element "media:content", as: :video, with: { medium: "video" }
           # ...
        end
    end
end

speedmax commented Jun 18, 2014

Given RSS might contain any sort of extensions (MediaRSS, GeoRSS... etc.)

People frequently mix in atom tags in RSS i.e. atom:link etc. Also something like iTunes RSS is RSS 2.0. with extra namespaced tags.

I'd like to propose a concept of "Extensions" where we start with base RSS/Atom parsers, we mix-in support for itunes, georss, itunes with a simple detection method.

All extensions can be moved to a feedjira-extensions gem

Now, Feedjira's parser becomes a composable. I believe this fits nicely with design goal of XML syndication.

lib/parsers/
    /rss.rb
    /atom.rb
lib/extensions/
    itunes.rb
    feedburner.rb
    media_rss.rb
    geo_rss.rb
    google_docs.rb
# extensions/media_rss.rb
module MediaRSS
    def self.detect?(source)
       source.match(/xmlns:media/)
    end

    def self.extended(base)
       base.instance_eval do
          element "media:content", as: :image, with: { medium: "image" }
          element "media:content", as: :video, with: { medium: "video" }
           # ...
        end
    end
end
@jonallured

This comment has been minimized.

Show comment
Hide comment
@jonallured

jonallured Jun 18, 2014

Member

Thanks @speedmax, this is a pretty cool idea!

Member

jonallured commented Jun 18, 2014

Thanks @speedmax, this is a pretty cool idea!

@jacobwilliams918

This comment has been minimized.

Show comment
Hide comment
@jacobwilliams918

jacobwilliams918 Aug 1, 2014

I don't know if this is already supported and I'm just not using it properly, or if it isn't supported yet, but I can't seem to access the summary images of a wordpress blog from the parsed RSS feed. I see that there is an attr accessor on image for the entry object, but it is always set to nil, despite the fact that title/summary images are present on the blog.

Again, I suspect that this might just be a failure to set something up properly on my end, but I didn't see any support group contact information, so I figured I'd ask here.

Thanks!

jacobwilliams918 commented Aug 1, 2014

I don't know if this is already supported and I'm just not using it properly, or if it isn't supported yet, but I can't seem to access the summary images of a wordpress blog from the parsed RSS feed. I see that there is an attr accessor on image for the entry object, but it is always set to nil, despite the fact that title/summary images are present on the blog.

Again, I suspect that this might just be a failure to set something up properly on my end, but I didn't see any support group contact information, so I figured I'd ask here.

Thanks!

@swanson

This comment has been minimized.

Show comment
Hide comment
@swanson

swanson Aug 1, 2014

Contributor

Related to: swanson/stringer#324

It looks like at some point in the past, handling gzip by default was turned off (not sure if this reflects the current state: http://www.pauldix.net/2009/04/small-feedzirra-release.html). Would it be worth re-evaluating if it should be on by default going forward?

Contributor

swanson commented Aug 1, 2014

Related to: swanson/stringer#324

It looks like at some point in the past, handling gzip by default was turned off (not sure if this reflects the current state: http://www.pauldix.net/2009/04/small-feedzirra-release.html). Would it be worth re-evaluating if it should be on by default going forward?

@subelsky

This comment has been minimized.

Show comment
Hide comment
@subelsky

subelsky Aug 7, 2014

my wish list would include:

  • Change HTTP library to one that vcr can mock for easier integration testing
  • Ability to pass in an IO object to Feedjira::Feed.parse - I want to do something like: Feedjiri::Feed.parse(URI.parse(url)) and have you stream parse it on the fly. (I've got some good Nokogiri SAX code that's related, if you're interested in seeing it)

subelsky commented Aug 7, 2014

my wish list would include:

  • Change HTTP library to one that vcr can mock for easier integration testing
  • Ability to pass in an IO object to Feedjira::Feed.parse - I want to do something like: Feedjiri::Feed.parse(URI.parse(url)) and have you stream parse it on the fly. (I've got some good Nokogiri SAX code that's related, if you're interested in seeing it)
@jonallured

This comment has been minimized.

Show comment
Hide comment
@jonallured

jonallured Aug 11, 2014

Member

Hi @subelsky, I'll definitely be changed the HTTP library, so hopefully that'll help on the testing front.

As for the stream parse, I'd love to see this code you mentioned, but isn't this what Sax Machine gives us?

Member

jonallured commented Aug 11, 2014

Hi @subelsky, I'll definitely be changed the HTTP library, so hopefully that'll help on the testing front.

As for the stream parse, I'd love to see this code you mentioned, but isn't this what Sax Machine gives us?

@subelsky

This comment has been minimized.

Show comment
Hide comment
@subelsky

subelsky Aug 11, 2014

I noticed that after I posted while researching another issue. If we could pass in an openable-IO object then yes it should work perfectly. My code probably not as good as what you have already. thanks!

subelsky commented Aug 11, 2014

I noticed that after I posted while researching another issue. If we could pass in an openable-IO object then yes it should work perfectly. My code probably not as good as what you have already. thanks!

@wallace

This comment has been minimized.

Show comment
Hide comment
@wallace

wallace Dec 27, 2014

+1 for removing concurrency. I was just trying to use webmock in some specs but unfortunately, webmock doesn't stub Curl::Multi, only Curl::Easy, so I can't mock out responses for feedjira. :(

wallace commented Dec 27, 2014

+1 for removing concurrency. I was just trying to use webmock in some specs but unfortunately, webmock doesn't stub Curl::Multi, only Curl::Easy, so I can't mock out responses for feedjira. :(

@jonallured

This comment has been minimized.

Show comment
Hide comment
@jonallured

jonallured Dec 28, 2014

Member

Well folks, I bit off more than I could chew with my plans for Feedjira 2.0 and ended up falling prey to the allure of a big bang rewrite. Rather than try to tackle everything all at once, I've changed my tactic and I'm just working on replacing curb with faraday - I've got this on a branch called 'faraday'.

While I was at it, I removed the concurrency stuff, the update code and simplified the Feedjira::Feed class quite a bit.

I'd love some feedback on this stuff, so please check out the branch!! :)

Member

jonallured commented Dec 28, 2014

Well folks, I bit off more than I could chew with my plans for Feedjira 2.0 and ended up falling prey to the allure of a big bang rewrite. Rather than try to tackle everything all at once, I've changed my tactic and I'm just working on replacing curb with faraday - I've got this on a branch called 'faraday'.

While I was at it, I removed the concurrency stuff, the update code and simplified the Feedjira::Feed class quite a bit.

I'd love some feedback on this stuff, so please check out the branch!! :)

@amatriain

This comment has been minimized.

Show comment
Hide comment
@amatriain

amatriain Jan 8, 2015

Contributor

I've tested my app using the 'faraday' branch of Feedjira. Tests are passing and after some manual testing I've found no issues.

I'm not fetching feeds with this gem though, I only use Feedjira for its parsing capabilities and do the fetching with an HTTP client gem, so take my feedback with a grain of salt. But at least I can confirm that feed parsing is not affected by the switch from curb to faraday.

Contributor

amatriain commented Jan 8, 2015

I've tested my app using the 'faraday' branch of Feedjira. Tests are passing and after some manual testing I've found no issues.

I'm not fetching feeds with this gem though, I only use Feedjira for its parsing capabilities and do the fetching with an HTTP client gem, so take my feedback with a grain of salt. But at least I can confirm that feed parsing is not affected by the switch from curb to faraday.

@jonallured

This comment has been minimized.

Show comment
Hide comment
@jonallured

jonallured Jan 25, 2015

Member

Thanks for the feedback @amatriain! If anyone else has anything for me on this - now is your chance. My plan is to pull the faraday branch into master this week and in the near future cut a 2.0 release.

Thanks!
Jon

Member

jonallured commented Jan 25, 2015

Thanks for the feedback @amatriain! If anyone else has anything for me on this - now is your chance. My plan is to pull the faraday branch into master this week and in the near future cut a 2.0 release.

Thanks!
Jon

@meliho

This comment has been minimized.

Show comment
Hide comment
@meliho

meliho Apr 19, 2015

What about quality of what gets parsed? Often times there are relative links that don't render properly. We don't get any styles with the content (which is a limitation of the feeds), but it can break how things are parsed.

If there are ways to improve the quality of the output, then I'm happy to volunteer to help with those pieces.

meliho commented Apr 19, 2015

What about quality of what gets parsed? Often times there are relative links that don't render properly. We don't get any styles with the content (which is a limitation of the feeds), but it can break how things are parsed.

If there are ways to improve the quality of the output, then I'm happy to volunteer to help with those pieces.

@jonallured

This comment has been minimized.

Show comment
Hide comment
@jonallured

jonallured Jun 5, 2015

Member

OMG, I just released version 2.0!!

http://feedjira.com/blog/2015/06/05/feedjira-two-point-oh.html

Thanks for everyone's input and help with testing!! ❤️ ❤️ ❤️

Member

jonallured commented Jun 5, 2015

OMG, I just released version 2.0!!

http://feedjira.com/blog/2015/06/05/feedjira-two-point-oh.html

Thanks for everyone's input and help with testing!! ❤️ ❤️ ❤️

@jonallured jonallured closed this Jun 5, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment