Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
mmcdole committed Apr 14, 2016
1 parent a8765af commit 37f532b
Showing 1 changed file with 31 additions and 27 deletions.
58 changes: 31 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

[![Build Status](https://travis-ci.org/mmcdole/gofeed.svg?branch=master)](https://travis-ci.org/mmcdole/gofeed) [![Coverage Status](https://coveralls.io/repos/github/mmcdole/gofeed/badge.svg?branch=master)](https://coveralls.io/github/mmcdole/gofeed?branch=master) [![Go Report Card](https://goreportcard.com/badge/github.com/mmcdole/gofeed)](https://goreportcard.com/report/github.com/mmcdole/gofeed) [![](https://godoc.org/github.com/mmcdole/gofeed?status.svg)](http://godoc.org/github.com/mmcdole/gofeed) [![License](http://img.shields.io/:license-mit-blue.svg)](http://doge.mit-license.org)

The `gofeed` library is a robust feed parser that supports parsing both [RSS](https://en.wikipedia.org/wiki/RSS) and [Atom](https://en.wikipedia.org/wiki/Atom_(standard)) feeds. You can use the universal ```gofeed.Parser``` that will detect the feed type, parse the feed and then normalize either types of feeds into a hybrid ```gofeed.Feed``` representation. You also have the option of parsing them into their respective ```atom.Feed``` and ```rss.Feed``` representations using the feed specific ```atom.Parser``` or ```rss.Parser```.
The `gofeed` library is a robust feed parser that supports parsing both [RSS](https://en.wikipedia.org/wiki/RSS) and [Atom](https://en.wikipedia.org/wiki/Atom_(standard)) feeds. The universal ```gofeed.Parser``` will parse and convert all feed types into a hybrid ```gofeed.Feed``` model. You also have the option of parsing them into their respective ```atom.Feed``` and ```rss.Feed``` models using the feed specific ```atom.Parser``` or ```rss.Parser```.

##### Supported feed types:
* RSS 0.90
Expand Down Expand Up @@ -32,19 +32,19 @@ It also provides support for parsing several popular extension modules, includin

##### How does the universal feed parser work?

The universal `gofeed.Parser` works in 3 stages: detection, parsing and translating. It first detects the feed type that it is currently parsing. Then, it parses the feed into its true representation which will be either a `rss.Feed` or `atom.Feed` using their respective feed specific parsers. These models cover every field possible for their feed types. They are finally *translated* into a generic `gofeed.Feed` model that is a hybrid of both feed types. Most feed parsing libraries will parse and translate to a universal model in a single pass. However, by doing it in several stages it allows for more flexibility and keeps the code base more maintainable by seperating RSS and Atom parsing in to seperate packages.
The universal `gofeed.Parser` works in 3 stages: detection, parsing and translation. It first detects the feed type that it is currently parsing. Then, it uses a feed specific parser to parse the feed into its true representation which will be either a `rss.Feed` or `atom.Feed`. These models cover every field possible for their feed types. They are finally *translated* into a `gofeed.Feed` model that is a hybrid of both feed types. Most feed parsing libraries will parse a feed directly into a hybrid model in a single pass. However, by doing it in several stages it allows for more flexibility and keeps the code base more maintainable by seperating RSS and Atom parsing in to seperate packages.

![Diagram](https://raw.githubusercontent.com/mmcdole/gofeed/master/docs/sequence.png)

The translation step is done by any object which adheres to the `gofeed.Translator` interface. By default the `DefaultRSSTranslator` and `DefaultAtomTranslator` are used behind the scenes when you use `gofeed.Parser` with its default settings. You can see how they translate fields from ```atom.Feed``` or ```rss.Feed``` to the universal ```gofeed.Feed``` struct in the [Default Mappings](#default-mappings) section. However, should you disagree with the way certain fields are translated you can easily supply your own `gofeed.Translator` and override this behavior. See the [Advanced Usage](#advanced-usage) section for an example how to do this.
The translation step is done by anything which adheres to the `gofeed.Translator` interface. By default the `DefaultRSSTranslator` and `DefaultAtomTranslator` are used behind the scenes when you use `gofeed.Parser` with its default settings. You can see how they translate fields from ```atom.Feed``` or ```rss.Feed``` to the universal ```gofeed.Feed``` struct in the [Default Mappings](#default-mappings) section. However, should you disagree with the way certain fields are translated you can easily supply your own `gofeed.Translator` and override this behavior. See the [Advanced Usage](#advanced-usage) section for an example how to do this.

##### When would I want to use the feed specific parsers?

If the fields in the hybrid `gofeed.Feed` structs that the universal `gofeed.Parser` produces do not contain a field from the `atom.Feed` or `rss.Feed` structs that you require, it might be beneficial to use the feed type specific parsers, either `atom.Parser` or `rss.Parser`, so you can get access to all of their original fields. It is also marginally faster because you are able to skip the translation step.
If the fields in the hybrid `gofeed.Feed` model that the universal `gofeed.Parser` produces does not contain a field from the `atom.Feed` or `rss.Feed` model that you require, it might be beneficial to use the feed type specific parsers. When using the `atom.Parser` or `rss.Parser` directly, you can access all of fields found in the `atom.Feed` and `rss.Feed` models. It is also marginally faster because you are able to skip the translation step.

However, for the *vast* majority of users, the `gofeed.Parser` is the best way for them to parse feeds. This allows the user of `gofeed` library to not care about the differences between RSS or Atom feeds.
However, for the *vast* majority of users, the universal `gofeed.Parser` is the best way to parse feeds. This allows the user of `gofeed` library to not care about the differences between RSS or Atom feeds.

##### How are broken feeds handled?
##### How are invalid feeds handled?

A best-effort attempt is made at parsing broken and invalid XML feeds. Currently, `gofeed` can succesfully parse feeds with the following issues:
- Unescaped/Naked Markup in feed elements
Expand All @@ -58,7 +58,7 @@ A best-effort attempt is made at parsing broken and invalid XML feeds. Currentl

#### Universal Feed Parser

The most common usage scenario will be to use ```gofeed.Parser``` to parse an arbitrary RSS or Atom feed into the hybrid ```gofeed.Feed```. This is useful for when you don't know what feed type your feeds will be ahead of time. This hybrid struct has a lot of the common properties between the two formats (but does not have all the properties). See the [default mappings](#default-mappings) section for more details.
The most common usage scenario will be to use ```gofeed.Parser``` to parse an arbitrary RSS or Atom feed into the hybrid ```gofeed.Feed``` model. This is useful for when you don't know what feed type your feeds will be ahead of time. To see how Atom or RSS fields translate into the `gofeed.Feed` model see the [default mappings](#default-mappings) section for more details.

##### Parse a feed from an URL:

Expand Down Expand Up @@ -93,7 +93,7 @@ fmt.Println(feed.Title)

#### Feed Specific Parsers

If you know in advanced that you will be parsing an RSS or Atom feed it can sometimes be desirable to utilize the ```atom.Parser``` or the ```rss.Parser``` directly. Not only will they parse the feed more efficiently but they also expose all fields of their respective feed formats (some of which will be missing from the universal ```gofeed.Feed```).
If you know in advanced that you will be parsing an RSS or Atom feed, or if there is some specific field not covered by the hybrid `gofeed.Feed` struct, it can sometimes be desirable to utilize the ```atom.Parser``` or the ```rss.Parser``` directly. Not only will they parse the feed more efficiently but they also expose all fields of their respective feed formats by producing `atom.Feed` and `rss.Feed` models.

##### Parse a RSS feed into a `rss.Feed`

Expand Down Expand Up @@ -125,7 +125,9 @@ fmt.Println(atomFeed.Subtitle)

The mappings and precedence order that are outlined in the [Default Mappings](#default-mappings) section are provided by the following two structs: `DefaultRSSTranslator` and `DefaultAtomTranslator`. If you have fields that you think should have a different precedence, or if you want to make a translator that is aware of an unsupported extension you can do this by specifying your own RSS or Atom translator when using the `gofeed.Parser`.

Here is a simple example of creating a custom `Translator` that makes the `/rss/channel/itunes:author` extension field have a higher precedence than the `/rss/channel/managingEditor` field in RSS feeds. We will wrap the existing `DefaultRSSTranslator` since we only want to change the behavior for a single field.
Here is a simple example of creating a custom `Translator` that makes the `/rss/channel/itunes:author` field have a higher precedence than the `/rss/channel/managingEditor` field in RSS feeds. We will wrap the existing `DefaultRSSTranslator` since we only want to change the behavior for a single field.

First we must define a custom translator:

```go
type MyCustomTranslator struct {
Expand All @@ -135,8 +137,8 @@ type MyCustomTranslator struct {
func NewMyCustomTranslator() *MyCustomTranslator {
t := &MyCustomTranslator{}

// We create a DefaultRSSTranslator internally so we can wrap its call
// since we only want to modify the precedence for a single field.
// We create a DefaultRSSTranslator internally so we can wrap its Translate
// call since we only want to modify the precedence for a single field.
t.defaultTranslator = &DefaultRSSTranslator{}
return t
}
Expand All @@ -159,20 +161,22 @@ func (ct* MyCustomTranslator) Translate(feed interface{}) (*Feed, error) {
}
return f
}
```

Next you must configure your `gofeed.Parser` to utilize the new `gofeed.Translator`:

func main() {
feedData := `<rss version="2.0">
<channel>
<managingEditor>Ender Wiggin</managingEditor>
<itunes:author>Valentine Wiggin</itunes:author>
</channel>
</rss>`
```go
feedData := `<rss version="2.0">
<channel>
<managingEditor>Ender Wiggin</managingEditor>
<itunes:author>Valentine Wiggin</itunes:author>
</channel>
</rss>`

fp := gofeed.NewParser()
fp.RSSTrans = NewMyCustomTranslator()
feed, _ := fp.ParseString(feedData)
fmt.Println(feed.Author) // Valentine Wiggin
}
fp := gofeed.NewParser()
fp.RSSTrans = NewMyCustomTranslator()
feed, _ := fp.ParseString(feedData)
fmt.Println(feed.Author) // Valentine Wiggin
```

## Extensions
Expand All @@ -186,7 +190,7 @@ In addition to the generic handling of extensions, `gofeed` also has built in su
The ```DefaultRSSTranslator``` and the ```DefaultAtomTranslator``` map the following ```rss.Feed``` and ```atom.Feed``` fields to their respective ```gofeed.Feed``` fields. They are listed in order of precedence (highest to lowest):


Feed | RSS | Atom
`gofeed.Feed` | RSS | Atom
--- | --- | ---
Title | /rss/channel/title<br>/rdf:RDF/channel/title<br>/rss/channel/dc:title<br>/rdf:RDF/channel/dc:title | /feed/title
Description | /rss/channel/description<br>/rdf:RDF/channel/description<br>/rss/channel/itunes:subtitle | /feed/subtitle<br>/feed/tagline
Expand All @@ -202,7 +206,7 @@ Generator | /rss/channel/generator | /feed/generator
Categories | /rss/channel/category<br>/rss/channel/itunes:category<br>/rss/channel/itunes:keywords<br>/rss/channel/dc:subject<br>/rdf:RDF/channel/dc:subject | /feed/category


Item | RSS | Atom
`gofeed.Item` | RSS | Atom
--- | --- | ---
Title | /rss/channel/item/title<br>/rdf:RDF/item/title<br>/rdf:RDF/item/dc:title<br>/rss/channel/item/dc:title | /feed/entry/title
Description | /rss/channel/item/description<br>/rdf:RDF/item/description<br>/rss/channel/item/dc:description<br>/rdf:RDF/item/dc:description | /feed/entry/summary
Expand All @@ -228,7 +232,7 @@ This project is licensed under the [MIT License](https://raw.githubusercontent.c

## Credits

* [Mark Pilgrim](https://en.wikipedia.org/wiki/Mark_Pilgrim) for his work on the excellent [Universal Feed Parser](https://github.com/kurtmckee/feedparser) Python library. This library was referenced several times during the development of `gofeed`. It's unit test cases were also ported to `gofeed` project as well.
* [Dan MacTough](http://blog.mact.me) for his work on [node-feedparser](https://github.com/danmactough/node-feedparser). It provided inspiration for the `gofeed.Feed` properties.
* [Mark Pilgrim](https://en.wikipedia.org/wiki/Mark_Pilgrim) for his work on the excellent [Universal Feed Parser](https://github.com/kurtmckee/feedparser) Python library. This library was referenced several times during the development of `gofeed`. Many of its unit test cases were also ported to the `gofeed` project as well.
* [Dan MacTough](http://blog.mact.me) for his work on [node-feedparser](https://github.com/danmactough/node-feedparser). It provided inspiration for the set of fields that should be covered in the hybrid `gofeed.Feed` model.
* [Matt Jibson](https://mattjibson.com/) for his date parsing function in the [goread](https://github.com/mjibson/goread) project.
* [Jim Teeuwen](https://github.com/jteeuwen) for his method of representing arbitrary feed extensions in the [go-pkg-rss](https://github.com/jteeuwen/go-pkg-rss) library.

0 comments on commit 37f532b

Please sign in to comment.