Skip to content

Latest commit

 

History

History
81 lines (65 loc) · 2.71 KB

common-rss-elements.rst

File metadata and controls

81 lines (65 loc) · 2.71 KB

Common RSS (Rich Site Summary) Elements

The most commonly used elements in RSS (Rich Site Summary) feeds (regardless of version) are title, link, description, publication date, and entry ID. The publication date comes from the pubDate element, and the entry ID comes from the guid element.

This sample RSS (Rich Site Summary) feed is at $READTHEDOCS_CANONICAL_URL/examples/rss20.xml.

xml

<?xml version="1.0" encoding="utf-8"?> <rss version="2.0"> <channel> <title>Sample Feed</title> <description>For documentation &lt;em&gt;only&lt;/em&gt;</description> <link>http://example.org/</link> <pubDate>Sat, 07 Sep 2002 00:00:01 GMT</pubDate> <!-- other elements omitted from this example --> <item> <title>First entry title</title> <link>http://example.org/entry/3</link> <description>Watch out for &lt;span style="background-image: url(javascript:window.location='http://example.org/')"&gt;nasty tricks&lt;/span&gt;</description> <pubDate>Thu, 05 Sep 2002 00:00:01 GMT</pubDate> <guid>http://example.org/entry/3</guid> <!-- other elements omitted from this example --> </item> </channel> </rss>

The channel elements are available in d.feed.

Accessing Common Channel Elements

>>> import feedparser
>>> d = feedparser.parse('$READTHEDOCS_CANONICAL_URL/examples/rss20.xml')
>>> d.feed.title
'Sample Feed'
>>> d.feed.link
'http://example.org/'
>>> d.feed.description
'For documentation <em>only</em>'
>>> d.feed.published
'Sat, 07 Sep 2002 00:00:01 GMT'
>>> d.feed.published_parsed
(2002, 9, 7, 0, 0, 1, 5, 250, 0)

The items are available in d.entries, which is a list. You access items in the list in the same order in which they appear in the original feed, so the first item is available in d.entries[0].

Accessing Common Item Elements

>>> import feedparser
>>> d = feedparser.parse('$READTHEDOCS_CANONICAL_URL/examples/rss20.xml')
>>> d.entries[0].title
'First item title'
>>> d.entries[0].link
'http://example.org/item/1'
>>> d.entries[0].description
'Watch out for <span>nasty tricks</span>'
>>> d.entries[0].published
'Thu, 05 Sep 2002 00:00:01 GMT'
>>> d.entries[0].published_parsed
(2002, 9, 5, 0, 0, 1, 3, 248, 0)
>>> d.entries[0].id
'http://example.org/guid/1'

Tip

You can also access data from RSS (Rich Site Summary) feeds using Atom terminology. See advanced.normalization for details.