## RSS

You can find RSS feeds on many different sites. [Library of Congress](https://www.loc.gov/rss/) has a lot. Most blogs and news web sites have them, for example [Tech Crunch](https://techcrunch.com/rssfeeds/), [New York Times](http://www.nytimes.com/services/xml/rss/index.html), and [NPR](https://help.npr.org/customer/portal/articles/2094175-where-can-i-find-npr-rss-feeds-). The [DC Public Library](http://www.dclibrary.org/) even gives you an RSS feed of your [catalog searches](https://catalog.dclibrary.org/client/rss/hitlist/dcpl/qu=python).

Today we are going to take a look at the [Netflix Top 100 DVDs](https://dvd.netflix.com/RSSFeeds). We will use the Python package [FeedParser](https://pypi.python.org/pypi/feedparser) to work with the RSS feed. FeedParser will allow us to deconstruct the data in the feed.

In [None]:
import feedparser
import pandas as pd

In [None]:
RSS_URL = "http://dvd.netflix.com/Top100RSS"

In [None]:
feed = feedparser.parse(RSS_URL)

In [None]:
type(feed)

"parse" is the primary function in FeedParser. The returned object is dictionary like and can be handled similarly. For example, we can look at the keys it contains.

In [None]:
feed.keys()

We will look at some, but not all, of the data stored in the feed. For more information about the keys, see the [documentation](http://pythonhosted.org/feedparser/).

We can use the version to check which type of feed we have.

In [None]:
feed.version

Bozo is an interesing key to know about if you are going to operationalize RSS feed ingestion. FeedParser sets the bozo bit when it detects a feed is not well-formed. (FeedParser will still parse the feed if it is not well-formed.) 

In [None]:
if feed.bozo == 0:
    print("Well done, you have a well-formed feed!")
else:
    print("Potential trouble ahead.")

We can look at some of the feed elements through the feed attribute.

In [None]:
print(feed.feed.title)
print(feed.feed.link)
print(feed.feed.description)

Published date is another thing we can look at, but not all feeds use this. 

In [None]:
feed.feed.published

Like with dictionaries, we can use the "get" method to see if a key exists. This is useful if we are writing code.

In [None]:
feed.get('published', 'N/A')

The items we are looking for are contained in the entries. Given the feed we are working with, how many entries do you think we have?

In [None]:
len(feed.entries)

The items in entries are stored as a list.

In [None]:
feed.entries[0].title

In [None]:
i = 0
for entry in feed.entries:
    print(i, feed.entries[i].title)
    i += 1

Given that information, what is something we can do with this data? Why not make it a dataframe?

In [None]:
df = pd.DataFrame(feed.entries)

In [None]:
df

In [None]:
df.summary[0]