## RSS

You can find RSS feeds on many different sites. [Library of Congress](https://www.loc.gov/rss/) has a lot. Most blogs and news web sites have them, for example [Tech Crunch](https://techcrunch.com/rssfeeds/), [New York Times](http://www.nytimes.com/services/xml/rss/index.html), and [NPR](https://help.npr.org/customer/portal/articles/2094175-where-can-i-find-npr-rss-feeds-). The [DC Public Library](http://www.dclibrary.org/) even gives you an RSS feed of your [catalog searches](https://catalog.dclibrary.org/client/rss/hitlist/dcpl/qu=python).

Today we are going to take a look at the [Netflix Top 100 DVDs](https://dvd.netflix.com/RSSFeeds). We will use the Python package [FeedParser](https://pypi.python.org/pypi/feedparser) to work with the RSS feed. FeedParser will allow us to deconstruct the data in the feed.

In [1]:
import feedparser
import pandas as pd

In [2]:
RSS_URL = "http://dvd.netflix.com/Top100RSS"

In [3]:
feed = feedparser.parse(RSS_URL)

In [4]:
type(feed)

feedparser.FeedParserDict

"parse" is the primary function in FeedParser. The returned object is dictionary like and can be handled similarly. For example, we can look at the keys it contains.

In [14]:
feed.feed.keys()

dict_keys(['title', 'subtitle_detail', 'ttl', 'link', 'subtitle', 'language', 'title_detail', 'links', 'cf_treatas'])

In [16]:
type(feed.bozo)

int

In [5]:
feed.keys()

dict_keys(['namespaces', 'version', 'entries', 'href', 'bozo', 'encoding', 'status', 'feed', 'headers'])

In [6]:
feed.version

'rss20'

We will look at some, but not all, of the data stored in the feed. For more information about the keys, see the [documentation](http://pythonhosted.org/feedparser/).

We can use the version to check which type of feed we have.

Bozo is an interesing key to know about if you are going to operationalize RSS feed ingestion. FeedParser sets the bozo bit when it detects a feed is not well-formed. (FeedParser will still parse the feed if it is not well-formed.) 

In [7]:
if feed.bozo == 0:
    print("Well done, you have a well-formed feed!")
else:
    print("Potential trouble ahead.")

Well done, you have a well-formed feed!


We can look at some of the feed elements through the feed attribute.

In [9]:
print(feed.feed.title)
print(feed.feed.link)
print(feed.feed.description)

Netflix Top 100
http://dvd.netflix.com
Top 100 Netflix movies, published every 2 weeks.


Published date is another thing we can look at, but not all feeds use this. 

In [11]:
feed.feed.published

AttributeError: object has no attribute 'published'

Like with dictionaries, we can use the "get" method to see if a key exists. This is useful if we are writing code.

In [17]:
feed.get('published', 'N/A')

'N/A'

The items we are looking for are contained in the entries. Given the feed we are working with, how many entries do you think we have?

In [18]:
len(feed.entries)

100

The items in entries are stored as a list.

In [19]:
feed.entries[0].title

'Money Monster'

In [20]:
i = 0
for entry in feed.entries:
    print(i, feed.entries[i].title)
    i += 1

0 Money Monster
1 Game of Thrones
2 Sully
3 Now You See Me 2
4 Ghostbusters
5 Star Trek Beyond
6 Captain America: Civil War
7 The Legend of Tarzan
8 The Jungle Book (2016)
9 Free State of Jones
10 The Nice Guys
11 The Magnificent Seven
12 Jason Bourne
13 Hell or High Water
14 Central Intelligence
15 Batman v Superman: Dawn of Justice
16 X-Men: Apocalypse
17 The Huntsman: Winter's War
18 Independence Day: Resurgence
19 Whiskey Tango Foxtrot
20 Suicide Squad
21 Bad Moms
22 Me Before You
23 Deepwater Horizon
24 A Hologram for the King
25 The Infiltrator
26 The Shallows
27 The Boss
28 Eye in the Sky
29 Outlander
30 Mother's Day
31 The Accountant
32 Homeland
33 The Secret Life of Pets
34 Criminal
35 Finding Dory
36 Mechanic: Resurrection
37 The Finest Hours
38 The Meddler
39 London Has Fallen
40 Florence Foster Jenkins
41 Joy
42 Inferno
43 Miracles from Heaven
44 The Man Who Knew Infinity
45 War Dogs
46 Hello, My Name Is Doris
47 The Revenant
48 Neighbors 2: Sorority Rising
49 My Big Fat Gr

Given that information, what is something we can do with this data? Why not make it a dataframe?

In [21]:
df = pd.DataFrame(feed.entries)

In [23]:
feed.entries[0]

{'guidislink': False,
 'id': 'https://dvd.netflix.com/Movie/Money-Monster/80084089',
 'link': 'https://dvd.netflix.com/Movie/Money-Monster/80084089',
 'links': [{'href': 'https://dvd.netflix.com/Movie/Money-Monster/80084089',
   'rel': 'alternate',
   'type': 'text/html'}],
 'summary': '<a href="https://dvd.netflix.com/Movie/Money-Monster/80084089"><img src="//secure.netflix.com/us/boxshots/small/80084089.jpg"/></a><br>Landing in dire financial straits after following a stock tip from bombastic TV persona Lee Gates, fuming Kyle Budwell takes the lout hostage on live television and threatens to kill him unless he turns the stock price around before the closing bell.',
 'summary_detail': {'base': 'http://dvd.netflix.com/Top100RSS',
  'language': None,
  'type': 'text/html',
  'value': '<a href="https://dvd.netflix.com/Movie/Money-Monster/80084089"><img src="//secure.netflix.com/us/boxshots/small/80084089.jpg"/></a><br>Landing in dire financial straits after following a stock tip from bom

In [22]:
df

Unnamed: 0,guidislink,id,link,links,summary,summary_detail,title,title_detail
0,False,https://dvd.netflix.com/Movie/Money-Monster/80...,https://dvd.netflix.com/Movie/Money-Monster/80...,[{'href': 'https://dvd.netflix.com/Movie/Money...,"<a href=""https://dvd.netflix.com/Movie/Money-M...","{'language': None, 'value': '<a href=""https://...",Money Monster,"{'language': None, 'value': 'Money Monster', '..."
1,False,https://dvd.netflix.com/Movie/Game-of-Thrones/...,https://dvd.netflix.com/Movie/Game-of-Thrones/...,[{'href': 'https://dvd.netflix.com/Movie/Game-...,"<a href=""https://dvd.netflix.com/Movie/Game-of...","{'language': None, 'value': '<a href=""https://...",Game of Thrones,"{'language': None, 'value': 'Game of Thrones',..."
2,False,https://dvd.netflix.com/Movie/Sully/80103102,https://dvd.netflix.com/Movie/Sully/80103102,[{'href': 'https://dvd.netflix.com/Movie/Sully...,"<a href=""https://dvd.netflix.com/Movie/Sully/8...","{'language': None, 'value': '<a href=""https://...",Sully,"{'language': None, 'value': 'Sully', 'type': '..."
3,False,https://dvd.netflix.com/Movie/Now-You-See-Me-2...,https://dvd.netflix.com/Movie/Now-You-See-Me-2...,[{'href': 'https://dvd.netflix.com/Movie/Now-Y...,"<a href=""https://dvd.netflix.com/Movie/Now-You...","{'language': None, 'value': '<a href=""https://...",Now You See Me 2,"{'language': None, 'value': 'Now You See Me 2'..."
4,False,https://dvd.netflix.com/Movie/Ghostbusters/800...,https://dvd.netflix.com/Movie/Ghostbusters/800...,[{'href': 'https://dvd.netflix.com/Movie/Ghost...,"<a href=""https://dvd.netflix.com/Movie/Ghostbu...","{'language': None, 'value': '<a href=""https://...",Ghostbusters,"{'language': None, 'value': 'Ghostbusters', 't..."
5,False,https://dvd.netflix.com/Movie/Star-Trek-Beyond...,https://dvd.netflix.com/Movie/Star-Trek-Beyond...,[{'href': 'https://dvd.netflix.com/Movie/Star-...,"<a href=""https://dvd.netflix.com/Movie/Star-Tr...","{'language': None, 'value': '<a href=""https://...",Star Trek Beyond,"{'language': None, 'value': 'Star Trek Beyond'..."
6,False,https://dvd.netflix.com/Movie/Captain-America-...,https://dvd.netflix.com/Movie/Captain-America-...,[{'href': 'https://dvd.netflix.com/Movie/Capta...,"<a href=""https://dvd.netflix.com/Movie/Captain...","{'language': None, 'value': '<a href=""https://...",Captain America: Civil War,"{'language': None, 'value': 'Captain America: ..."
7,False,https://dvd.netflix.com/Movie/The-Legend-of-Ta...,https://dvd.netflix.com/Movie/The-Legend-of-Ta...,[{'href': 'https://dvd.netflix.com/Movie/The-L...,"<a href=""https://dvd.netflix.com/Movie/The-Leg...","{'language': None, 'value': '<a href=""https://...",The Legend of Tarzan,"{'language': None, 'value': 'The Legend of Tar..."
8,False,https://dvd.netflix.com/Movie/The-Jungle-Book-...,https://dvd.netflix.com/Movie/The-Jungle-Book-...,[{'href': 'https://dvd.netflix.com/Movie/The-J...,"<a href=""https://dvd.netflix.com/Movie/The-Jun...","{'language': None, 'value': '<a href=""https://...",The Jungle Book (2016),"{'language': None, 'value': 'The Jungle Book (..."
9,False,https://dvd.netflix.com/Movie/Free-State-of-Jo...,https://dvd.netflix.com/Movie/Free-State-of-Jo...,[{'href': 'https://dvd.netflix.com/Movie/Free-...,"<a href=""https://dvd.netflix.com/Movie/Free-St...","{'language': None, 'value': '<a href=""https://...",Free State of Jones,"{'language': None, 'value': 'Free State of Jon..."


In [24]:
df.summary[0]

'<a href="https://dvd.netflix.com/Movie/Money-Monster/80084089"><img src="//secure.netflix.com/us/boxshots/small/80084089.jpg"/></a><br>Landing in dire financial straits after following a stock tip from bombastic TV persona Lee Gates, fuming Kyle Budwell takes the lout hostage on live television and threatens to kill him unless he turns the stock price around before the closing bell.'