## RSS

You can find RSS feeds on many different sites. [Library of Congress](https://www.loc.gov/rss/) has a lot. Most blogs and news web sites have them, for example [Tech Crunch](https://techcrunch.com/rssfeeds/), [New York Times](http://www.nytimes.com/services/xml/rss/index.html), and [NPR](https://help.npr.org/customer/portal/articles/2094175-where-can-i-find-npr-rss-feeds-). The [DC Public Library](http://www.dclibrary.org/) even gives you an RSS feed of your [catalog searches](https://catalog.dclibrary.org/client/rss/hitlist/dcpl/qu=python). iTunes delivers podcasts by [aggregating RSS feeds](http://itunespartner.apple.com/en/podcasts/faq) from content creators. 

Today we are going to take a look at the [Netflix Top 100 DVDs](https://dvd.netflix.com/RSSFeeds). We will use the Python package [FeedParser](https://pypi.python.org/pypi/feedparser) to work with the RSS feed. FeedParser will allow us to deconstruct the data in the feed.

In [7]:
import feedparser
import pandas as pd

In [17]:
RSS_URL = "https://usa.newonnetflix.info/feed"

In [18]:
feed = feedparser.parse(RSS_URL)

In [19]:
type(feed)

feedparser.util.FeedParserDict

"parse" is the primary function in FeedParser. The returned object is dictionary like and can be handled similarly to a dictionary. For example, we can look at the keys it contains and what type of items those keys are.

In [20]:
feed.keys()

dict_keys(['bozo', 'entries', 'feed', 'headers', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

In [21]:
type(feed.bozo)

bool

In [13]:
type(feed.feed)

feedparser.util.FeedParserDict

We will look at some, but not all, of the data stored in the feed. For more information about the keys, see the [documentation](http://pythonhosted.org/feedparser/).

We can use the version to check which type of feed we have.

In [22]:
feed.version

'rss20'

Bozo is an interesing key to know about if you are going to parse a RSS feed in code. FeedParser sets the bozo bit when it detects a feed is not well-formed. (FeedParser will still parse the feed if it is not well-formed.) You can use the bozo bit to create error handling or just print a simple warning.

In [23]:
if feed.bozo == 0:
    print("Well done, you have a well-formed feed!")
else:
    print("Potential trouble ahead.")

Well done, you have a well-formed feed!


We can look at some of the feed elements through the feed attribute.

In [24]:
feed.feed.keys()

dict_keys(['webfeeds_analytics', 'title', 'title_detail', 'links', 'link', 'subtitle', 'subtitle_detail', 'language', 'published', 'published_parsed', 'updated', 'updated_parsed', 'authors', 'author', 'author_detail', 'publisher', 'publisher_detail'])

In [25]:
print(feed.feed.title)
print(feed.feed.link)
print(feed.feed.description)

New On Netflix USA
https://usa.newonnetflix.info
RSS feed for new additions over the last 5 days to Netflix USA (100% unofficial!). A project by MaFt.co.uk


The [reference section](http://pythonhosted.org/feedparser/reference.html) of the feedparser documenation shows us all the inforamtion thatcan be in a feed. [Annotated Examples](http://pythonhosted.org/feedparser/annotated-examples.html) are also provided. But note the caution provided-

"Caution: Even though many of these elements are required according to the specification, real-world feeds may be missing any element. If an element is not present in the feed, it will not be present in the parsed results. You should not rely on any particular element being present."

For example, our feed is RSS 2.0. One of the elements available in this version is the published date.

In [26]:
feed.feed.published

'Sun, 05 Jun 2022 22:07:08 -0400'

We can see from our error, our feed is not using 'published'.

As with [standard python dictionaries](https://docs.python.org/3.5/library/stdtypes.html#dict), we can use the "get" method to see if a key exists. This is useful if we are writing code.

In [27]:
feed.feed.get('published', 'N/A')

'Sun, 05 Jun 2022 22:07:08 -0400'

The data we are looking for are contained in the entries. Given the feed we are working with, how many entries do you think we have?

In [28]:
len(feed.entries)

62

The items in entries are stored as a list.

In [29]:
type(feed.entries)

list

In [30]:
feed.entries[0].title

'6th Jun: Bill Burr Presents: Friends Who Kill (2022), 1hr 13m [TV-MA] (6/10)'

In [31]:
i = 0
for entry in feed.entries:
    print(i, feed.entries[i].title)
    i += 1

0 6th Jun: Bill Burr Presents: Friends Who Kill (2022), 1hr 13m [TV-MA] (6/10)
1 6th Jun: Action Pack (2022), 2 Seasons [TV-Y] - New Episodes (6.25/10)
2 5th Jun: Straight Up (2020), 1hr 36m [TV-MA] - Streaming Again (6.4/10)
3 4th Jun: Ammar (2020), 1hr 23m [TV-MA] (6/10)
4 3rd Jun: Change Days (2022), 1 Season [TV-PG] (6/10)
5 3rd Jun: As the Crow Flies (2022), 1 Season [TV-MA] (6/10)
6 3rd Jun: Mr. Good: Cop or Crook? (2022), 1 Season [TV-14] (6/10)
7 3rd Jun: The Perfect Mother (2022), 1 Season [TV-MA] (6/10)
8 3rd Jun: Surviving Summer (2022), 1 Season [TV-PG] (6/10)
9 3rd Jun: Two Summers (2022), 1 Season [TV-MA] (6.55/10)
10 3rd Jun: Floor Is Lava (2022), 2 Seasons [TV-PG] - New Episodes (5.65/10)
11 3rd Jun: Interceptor (2022), 1hr 38m [TV-MA] (6/10)
12 2nd Jun: #ABtalks (2022), 2 Seasons [TV-PG] - New Episodes (6.9/10)
13 2nd Jun: Yuri Marçal: Honest Mistake (2022), 53m [TV-MA] (6/10)
14 2nd Jun: Jana 2022 (Kannada) (2022), 2hr 41m [TV-14] (7.3/10)
15 2nd Jun: Jana 2022 (Tamil

Given that information, what is something we can do with this data? Why not make it a dataframe?

In [32]:
df = pd.DataFrame(feed.entries)

In [33]:
df.head()

Unnamed: 0,title,title_detail,links,link,summary,summary_detail,published,published_parsed,id,guidislink
0,6th Jun: Bill Burr Presents: Friends Who Kill ...,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://usa.newonnetflix.info/info/81222748,"In a night of killer comedy, Bill Burr hosts a...","{'type': 'text/html', 'language': None, 'base'...","Sun, 05 Jun 2022 22:07:08 -0400","(2022, 6, 6, 2, 7, 8, 0, 157, 0)",https://usa.newonnetflix.info/info/81222748,False
1,"6th Jun: Action Pack (2022), 2 Seasons [TV-Y] ...","{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://usa.newonnetflix.info/info/80993597,"[New Episodes] With hearts, smarts and superpo...","{'type': 'text/html', 'language': None, 'base'...","Sun, 05 Jun 2022 21:01:17 -0400","(2022, 6, 6, 1, 1, 17, 0, 157, 0)",https://usa.newonnetflix.info/info/80993597,False
2,"5th Jun: Straight Up (2020), 1hr 36m [TV-MA] -...","{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://usa.newonnetflix.info/info/81229555,[Streaming Again] When a gay brainiac with OCD...,"{'type': 'text/html', 'language': None, 'base'...","Sun, 05 Jun 2022 01:07:23 -0400","(2022, 6, 5, 5, 7, 23, 6, 156, 0)",https://usa.newonnetflix.info/info/81229555,False
3,"4th Jun: Ammar (2020), 1hr 23m [TV-MA] (6/10)","{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://usa.newonnetflix.info/info/81551208,"When a family moves into an old castle, excite...","{'type': 'text/html', 'language': None, 'base'...","Sat, 04 Jun 2022 01:07:07 -0400","(2022, 6, 4, 5, 7, 7, 5, 155, 0)",https://usa.newonnetflix.info/info/81551208,False
4,"3rd Jun: Change Days (2022), 1 Season [TV-PG] ...","{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://usa.newonnetflix.info/info/81474612,"At a romantic getaway, real-life couples on th...","{'type': 'text/html', 'language': None, 'base'...","Fri, 03 Jun 2022 01:07:13 -0400","(2022, 6, 3, 5, 7, 13, 4, 154, 0)",https://usa.newonnetflix.info/info/81474612,False


Challenge: write code to create a dataframe of the top 10 movies from the Netflix Top 100 DVDs and iTunes. Check to see if your feed is well formed. Compile the name of the feed as the souce, the published date, the movie ranking in the list, the movie title, a link to the movie, and the summary. If the published date does not exist in the feed, use the current date. Save your dataframe as a csv. Here is a link to one [possible solution](./rss_challenge.py).