# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [1]:
import feedparser

### 1. Use feedparser to parse the following RSS feed URL.

In [2]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [3]:
oreilly = feedparser.parse(url)

### 2. Obtain a list of components (keys) that are available for this feed.

In [4]:
oreilly.keys()

dict_keys(['feed', 'entries', 'bozo', 'headers', 'etag', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [5]:
oreilly.feed.keys()

dict_keys(['title', 'title_detail', 'id', 'guidislink', 'link', 'updated', 'updated_parsed', 'subtitle', 'subtitle_detail', 'links', 'authors', 'author_detail', 'author', 'feedburner_info', 'geo_lat', 'geo_long', 'feedburner_emailserviceid', 'feedburner_feedburnerhostname'])

### 4. Extract and print the feed title, subtitle, author, and link.

In [6]:
print(oreilly.feed.title)
print(oreilly.feed.subtitle)
print(oreilly.feed.author)
print(oreilly.feed.link)

All - O'Reilly Media
All of our Ideas and Learning material from all of our topics.
O'Reilly Media
https://www.oreilly.com


### 5. Count the number of entries that are contained in this RSS feed.

In [7]:
len(oreilly.entries)

60

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [8]:
oreilly.entries[0].keys()

dict_keys(['title', 'title_detail', 'updated', 'updated_parsed', 'id', 'guidislink', 'link', 'content', 'summary', 'links', 'authors', 'author_detail', 'author', 'feedburner_origlink'])

### 7. Extract a list of entry titles.

In [9]:
titles = [oreilly.entries[i].title for i in range(len(oreilly.entries))]
titles

['Four short links: 7 January 2019',
 'Four short links: 4 January 2019',
 'In the age of AI, fundamental value resides in data',
 'Four short links: 3 January 2019',
 'Four short links: 2 January 2019',
 '250+ live online training courses opened for January, February, and March',
 'Four short links: 1 January 2019',
 'Four short links: 31 December 2018',
 'Four short links: 28 December 2018',
 'Four short links: 27 December 2018',
 'Four short links: 26 December 2018',
 'Four short links: 25 December 2018',
 'Four short links: 24 December 2018',
 'Four short links: 21 December 2018',
 'Trends in data, machine learning, and AI',
 'Four short links: 20 December 2018',
 'What is neural architecture search?',
 'Four short links: 19 December 2018',
 'Deep automation in machine learning',
 '10 top AWS resources on O’Reilly’s online learning platform',
 'Four short links: 18 December 2018',
 'Four short links: 17 December 2018',
 'Four short links: 14 December 2018',
 'Four short links: 13 D

### 8. Calculate the percentage of "Four short links" entry titles.

In [10]:
len([i for i in titles if 'Four short links' in i])/len(oreilly.entries)

0.75

### 9. Create a Pandas data frame from the feed's entries.

In [11]:
import pandas as pd

In [12]:
entries = pd.DataFrame(oreilly.entries)
entries

Unnamed: 0,author,author_detail,authors,content,feedburner_origlink,guidislink,id,link,links,summary,title,title_detail,updated,updated_parsed
0,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/four-short-links...,True,"tag:www.oreilly.com,2019-01-07:/ideas/four-sho...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,"<p><em>Named Tensors, Project Management Aphor...",Four short links: 7 January 2019,"{'type': 'text/plain', 'language': None, 'base...",2019-01-07T12:00:00Z,"(2019, 1, 7, 12, 0, 0, 0, 7, 0)"
1,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/four-short-links...,True,"tag:www.oreilly.com,2019-01-04:/ideas/four-sho...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,"<p><em>State of the World, NLP Toolkit, Fair A...",Four short links: 4 January 2019,"{'type': 'text/plain', 'language': None, 'base...",2019-01-04T11:15:00Z,"(2019, 1, 4, 11, 15, 0, 4, 4, 0)"
2,Ben Lorica,{'name': 'Ben Lorica'},[{'name': 'Ben Lorica'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/in-the-age-of-ai...,True,"tag:www.oreilly.com,2019-01-03:/ideas/in-the-a...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,<p><img src='https://d3ucjech6zwjp8.cloudfront...,"In the age of AI, fundamental value resides in...","{'type': 'text/plain', 'language': None, 'base...",2019-01-03T11:30:00Z,"(2019, 1, 3, 11, 30, 0, 3, 3, 0)"
3,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/four-short-links...,True,"tag:www.oreilly.com,2019-01-03:/ideas/four-sho...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,"<p><em>Raw Data, Learning Text Adventures, Alg...",Four short links: 3 January 2019,"{'type': 'text/plain', 'language': None, 'base...",2019-01-03T11:00:00Z,"(2019, 1, 3, 11, 0, 0, 3, 3, 0)"
4,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/four-short-links...,True,"tag:www.oreilly.com,2019-01-02:/ideas/four-sho...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,"<p><em>Robot Cafe, Surveillance Sci-Fi, Hardwa...",Four short links: 2 January 2019,"{'type': 'text/plain', 'language': None, 'base...",2019-01-02T11:35:00Z,"(2019, 1, 2, 11, 35, 0, 2, 2, 0)"
5,,,,"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/250-plus-live-on...,True,"tag:www.oreilly.com,2019-01-02:/ideas/250-plus...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,<p><img src='https://d3ucjech6zwjp8.cloudfront...,250+ live online training courses opened for J...,"{'type': 'text/plain', 'language': None, 'base...",2019-01-02T11:00:00Z,"(2019, 1, 2, 11, 0, 0, 2, 2, 0)"
6,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/four-short-links...,True,"tag:www.oreilly.com,2019-01-01:/ideas/four-sho...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,"<p><em>Amazon Tricks, Public Domain, Blocking ...",Four short links: 1 January 2019,"{'type': 'text/plain', 'language': None, 'base...",2019-01-01T15:40:00Z,"(2019, 1, 1, 15, 40, 0, 1, 1, 0)"
7,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/four-short-links...,True,"tag:www.oreilly.com,2018-12-31:/ideas/four-sho...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,"<p><em>Schema Crawler, Open Source Bug Bountie...",Four short links: 31 December 2018,"{'type': 'text/plain', 'language': None, 'base...",2018-12-31T12:55:00Z,"(2018, 12, 31, 12, 55, 0, 0, 365, 0)"
8,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/four-short-links...,True,"tag:www.oreilly.com,2018-12-28:/ideas/four-sho...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,"<p><em>Bayes Notes, Fake Internet, Tensorflow ...",Four short links: 28 December 2018,"{'type': 'text/plain', 'language': None, 'base...",2018-12-28T12:55:00Z,"(2018, 12, 28, 12, 55, 0, 4, 362, 0)"
9,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/four-short-links...,True,"tag:www.oreilly.com,2018-12-27:/ideas/four-sho...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,"<p><em>Reading Minds, Year Gotchas, LSTM Conve...",Four short links: 27 December 2018,"{'type': 'text/plain', 'language': None, 'base...",2018-12-27T12:45:00Z,"(2018, 12, 27, 12, 45, 0, 3, 361, 0)"


### 10. Count the number of entries per author and sort them in descending order.

In [13]:
authors = entries.groupby(['author'], as_index=False).agg({'title':'count'})
authors.columns = ['author', 'entries']
authors.sort_values('entries', ascending=False)

Unnamed: 0,author,entries
6,Nat Torkington,45
0,Ben Lorica,6
1,"Ben Lorica, Mike Loukides",1
2,Jake Kitchener,1
3,James Furbush,1
4,Jenn Webb,1
5,"Liam Li, Ameet Talwalkar",1


### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [14]:
entries['title_length'] = entries['title'].apply(len)
entries[['title', 'author', 'title_length']].sort_values('title_length', ascending=False)

Unnamed: 0,title,author,title_length
29,Tools for generating deep neural networks with...,Ben Lorica,78
5,250+ live online training courses opened for J...,,73
55,Lessons learned while helping enterprises adop...,Ben Lorica,64
33,Survey reveals the opportunities and realities...,,63
47,10 top Java resources on O’Reilly’s online lea...,,60
19,10 top AWS resources on O’Reilly’s online lear...,,59
2,"In the age of AI, fundamental value resides in...",Ben Lorica,51
31,Distributed systems: A quick and simple defini...,James Furbush,50
28,Assessing progress in automation technologies,Ben Lorica,45
43,Building tools for enterprise data science,Ben Lorica,42


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [15]:
list(entries[entries['summary'].str.contains('machine learning')]['title'])

['250+ live online training courses opened for January, February, and March',
 'Four short links: 28 December 2018',
 'Trends in data, machine learning, and AI',
 'Four short links: 20 December 2018',
 'What is neural architecture search?',
 'Deep automation in machine learning',
 'Assessing progress in automation technologies',
 'Tools for generating deep neural networks with efficient network architectures',
 'Four short links: 4 December 2018',
 'Four short links: 30 November 2018',
 'Building tools for enterprise data science',
 'Four short links: 20 November 2018',
 'Managing risk in machine learning',
 'Four short links: 12 November 2018',
 'Lessons learned while helping enterprises adopt machine learning']