# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [1]:
import feedparser

### 1. Use feedparser to parse the following RSS feed URL.

In [2]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [3]:
oreilly = feedparser.parse(url)

In [4]:
oreilly.feed.title

'Radar'

### 2. Obtain a list of components (keys) that are available for this feed.

In [5]:
llaves =  list(oreilly.keys())
llaves

['feed',
 'entries',
 'bozo',
 'headers',
 'etag',
 'updated',
 'updated_parsed',
 'href',
 'status',
 'encoding',
 'version',
 'namespaces']

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [6]:
feed_llaves = list(oreilly['feed'].keys())
feed_llaves

['title',
 'title_detail',
 'links',
 'link',
 'subtitle',
 'subtitle_detail',
 'updated',
 'updated_parsed',
 'language',
 'sy_updateperiod',
 'sy_updatefrequency',
 'generator_detail',
 'generator',
 'feedburner_info',
 'geo_lat',
 'geo_long',
 'feedburner_emailserviceid',
 'feedburner_feedburnerhostname']

### 4. Extract and print the feed title, subtitle, author, and link.

In [7]:
feed_title = oreilly.feed.title
feed_subtitle = oreilly.feed.subtitle
authors = list(set([a.author for a in oreilly.entries]))
links  = list(set([l.link for l in oreilly.entries]))

print(feed_title+ "| "+ feed_subtitle+"\n")
print(authors)
print("\n")
print(links)

Radar| Now, next, and beyond: Tracking need-to-know trends at the intersection of business and technology

['Martin Fowler', 'Nat Torkington', 'Cynthia Owens', 'Mike Loukides', 'Tim O’Reilly', 'Roger Magoulas and Steve Swoyer', 'Pamela Rucker', 'Hugo Bowne-Anderson', 'Rita J. King', 'Jenn Webb', 'Mac Slocum', 'Kai Holnes', 'Rachel Laycock and Neal Ford', 'Peter Skomoroch and Mike Loukides', 'Mark Richards', 'Mary Poppendieck', 'George Fairbanks']


['http://feedproxy.google.com/~r/oreilly/radar/atom/~3/iZyrc89ytVY/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/qlfo450nz88/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/cQpks9Bw11E/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/GGLbZECmPJI/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/51j23cc8IHc/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/Uz6cibmqT1I/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/UNR8GT2bHms/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/FQ2S_dHbH

### 5. Count the number of entries that are contained in this RSS feed.

In [8]:
len(oreilly.entries)

60

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [9]:
list(oreilly.entries[0].keys())

['title',
 'title_detail',
 'links',
 'link',
 'comments',
 'published',
 'published_parsed',
 'authors',
 'author',
 'author_detail',
 'tags',
 'id',
 'guidislink',
 'summary',
 'summary_detail',
 'content',
 'wfw_commentrss',
 'slash_comments',
 'feedburner_origlink']

### 7. Extract a list of entry titles.

In [10]:
entries_titles =  list(set([t.title for t in oreilly.entries]))
print(entries_titles)

['Four short links: 3 March 2020', 'Four short links: 13 February 2020', 'Four short links: 2 April 2020', 'Four short links: 20 February 2020', 'Four short links: 19 February 2020', 'Four short links: 18 February 2020', 'Four short links: 22 February 2020', 'Four short links: 7 April 2020', 'Great leaders inspire innovation and creativity from within their workforces', 'Four short links: 31 March 2020', 'The state of data quality in 2020', 'Four short links: 17 February 2020', 'Four short links: 23 March 2020', 'Four short links: 3 April 2020', 'Governance and Discovery', 'Four short links: 16 March 2020', 'Radar trends to watch: March 2020', 'The unreasonable importance of data preparation', 'Four short links: 14 February 2020', '5 key areas for tech leaders to watch in 2020', 'Four short links: 6 April 2020', 'What you need to know about product management for AI', 'Four short links: 17 March 2020', 'The death of Agile?', 'An enterprise vision is your company’s North Star', 'Four sh

### 8. Calculate the percentage of "Four short links" entry titles.

In [11]:
percentage = 100*len(set([t.title for t in oreilly.entries if t.title.startswith("Four short links")]))/(len(entries_titles))
percentage

60.0

### 9. Create a Pandas data frame from the feed's entries.

In [12]:
import pandas as pd

In [13]:
df = pd.DataFrame(oreilly.entries)
df.head()

Unnamed: 0,author,author_detail,authors,comments,content,feedburner_origlink,guidislink,id,link,links,published,published_parsed,slash_comments,summary,summary_detail,tags,title,title_detail,wfw_commentrss
0,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],https://www.oreilly.com/radar/four-short-links...,"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,False,https://www.oreilly.com/radar/?p=12611,http://feedproxy.google.com/~r/oreilly/radar/a...,"[{'rel': 'alternate', 'type': 'text/html', 'hr...","Wed, 08 Apr 2020 11:48:28 +0000","(2020, 4, 8, 11, 48, 28, 2, 99, 0)",0,System Design for Advanced Beginners &#8212; a...,"{'type': 'text/html', 'language': None, 'base'...","[{'term': 'Four Short Links', 'scheme': None, ...",Four short links: 8 April 2020,"{'type': 'text/plain', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...
1,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],https://www.oreilly.com/radar/four-short-links...,"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,False,https://www.oreilly.com/radar/?p=12606,http://feedproxy.google.com/~r/oreilly/radar/a...,"[{'rel': 'alternate', 'type': 'text/html', 'hr...","Tue, 07 Apr 2020 11:45:13 +0000","(2020, 4, 7, 11, 45, 13, 1, 98, 0)",0,locust &#8212; open source load testing tool: ...,"{'type': 'text/html', 'language': None, 'base'...","[{'term': 'Four Short Links', 'scheme': None, ...",Four short links: 7 April 2020,"{'type': 'text/plain', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...
2,Mike Loukides,{'name': 'Mike Loukides'},[{'name': 'Mike Loukides'}],https://www.oreilly.com/radar/governance-and-d...,"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/governance-and-d...,False,https://www.oreilly.com/radar/?p=12594,http://feedproxy.google.com/~r/oreilly/radar/a...,"[{'rel': 'alternate', 'type': 'text/html', 'hr...","Mon, 06 Apr 2020 19:09:29 +0000","(2020, 4, 6, 19, 9, 29, 0, 97, 0)",0,Data Governance sounds like a candidate for th...,"{'type': 'text/html', 'language': None, 'base'...","[{'term': 'Radar Column', 'scheme': None, 'lab...",Governance and Discovery,"{'type': 'text/plain', 'language': None, 'base...",https://www.oreilly.com/radar/governance-and-d...
3,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],https://www.oreilly.com/radar/four-short-links...,"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,False,https://www.oreilly.com/radar/?p=12590,http://feedproxy.google.com/~r/oreilly/radar/a...,"[{'rel': 'alternate', 'type': 'text/html', 'hr...","Mon, 06 Apr 2020 11:53:01 +0000","(2020, 4, 6, 11, 53, 1, 0, 97, 0)",0,Rufus &#8212; Create bootable USB drives the e...,"{'type': 'text/html', 'language': None, 'base'...","[{'term': 'Four Short Links', 'scheme': None, ...",Four short links: 6 April 2020,"{'type': 'text/plain', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...
4,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],https://www.oreilly.com/radar/four-short-links...,"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,False,https://www.oreilly.com/radar/?p=12585,http://feedproxy.google.com/~r/oreilly/radar/a...,"[{'rel': 'alternate', 'type': 'text/html', 'hr...","Fri, 03 Apr 2020 11:59:08 +0000","(2020, 4, 3, 11, 59, 8, 4, 94, 0)",0,The Zero Trust Learning Curve (Palo Alto Netwo...,"{'type': 'text/html', 'language': None, 'base'...","[{'term': 'Four Short Links', 'scheme': None, ...",Four short links: 3 April 2020,"{'type': 'text/plain', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...


### 10. Count the number of entries per author and sort them in descending order.

In [14]:
df['author'].value_counts()

Nat Torkington                       36
Jenn Webb                             4
Roger Magoulas and Steve Swoyer       4
Mike Loukides                         3
George Fairbanks                      1
Pamela Rucker                         1
Tim O’Reilly                          1
Kai Holnes                            1
Cynthia Owens                         1
Rita J. King                          1
Mark Richards                         1
Hugo Bowne-Anderson                   1
Mary Poppendieck                      1
Mac Slocum                            1
Rachel Laycock and Neal Ford          1
Peter Skomoroch and Mike Loukides     1
Martin Fowler                         1
Name: author, dtype: int64

### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [15]:
df['title length'] = df['title'].apply(len)

In [16]:
new_df = pd.concat([df['title'], df['author'], df['title length']], axis=1)

In [17]:
new_df.sort_values(by=['title length'], ascending=False, inplace=True)

In [18]:
new_df.head(15)

Unnamed: 0,title,author,title length
40,Highlights from the O’Reilly Software Architec...,Mac Slocum,78
14,Great leaders inspire innovation and creativit...,Jenn Webb,76
52,10 ways to get untapped talent in your organiz...,Pamela Rucker,65
15,Strong leaders forge an intersection of knowle...,Jenn Webb,64
20,It’s an unprecedented crisis: 8 things to do r...,Cynthia Owens,54
8,What you need to know about product management...,Peter Skomoroch and Mike Loukides,53
12,An enterprise vision is your company’s North Star,Jenn Webb,49
13,Leaders need to mobilize change-ready workforces,Jenn Webb,48
9,The unreasonable importance of data preparation,Hugo Bowne-Anderson,47
11,3 ways to confront modern business challenges,Rita J. King,45


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [19]:
[t.title for t in oreilly.entries if "machine learning" in t.summary.lower()]

['What you need to know about product management for AI',
 'Four short links: 13 February 2020']