# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [1]:
import feedparser
import pandas as pd

### 1. Use feedparser to parse the following RSS feed URL.

In [2]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [3]:
reddit = feedparser.parse(url)
print(reddit['feed'])

{'title': 'Radar', 'title_detail': {'type': 'text/plain', 'language': None, 'base': 'http://feeds.feedburner.com/oreilly/radar/atom', 'value': 'Radar'}, 'links': [{'rel': 'alternate', 'type': 'text/html', 'href': 'https://www.oreilly.com/radar'}, {'rel': 'self', 'type': 'application/rss+xml', 'href': 'http://feeds.feedburner.com/oreilly/radar/atom'}, {'rel': 'hub', 'href': 'http://pubsubhubbub.appspot.com/', 'type': 'text/html'}], 'link': 'https://www.oreilly.com/radar', 'subtitle': 'Now, next, and beyond: Tracking need-to-know trends at the intersection of business and technology', 'subtitle_detail': {'type': 'text/html', 'language': None, 'base': 'http://feeds.feedburner.com/oreilly/radar/atom', 'value': 'Now, next, and beyond: Tracking need-to-know trends at the intersection of business and technology'}, 'updated': 'Wed, 08 Apr 2020 11:48:28 +0000', 'updated_parsed': time.struct_time(tm_year=2020, tm_mon=4, tm_mday=8, tm_hour=11, tm_min=48, tm_sec=28, tm_wday=2, tm_yday=99, tm_isdst

### 2. Obtain a list of components (keys) that are available for this feed.

In [4]:
reddit.keys()

dict_keys(['feed', 'entries', 'bozo', 'headers', 'etag', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [5]:
reddit.feed.keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'subtitle', 'subtitle_detail', 'updated', 'updated_parsed', 'language', 'sy_updateperiod', 'sy_updatefrequency', 'generator_detail', 'generator', 'feedburner_info', 'geo_lat', 'geo_long', 'feedburner_emailserviceid', 'feedburner_feedburnerhostname'])

### 4. Extract and print the feed title, subtitle, author, and link.

In [6]:
print (reddit.feed.title)
print (reddit.feed.subtitle)
print (reddit.feed.link)

Radar
Now, next, and beyond: Tracking need-to-know trends at the intersection of business and technology
https://www.oreilly.com/radar


### 5. Count the number of entries that are contained in this RSS feed.

In [7]:
len(reddit.entries[0])

19

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [8]:
reddit.entries[0]

{'title': 'Four short links: 8 April 2020',
 'title_detail': {'type': 'text/plain',
  'language': None,
  'base': 'http://feeds.feedburner.com/oreilly/radar/atom',
  'value': 'Four short links: 8 April 2020'},
 'links': [{'rel': 'alternate',
   'type': 'text/html',
   'href': 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/V8u8aZ2VPaE/'}],
 'link': 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/V8u8aZ2VPaE/',
 'comments': 'https://www.oreilly.com/radar/four-short-links-8-april-2020/#respond',
 'published': 'Wed, 08 Apr 2020 11:48:28 +0000',
 'published_parsed': time.struct_time(tm_year=2020, tm_mon=4, tm_mday=8, tm_hour=11, tm_min=48, tm_sec=28, tm_wday=2, tm_yday=99, tm_isdst=0),
 'authors': [{'name': 'Nat Torkington'}],
 'author': 'Nat Torkington',
 'author_detail': {'name': 'Nat Torkington'},
 'tags': [{'term': 'Four Short Links', 'scheme': None, 'label': None},
  {'term': 'Signals', 'scheme': None, 'label': None}],
 'id': 'https://www.oreilly.com/radar/?p=12611',
 'gui

### 7. Extract a list of entry titles.

In [9]:
titles = [reddit.entries[i].title for i in range(len(reddit.entries))]
print(titles)

['Four short links: 8 April 2020', 'Four short links: 7 April 2020', 'Governance and Discovery', 'Four short links: 6 April 2020', 'Four short links: 3 April 2020', 'Four short links: 2 April 2020', 'Four short links: 1 April 2020', 'Four short links: 31 March 2020', 'What you need to know about product management for AI', 'The unreasonable importance of data preparation', 'Four short links: 24 March 2020', '3 ways to confront modern business challenges', 'An enterprise vision is your company’s North Star', 'Leaders need to mobilize change-ready workforces', 'Great leaders inspire innovation and creativity from within their workforces', 'Strong leaders forge an intersection of knowledge and experience', 'Four short links: 23 March 2020', 'Four short links: 20 March 2020', '6 trends framing the state of AI and ML', 'Four short links: 19 March 2020', 'It’s an unprecedented crisis: 8 things to do right now', 'AI adoption in the enterprise 2020', 'Four short links: 18 March 2020', 'Four sh

### 8. Calculate the percentage of "Four short links" entry titles.

In [10]:
c=0
for i in titles:
    if i.startswith("Four"):
        c+=1
total =len(titles)
percentage = (c*100/total)
percentage

60.0

### 9. Create a Pandas data frame from the feed's entries.

In [11]:
df = pd.DataFrame(reddit.entries)
df

Unnamed: 0,title,title_detail,links,link,comments,published,published_parsed,authors,author,author_detail,tags,id,guidislink,summary,summary_detail,content,wfw_commentrss,slash_comments,feedburner_origlink
0,Four short links: 8 April 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Wed, 08 Apr 2020 11:48:28 +0000","(2020, 4, 8, 11, 48, 28, 2, 99, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=12611,False,System Design for Advanced Beginners &#8212; a...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
1,Four short links: 7 April 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Tue, 07 Apr 2020 11:45:13 +0000","(2020, 4, 7, 11, 45, 13, 1, 98, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=12606,False,locust &#8212; open source load testing tool: ...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
2,Governance and Discovery,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/governance-and-d...,"Mon, 06 Apr 2020 19:09:29 +0000","(2020, 4, 6, 19, 9, 29, 0, 97, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,{'name': 'Mike Loukides'},"[{'term': 'Radar Column', 'scheme': None, 'lab...",https://www.oreilly.com/radar/?p=12594,False,Data Governance sounds like a candidate for th...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/governance-and-d...,0,https://www.oreilly.com/radar/governance-and-d...
3,Four short links: 6 April 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Mon, 06 Apr 2020 11:53:01 +0000","(2020, 4, 6, 11, 53, 1, 0, 97, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=12590,False,Rufus &#8212; Create bootable USB drives the e...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
4,Four short links: 3 April 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Fri, 03 Apr 2020 11:59:08 +0000","(2020, 4, 3, 11, 59, 8, 4, 94, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=12585,False,The Zero Trust Learning Curve (Palo Alto Netwo...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
5,Four short links: 2 April 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Thu, 02 Apr 2020 15:05:16 +0000","(2020, 4, 2, 15, 5, 16, 3, 93, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=12577,False,Imperial College&#8217;s COVID19 Model &#8212;...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
6,Four short links: 1 April 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Wed, 01 Apr 2020 14:05:05 +0000","(2020, 4, 1, 14, 5, 5, 2, 92, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=12568,False,Replaying Traffic to Test Proprietary Systems ...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
7,Four short links: 31 March 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Tue, 31 Mar 2020 14:10:00 +0000","(2020, 3, 31, 14, 10, 0, 1, 91, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=12574,False,Medtronic Releases Ventilator Designs &#8212; ...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
8,What you need to know about product management...,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/what-you-need-to...,"Tue, 31 Mar 2020 10:00:00 +0000","(2020, 3, 31, 10, 0, 0, 1, 91, 0)",[{'name': 'Peter Skomoroch and Mike Loukides'}],Peter Skomoroch and Mike Loukides,{'name': 'Peter Skomoroch and Mike Loukides'},"[{'term': 'AI & ML', 'scheme': None, 'label': ...",https://www.oreilly.com/radar/?p=12463,False,If you’re already a software product manager (...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/what-you-need-to...,0,https://www.oreilly.com/radar/what-you-need-to...
9,The unreasonable importance of data preparation,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/the-unreasonable...,"Tue, 24 Mar 2020 10:00:00 +0000","(2020, 3, 24, 10, 0, 0, 1, 84, 0)",[{'name': 'Hugo Bowne-Anderson'}],Hugo Bowne-Anderson,{'name': 'Hugo Bowne-Anderson'},"[{'term': 'AI & ML', 'scheme': None, 'label': ...",https://www.oreilly.com/radar/?p=12448,False,In a world focused on buzzword-driven models a...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/the-unreasonable...,0,https://www.oreilly.com/radar/the-unreasonable...


### 10. Count the number of entries per author and sort them in descending order.

In [13]:
df = df.groupby('author', as_index=False).agg({'title':'count'})
df

Unnamed: 0,author,title
0,Cynthia Owens,1
1,George Fairbanks,1
2,Hugo Bowne-Anderson,1
3,Jenn Webb,4
4,Kai Holnes,1
5,Mac Slocum,1
6,Mark Richards,1
7,Martin Fowler,1
8,Mary Poppendieck,1
9,Mike Loukides,3


In [15]:
df.columns = ['author', 'count']
df.sort_values('count', ascending=False)

Unnamed: 0,author,count
10,Nat Torkington,36
15,Roger Magoulas and Steve Swoyer,4
3,Jenn Webb,4
9,Mike Loukides,3
0,Cynthia Owens,1
14,Rita J. King,1
13,Rachel Laycock and Neal Ford,1
12,Peter Skomoroch and Mike Loukides,1
11,Pamela Rucker,1
8,Mary Poppendieck,1


### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [19]:
df['len-Char'] = df['author'].apply(len)


Unnamed: 0,author,count,len-Char
0,Cynthia Owens,1,13
1,George Fairbanks,1,16
2,Hugo Bowne-Anderson,1,19
3,Jenn Webb,4,9
4,Kai Holnes,1,10
5,Mac Slocum,1,10
6,Mark Richards,1,13
7,Martin Fowler,1,13
8,Mary Poppendieck,1,16
9,Mike Loukides,3,13


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [21]:
[e.title for e in reddit.entries if 'machine' in e.summary]

['What you need to know about product management for AI',
 'Four short links: 6 March 2020',
 'Radar trends to watch: March 2020',
 'Four short links: 13 February 2020']