# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [3]:
import feedparser

### 1. Use feedparser to parse the following RSS feed URL.

In [2]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [4]:
x=feedparser.parse(url)

x.feed.title

'Radar'

### 2. Obtain a list of components (keys) that are available for this feed.

In [5]:
k=list(x.keys())
k

['feed',
 'entries',
 'bozo',
 'headers',
 'etag',
 'updated',
 'updated_parsed',
 'href',
 'status',
 'encoding',
 'version',
 'namespaces']

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [7]:
k_feed=list(x["feed"].keys())
k_feed

['title',
 'title_detail',
 'links',
 'link',
 'subtitle',
 'subtitle_detail',
 'updated',
 'updated_parsed',
 'language',
 'sy_updateperiod',
 'sy_updatefrequency',
 'generator_detail',
 'generator',
 'feedburner_info',
 'geo_lat',
 'geo_long',
 'feedburner_emailserviceid',
 'feedburner_feedburnerhostname']

### 4. Extract and print the feed title, subtitle, author, and link.

In [13]:
title=x.feed.title
sub= x.feed.subtitle
auth=list(set([a.author for a in x.entries]))
links=list(set([l.link for l in x.entries]))

print(title+" "+ sub)
print(auth)
print(links)

Radar Now, next, and beyond: Tracking need-to-know trends at the intersection of business and technology
['Kai Holnes', 'Rita J. King', 'Mike Loukides', 'Jenn Webb', 'Mary Poppendieck', 'Rachel Laycock and Neal Ford', 'Tim O’Reilly', 'Mac Slocum', 'Nat Torkington', 'Martin Fowler', 'Peter Skomoroch and Mike Loukides', 'Mark Richards', 'Pamela Rucker', 'George Fairbanks', 'Hugo Bowne-Anderson', 'Roger Magoulas and Steve Swoyer', 'Cynthia Owens']
['http://feedproxy.google.com/~r/oreilly/radar/atom/~3/r2v6X5DEj18/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/_F37wbLQBc4/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/oihNdyY6L8c/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/WTB4fwfK6uY/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/OWHdXEwPypI/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/7OrytSiwf90/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/NPdxqC3wq2E/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/oDUD3rzhH90/'

### 5. Count the number of entries that are contained in this RSS feed.

In [14]:
len(x.entries)

60

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [15]:
list(x.entries[0].keys())

['title',
 'title_detail',
 'links',
 'link',
 'comments',
 'published',
 'published_parsed',
 'authors',
 'author',
 'author_detail',
 'tags',
 'id',
 'guidislink',
 'summary',
 'summary_detail',
 'content',
 'wfw_commentrss',
 'slash_comments',
 'feedburner_origlink']

### 7. Extract a list of entry titles.

In [16]:
entrys= list(set([t.title for t in x.entries]))
print(entrys)

['Four short links: 31 March 2020', 'Four short links: 12 March 2020', 'Four short links: 3 March 2020', 'What you need to know about product management for AI', 'Highlights from the O’Reilly Software Architecture Conference in New York 2020', 'Four short links: 9 March 2020', 'The elephant in the architecture', '5 key areas for tech leaders to watch in 2020', 'Four short links: 5 March 2020', 'Four short links: 16 March 2020', 'Four short links: 2 April 2020', 'Governance and Discovery', 'Four short links: 13 March 2020', 'Great leaders inspire innovation and creativity from within their workforces', 'Remembering Freeman Dyson', 'Four short links: 3 April 2020', 'Four short links: 19 February 2020', 'Radar trends to watch: March 2020', 'Four short links: 6 March 2020', 'Four short links: 26 February 2020', 'Strong leaders forge an intersection of knowledge and experience', 'Four short links: 19 March 2020', 'Four short links: 14 February 2020', 'Sometimes I draw', 'Four short links: 2

### 8. Calculate the percentage of "Four short links" entry titles.

In [18]:
links_4=100*len(set([t.title for t in x.entries if t.title.startswith("Four short links")]))/(len(entrys))
links_4

61.666666666666664

### 9. Create a Pandas data frame from the feed's entries.

In [19]:
import pandas as pd

In [20]:
df = pd.DataFrame(x.entries)
df.head()

Unnamed: 0,title,title_detail,links,link,comments,published,published_parsed,authors,author,author_detail,tags,id,guidislink,summary,summary_detail,content,wfw_commentrss,slash_comments,feedburner_origlink
0,Four short links: 10 April 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Fri, 10 Apr 2020 11:33:40 +0000","(2020, 4, 10, 11, 33, 40, 4, 101, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=12618,False,FairMOT &#8212; one-shot multi-object tracking...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
1,Four short links: 9 April 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Thu, 09 Apr 2020 11:42:34 +0000","(2020, 4, 9, 11, 42, 34, 3, 100, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=12614,False,The Fuzzy Edges of Character Encoding &#8212; ...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
2,Four short links: 8 April 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Wed, 08 Apr 2020 11:48:28 +0000","(2020, 4, 8, 11, 48, 28, 2, 99, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=12611,False,System Design for Advanced Beginners &#8212; a...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
3,Four short links: 7 April 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Tue, 07 Apr 2020 11:45:13 +0000","(2020, 4, 7, 11, 45, 13, 1, 98, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=12606,False,locust &#8212; open source load testing tool: ...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
4,Governance and Discovery,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/governance-and-d...,"Mon, 06 Apr 2020 19:09:29 +0000","(2020, 4, 6, 19, 9, 29, 0, 97, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,{'name': 'Mike Loukides'},"[{'term': 'Radar Column', 'scheme': None, 'lab...",https://www.oreilly.com/radar/?p=12594,False,Data Governance sounds like a candidate for th...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/governance-and-d...,0,https://www.oreilly.com/radar/governance-and-d...


### 10. Count the number of entries per author and sort them in descending order.

In [21]:
df["author"].value_counts()

Nat Torkington                       37
Jenn Webb                             4
Mike Loukides                         3
Roger Magoulas and Steve Swoyer       3
Hugo Bowne-Anderson                   1
Mac Slocum                            1
Rita J. King                          1
Rachel Laycock and Neal Ford          1
Mary Poppendieck                      1
Tim O’Reilly                          1
Martin Fowler                         1
Peter Skomoroch and Mike Loukides     1
Mark Richards                         1
Pamela Rucker                         1
George Fairbanks                      1
Cynthia Owens                         1
Kai Holnes                            1
Name: author, dtype: int64

### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [22]:
df["Lenght"]=df["title"].apply(len)
df2=pd.concat([df["title"], df["author"], df["Lenght"]], axis=1)
df2.sort_values(by=["Lenght"], ascending=False, inplace=True)
df2.head(10)

Unnamed: 0,title,author,Lenght
45,Highlights from the O’Reilly Software Architec...,Mac Slocum,78
16,Great leaders inspire innovation and creativit...,Jenn Webb,76
54,10 ways to get untapped talent in your organiz...,Pamela Rucker,65
17,Strong leaders forge an intersection of knowle...,Jenn Webb,64
22,It’s an unprecedented crisis: 8 things to do r...,Cynthia Owens,54
10,What you need to know about product management...,Peter Skomoroch and Mike Loukides,53
14,An enterprise vision is your company’s North Star,Jenn Webb,49
15,Leaders need to mobilize change-ready workforces,Jenn Webb,48
11,The unreasonable importance of data preparation,Hugo Bowne-Anderson,47
56,5 key areas for tech leaders to watch in 2020,Roger Magoulas and Steve Swoyer,45


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [31]:
[t.title for t in x.entries if "machine learning" in t.summary.lower()]
    

['What you need to know about product management for AI']