# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [1]:
import feedparser

### 1. Use feedparser to parse the following RSS feed URL.

In [2]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [4]:
d = feedparser.parse(url)

### 2. Obtain a list of components (keys) that are available for this feed.

In [14]:
print(list(d.keys()))

['bozo', 'entries', 'feed', 'headers', 'etag', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces']


### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [23]:
print(list(d.feed.keys()))

['title', 'title_detail', 'links', 'link', 'subtitle', 'subtitle_detail', 'updated', 'updated_parsed', 'language', 'sy_updateperiod', 'sy_updatefrequency', 'generator_detail', 'generator', 'feedburner_info', 'geo_lat', 'geo_long', 'feedburner_emailserviceid', 'feedburner_feedburnerhostname']


### 4. Extract and print the feed title, subtitle, author, and link.

In [60]:
print(f"Feed title: {d['feed']['title']}\nFeed subtitle: {d['feed']['subtitle']}\nFeed link: {d['feed']['link']}")

Feed title: Radar
Feed subtitle: Now, next, and beyond: Tracking need-to-know trends at the intersection of business and technology
Feed link: https://www.oreilly.com/radar


In [55]:
for i in range(len(d['entries'])):
    print(f"{i+1}. Feed title: {d['entries'][i]['title']}, Feed author: {d['entries'][i]['author']}, Feed link: {d['entries'][i]['link']}")

1. Feed title: Remote Teams in ML/AI, Feed author: Q McCallum, Feed link: https://www.oreilly.com/radar/remote-teams-in-ml-ai/
2. Feed title: Radar trends to watch: November 2021, Feed author: Mike Loukides, Feed link: https://www.oreilly.com/radar/radar-trends-to-watch-november-2021/
3. Feed title: The Sobering Truth About the Impact of Your Business Ideas, Feed author: Eric Colson, Daragh Sibley and Dave Spiegel, Feed link: https://www.oreilly.com/radar/the-sobering-truth-about-the-impact-of-your-business-ideas/
4. Feed title: MLOps and DevOps: Why Data Makes It Different, Feed author: Ville Tuulos and Hugo Bowne-Anderson, Feed link: https://www.oreilly.com/radar/mlops-and-devops-why-data-makes-it-different/
5. Feed title: The Quality of Auto-Generated Code, Feed author: Mike Loukides and Kevlin Henney, Feed link: https://www.oreilly.com/radar/the-quality-of-auto-generated-code/
6. Feed title: Radar trends to watch: October 2021, Feed author: Mike Loukides, Feed link: https://www.ore

### 5. Count the number of entries that are contained in this RSS feed.

In [61]:
len(d.entries)

60

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [73]:
print(list(d.entries[0].keys()))

['title', 'title_detail', 'links', 'link', 'comments', 'published', 'published_parsed', 'authors', 'author', 'author_detail', 'tags', 'id', 'guidislink', 'summary', 'summary_detail', 'content', 'wfw_commentrss', 'slash_comments']


### 7. Extract a list of entry titles.

In [92]:
entry_titles = []
for i in range(len(d['entries'])):
    entry_titles.append(d['entries'][i]['title'])
    
entry_titles

['Remote Teams in ML/AI',
 'Radar trends to watch: November 2021',
 'The Sobering Truth About the Impact of Your Business Ideas',
 'MLOps and DevOps: Why Data Makes It Different',
 'The Quality of Auto-Generated Code',
 'Radar trends to watch: October 2021',
 'Ethical Social Media: Oxymoron or Attainable Goal?',
 '2021 Data/AI Salary Survey',
 'Radar trends to watch: September 2021',
 'Rebranding Data',
 'A Way Forward with Communal Computing',
 'Defending against ransomware is all about the basics',
 'Radar trends to watch: August 2021',
 'Communal Computing’s Many Problems',
 'Thinking About Glue',
 'Radar trends to watch: July 2021',
 'Hand Labeling Considered Harmful',
 'Two economies. Two sets of rules.',
 'Communal Computing',
 'Code as Infrastructure',
 'Radar trends to watch: June 2021',
 'AI Powered Misinformation and Manipulation at Scale #GPT-3',
 'DeepCheapFakes',
 'Radar trends to watch: May 2021',
 'Checking Jeff Bezos’s Math',
 'AI Adoption in the Enterprise 2021',
 'NFT

### 8. Calculate the percentage of "Four short links" entry titles.

In [82]:
four_short_links = []
for i in entry_titles:
    if "Four short links" in i:
        four_short_links.append(i)
four_short_links        

['Four short links: 14 Dec 2020',
 'Four short links: 8 Dec 2020',
 'Four short links: 4 Dec 2020',
 'Four short links: 1 Dec 2020',
 'Four short links: 27 Nov 2020',
 'Four short links: 24 Nov 2020',
 'Four short links: 20 Nov 2020',
 'Four short links: 17 Nov 2020',
 'Four short links: 13 Nov 2020',
 'Four short links: 10 November 2020',
 'Four short links: 6 Nov 2020',
 'Four short links: 4 Nov 2020',
 'Four short links: 30 Oct 2020']

In [93]:
f'"Four short links" titles make up {round(100*len(four_short_links)/len(entry_titles), 2)}% of all entry titles'

'"Four short links" titles make up 21.67% of all entry titles'

### 9. Create a Pandas data frame from the feed's entries.

In [94]:
import pandas as pd

In [118]:
df = pd.DataFrame(d.entries)
df.head(10)

Unnamed: 0,title,title_detail,links,link,comments,published,published_parsed,authors,author,author_detail,tags,id,guidislink,summary,summary_detail,content,wfw_commentrss,slash_comments
0,Remote Teams in ML/AI,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.oreilly.com/radar/remote-teams-in-...,https://www.oreilly.com/radar/remote-teams-in-...,"Tue, 09 Nov 2021 14:05:48 +0000","(2021, 11, 9, 14, 5, 48, 1, 313, 0)",[{'name': 'Q McCallum'}],Q McCallum,{'name': 'Q McCallum'},"[{'term': 'Building a data culture', 'scheme':...",https://www.oreilly.com/radar/?p=14075,False,I&#8217;m well-versed in the ups and downs of ...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/remote-teams-in-...,0
1,Radar trends to watch: November 2021,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.oreilly.com/radar/radar-trends-to-...,https://www.oreilly.com/radar/radar-trends-to-...,"Tue, 02 Nov 2021 11:40:17 +0000","(2021, 11, 2, 11, 40, 17, 1, 306, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,{'name': 'Mike Loukides'},"[{'term': 'Radar Trends', 'scheme': None, 'lab...",https://www.oreilly.com/radar/?p=14066,False,While October’s news was dominated by Facebook...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/radar-trends-to-...,0
2,The Sobering Truth About the Impact of Your Bu...,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.oreilly.com/radar/the-sobering-tru...,https://www.oreilly.com/radar/the-sobering-tru...,"Tue, 26 Oct 2021 13:07:58 +0000","(2021, 10, 26, 13, 7, 58, 1, 299, 0)","[{'name': 'Eric Colson, Daragh Sibley and Dave...","Eric Colson, Daragh Sibley and Dave Spiegel","{'name': 'Eric Colson, Daragh Sibley and Dave ...","[{'term': 'Business', 'scheme': None, 'label':...",https://www.oreilly.com/radar/?p=14041,False,The introduction of data science into the busi...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/the-sobering-tru...,0
3,MLOps and DevOps: Why Data Makes It Different,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.oreilly.com/radar/mlops-and-devops...,https://www.oreilly.com/radar/mlops-and-devops...,"Tue, 19 Oct 2021 14:17:38 +0000","(2021, 10, 19, 14, 17, 38, 1, 292, 0)",[{'name': 'Ville Tuulos and Hugo Bowne-Anderso...,Ville Tuulos and Hugo Bowne-Anderson,{'name': 'Ville Tuulos and Hugo Bowne-Anderson'},"[{'term': 'AI & ML', 'scheme': None, 'label': ...",https://www.oreilly.com/radar/?p=14018,False,Much has been written about struggles of deplo...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/mlops-and-devops...,0
4,The Quality of Auto-Generated Code,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.oreilly.com/radar/the-quality-of-a...,https://www.oreilly.com/radar/the-quality-of-a...,"Tue, 12 Oct 2021 13:45:10 +0000","(2021, 10, 12, 13, 45, 10, 1, 285, 0)",[{'name': 'Mike Loukides and Kevlin Henney'}],Mike Loukides and Kevlin Henney,{'name': 'Mike Loukides and Kevlin Henney'},"[{'term': 'AI & ML', 'scheme': None, 'label': ...",https://www.oreilly.com/radar/?p=14007,False,Kevlin Henney and I were riffing on some ideas...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/the-quality-of-a...,0
5,Radar trends to watch: October 2021,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.oreilly.com/radar/radar-trends-to-...,https://www.oreilly.com/radar/radar-trends-to-...,"Tue, 05 Oct 2021 11:42:52 +0000","(2021, 10, 5, 11, 42, 52, 1, 278, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,{'name': 'Mike Loukides'},"[{'term': 'Radar Trends', 'scheme': None, 'lab...",https://www.oreilly.com/radar/?p=14000,False,The unwilling star of this month’s trends is c...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/radar-trends-to-...,0
6,Ethical Social Media: Oxymoron or Attainable G...,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.oreilly.com/radar/ethical-social-m...,https://www.oreilly.com/radar/ethical-social-m...,"Tue, 21 Sep 2021 11:55:27 +0000","(2021, 9, 21, 11, 55, 27, 1, 264, 0)",[{'name': 'Mike Barlow'}],Mike Barlow,{'name': 'Mike Barlow'},"[{'term': 'Social Media', 'scheme': None, 'lab...",https://www.oreilly.com/radar/?p=13981,False,Humans have wrestled with ethics for millennia...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/ethical-social-m...,0
7,2021 Data/AI Salary Survey,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.oreilly.com/radar/2021-data-ai-sal...,https://www.oreilly.com/radar/2021-data-ai-sal...,"Wed, 15 Sep 2021 11:32:26 +0000","(2021, 9, 15, 11, 32, 26, 2, 258, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,{'name': 'Mike Loukides'},"[{'term': 'AI & ML', 'scheme': None, 'label': ...",https://www.oreilly.com/radar/?p=13950,False,"In June 2021, we asked the recipients of our&#...","{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/2021-data-ai-sal...,0
8,Radar trends to watch: September 2021,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.oreilly.com/radar/radar-trends-to-...,https://www.oreilly.com/radar/radar-trends-to-...,"Wed, 01 Sep 2021 12:18:33 +0000","(2021, 9, 1, 12, 18, 33, 2, 244, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,{'name': 'Mike Loukides'},"[{'term': 'Radar Trends', 'scheme': None, 'lab...",https://www.oreilly.com/radar/?p=13943,False,Let’s start with a moment of silence for O’Rei...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/radar-trends-to-...,0
9,Rebranding Data,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.oreilly.com/radar/rebranding-data/,https://www.oreilly.com/radar/rebranding-data/...,"Tue, 24 Aug 2021 14:16:28 +0000","(2021, 8, 24, 14, 16, 28, 1, 236, 0)",[{'name': 'Q McCallum'}],Q McCallum,{'name': 'Q McCallum'},"[{'term': 'Data', 'scheme': None, 'label': Non...",https://www.oreilly.com/radar/?p=13932,False,There&#8217;s a flavor of puzzle in which you ...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/rebranding-data/...,0


### 10. Count the number of entries per author and sort them in descending order.

In [119]:
df['author'].value_counts().sort_values(ascending=False)

Mike Loukides                                  27
Nat Torkington                                 13
                                                3
Chris Butler                                    3
Tim O’Reilly                                    3
Q McCallum                                      2
Eric Colson, Daragh Sibley and Dave Spiegel     1
Shayan Mohanty and Hugo Bowne-Anderson          1
Mike Loukides and Kevlin Henney                 1
Nitesh Dhanjani                                 1
Mike Barlow                                     1
Hugo Bowne-Anderson                             1
Kevlin Henney                                   1
Patrick Hall and Ayoub Ouederni                 1
Ville Tuulos and Hugo Bowne-Anderson            1
Name: author, dtype: int64

### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [120]:
df['title_length'] = df['title'].str.len()
df_new = df[['title', 'author', 'title_length']].sort_values(by='title_length', ascending=False)
df_new.head()

Unnamed: 0,title,author,title_length
37,"Where Programming, Ops, AI, and the Cloud are ...",Mike Loukides,60
2,The Sobering Truth About the Impact of Your Bu...,"Eric Colson, Daragh Sibley and Dave Spiegel",58
21,AI Powered Misinformation and Manipulation at ...,Nitesh Dhanjani,58
34,5 infrastructure and operations trends to watc...,,55
43,O’Reilly’s top 20 live online training courses...,,54


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [129]:
ml_books = list(df[df['summary'].str.contains(pat = 'machine learning')]['title'])
ml_books

['MLOps and DevOps: Why Data Makes It Different',
 'Hand Labeling Considered Harmful',
 'Radar trends to watch: April 2021',
 'Seven Legal Questions for Data Scientists']