# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [1]:
import feedparser

### 1. Use feedparser to parse the following RSS feed URL.

In [2]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [4]:
atom = feedparser.parse(url)

### 2. Obtain a list of components (keys) that are available for this feed.

In [5]:
atom.keys()

dict_keys(['feed', 'entries', 'bozo', 'headers', 'etag', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [6]:
atom.feed.keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'subtitle', 'subtitle_detail', 'updated', 'updated_parsed', 'language', 'sy_updateperiod', 'sy_updatefrequency', 'generator_detail', 'generator', 'feedburner_info', 'geo_lat', 'geo_long', 'feedburner_emailserviceid', 'feedburner_feedburnerhostname'])

### 4. Extract and print the feed title, subtitle, author, and link.

In [10]:
title = atom.feed.title
subtitle = atom.feed.subtitle
author = atom.entries[0]['author']
link = atom.entries[0]['link']

print(title)
print(subtitle)
print(author)
print(link)

Radar
Now, next, and beyond: Tracking need-to-know trends at the intersection of business and technology
Nat Torkington
http://feedproxy.google.com/~r/oreilly/radar/atom/~3/QTPIfa1h-uM/


### 5. Count the number of entries that are contained in this RSS feed.

In [11]:
print(len(atom.entries))

60


### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [12]:
components = [i for i in atom.feed.keys()]
components

['title',
 'title_detail',
 'links',
 'link',
 'subtitle',
 'subtitle_detail',
 'updated',
 'updated_parsed',
 'language',
 'sy_updateperiod',
 'sy_updatefrequency',
 'generator_detail',
 'generator',
 'feedburner_info',
 'geo_lat',
 'geo_long',
 'feedburner_emailserviceid',
 'feedburner_feedburnerhostname']

### 7. Extract a list of entry titles.

In [13]:
entry_titles = [atom.entries[i].title for i in range(len(atom.entries))]
entry_titles

['Four short links: 22 January 2020',
 'Four short links: 21 January 2020',
 'Four short links: 20 January 2020',
 'Four short links: 17 January 2020',
 'Four short links: 16 January 2020',
 'Reinforcement learning for the real world',
 'Four short links: 15 January 2020',
 'Four short links: 14 January 2020',
 'Where programming languages are headed in 2020',
 'Four short links: 13 January 2020',
 'Four short links: 10 January 2020',
 'Radar trends to watch: January 2020',
 'Four short links: 9 January 2020',
 'Four short links: 8 January 2020',
 '9 additional books for the Next Economy',
 '8 AI trends we’re watching in 2020',
 'Four short links: 7 January 2020',
 'Rethinking programming',
 'Four short links: 6 January 2020',
 'Four short links: 3 January 2020',
 'Four short links: 2 January 2020',
 '10+ books for the Next Economy',
 'Four short links: 1 January 2020',
 'Four short links: 31 December 2019',
 'Four short links: 30 December 2019',
 'Four short links: 27 December 2019',


### 8. Calculate the percentage of "Four short links" entry titles.

In [15]:
four_short_links = [i for i in entry_titles if 'Four short links' in i]
percentage = len(four_short_links)/len(entry_titles)*100
print(round(percentage, 2), '%')

73.33 %


### 9. Create a Pandas data frame from the feed's entries.

In [16]:
import pandas as pd

In [17]:
entries = atom.entries
df = pd.DataFrame(entries)
df

Unnamed: 0,title,title_detail,links,link,comments,published,published_parsed,authors,author,author_detail,tags,id,guidislink,summary,summary_detail,content,wfw_commentrss,slash_comments,feedburner_origlink
0,Four short links: 22 January 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Wed, 22 Jan 2020 05:01:00 +0000","(2020, 1, 22, 5, 1, 0, 2, 22, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11535,False,Elements of Scheduling &#8212; notable for sev...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
1,Four short links: 21 January 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Tue, 21 Jan 2020 05:01:00 +0000","(2020, 1, 21, 5, 1, 0, 1, 21, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11531,False,Cytoscape &#8212; an open source software plat...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
2,Four short links: 20 January 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Mon, 20 Jan 2020 05:01:00 +0000","(2020, 1, 20, 5, 1, 0, 0, 20, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11525,False,AR Contact Lens &#8212; The path ahead is not ...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
3,Four short links: 17 January 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Fri, 17 Jan 2020 05:01:00 +0000","(2020, 1, 17, 5, 1, 0, 4, 17, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11519,False,cursedfs &#8212; Make a disk image formatted w...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
4,Four short links: 16 January 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Thu, 16 Jan 2020 05:01:00 +0000","(2020, 1, 16, 5, 1, 0, 3, 16, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11515,False,Zero Trust Architecture Principles &#8212; Ten...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
5,Reinforcement learning for the real world,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/reinforcement-le...,"Wed, 15 Jan 2020 11:00:00 +0000","(2020, 1, 15, 11, 0, 0, 2, 15, 0)",[{'name': 'Jenn Webb'}],Jenn Webb,{'name': 'Jenn Webb'},"[{'term': 'AI & ML', 'scheme': None, 'label': ...",https://www.oreilly.com/radar/?p=11335,False,Roger Magoulas recently sat down with Edward J...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/reinforcement-le...,0,https://www.oreilly.com/radar/reinforcement-le...
6,Four short links: 15 January 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Wed, 15 Jan 2020 05:01:00 +0000","(2020, 1, 15, 5, 1, 0, 2, 15, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11511,False,Performance Degradation and Restoration During...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
7,Four short links: 14 January 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Tue, 14 Jan 2020 05:01:00 +0000","(2020, 1, 14, 5, 1, 0, 1, 14, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11506,False,The 2019 Privacy Legislation Bomb Cyclone &#82...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
8,Where programming languages are headed in 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/where-programmin...,"Mon, 13 Jan 2020 11:30:00 +0000","(2020, 1, 13, 11, 30, 0, 0, 13, 0)",[{'name': 'Zan McQuade and Amanda Quinn'}],Zan McQuade and Amanda Quinn,{'name': 'Zan McQuade and Amanda Quinn'},"[{'term': 'Innovation & Disruption', 'scheme':...",https://www.oreilly.com/radar/?p=11305,False,"As we enter a new decade, we asked programming...","{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/where-programmin...,0,https://www.oreilly.com/radar/where-programmin...
9,Four short links: 13 January 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Mon, 13 Jan 2020 05:01:00 +0000","(2020, 1, 13, 5, 1, 0, 0, 13, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11495,False,Simulated Customer &#8212; The site will rando...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...


### 10. Count the number of entries per author and sort them in descending order.

In [18]:
authors = df.groupby('author', as_index=False).agg({'title':'count'})
authors.columns = ['author', 'entries']
authors.sort_values('entries', ascending=False)

Unnamed: 0,author,entries
5,Nat Torkington,44
3,Mike Loukides,4
2,Jenn Webb,3
0,,2
1,Alison McCauley,1
4,Mike Loukides and Ben Lorica,1
6,Pamela Rucker,1
7,Patrick Hall and Andrew Burt,1
8,Roger Magoulas,1
9,"Sunil Ranka, Roger Magoulas and Steve Swoyer",1


### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [19]:
df['title_length'] = df['title'].apply(len)
df[['title', 'author', 'title_length']].sort_values('title_length', ascending=False)

Unnamed: 0,title,author,title_length
35,5 industries that demonstrate how blockchains ...,Alison McCauley,63
38,Why you should care about debugging machine le...,Patrick Hall and Andrew Burt,59
59,Why you should care about robotic process auto...,"Sunil Ranka, Roger Magoulas and Steve Swoyer",52
55,Moving AI and ML from research into production,Jenn Webb,46
8,Where programming languages are headed in 2020,Zan McQuade and Amanda Quinn,46
32,AI is computer science disguised as hard work,Jenn Webb,45
5,Reinforcement learning for the real world,Jenn Webb,41
48,Use your people as competitive advantage,Pamela Rucker,40
14,9 additional books for the Next Economy,,39
45,Radar trends to watch: December 2019,Mike Loukides,36


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [20]:
mach_learn = [atom.entries[i]['summary'] for i in range(len(atom.entries))]
dictionary = dict(zip(entry_titles, mach_learn))
text = "machine learning"
machine_learning = [e for e,s in dictionary.items() if text in s]
machine_learning

['Four short links: 13 January 2020',
 'Why you should care about debugging machine learning models',
 'The road to Software 2.0',
 'Moving AI and ML from research into production']