# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [1]:
import feedparser

### 1. Use feedparser to parse the following RSS feed URL.

In [2]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [3]:
reddit1 = feedparser.parse(url)

### 2. Obtain a list of components (keys) that are available for this feed.

In [4]:
reddit1.keys()

dict_keys(['feed', 'entries', 'bozo', 'headers', 'etag', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [5]:
reddit1.feed

{'title': "All - O'Reilly Media",
 'title_detail': {'type': 'text/plain',
  'language': None,
  'base': 'http://feeds.feedburner.com/oreilly/radar/atom',
  'value': "All - O'Reilly Media"},
 'id': 'https://www.oreilly.com',
 'guidislink': True,
 'link': 'https://www.oreilly.com',
 'updated': '2019-06-26T15:34:19Z',
 'updated_parsed': time.struct_time(tm_year=2019, tm_mon=6, tm_mday=26, tm_hour=15, tm_min=34, tm_sec=19, tm_wday=2, tm_yday=177, tm_isdst=0),
 'subtitle': 'All of our Ideas and Learning material from all of our topics.',
 'subtitle_detail': {'type': 'text/plain',
  'language': None,
  'base': 'http://feeds.feedburner.com/oreilly/radar/atom',
  'value': 'All of our Ideas and Learning material from all of our topics.'},
 'links': [{'href': 'https://www.oreilly.com',
   'rel': 'alternate',
   'type': 'text/html'},
  {'rel': 'self',
   'type': 'application/atom+xml',
   'href': 'http://feeds.feedburner.com/oreilly/radar/atom'},
  {'rel': 'hub',
   'href': 'http://pubsubhubbub.a

In [6]:
reddit1.feed.keys()

dict_keys(['title', 'title_detail', 'id', 'guidislink', 'link', 'updated', 'updated_parsed', 'subtitle', 'subtitle_detail', 'links', 'authors', 'author_detail', 'author', 'feedburner_info', 'geo_lat', 'geo_long', 'feedburner_emailserviceid', 'feedburner_feedburnerhostname'])

### 4. Extract and print the feed title, subtitle, author, and link.

In [7]:
reddit1.feed.title

"All - O'Reilly Media"

In [8]:
reddit1.feed.subtitle

'All of our Ideas and Learning material from all of our topics.'

In [9]:
reddit1.feed.author

"O'Reilly Media"

In [10]:
reddit1.feed.link

'https://www.oreilly.com'

### 5. Count the number of entries that are contained in this RSS feed.

In [11]:
len(reddit1.entries)

60

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [12]:
reddit1.entries[0].keys()

dict_keys(['title', 'title_detail', 'updated', 'updated_parsed', 'id', 'guidislink', 'link', 'content', 'summary', 'links', 'authors', 'author_detail', 'author', 'feedburner_origlink'])

### 7. Extract a list of entry titles.

In [13]:
lst = [reddit1.entries[i].title for i in range(len(reddit1.entries))]
lst

['Four short links: 26 June 2019',
 'AI and machine learning will require retraining your entire organization',
 'Four short links: 25 June 2019',
 'Four short links: 24 June 2019',
 'Four short links: 21 June 2019',
 'Four short links: 20 June 2019',
 'Enabling end-to-end machine learning pipelines in real-world applications',
 'Four short links: 19 June 2019',
 'What are model governance and model operations?',
 'Four short links: 18 June 2019',
 'The quest for high-quality data',
 'Four short links: 17 June 2019',
 'Prioritizing technical debt as if time and money mattered',
 'Choices of scale',
 'From the trenches with Rebecca Parsons',
 'Four short links: 14 June 2019',
 'The cloud native elephant in the room',
 'Infrastructure first: Because solving complex problems needs more than technology',
 'How do we heal?',
 'Cultivating production excellence',
 'Kubernetes for the impatient',
 "Highlights from the O'Reilly Software Architecture Conference in San Jose 2019",
 'Next Archite

### 8. Calculate the percentage of "Four short links" entry titles.

In [14]:
z = 'Four short links:'
count_4 = 0
for title in lst:
    if z in title:
        count_4 += 1
percentage = count_4/len(lst)
print(percentage*100)


46.666666666666664


### 9. Create a Pandas data frame from the feed's entries.

In [15]:
import pandas as pd

In [16]:
df = pd.DataFrame(reddit1.entries)
df.head(10)

Unnamed: 0,author,author_detail,authors,content,feedburner_origlink,guidislink,id,link,links,summary,title,title_detail,updated,updated_parsed
0,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/four-short-links...,True,"tag:www.oreilly.com,2019-06-26:/ideas/four-sho...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,"<p><em>Ethics and OKRs, Rewriting Binaries, Di...",Four short links: 26 June 2019,"{'type': 'text/plain', 'language': None, 'base...",2019-06-26T11:15:00Z,"(2019, 6, 26, 11, 15, 0, 2, 177, 0)"
1,Ben Lorica,{'name': 'Ben Lorica'},[{'name': 'Ben Lorica'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/ai-and-machine-l...,True,"tag:www.oreilly.com,2019-06-26:/ideas/ai-and-m...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,<p><img src='https://d3ucjech6zwjp8.cloudfront...,AI and machine learning will require retrainin...,"{'type': 'text/plain', 'language': None, 'base...",2019-06-26T11:00:00Z,"(2019, 6, 26, 11, 0, 0, 2, 177, 0)"
2,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/four-short-links...,True,"tag:www.oreilly.com,2019-06-25:/ideas/four-sho...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,"<p><em>Analog Deep Learning, Low-Trust Interne...",Four short links: 25 June 2019,"{'type': 'text/plain', 'language': None, 'base...",2019-06-25T10:50:00Z,"(2019, 6, 25, 10, 50, 0, 1, 176, 0)"
3,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/four-short-links...,True,"tag:www.oreilly.com,2019-06-24:/ideas/four-sho...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,"<p><em>Wacky Timestamps, Computers and Spies, ...",Four short links: 24 June 2019,"{'type': 'text/plain', 'language': None, 'base...",2019-06-24T10:45:00Z,"(2019, 6, 24, 10, 45, 0, 0, 175, 0)"
4,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/four-short-links...,True,"tag:www.oreilly.com,2019-06-21:/ideas/four-sho...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,"<p><em>Private Computation, Robot Framework, 3...",Four short links: 21 June 2019,"{'type': 'text/plain', 'language': None, 'base...",2019-06-21T10:45:00Z,"(2019, 6, 21, 10, 45, 0, 4, 172, 0)"
5,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/four-short-links...,True,"tag:www.oreilly.com,2019-06-20:/ideas/four-sho...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,"<p><em>Model Governance, Content Moderators, I...",Four short links: 20 June 2019,"{'type': 'text/plain', 'language': None, 'base...",2019-06-20T21:55:00Z,"(2019, 6, 20, 21, 55, 0, 3, 171, 0)"
6,Ben Lorica,{'name': 'Ben Lorica'},[{'name': 'Ben Lorica'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/enabling-end-to-...,True,"tag:www.oreilly.com,2019-06-20:/ideas/enabling...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,<p><img src='https://d3ucjech6zwjp8.cloudfront...,Enabling end-to-end machine learning pipelines...,"{'type': 'text/plain', 'language': None, 'base...",2019-06-20T11:50:00Z,"(2019, 6, 20, 11, 50, 0, 3, 171, 0)"
7,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/four-short-links...,True,"tag:www.oreilly.com,2019-06-19:/ideas/four-sho...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,"<p><em>Voice2Face, DIY Minivac, Cloud Metrics,...",Four short links: 19 June 2019,"{'type': 'text/plain', 'language': None, 'base...",2019-06-19T11:05:00Z,"(2019, 6, 19, 11, 5, 0, 2, 170, 0)"
8,"Ben Lorica, Harish Doddi, David Talby","{'name': 'Ben Lorica, Harish Doddi, David Talby'}","[{'name': 'Ben Lorica, Harish Doddi, David Tal...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/what-are-model-g...,True,"tag:www.oreilly.com,2019-06-19:/ideas/what-are...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,<p><img src='https://d3ucjech6zwjp8.cloudfront...,What are model governance and model operations?,"{'type': 'text/plain', 'language': None, 'base...",2019-06-19T11:00:00Z,"(2019, 6, 19, 11, 0, 0, 2, 170, 0)"
9,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/four-short-links...,True,"tag:www.oreilly.com,2019-06-18:/ideas/four-sho...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,"<p><em>JavaScript Spreadsheets, Pessimism, Pri...",Four short links: 18 June 2019,"{'type': 'text/plain', 'language': None, 'base...",2019-06-18T11:50:00Z,"(2019, 6, 18, 11, 50, 0, 1, 169, 0)"


### 10. Count the number of entries per author and sort them in descending order.

In [17]:
authors = df.groupby('author', as_index=False).agg({'title':'count'})
authors.columns = ['author', 'entries']
authors.sort_values('entries', ascending=False)

Unnamed: 0,author,entries
19,Nat Torkington,28
2,Ben Lorica,6
0,Adam Tornhill,1
14,Liz Fong-Jones,1
25,Yaniv Aknin,1
24,"Roger Magoulas, Rachel Roumeliotis",1
23,Rebecca Wirfs-Brock,1
22,"Rebecca Parsons, Neal Ford",1
21,Nikki McDonald,1
20,Nathaniel Schutta,1


### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [18]:
title_length = [len(title) for title in lst]
df['title length'] = title_length
df1 = df[['title', 'author', 'title length']].copy()
df1 = df1.sort_values('title length', ascending=False)
df1

Unnamed: 0,title,author,title length
17,Infrastructure first: Because solving complex ...,Everett Harper,81
57,Becoming a machine learning company means inve...,Ben Lorica,80
21,Highlights from the O'Reilly Software Architec...,Jenn Webb,78
6,Enabling end-to-end machine learning pipelines...,Ben Lorica,73
1,AI and machine learning will require retrainin...,Ben Lorica,72
54,Applications of data science and machine learn...,Ben Lorica,71
32,Channel into the universe of eventually perfec...,Lena Hall,67
29,Highlights from the O'Reilly Velocity Conferen...,Mac Slocum,65
31,Scaling teams with technology (or is it the ot...,Chen Goldberg,62
36,How to get started with site reliability engin...,Nikki McDonald,58


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [19]:
lst_m_c = [title for title in lst if 'machine learning' in title ]
lst_m_c

['AI and machine learning will require retraining your entire organization',
 'Enabling end-to-end machine learning pipelines in real-world applications',
 'Applications of data science and machine learning in financial services',
 'Becoming a machine learning company means investing in foundational technologies']