# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [2]:
import feedparser

### 1. Use feedparser to parse the following RSS feed URL.

In [3]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'
feedburner = feedparser.parse(url)

### 2. Obtain a list of components (keys) that are available for this feed.

In [4]:
feedburner.keys()

dict_keys(['feed', 'entries', 'bozo', 'headers', 'etag', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [5]:
feedburner.feed.keys()

dict_keys(['title', 'title_detail', 'id', 'guidislink', 'link', 'updated', 'updated_parsed', 'subtitle', 'subtitle_detail', 'links', 'authors', 'author_detail', 'author', 'feedburner_info', 'geo_lat', 'geo_long', 'feedburner_emailserviceid', 'feedburner_feedburnerhostname'])

### 4. Extract and print the feed title, subtitle, author, and link.

In [6]:
feedburner.feed.title

"All - O'Reilly Media"

In [7]:
feedburner.feed.subtitle

'All of our Ideas and Learning material from all of our topics.'

In [8]:
feedburner.feed.author

"O'Reilly Media"

In [9]:
feedburner.feed.link

'https://www.oreilly.com'

### 5. Count the number of entries that are contained in this RSS feed.

In [10]:
len(feedburner.entries)

60

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [11]:
feedburner.entries[0].keys()

dict_keys(['title', 'title_detail', 'updated', 'updated_parsed', 'id', 'guidislink', 'link', 'content', 'summary', 'links', 'authors', 'author_detail', 'author', 'feedburner_origlink'])

### 7. Extract a list of entry titles.

In [30]:
titles = [feedburner.entries[i].title for i in range(len(feedburner.entries))]
print(titles)

['Four short links: 19 July 2019', 'The war for the soul of open source', 'Ask not what Brands™ can do for you', 'O’Reilly Radar: Open source technology trends—What our users tell us', "O'Reilly Open Source and Frank Willison Awards", 'Managing machines', 'Acquiring and sharing high-quality data', 'Four short links: 18 July 2019', 'Better living through software', 'Built to last: Building and growing open source communities', 'The next age of open innovation', "Highlights from the O'Reilly Open Source Software Conference in Portland 2019", 'The role of open source in mitigating natural disasters', 'Why Amazon cares about open source', 'Four short links: 17 July 2019', 'Four short links: 16 July 2019', 'Managing machine learning in the enterprise: Lessons from banking and health care', 'Four short links: 15 July 2019', 'Four short links: 12 July 2019', 'Four short links: 11 July 2019', 'Four short links: 10 July 2019', 'Four short links: 9 July 2019', 'The circle of fairness', "Highligh

### 8. Calculate the percentage of "Four short links" entry titles.

In [35]:
(len([title for title in titles if "Four short links" in title])/len(feedburner.entries))

0.43333333333333335

### 9. Create a Pandas data frame from the feed's entries.

In [25]:
import pandas as pd

In [27]:
df = pd.DataFrame(feedburner.entries)
df.head()

Unnamed: 0,author,author_detail,authors,content,feedburner_origlink,guidislink,id,link,links,summary,title,title_detail,updated,updated_parsed
0,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/four-short-links...,True,"tag:www.oreilly.com,2019-07-19:/ideas/four-sho...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,"<p><em>Journal Mining, API Use, Better Convers...",Four short links: 19 July 2019,"{'type': 'text/plain', 'language': None, 'base...",2019-07-19T17:05:00Z,"(2019, 7, 19, 17, 5, 0, 4, 200, 0)"
1,Adam Jacob,{'name': 'Adam Jacob'},[{'name': 'Adam Jacob'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/the-war-for-the-...,True,"tag:www.oreilly.com,2019-07-18:/ideas/the-war-...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,<p><img src='https://d3ucjech6zwjp8.cloudfront...,The war for the soul of open source,"{'type': 'text/plain', 'language': None, 'base...",2019-07-18T20:00:00Z,"(2019, 7, 18, 20, 0, 0, 3, 199, 0)"
2,VM Brasseur,{'name': 'VM Brasseur'},[{'name': 'VM Brasseur'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/ask-not-what-bra...,True,"tag:www.oreilly.com,2019-07-18:/ideas/ask-not-...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,<p><img src='https://d3ucjech6zwjp8.cloudfront...,Ask not what Brands™ can do for you,"{'type': 'text/plain', 'language': None, 'base...",2019-07-18T20:00:00Z,"(2019, 7, 18, 20, 0, 0, 3, 199, 0)"
3,Roger Magoulas,{'name': 'Roger Magoulas'},[{'name': 'Roger Magoulas'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/oreilly-radar-op...,True,"tag:www.oreilly.com,2019-07-18:/ideas/oreilly-...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,<p><img src='https://d3ucjech6zwjp8.cloudfront...,O’Reilly Radar: Open source technology trends—...,"{'type': 'text/plain', 'language': None, 'base...",2019-07-18T20:00:00Z,"(2019, 7, 18, 20, 0, 0, 3, 199, 0)"
4,,,,"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/oreilly-open-sou...,True,"tag:www.oreilly.com,2019-07-18:/ideas/oreilly-...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,<p><img src='https://d3ucjech6zwjp8.cloudfront...,O'Reilly Open Source and Frank Willison Awards,"{'type': 'text/plain', 'language': None, 'base...",2019-07-18T20:00:00Z,"(2019, 7, 18, 20, 0, 0, 3, 199, 0)"


### 10. Count the number of entries per author and sort them in descending order.

In [30]:
authors = df.groupby('author', as_index=False).agg({'title':'count'})
authors.columns = ['author', 'entries']
authors.sort_values('entries', ascending=False)

Unnamed: 0,author,entries
18,Nat Torkington,26
5,Ben Lorica,5
6,"Ben Lorica, Harish Doddi, David Talby",2
10,Jenn Webb,2
0,Abigail Hing Wen,1
15,Michael James,1
25,Tim Kraska,1
24,Tiffani Bell,1
23,Roger Magoulas,1
22,"Rebecca Parsons, Neal Ford",1


### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [31]:
df['title_length'] = df['title'].apply(len)
df[['title', 'author', 'title_length']].sort_values('title_length', ascending=False)

Unnamed: 0,title,author,title_length
40,RISELab’s AutoPandas hints at automation tech ...,Ben Lorica,97
16,Managing machine learning in the enterprise: L...,"Ben Lorica, Harish Doddi, David Talby",81
23,Highlights from the O'Reilly Artificial Intell...,Jenn Webb,79
11,Highlights from the O'Reilly Open Source Softw...,Mac Slocum,77
50,Enabling end-to-end machine learning pipelines...,Ben Lorica,73
45,AI and machine learning will require retrainin...,Ben Lorica,72
3,O’Reilly Radar: Open source technology trends—...,Roger Magoulas,68
9,Built to last: Building and growing open sourc...,Kay Williams,59
56,Prioritizing technical debt as if time and mon...,Adam Tornhill,57
24,"Toward learned algorithms, data structures, an...",Tim Kraska,55


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [40]:
list_titles_ml = df[df['summary'].str.contains('machine learning')]['title']

In [41]:
print(list_titles_ml)

6               Acquiring and sharing high-quality data
11    Highlights from the O'Reilly Open Source Softw...
16    Managing machine learning in the enterprise: L...
23    Highlights from the O'Reilly Artificial Intell...
31               The future of machine learning is tiny
35               Tools for machine learning development
36                     New live online training courses
40    RISELab’s AutoPandas hints at automation tech ...
45    AI and machine learning will require retrainin...
50    Enabling end-to-end machine learning pipelines...
52      What are model governance and model operations?
54                      The quest for high-quality data
Name: title, dtype: object
