# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [10]:
import feedparser
import re

### 1. Use feedparser to parse the following RSS feed URL.

In [11]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [12]:
fbn = feedparser.parse(url)

### 2. Obtain a list of components (keys) that are available for this feed.

In [13]:
fbn.keys()

dict_keys(['feed', 'entries', 'bozo', 'headers', 'etag', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [14]:
fbn.feed.keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'subtitle', 'subtitle_detail', 'updated', 'updated_parsed', 'language', 'sy_updateperiod', 'sy_updatefrequency', 'generator_detail', 'generator', 'feedburner_info', 'geo_lat', 'geo_long', 'feedburner_emailserviceid', 'feedburner_feedburnerhostname'])

### 4. Extract and print the feed title, subtitle, author, and link.

In [15]:
title = fbn.feed.title
print(title,'\n')
subtitle = fbn.feed.subtitle
print(subtitle,'\n')
link = fbn.feed.link
print(link,'\n')
author = fbn.entries[0].authors
author = author[0].get('name')
print(author)

Radar 

Now, next, and beyond: Tracking need-to-know trends at the intersection of business and technology 

https://www.oreilly.com/radar 

Mac Slocum


### 5. Count the number of entries that are contained in this RSS feed.

In [16]:
len(fbn.entries)

18

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [17]:
fbn.entries[0].keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'comments', 'published', 'published_parsed', 'authors', 'author', 'author_detail', 'tags', 'id', 'guidislink', 'summary', 'summary_detail', 'content', 'wfw_commentrss', 'slash_comments', 'feedburner_origlink'])

### 7. Extract a list of entry titles.

In [20]:
entries = fbn.entries
titles = [fbn.entries[i].title for i in range(len(entries))]
titles

['Four short links: 18 September 2019',
 'Four short links: 17 September 2019',
 'Four short links: 16 September 2019',
 'Radar trends to watch: September 2019',
 'Four short links: 13 September 2019',
 'Safe and smarter driving, powered by AI',
 'On gradient-based methods for finding game-theoretic equilibria',
 'Accelerate with purpose',
 'Practical insights into deep reinforcement learning',
 'Open-endedness: A new grand challenge for AI',
 'Four short links: 12 September 2019',
 'AI for ophthalmology: Doing what doctors can’t',
 'Enabling AI’s potential through wafer-scale integration',
 'Getting from A to AI',
 'Highlights from the O’Reilly Artificial Intelligence Conference in San Jose 2019',
 'Going beyond fully supervised learning',
 'Developing AI responsibly',
 'Unlocking the value of your data']

### 8. Calculate the percentage of "Four short links" entry titles.

In [21]:
x = [i for i in titles if re.findall('Four short links',i)]
x = len(x)
round(x/len(titles)*100,2)

27.78

### 9. Create a Pandas data frame from the feed's entries.

In [22]:
import pandas as pd

In [23]:
df = pd.DataFrame(entries)
df.head(2)

Unnamed: 0,title,title_detail,links,link,comments,published,published_parsed,authors,author,author_detail,tags,id,guidislink,summary,summary_detail,content,wfw_commentrss,slash_comments,feedburner_origlink
0,Four short links: 18 September 2019,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Wed, 18 Sep 2019 04:01:52 +0000","(2019, 9, 18, 4, 1, 52, 2, 261, 0)",[{'name': 'Mac Slocum'}],Mac Slocum,{'name': 'Mac Slocum'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=9372,False,Extracting Insights from the Shape of Complex ...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
1,Four short links: 17 September 2019,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Tue, 17 Sep 2019 04:01:36 +0000","(2019, 9, 17, 4, 1, 36, 1, 260, 0)",[{'name': 'Mac Slocum'}],Mac Slocum,{'name': 'Mac Slocum'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=9350,False,Mirroring to Build Trust in Digital Assistants...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...


### 10. Count the number of entries per author and sort them in descending order.

In [24]:
df.groupby('author')['title'].count().sort_values(ascending=False)

author
Mac Slocum    18
Name: title, dtype: int64

### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [41]:
d = [len(df['title'][i]) for i in range(len(df.title))]
df['title_length'] = d
df2 = df[['title','author','title_length']].sort_values(by = ['title_length'], ascending= False)
df2

Unnamed: 0,title,author,title_length
14,Highlights from the O’Reilly Artificial Intell...,Mac Slocum,80
6,On gradient-based methods for finding game-the...,Mac Slocum,63
12,Enabling AI’s potential through wafer-scale in...,Mac Slocum,55
8,Practical insights into deep reinforcement lea...,Mac Slocum,51
11,AI for ophthalmology: Doing what doctors can’t,Mac Slocum,46
9,Open-endedness: A new grand challenge for AI,Mac Slocum,44
5,"Safe and smarter driving, powered by AI",Mac Slocum,39
15,Going beyond fully supervised learning,Mac Slocum,38
3,Radar trends to watch: September 2019,Mac Slocum,37
10,Four short links: 12 September 2019,Mac Slocum,35


# 12. Create a list of entry titles whose summary includes the phrase "mchine learning."

In [52]:
ml_lst = list(df.title[df.summary.str.contains('machine learning')])
ml_lst

[]