# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [1]:
import requests
import xmltodict
import re
import pandas as pd

### 1. Use feedparser to parse the following RSS feed URL.

In [2]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [3]:
req = requests.get(url)
data = xmltodict.parse(req.content) 

### 2. Obtain a list of components (keys) that are available for this feed.

In [11]:
data['rss'].keys()

odict_keys(['@xmlns:content', '@xmlns:wfw', '@xmlns:dc', '@xmlns:atom', '@xmlns:sy', '@xmlns:slash', '@xmlns:geo', '@xmlns:feedburner', '@version', 'channel'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [45]:
data['rss']['channel']['item']

[OrderedDict([('title', 'Four short links: 28 Oct 2020'),
              ('link',
               'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/9SAYsodoeJo/'),
              ('comments',
               'https://www.oreilly.com/radar/four-short-links-28-oct-2020/#respond'),
              ('pubDate', 'Wed, 28 Oct 2020 11:39:13 +0000'),
              ('dc:creator', 'Nat Torkington'),
              ('category', ['Four Short Links', 'Signals']),
              ('guid',
               OrderedDict([('@isPermaLink', 'false'),
                            ('#text',
                             'https://www.oreilly.com/radar/?p=13382')])),
              ('description',
               'Phantom of the ADAS &#8212; In this paper, we investigate &#8220;split-second phantom attacks,&#8221; a scientific gap that causes two commercial advanced driver-assistance systems (ADASs), Telsa Model X (HW 2.5 and HW 3) and Mobileye 630, to treat a depthless object that appears for a few milliseconds as a rea

### 4. Extract and print the feed title, subtitle, author, and link.

In [60]:
titles = []
subtitles = []
links  = []
creators = []
for i in range(len(data['rss']['channel']['item']))  :
    titles.append(data['rss']['channel']['item'][i]['title'])
    subtitles.append(data['rss']['channel']['item'][i]['description'].split('&')[0])
    links.append(data['rss']['channel']['item'][i]['link'])
    creators.append(data['rss']['channel']['item'][i]['dc:creator'])

In [61]:
data['rss']['channel']['item'][0]['description'].split(';')[0]

'Phantom of the ADAS &#8212'

In [62]:
df_rss = pd.DataFrame({'Titles':titles,'Subtitles':subtitles,'Creators':creators, 'Links':links})
df_rss

Unnamed: 0,Titles,Subtitles,Creators,Links
0,Four short links: 28 Oct 2020,Phantom of the ADAS,Nat Torkington,http://feedproxy.google.com/~r/oreilly/radar/a...
1,Our Favorite Questions,,"Q Ethan McCallum, Chris Butler and Shane Glynn",http://feedproxy.google.com/~r/oreilly/radar/a...
2,Four short links: 21 Oct 2020,Justice Department Antitrust Filing Against Go...,Nat Torkington,http://feedproxy.google.com/~r/oreilly/radar/a...
3,Four Short Links: 16 October 2020,Automerge,,http://feedproxy.google.com/~r/oreilly/radar/a...
4,Four short links: 14 Oct 2020,Data Organization in Spreadsheets,Nat Torkington,http://feedproxy.google.com/~r/oreilly/radar/a...
5,AI Product Management After Deployment,The field of AI product management continues t...,Justin Norman and Mike Loukides,http://feedproxy.google.com/~r/oreilly/radar/a...
6,Four short links: 9 October 2020,T-SQL in SQLite,Nat Torkington,http://feedproxy.google.com/~r/oreilly/radar/a...
7,AI and Creativity,The release of GPT-3 has reinvigorated a discu...,Mike Loukides,http://feedproxy.google.com/~r/oreilly/radar/a...
8,Four short links: 6 October 2020,Algorithms Can Collude,Nat Torkington,http://feedproxy.google.com/~r/oreilly/radar/a...
9,Four short links: 2 October 2020,Single Device Behaves Like a Neuron,Nat Torkington,http://feedproxy.google.com/~r/oreilly/radar/a...


### 5. Count the number of entries that are contained in this RSS feed.

In [49]:
len(data['rss']['channel']['item'])

60

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [63]:
data['rss']['channel']['item'][0].keys()

odict_keys(['title', 'link', 'comments', 'pubDate', 'dc:creator', 'category', 'guid', 'description', 'content:encoded', 'wfw:commentRss', 'slash:comments', 'feedburner:origLink'])

### 7. Extract a list of entry titles.

In [64]:
titles

['Four short links: 28 Oct 2020',
 'Our Favorite Questions',
 'Four short links: 21 Oct 2020',
 'Four Short Links: 16 October 2020',
 'Four short links: 14 Oct 2020',
 'AI Product Management After Deployment',
 'Four short links: 9 October 2020',
 'AI and Creativity',
 'Four short links: 6 October 2020',
 'Four short links: 2 October 2020',
 'Radar trends to watch: October 2020',
 'Four short links: 29 Sep 2020',
 'Four short links: 25 September 2020',
 'Four short links: 18 Sep 2020',
 'Four short links: 16 Sep 2020',
 'How to Set AI Goals',
 'Four short links: 11 Sep 2020',
 'Four short links: 9 Sep 2020',
 'Pair Programming with AI',
 'Four short links: 4 September 2020',
 'Four short links: 2 September 2020',
 'Radar trends to watch: September 2020',
 'Four short links: 28 August 2020',
 'An Agent of Change',
 'Four short links: 25 August 2020',
 'Four short links: 21 August 2020',
 'Four Short Links: 19 August 2020',
 'Why Best-of-Breed is a Better Choice than All-in-One Platforms

### 8. Calculate the percentage of "Four short links" entry titles.

In [70]:
sum('Four' in title for title in titles)/60

0.7166666666666667

### 9. Create a Pandas data frame from the feed's entries.

In [None]:
import pandas as pd

In [65]:
df_rss

Unnamed: 0,Titles,Subtitles,Creators,Links
0,Four short links: 28 Oct 2020,Phantom of the ADAS,Nat Torkington,http://feedproxy.google.com/~r/oreilly/radar/a...
1,Our Favorite Questions,,"Q Ethan McCallum, Chris Butler and Shane Glynn",http://feedproxy.google.com/~r/oreilly/radar/a...
2,Four short links: 21 Oct 2020,Justice Department Antitrust Filing Against Go...,Nat Torkington,http://feedproxy.google.com/~r/oreilly/radar/a...
3,Four Short Links: 16 October 2020,Automerge,,http://feedproxy.google.com/~r/oreilly/radar/a...
4,Four short links: 14 Oct 2020,Data Organization in Spreadsheets,Nat Torkington,http://feedproxy.google.com/~r/oreilly/radar/a...
5,AI Product Management After Deployment,The field of AI product management continues t...,Justin Norman and Mike Loukides,http://feedproxy.google.com/~r/oreilly/radar/a...
6,Four short links: 9 October 2020,T-SQL in SQLite,Nat Torkington,http://feedproxy.google.com/~r/oreilly/radar/a...
7,AI and Creativity,The release of GPT-3 has reinvigorated a discu...,Mike Loukides,http://feedproxy.google.com/~r/oreilly/radar/a...
8,Four short links: 6 October 2020,Algorithms Can Collude,Nat Torkington,http://feedproxy.google.com/~r/oreilly/radar/a...
9,Four short links: 2 October 2020,Single Device Behaves Like a Neuron,Nat Torkington,http://feedproxy.google.com/~r/oreilly/radar/a...


### 10. Count the number of entries per author and sort them in descending order.

In [75]:
df_rss.sort_values(by = 'Creators').reset_index(drop=True)

Unnamed: 0,Titles,Subtitles,Creators,Links
0,How to Set AI Goals,AI Benefits and Stakeholders AI is a field whe...,Alex Castrounis,http://feedproxy.google.com/~r/oreilly/radar/a...
1,AI Product Management After Deployment,The field of AI product management continues t...,Justin Norman and Mike Loukides,http://feedproxy.google.com/~r/oreilly/radar/a...
2,Bringing an AI Product to Market,The Core Responsibilities of the AI Product Ma...,"Justin Norman, Peter Skomoroch and Mike Loukides",http://feedproxy.google.com/~r/oreilly/radar/a...
3,Why Best-of-Breed is a Better Choice than All-...,So you need to redesign your company’s data in...,Matthew Rocklin and Hugo Bowne-Anderson,http://feedproxy.google.com/~r/oreilly/radar/a...
4,Radar trends to watch: August 2020,"I thought July was going to be a dull month, b...",Mike Loukides,http://feedproxy.google.com/~r/oreilly/radar/a...
5,Pair Programming with AI,"In a conversation with Kevlin Henney, we start...",Mike Loukides,http://feedproxy.google.com/~r/oreilly/radar/a...
6,"Power, Harms, and Data","A recent article in The Verge discussed PULSE,...",Mike Loukides,http://feedproxy.google.com/~r/oreilly/radar/a...
7,Radar trends to watch: September 2020,"Compared to the last few months, there are rel...",Mike Loukides,http://feedproxy.google.com/~r/oreilly/radar/a...
8,Radar trends to watch: October 2020,"This month, the big surprise is that there’s n...",Mike Loukides,http://feedproxy.google.com/~r/oreilly/radar/a...
9,Automated Coding and the Future of Programming,At Microsoft,Mike Loukides,http://feedproxy.google.com/~r/oreilly/radar/a...


### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [84]:
string_count= [len(string) for string in titles]
new_df = pd.DataFrame({'Titles':titles,'Authors':creators,'Titles Length':string_count})
new_df.sort_values(by = 'Titles Length', ascending= False).reset_index(drop=True)

Unnamed: 0,Titles,Authors,Titles Length
0,Why Best-of-Breed is a Better Choice than All-...,Matthew Rocklin and Hugo Bowne-Anderson,79
1,Automated Coding and the Future of Programming,Mike Loukides,46
2,AI Product Management After Deployment,Justin Norman and Mike Loukides,38
3,Radar trends to watch: September 2020,Mike Loukides,37
4,The Least Liked Programming Languages,Mike Loukides,37
5,Radar trends to watch: October 2020,Mike Loukides,35
6,Four short links: 25 September 2020,Nat Torkington,35
7,Radar trends to watch: August 2020,Mike Loukides,34
8,Four short links: 2 September 2020,Nat Torkington,34
9,Four short links: 4 September 2020,Nat Torkington,34


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [92]:
new_titles = []
for i in range(60):
    if 'machine learning' in data['rss']['channel']['item'][i]['content:encoded']:
        new_titles.append(data['rss']['channel']['item'][i]['title'])
new_titles

['Our Favorite Questions',
 'AI Product Management After Deployment',
 'AI and Creativity',
 'Radar trends to watch: October 2020',
 'Four short links: 16 Sep 2020',
 'How to Set AI Goals',
 'Radar trends to watch: September 2020',
 'Why Best-of-Breed is a Better Choice than All-in-One Platforms for Data Science',
 'Radar trends to watch: August 2020',
 'Bringing an AI Product to Market',
 'Power, Harms, and Data',
 'Four short links: 8 July 2020']