# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [1]:
import feedparser

### 1. Use feedparser to parse the following RSS feed URL.

In [2]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [5]:
reddit = feedparser.parse('http://feeds.feedburner.com/oreilly/radar/atom')

### 2. Obtain a list of components (keys) that are available for this feed.

In [17]:
reddit.keys()

dict_keys(['headers', 'encoding', 'etag', 'bozo', 'namespaces', 'version', 'entries', 'updated_parsed', 'updated', 'href', 'feed', 'status'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [20]:
reddit.feed.keys()

dict_keys(['title', 'geo_lat', 'author_detail', 'author', 'feedburner_feedburnerhostname', 'guidislink', 'subtitle', 'links', 'authors', 'subtitle_detail', 'feedburner_info', 'title_detail', 'id', 'updated_parsed', 'feedburner_emailserviceid', 'updated', 'link', 'geo_long'])

### 4. Extract and print the feed title, subtitle, author, and link.

In [27]:
print(reddit.feed.title)
print(reddit.feed.subtitle)
print(reddit.feed.author)
print(reddit.feed.link)

All - O'Reilly Media
All of our Ideas and Learning material from all of our topics.
O'Reilly Media
https://www.oreilly.com


### 5. Count the number of entries that are contained in this RSS feed.

In [28]:
len(reddit.entries)

60

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [36]:
reddit.entries[0].keys()

dict_keys(['title', 'author_detail', 'feedburner_origlink', 'author', 'guidislink', 'authors', 'links', 'content', 'title_detail', 'id', 'updated_parsed', 'updated', 'link', 'summary'])

### 7. Extract a list of entry titles.

In [40]:
titles = [reddit.entries[i].title for i in range(len(reddit.entries))]
print(titles)

['Why companies are in need of data lineage solutions', 'Stablecoins: Solving the cryptocurrency volatility crisis', 'Four short links: 25 April 2019', 'Four short links: 24 April 2019', 'Four short links: 23 April 2019', 'Four short links: 22 April 2019', 'Four short links: 19 April 2019', 'Computational propaganda', 'Decoding the human genome with deep learning', 'Automation of AI: Accelerating the AI revolution', 'Simple, scalable, and sustainable: A methodical approach to AI adoption', 'Software 2.0 and Snorkel', 'Applied machine learning at Facebook', 'Artificial intelligence: The “refinery” for data', 'Making real-world distributed deep learning easy with Nauta', 'Four short links: 18 April 2019', 'Toward ethical AI: Inclusivity as a messy, difficult, but promising answer', 'Fast, flexible, and functional: 4 real-world AI deployments at enterprise scale', 'Machine learning for personalization', 'Automated ML: A journey from CRISPR.ML to Azure ML', 'Checking in on AI tools', 'How 

### 8. Calculate the percentage of "Four short links" entry titles.

In [57]:
import re
total = len(titles)
titlesStr = ''.join(titles)
#titlesStr
pattern = 'Four\sshort\slinks'  
text = titlesStr
fsl = re.findall(pattern, text)
totalfour = len(fsl)
percent = (totalfour/total)*100
print(percent,"%")

35.0 %


### 9. Create a Pandas data frame from the feed's entries.

In [37]:
import pandas as pd

In [39]:
df = pd.DataFrame(reddit.entries)
df.head(3)

Unnamed: 0,author,author_detail,authors,content,feedburner_origlink,guidislink,id,link,links,summary,title,title_detail,updated,updated_parsed
0,Ben Lorica,{'name': 'Ben Lorica'},[{'name': 'Ben Lorica'}],"[{'language': None, 'value': '<p><img src=""htt...",https://www.oreilly.com/ideas/why-companies-ar...,True,"tag:www.oreilly.com,2019-04-25:/ideas/why-comp...",http://feedproxy.google.com/~r/oreilly/radar/a...,"[{'rel': 'alternate', 'href': 'http://feedprox...","<p><img src=""https://d3ucjech6zwjp8.cloudfront...",Why companies are in need of data lineage solu...,"{'language': None, 'value': 'Why companies are...",2019-04-25T11:15:00Z,"(2019, 4, 25, 11, 15, 0, 3, 115, 0)"
1,"Wayne Chang, Gregory Rocco, Jacob Blish","{'name': 'Wayne Chang, Gregory Rocco, Jacob Bl...","[{'name': 'Wayne Chang, Gregory Rocco, Jacob B...","[{'language': None, 'value': '<p><img src=""htt...",https://www.oreilly.com/ideas/stablecoins-solv...,True,"tag:www.oreilly.com,2019-04-25:/ideas/stableco...",http://feedproxy.google.com/~r/oreilly/radar/a...,"[{'rel': 'alternate', 'href': 'http://feedprox...","<p><img src=""https://d3ucjech6zwjp8.cloudfront...",Stablecoins: Solving the cryptocurrency volati...,"{'language': None, 'value': 'Stablecoins: Solv...",2019-04-25T11:00:00Z,"(2019, 4, 25, 11, 0, 0, 3, 115, 0)"
2,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],"[{'language': None, 'value': '<p><em>Values Ri...",https://www.oreilly.com/ideas/four-short-links...,True,"tag:www.oreilly.com,2019-04-25:/ideas/four-sho...",http://feedproxy.google.com/~r/oreilly/radar/a...,"[{'rel': 'alternate', 'href': 'http://feedprox...","<p><em>Values Risk, Brain Interface, Hacking S...",Four short links: 25 April 2019,"{'language': None, 'value': 'Four short links:...",2019-04-25T10:50:00Z,"(2019, 4, 25, 10, 50, 0, 3, 115, 0)"


### 10. Count the number of entries per author and sort them in descending order.

In [70]:
authors = df.groupby('author', as_index=False).agg({'title':'count'})
authors= authors.sort_values('title', ascending=False)
authors.rename(columns={'title':'entries'})
authors.head()

Unnamed: 0,author,title
22,Nat Torkington,21
2,Ben Lorica,3
20,Mike Loukides,2
18,Mac Slocum,2
0,Aleksander Madry,1


### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [78]:
df['title_length'] = df['title'].apply(len)
df[['title', 'author', 'title_length']].sort_values('title_length', ascending=False).head()

Unnamed: 0,title,author,title_length
42,Specialized tools for machine learning develop...,"Ben Lorica, Mike Loukides",94
31,What data scientists and data engineers can do...,Ben Lorica,94
53,Likewar: How social media is changing the worl...,Peter Singer,91
54,It’s time for data scientists to collaborate w...,Ben Lorica,82
47,Chatting with machines: Strange things 60 bill...,Lauren Kunze,81


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."