# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [3]:
!pip install feedparser
import feedparser

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting feedparser
  Downloading feedparser-6.0.10-py3-none-any.whl (81 kB)
[K     |████████████████████████████████| 81 kB 7.7 MB/s 
[?25hCollecting sgmllib3k
  Downloading sgmllib3k-1.0.0.tar.gz (5.8 kB)
Building wheels for collected packages: sgmllib3k
  Building wheel for sgmllib3k (setup.py) ... [?25l[?25hdone
  Created wheel for sgmllib3k: filename=sgmllib3k-1.0.0-py3-none-any.whl size=6066 sha256=a6f9e5cac0ec9ae5407d4e3a1ba162d53e82ec62c4688318e1d9bbb3d9df7000
  Stored in directory: /root/.cache/pip/wheels/73/ad/a4/0dff4a6ef231fc0dfa12ffbac2a36cebfdddfe059f50e019aa
Successfully built sgmllib3k
Installing collected packages: sgmllib3k, feedparser
Successfully installed feedparser-6.0.10 sgmllib3k-1.0.0


### 1. Use feedparser to parse the following RSS feed URL.

In [4]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [5]:
url_parse = feedparser.parse('http://feeds.feedburner.com/oreilly/radar/atom')

### 2. Obtain a list of components (keys) that are available for this feed.

In [6]:
url_parse.keys()

dict_keys(['bozo', 'entries', 'feed', 'headers', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [7]:
url_parse.feed.keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'subtitle', 'subtitle_detail', 'updated', 'updated_parsed', 'language', 'sy_updateperiod', 'sy_updatefrequency', 'generator_detail', 'generator'])

### 4. Extract and print the feed title, subtitle, author, and link.

In [27]:
print(url_parse.feed.title) 
print()
print(url_parse.feed.subtitle)  
print()
print(url_parse.entries[0]['author'])
print()

print(url_parse.entries[0]['links'][0]['href']) 


Radar

Now, next, and beyond: Tracking need-to-know trends at the intersection of business and technology

Mike Loukides

https://www.oreilly.com/radar/radar-trends-to-watch-november-2022/


### 5. Count the number of entries that are contained in this RSS feed.

In [28]:
url_parse.entries[0]

{'title': 'Radar Trends to Watch: November 2022',
 'title_detail': {'type': 'text/plain',
  'language': None,
  'base': 'http://feeds.feedburner.com/oreilly/radar/atom',
  'value': 'Radar Trends to Watch: November 2022'},
 'links': [{'rel': 'alternate',
   'type': 'text/html',
   'href': 'https://www.oreilly.com/radar/radar-trends-to-watch-november-2022/'}],
 'link': 'https://www.oreilly.com/radar/radar-trends-to-watch-november-2022/',
 'comments': 'https://www.oreilly.com/radar/radar-trends-to-watch-november-2022/#respond',
 'published': 'Tue, 01 Nov 2022 11:15:57 +0000',
 'published_parsed': time.struct_time(tm_year=2022, tm_mon=11, tm_mday=1, tm_hour=11, tm_min=15, tm_sec=57, tm_wday=1, tm_yday=305, tm_isdst=0),
 'authors': [{'name': 'Mike Loukides'}],
 'author': 'Mike Loukides',
 'author_detail': {'name': 'Mike Loukides'},
 'tags': [{'term': 'Radar Trends', 'scheme': None, 'label': None},
  {'term': 'Signals', 'scheme': None, 'label': None}],
 'id': 'https://www.oreilly.com/radar/?

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [29]:
url_keys= url_parse.entries[0].keys
print(url_keys)

<built-in method keys of FeedParserDict object at 0x7efc30d9b3b0>


### 7. Extract a list of entry titles.

In [30]:
titles = [url_parse.entries[i].title for i in range (len(url_parse.entries))]
print(titles)

['Radar Trends to Watch: November 2022', 'What We Learned Auditing Sophisticated AI for Bias', 'The Collaborative Metaverse', 'What Is Hyperautomation?', 'Radar Trends to Watch: October 2022', 'The Problem with Intelligence', 'Radar Trends to Watch: September 2022', 'Ad Networks and Content Marketing', 'On Technique', 'Scaling False Peaks', 'The Metaverse Is Not a Place', 'Radar Trends to Watch: August 2022', 'SQL: The Universal Solvent for REST APIs', 'Artificial Creativity?', 'Radar Trends to Watch: July 2022']


### 8. Calculate the percentage of "Four short links" entry titles.

In [31]:
total_title = len(titles)

count = 0
for i in titles:
    if 'four short links' in i.lower(): 
        count += 1  
        
percent = count / total_title
print(f'percent of four short links entries  {percent*100}%')

percent of four short links entries  0.0%


### 9. Create a Pandas data frame from the feed's entries.

In [19]:
import pandas as pd

In [32]:
df = pd.DataFrame(url_parse.entries) 
df.head()

Unnamed: 0,title,title_detail,links,link,comments,published,published_parsed,authors,author,author_detail,tags,id,guidislink,summary,summary_detail,content,wfw_commentrss,slash_comments
0,Radar Trends to Watch: November 2022,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.oreilly.com/radar/radar-trends-to-...,https://www.oreilly.com/radar/radar-trends-to-...,"Tue, 01 Nov 2022 11:15:57 +0000","(2022, 11, 1, 11, 15, 57, 1, 305, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,{'name': 'Mike Loukides'},"[{'term': 'Radar Trends', 'scheme': None, 'lab...",https://www.oreilly.com/radar/?p=14760,False,Maintaining a separate category for AI is gett...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/radar-trends-to-...,0
1,What We Learned Auditing Sophisticated AI for ...,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.oreilly.com/radar/what-we-learned-...,https://www.oreilly.com/radar/what-we-learned-...,"Tue, 18 Oct 2022 11:14:23 +0000","(2022, 10, 18, 11, 14, 23, 1, 291, 0)",[{'name': 'Patrick Hall'}],Patrick Hall,{'name': 'Patrick Hall'},"[{'term': 'AI & ML', 'scheme': None, 'label': ...",https://www.oreilly.com/radar/?p=14754,False,A recently passed law in New York City require...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/what-we-learned-...,0
2,The Collaborative Metaverse,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.oreilly.com/radar/the-collaborativ...,https://www.oreilly.com/radar/the-collaborativ...,"Wed, 12 Oct 2022 20:01:45 +0000","(2022, 10, 12, 20, 1, 45, 2, 285, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,{'name': 'Mike Loukides'},"[{'term': 'Metaverse', 'scheme': None, 'label'...",https://www.oreilly.com/radar/?p=14750,False,We want to congratulate Dylan Field on his sta...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/the-collaborativ...,0
3,What Is Hyperautomation?,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.oreilly.com/radar/what-is-hyperaut...,https://www.oreilly.com/radar/what-is-hyperaut...,"Tue, 11 Oct 2022 10:59:21 +0000","(2022, 10, 11, 10, 59, 21, 1, 284, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,{'name': 'Mike Loukides'},"[{'term': 'AI & ML', 'scheme': None, 'label': ...",https://www.oreilly.com/radar/?p=14733,False,Gartner has anointed “Hyperautomation” one of ...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/what-is-hyperaut...,0
4,Radar Trends to Watch: October 2022,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.oreilly.com/radar/radar-trends-to-...,https://www.oreilly.com/radar/radar-trends-to-...,"Tue, 04 Oct 2022 11:15:42 +0000","(2022, 10, 4, 11, 15, 42, 1, 277, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,{'name': 'Mike Loukides'},"[{'term': 'Radar Trends', 'scheme': None, 'lab...",https://www.oreilly.com/radar/?p=14726,False,September was a busy month. In addition to con...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/radar-trends-to-...,0


### 10. Count the number of entries per author and sort them in descending order.

In [33]:
authors = df.groupby('author', as_index=False).agg({'title':'count'}) 
authors.columns = ['author', 'entries'] 
authors.sort_values('entries', ascending=False) 

Unnamed: 0,author,entries
2,Mike Loukides,10
0,Jon Udell,1
1,Kevlin Henney,1
3,Patrick Hall,1
4,Q McCallum,1
5,Tim O’Reilly,1


### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [34]:
df['title_len'] = df['title'].apply(len)
df[['title', 'author', 'title_len']].sort_values('title_len', ascending=False)

Unnamed: 0,title,author,title_len
1,What We Learned Auditing Sophisticated AI for ...,Patrick Hall,50
12,SQL: The Universal Solvent for REST APIs,Jon Udell,40
6,Radar Trends to Watch: September 2022,Mike Loukides,37
0,Radar Trends to Watch: November 2022,Mike Loukides,36
4,Radar Trends to Watch: October 2022,Mike Loukides,35
11,Radar Trends to Watch: August 2022,Mike Loukides,34
7,Ad Networks and Content Marketing,Q McCallum,33
14,Radar Trends to Watch: July 2022,Mike Loukides,32
5,The Problem with Intelligence,Mike Loukides,29
10,The Metaverse Is Not a Place,Tim O’Reilly,28


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [35]:
list_ML = []

for i in range(len(info['entries'])):
  if 'machine learning' in info['entries'][i]['summary'].lower():
    list_ML.append(info['entries'][i]['title'])

print(f'Books related to machine learning ')
list_ML

NameError: ignored