# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [3]:
import feedparser

### 1. Use feedparser to parse the following RSS feed URL.

In [4]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [5]:
oreilly = feedparser.parse(url)

### 2. Obtain a list of components (keys) that are available for this feed.

In [6]:
oreilly.keys()

dict_keys(['feed', 'entries', 'bozo', 'headers', 'etag', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [7]:
oreilly.feed.keys()

dict_keys(['title', 'title_detail', 'id', 'guidislink', 'link', 'updated', 'updated_parsed', 'subtitle', 'subtitle_detail', 'links', 'authors', 'author_detail', 'author', 'feedburner_info', 'geo_lat', 'geo_long', 'feedburner_emailserviceid', 'feedburner_feedburnerhostname'])

### 4. Extract and print the feed title, subtitle, author, and link.

In [20]:
oreilly.feed.title,oreilly.feed.subtitle,oreilly.feed.author,oreilly.feed.link

("All - O'Reilly Media",
 'All of our Ideas and Learning material from all of our topics.',
 "O'Reilly Media",
 'https://www.oreilly.com')

### 5. Count the number of entries that are contained in this RSS feed.

In [27]:
print(len(oreilly.entries))

60


### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [40]:
oreilly.entries[0].keys()

dict_keys(['title', 'title_detail', 'updated', 'updated_parsed', 'id', 'guidislink', 'link', 'content', 'summary', 'links', 'authors', 'author_detail', 'author', 'feedburner_origlink'])

### 7. Extract a list of entry titles.

In [41]:
titles = [oreilly.entries[i].title for i in range(len(oreilly.entries))]
titles

['Four short links: 22 July 2019',
 'Four short links: 19 July 2019',
 'The war for the soul of open source',
 "O'Reilly Open Source and Frank Willison Awards",
 'O’Reilly Radar: Open source technology trends—What our users tell us',
 'Ask not what Brands™ can do for you',
 'Managing machines',
 'Acquiring and sharing high-quality data',
 'Four short links: 18 July 2019',
 'The role of open source in mitigating natural disasters',
 "Highlights from the O'Reilly Open Source Software Conference in Portland 2019",
 'Better living through software',
 'Why Amazon cares about open source',
 'Built to last: Building and growing open source communities',
 'The next age of open innovation',
 'Four short links: 17 July 2019',
 'Four short links: 16 July 2019',
 'Managing machine learning in the enterprise: Lessons from banking and health care',
 'Four short links: 15 July 2019',
 'Four short links: 12 July 2019',
 'Four short links: 11 July 2019',
 'Four short links: 10 July 2019',
 'Four short 

### 8. Calculate the percentage of "Four short links" entry titles.

In [64]:
calc=0
for t in titles:
    if "Four short links" in t:
        calc = calc+1
calc
pourcentage = (calc /len(oreilly.entries))*100
pourcentage

43.333333333333336

### 9. Create a Pandas data frame from the feed's entries.

In [43]:
import pandas as pd

In [46]:
df = pd.DataFrame(oreilly.entries)
df

Unnamed: 0,author,author_detail,authors,content,feedburner_origlink,guidislink,id,link,links,summary,title,title_detail,updated,updated_parsed
0,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/four-short-links...,True,"tag:www.oreilly.com,2019-07-22:/ideas/four-sho...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,"<p><em>Game Source, Procurement Graph, Data Mo...",Four short links: 22 July 2019,"{'type': 'text/plain', 'language': None, 'base...",2019-07-22T14:30:00Z,"(2019, 7, 22, 14, 30, 0, 0, 203, 0)"
1,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/four-short-links...,True,"tag:www.oreilly.com,2019-07-19:/ideas/four-sho...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,"<p><em>Journal Mining, API Use, Better Convers...",Four short links: 19 July 2019,"{'type': 'text/plain', 'language': None, 'base...",2019-07-19T17:05:00Z,"(2019, 7, 19, 17, 5, 0, 4, 200, 0)"
2,Adam Jacob,{'name': 'Adam Jacob'},[{'name': 'Adam Jacob'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/the-war-for-the-...,True,"tag:www.oreilly.com,2019-07-18:/ideas/the-war-...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,<p><img src='https://d3ucjech6zwjp8.cloudfront...,The war for the soul of open source,"{'type': 'text/plain', 'language': None, 'base...",2019-07-18T20:00:00Z,"(2019, 7, 18, 20, 0, 0, 3, 199, 0)"
3,,,,"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/oreilly-open-sou...,True,"tag:www.oreilly.com,2019-07-18:/ideas/oreilly-...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,<p><img src='https://d3ucjech6zwjp8.cloudfront...,O'Reilly Open Source and Frank Willison Awards,"{'type': 'text/plain', 'language': None, 'base...",2019-07-18T20:00:00Z,"(2019, 7, 18, 20, 0, 0, 3, 199, 0)"
4,Roger Magoulas,{'name': 'Roger Magoulas'},[{'name': 'Roger Magoulas'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/oreilly-radar-op...,True,"tag:www.oreilly.com,2019-07-18:/ideas/oreilly-...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,<p><img src='https://d3ucjech6zwjp8.cloudfront...,O’Reilly Radar: Open source technology trends—...,"{'type': 'text/plain', 'language': None, 'base...",2019-07-18T20:00:00Z,"(2019, 7, 18, 20, 0, 0, 3, 199, 0)"
5,VM Brasseur,{'name': 'VM Brasseur'},[{'name': 'VM Brasseur'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/ask-not-what-bra...,True,"tag:www.oreilly.com,2019-07-18:/ideas/ask-not-...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,<p><img src='https://d3ucjech6zwjp8.cloudfront...,Ask not what Brands™ can do for you,"{'type': 'text/plain', 'language': None, 'base...",2019-07-18T20:00:00Z,"(2019, 7, 18, 20, 0, 0, 3, 199, 0)"
6,Pete Skomoroch,{'name': 'Pete Skomoroch'},[{'name': 'Pete Skomoroch'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/managing-machines,True,"tag:www.oreilly.com,2019-07-18:/ideas/managing...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,<p><img src='https://d3ucjech6zwjp8.cloudfront...,Managing machines,"{'type': 'text/plain', 'language': None, 'base...",2019-07-18T20:00:00Z,"(2019, 7, 18, 20, 0, 0, 3, 199, 0)"
7,Ben Lorica,{'name': 'Ben Lorica'},[{'name': 'Ben Lorica'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/acquiring-and-sh...,True,"tag:www.oreilly.com,2019-07-18:/ideas/acquirin...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,<p><img src='https://d3ucjech6zwjp8.cloudfront...,Acquiring and sharing high-quality data,"{'type': 'text/plain', 'language': None, 'base...",2019-07-18T13:30:00Z,"(2019, 7, 18, 13, 30, 0, 3, 199, 0)"
8,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/four-short-links...,True,"tag:www.oreilly.com,2019-07-18:/ideas/four-sho...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,"<p><em>Weird Algorithms, Open Syllabi, Convers...",Four short links: 18 July 2019,"{'type': 'text/plain', 'language': None, 'base...",2019-07-18T08:00:00Z,"(2019, 7, 18, 8, 0, 0, 3, 199, 0)"
9,"Pedro Cruz, Brad Topol","{'name': 'Pedro Cruz, Brad Topol'}","[{'name': 'Pedro Cruz, Brad Topol'}]","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/the-role-of-open...,True,"tag:www.oreilly.com,2019-07-17:/ideas/the-role...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,<p><img src='https://d3ucjech6zwjp8.cloudfront...,The role of open source in mitigating natural ...,"{'type': 'text/plain', 'language': None, 'base...",2019-07-17T20:00:00Z,"(2019, 7, 17, 20, 0, 0, 2, 198, 0)"


### 10. Count the number of entries per author and sort them in descending order.

In [58]:
series = df.groupby('author').count().sort_values(by='authors',ascending=False)
series

Unnamed: 0_level_0,author_detail,authors,content,feedburner_origlink,guidislink,id,link,links,summary,title,title_detail,updated,updated_parsed
author,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Nat Torkington,26,26,26,26,26,26,26,26,26,26,26,26,26
Ben Lorica,5,5,5,5,5,5,5,5,5,5,5,5,5
"Ben Lorica, Harish Doddi, David Talby",2,2,2,2,2,2,2,2,2,2,2,2,2
Jenn Webb,2,2,2,2,2,2,2,2,2,2,2,2,2
Abigail Hing Wen,1,1,1,1,1,1,1,1,1,1,1,1,1
Michael James,1,1,1,1,1,1,1,1,1,1,1,1,1
Tim Kraska,1,1,1,1,1,1,1,1,1,1,1,1,1
Tiffani Bell,1,1,1,1,1,1,1,1,1,1,1,1,1
Roger Magoulas,1,1,1,1,1,1,1,1,1,1,1,1,1
"Rebecca Parsons, Neal Ford",1,1,1,1,1,1,1,1,1,1,1,1,1


In [60]:
#Methode du cours
authors = df.groupby('author', as_index=False).agg({'title':'count'})
authors.columns = ['author', 'entries']
authors.sort_values('entries', ascending=False)

Unnamed: 0,author,entries
18,Nat Torkington,26
5,Ben Lorica,5
6,"Ben Lorica, Harish Doddi, David Talby",2
10,Jenn Webb,2
0,Abigail Hing Wen,1
15,Michael James,1
25,Tim Kraska,1
24,Tiffani Bell,1
23,Roger Magoulas,1
22,"Rebecca Parsons, Neal Ford",1


### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [67]:
df['title_length'] = df['title'].apply(len)
df[['title', 'author', 'title_length']].sort_values('title_length', ascending=False).head()


Unnamed: 0,title,author,title_length
41,RISELab’s AutoPandas hints at automation tech ...,Ben Lorica,97
17,Managing machine learning in the enterprise: L...,"Ben Lorica, Harish Doddi, David Talby",81
24,Highlights from the O'Reilly Artificial Intell...,Jenn Webb,79
10,Highlights from the O'Reilly Open Source Softw...,Mac Slocum,77
51,Enabling end-to-end machine learning pipelines...,Ben Lorica,73


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [74]:
machine_learning = df.loc[(df['summary'].str.contains('machine learning.')==True)]
machine_learning

ml=[]
for summa in df['summary']:
    if 'machine learning.' in summa:
        ml.append(summa)
ml

['<p><img src=\'https://d3ucjech6zwjp8.cloudfront.net/600x450/bydgoszcz_przechodzacy_crop-49dd4046437faa3de84b1dd393c128bc.jpg\'/></p><p><em>A look at how guidelines from regulated industries can help shape your ML strategy.</em></p><p>As companies use machine learning (ML) and AI technologies across a broader suite of products and services, it’s clear that new tools, best practices, and new organizational structures will be needed. In recent posts, we described requisite <a href="https://www.oreilly.com/ideas/becoming-a-machine-learning-company-means-investing-in-foundational-technologies">foundational technologies</a> needed to sustain machine learning practices within organizations, and <a href="https://www.oreilly.com/ideas/what-are-model-governance-and-model-operations">specialized tools</a> for model development, model governance, and model operations/testing/monitoring.</p>\r\n\r\n\r\n\r\n<p>What cultural and organizational changes will be needed to accommodate the rise of machi