# How to import Data from CNN's RSS Feed?
It's very easy to import data from any RSS Feed but for this implementation we will be focusing on CNN's RSS Feed Data. In this sample we are using the RSS feed for the political data. So lets get started.

We first need to import few basic libraries like **request**, **Beautiful Soup** and **pandas**. We will be using pandas for persisting data into dataframe or csv

In [36]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

We will be defining two different functions out here. **cnn_news_scrapper** function is used in order to scrape data using beautiful soup. We use `features='xml'` to parse xml that is coming through RSS feed

In [37]:
# Function to get the various attributes of the article
def getArticles(articles):
    all_articles = []
    for article in articles:
        article_title = article.find('title').text
        article_link = article.find('link').text
        article_published = article.find('pubDate').text
        all_articles.append({
            'title':article_title,
            'link':article_link,
            'published':article_published
        })
    return all_articles
    
# Function to invoke CNN Scrapper
def cnn_news_scrapper(URL):
    try:
        r = requests.get(URL)
        soupContent = BeautifulSoup(r.content,features='xml')
        print('Job Succeeded returning Status Code: ', r.status_code)
        items = soupContent.findAll('item')
        print('Total News Content')
        print(len(items))
        return getArticles(soupContent.findAll('item'))
    except Exception as e:
        print('Scraping failed due to the below exception')
        print(e)

## Scrapping Process
We will start the scraping process using the above mentioned functions. We will call it using the below mentioned URL `http://rss.cnn.com/rss/cnn_allpolitics.rss`

In [38]:
print('Starting scraping')
data = cnn_news_scrapper('http://rss.cnn.com/rss/cnn_allpolitics.rss')
print('Finished scraping')

Starting scraping
Job Succeeded returning Status Code:  200
Total News Content
30
Finished scraping


Loading data into the DataFrame is easy. We just need to get the variable `data` and feed it into the function for creating  a new data frame eg. `pd.DataFrame(data)`

In [39]:
df = pd.DataFrame(data)

In [40]:
df.head()

Unnamed: 0,title,link,published
0,State Department to list 80% of countries as '...,http://rss.cnn.com/~r/rss/cnn_allpolitics/~3/y...,"Tue, 20 Apr 2021 02:48:25 GMT"
1,Florida governor signs controversial 'pro-law ...,http://rss.cnn.com/~r/rss/cnn_allpolitics/~3/p...,"Mon, 19 Apr 2021 23:32:56 GMT"
2,Anti-riot laws vs. police reform as the US wai...,http://rss.cnn.com/~r/rss/cnn_allpolitics/~3/O...,"Tue, 20 Apr 2021 00:01:34 GMT"
3,New York attorney general asked to investigate...,http://rss.cnn.com/~r/rss/cnn_allpolitics/~3/Z...,"Mon, 19 Apr 2021 20:42:02 GMT"
4,Former Vice President Walter 'Fritz' Mondale d...,http://rss.cnn.com/~r/rss/cnn_allpolitics/~3/o...,"Tue, 20 Apr 2021 03:50:27 GMT"


In [41]:
# Persist data as CSV

# df.to_csv('cnn_political_news.csv', index=False)

# Conclusion
As you can see the way we extract data from RSS feed using BeautifulSoup is pretty similar to any other scraping implementation for BeautifulSoup. It's easy to implement and use as well.