In [2]:
import feedparser # -> pip install feedparser
import pandas as pd

In [3]:
url = "https://usa.newonnetflix.info/feed"

feed = feedparser.parse(url)

type(feed)

feedparser.util.FeedParserDict

"parse" is the primary function in FeedParser. The returned object is dictionary like and can be handled similarly to a dictionary. For example, we can look at the keys it contains and what type of items those keys are.

In [4]:
feed.keys()

dict_keys(['bozo', 'entries', 'feed', 'headers', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

In [5]:
type(feed.bozo)

bool

In [6]:
type(feed.feed)

feedparser.util.FeedParserDict

In [7]:
feed.version

'rss20'

Bozo is an interesting key to know about if you are going to parse a RSS feed in code. FeedParser sets the bozo bit when it detects a feed is not well-formed. (FeedParser will still parse the feed if it is not well-formed.) You can use the bozo bit to create error handling or just print a simple warning.

In [8]:
if feed.bozo == 0:
    print("Well done, you have a well-formed feed!")
else:
    print("Potential trouble ahead.")

Well done, you have a well-formed feed!


We can look at some of the feed elements through the feed attribute.

In [9]:
feed.feed.keys()

dict_keys(['webfeeds_analytics', 'title', 'title_detail', 'links', 'link', 'subtitle', 'subtitle_detail', 'language', 'published', 'published_parsed', 'updated', 'updated_parsed', 'authors', 'author', 'author_detail', 'publisher', 'publisher_detail'])

In [12]:
print(feed.feed.title)
print(feed.feed.link)
print(feed.feed.published)

New On Netflix USA
https://usa.newonnetflix.info
Sun, 16 Jan 2022 01:07:20 -0500


In [13]:
feed.feed.description

'RSS feed for new additions over the last 5 days to Netflix USA (100% unofficial!). A project by MaFt.co.uk'

As with standard python dictionaries, we can use the "get" method to see if a key exists. This is useful if we are writing code.

In [14]:
feed.feed.get('non_existent_key', 'N/A')

'N/A'

In [15]:
len(feed.entries)

17

The items in entries are stored as a list

In [16]:
type(feed.entries)

list

In [17]:
feed.entries[0].title

'16th Jan: Paddington (2014), 1hr 35m [PG] - Streaming Again (6.65/10)'

In [21]:
i = 1
for entry in feed.entries:
    print(f"{i} - {entry.title}")
    i += 1

1 - 16th Jan: Paddington (2014), 1hr 35m [PG] - Streaming Again (6.65/10)
2 - 16th Jan: Phantom Thread (2017), 2hr 10m [R] (6.75/10)
3 - 15th Jan: A・RIGATO ーJARUJARU TOWER 2020&#x30fc (2020), 1hr 40m [TV-G] (6/10)
4 - 14th Jan: Fatuma (2018), 1hr 18m [TV-14] (6/10)
5 - 14th Jan: The Ultimate Braai Master (2021), 1 Season [TV-G] (6.55/10)
6 - 14th Jan: Archive 81 (2022), 1 Season [TV-MA] (6/10)
7 - 14th Jan: The House (2022), 1hr 37m [TV-MA] (6/10)
8 - 14th Jan: Riverdance: The Animated Adventure (2022), 1hr 33m [TV-G] (5.65/10)
9 - 14th Jan: This Is Not a Comedy (2022), 1hr 45m [TV-MA] (6/10)
10 - 14th Jan: Yeh Kaali Kaali Ankhein (2022), 1 Season [TV-MA] (6/10)
11 - 14th Jan: After Life (2022), 3 Seasons [TV-MA] - New Episodes (7.2/10)
12 - 13th Jan: Brazen (2022), 1hr 36m [TV-14] (6/10)
13 - 13th Jan: The Journalist (2022), 1 Season [TV-14] (6/10)
14 - 13th Jan: Photocopier (2022), 2hr 10m [TV-MA] (6/10)
15 - 12th Jan: The God Committee (2021), 1hr 38m [TV-MA] (5.85/10)
16 - 12th Jan

We can create a dataframe with this data.

In [24]:
df = pd.DataFrame(feed.entries)

In [26]:
df.head(3)

Unnamed: 0,title,title_detail,links,link,summary,summary_detail,published,published_parsed,id,guidislink
0,"16th Jan: Paddington (2014), 1hr 35m [PG] - St...","{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://usa.newonnetflix.info/info/70305929,[Streaming Again] A lovable young bear from th...,"{'type': 'text/html', 'language': None, 'base'...","Sun, 16 Jan 2022 01:07:20 -0500","(2022, 1, 16, 6, 7, 20, 6, 16, 0)",https://usa.newonnetflix.info/info/70305929,False
1,"16th Jan: Phantom Thread (2017), 2hr 10m [R] (...","{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://usa.newonnetflix.info/info/80195447,"A fashion designer is drawn to a waitress, who...","{'type': 'text/html', 'language': None, 'base'...","Sun, 16 Jan 2022 01:07:07 -0500","(2022, 1, 16, 6, 7, 7, 6, 16, 0)",https://usa.newonnetflix.info/info/80195447,False
2,15th Jan: A・RIGATO ーJARUJARU TOWER 2020&#x30fc...,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://usa.newonnetflix.info/info/81505825,"Known for their quirky skits, comedy duo JaruJ...","{'type': 'text/html', 'language': None, 'base'...","Sat, 15 Jan 2022 02:18:09 -0500","(2022, 1, 15, 7, 18, 9, 5, 15, 0)",https://usa.newonnetflix.info/info/81505825,False
