## Introduction

Netflix is an American, subscription-based service offering online streaming from a library of films and television series, including those produced in-house. Similar to Netflix, Disney Plus service distributes films and television series produced by The Walt Disney Studios and Walt Disney Television and is one of the biggest competitor of Netflix. 

Due to Covid, there has been a tremendous increase in usage of OTT platform as cinemas were shut down and more people were staying home.

## Problem Statement

As more people are shifting towards OTT platform, we would like to get more insights in the similarities and differences between two of the most competitive online streaming platforms - Netflix and Disneyplus. This analysis would help understand users which platform is more suitable for them and hence which one they would like to choose.

**This notebook contains the scraping of netflix posts from [Reddit](https://www.reddit.com/r/netflix/).**


### Contents

- [Scrape Sample Posts](#Scrape-Sample-Posts)
- [Scrape Sufficient Data](#Scrape-Sufficient-Data)
- [Export To CSV](#Export-To-CSV)

In [2]:
import pandas as pd
import numpy as np
import requests 
import random
import time

### Scrape Sample Posts

First we will fetch sample posts from netflix subreddit to understand the json data retrieved.

In [41]:
url = 'https://www.reddit.com/r/netflix.json'

In [42]:
res = requests.get(url, headers={'User-agent': 'Arti 1.0 Inc'})

We will check the status of the request made to fetch data. 

In [43]:
res.status_code

200

In [44]:
reddit_dict = res.json()

In [45]:
print(reddit_dict)



From above we can see that the data is received in json format from the request. <br>
First we will list out the keys from this data and then understand the values linked with these keys so that we can get the exact netflix posts data we need.

In [8]:
reddit_dict.keys()

dict_keys(['kind', 'data'])

In [9]:
reddit_dict['kind']

'Listing'

In [10]:
reddit_dict['data']

{'modhash': '',
 'dist': 27,
 'children': [{'kind': 't3',
   'data': {'approved_at_utc': None,
    'subreddit': 'netflix',
    'selftext': '',
    'author_fullname': 't2_6yobi',
    'saved': False,
    'mod_reason_title': None,
    'gilded': 0,
    'clicked': False,
    'title': 'Netflix now allows you to remove a movie/series from the "continue watching" row! [All]',
    'link_flair_richtext': [],
    'subreddit_name_prefixed': 'r/netflix',
    'hidden': False,
    'pwls': 6,
    'link_flair_css_class': 'one',
    'downs': 0,
    'top_awarded_type': None,
    'hide_score': False,
    'name': 't3_hrlw2a',
    'quarantine': False,
    'link_flair_text_color': 'dark',
    'upvote_ratio': 0.99,
    'author_flair_background_color': None,
    'subreddit_type': 'public',
    'ups': 5380,
    'total_awards_received': 8,
    'media_embed': {'content': '&lt;iframe class="embedly-embed" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fimgur.com%2FXQx3ioW%2Fembed%3Fpub%3Dtrue%26r

data in response again contains a dictionary. We will now do the same as before, list the keys and understand the values associated with these keys

In [11]:
reddit_dict['data'].keys()

dict_keys(['modhash', 'dist', 'children', 'after', 'before'])

In [12]:
reddit_dict['data']['children']

[{'kind': 't3',
  'data': {'approved_at_utc': None,
   'subreddit': 'netflix',
   'selftext': '',
   'author_fullname': 't2_6yobi',
   'saved': False,
   'mod_reason_title': None,
   'gilded': 0,
   'clicked': False,
   'title': 'Netflix now allows you to remove a movie/series from the "continue watching" row! [All]',
   'link_flair_richtext': [],
   'subreddit_name_prefixed': 'r/netflix',
   'hidden': False,
   'pwls': 6,
   'link_flair_css_class': 'one',
   'downs': 0,
   'top_awarded_type': None,
   'hide_score': False,
   'name': 't3_hrlw2a',
   'quarantine': False,
   'link_flair_text_color': 'dark',
   'upvote_ratio': 0.99,
   'author_flair_background_color': None,
   'subreddit_type': 'public',
   'ups': 5380,
   'total_awards_received': 8,
   'media_embed': {'content': '&lt;iframe class="embedly-embed" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fimgur.com%2FXQx3ioW%2Fembed%3Fpub%3Dtrue%26ref%3Dhttps%253A%252F%252Fembed.ly%26w%3D809&amp;display_name=Imgur&a

children in data contains all the posts data that we will need for our analysis.<br>

At a time request retrives 27 posts from reddit.

In [13]:
len(reddit_dict['data']['children'])

27

In [14]:
reddit_dict['data']['children'][1]

{'kind': 't3',
 'data': {'approved_at_utc': None,
  'subreddit': 'netflix',
  'selftext': "Hello everyone, 8 years ago we woke up one day and found one of our youtube videos had gone crazy on r/Videos. 8 year later thanks to lots of support form the reddit community we got our shot at our own TV show, Aunty Donna's Big Ol' House of Fun on Netflix.\n\nWe’re gonna be answering anything for the next hour or so. Answering today will be the aforementioned three performers, as well as head writer Sam Lingham, film maker Max Miller and composer Tom Armstrong.\n\nIf you don’t know who we are here's a little introductory playlist of our work so far: https://www.youtube.com/playlist?list=PLEzN-y0ZMgptSc7jzquvG0XrdfOnSMAQS\n\n\nProof: \n\n - https://i.redd.it/ryyg4p9amhz51.jpg\n - https://i.redd.it/o982fwqcmhz51.jpg\n - https://i.redd.it/2dn3dpwemhz51.jpg\n\nEDIT 1: Okay, Bro, Mark and I (Zach) gotta sign off for a bit to go record a pod. But the BTS boys might stick around for a little bit. Pls 

Each post information contains kind and data. This data key in the dictionary contains the post information.

In [15]:
reddit_dict['data']['children'][1].keys()

dict_keys(['kind', 'data'])

In [16]:
reddit_dict['data']['children'][1]['kind']

't3'

In [17]:
reddit_dict['data']['children'][1]['data']

{'approved_at_utc': None,
 'subreddit': 'netflix',
 'selftext': "Hello everyone, 8 years ago we woke up one day and found one of our youtube videos had gone crazy on r/Videos. 8 year later thanks to lots of support form the reddit community we got our shot at our own TV show, Aunty Donna's Big Ol' House of Fun on Netflix.\n\nWe’re gonna be answering anything for the next hour or so. Answering today will be the aforementioned three performers, as well as head writer Sam Lingham, film maker Max Miller and composer Tom Armstrong.\n\nIf you don’t know who we are here's a little introductory playlist of our work so far: https://www.youtube.com/playlist?list=PLEzN-y0ZMgptSc7jzquvG0XrdfOnSMAQS\n\n\nProof: \n\n - https://i.redd.it/ryyg4p9amhz51.jpg\n - https://i.redd.it/o982fwqcmhz51.jpg\n - https://i.redd.it/2dn3dpwemhz51.jpg\n\nEDIT 1: Okay, Bro, Mark and I (Zach) gotta sign off for a bit to go record a pod. But the BTS boys might stick around for a little bit. Pls keep asking etc and we'll 

In [18]:
reddit_dict['data']['children'][1]['data']['subreddit']

'netflix'

Above is the target : Netflix

Below are the text fields that will mainly be used for our analysis.

In [19]:
reddit_dict['data']['children'][1]['data']['title']

"Hi we're Mark, Zach and Broden from Aunty Donna, we got our first big break on reddit. We've just released our first tv series on Netflix. AMA."

In [20]:
reddit_dict['data']['children'][1]['data']['selftext']

"Hello everyone, 8 years ago we woke up one day and found one of our youtube videos had gone crazy on r/Videos. 8 year later thanks to lots of support form the reddit community we got our shot at our own TV show, Aunty Donna's Big Ol' House of Fun on Netflix.\n\nWe’re gonna be answering anything for the next hour or so. Answering today will be the aforementioned three performers, as well as head writer Sam Lingham, film maker Max Miller and composer Tom Armstrong.\n\nIf you don’t know who we are here's a little introductory playlist of our work so far: https://www.youtube.com/playlist?list=PLEzN-y0ZMgptSc7jzquvG0XrdfOnSMAQS\n\n\nProof: \n\n - https://i.redd.it/ryyg4p9amhz51.jpg\n - https://i.redd.it/o982fwqcmhz51.jpg\n - https://i.redd.it/2dn3dpwemhz51.jpg\n\nEDIT 1: Okay, Bro, Mark and I (Zach) gotta sign off for a bit to go record a pod. But the BTS boys might stick around for a little bit. Pls keep asking etc and we'll try to jump back on and answer a few more in the next day or so.

Get all the posts information into a single dataframe.

In [21]:
posts = [p['data'] for p in reddit_dict['data']['children']]

In [22]:
pd.DataFrame(posts)

Unnamed: 0,approved_at_utc,subreddit,selftext,author_fullname,saved,mod_reason_title,gilded,clicked,title,link_flair_richtext,...,permalink,parent_whitelist_status,stickied,url,subreddit_subscribers,created_utc,num_crossposts,media,is_video,link_flair_template_id
0,,netflix,,t2_6yobi,False,,0,False,Netflix now allows you to remove a movie/serie...,[],...,/r/netflix/comments/hrlw2a/netflix_now_allows_...,all_ads,True,https://imgur.com/XQx3ioW,808001,1594813000.0,1,{'oembed': {'provider_url': 'http://imgur.com'...,False,
1,,netflix,"Hello everyone, 8 years ago we woke up one day...",t2_3cpw5xub,False,,2,False,"Hi we're Mark, Zach and Broden from Aunty Donn...",[],...,/r/netflix/comments/jvfoxg/hi_were_mark_zach_a...,all_ads,True,https://www.reddit.com/r/netflix/comments/jvfo...,808001,1605562000.0,4,,False,
2,,netflix,,t2_p4hzt66,False,,0,False,Netflix Tests Cracking Down On Password Sharing,[],...,/r/netflix/comments/m2ymuh/netflix_tests_crack...,all_ads,False,https://www.hollywoodreporter.com/live-feed/ne...,808001,1615491000.0,0,,False,
3,,netflix,The show seem neat but it looks more like a k...,t2_agh7r2kj,False,,0,False,Should I watch the show The Dragon Prince as a...,[],...,/r/netflix/comments/m2uqy3/should_i_watch_the_...,all_ads,False,https://www.reddit.com/r/netflix/comments/m2uq...,808001,1615482000.0,0,,False,
4,,netflix,I’m sure that someone has posted this; but if ...,t2_a0z00q7h,False,,0,False,Behind Her Eyes,[],...,/r/netflix/comments/m39k8y/behind_her_eyes/,all_ads,False,https://www.reddit.com/r/netflix/comments/m39k...,808001,1615523000.0,0,,False,
5,,netflix,Hi! I've just finished the good place and park...,t2_2eukfbxa,False,,0,False,"Any shows similar to parks and rec, the office...",[],...,/r/netflix/comments/m3bghu/any_shows_similar_t...,all_ads,False,https://www.reddit.com/r/netflix/comments/m3bg...,808001,1615529000.0,0,,False,
6,,netflix,,t2_2bexeqsm,False,,0,False,Parody review/recap of Riverdale (season 5),[],...,/r/netflix/comments/m2oe94/parody_reviewrecap_...,all_ads,False,https://youtu.be/iFCNxsZf29s,808001,1615463000.0,0,"{'type': 'youtube.com', 'oembed': {'provider_u...",False,
7,,netflix,Had this on my list for awhile and finally tri...,t2_13843vsh,False,,0,False,Nailed It,[],...,/r/netflix/comments/m3d2pt/nailed_it/,all_ads,False,https://www.reddit.com/r/netflix/comments/m3d2...,808001,1615537000.0,0,,False,
8,,netflix,,t2_9c8e1h2r,False,,0,False,Netflix to Start Testing Warnings for People B...,[],...,/r/netflix/comments/m30pn9/netflix_to_start_te...,all_ads,False,https://gammawire.com/netflix-to-start-testing...,808001,1615497000.0,0,,False,
9,,netflix,,t2_81vkr2em,False,,0,False,From 'The White Tiger' to 'Supernova': 10 best...,[],...,/r/netflix/comments/m3bx5c/from_the_white_tige...,all_ads,False,https://hosanna.store/blogs/news/from-the-whit...,808001,1615531000.0,0,,False,


We have explored a single request and the data received from it. We need to scrape more to fetch sufficient data for our analysis. <br>
In order to do that the last post of the response data will help us determine which will be our next batch of posts.<br> We can get the reference of last post in the response dictionary as below.

In [25]:
reddit_dict['data']['after']

't3_m1xyvd'

The url formed below will be used to fetch next batch of posts from reddit. 

In [26]:
url + '?after=' + reddit_dict['data']['after']

'https://www.reddit.com/r/netflix.json?after=t3_m1xyvd'

### Scrape Sufficient Data

In [27]:
posts = []
user_agents = ['Arti Inc 1.0', 'AJ 1.0 Inc', 'Arti Inc 2.0', 'AJ 2.0 Inc', 'Arti Inc 3.0', 
               'AJ 3.0 Inc', 'Arti Inc 4.0', 'AJ 4.0 Inc','Arti Inc 5.0', 'AJ 5.0 Inc',]
after = None

for a in range(100):
    if after == None:
        current_url = url
    else:
        current_url = url + '?after=' + after
    print(current_url)
    res = requests.get(current_url, headers={'User-agent': random.choice(user_agents)})
    
    if res.status_code != 200:
        print('Status error', res.status_code)
        break
    
    current_dict = res.json()
    current_posts = [p['data'] for p in current_dict['data']['children']]
    posts.extend(current_posts)
    after = current_dict['data']['after']
    
    # generate a random sleep duration to look more 'natural'
    sleep_duration = random.randint(2,20)
    print(sleep_duration)
    time.sleep(sleep_duration)

https://www.reddit.com/r/netflix.json
15
https://www.reddit.com/r/netflix.json?after=t3_m1xyvd
5
https://www.reddit.com/r/netflix.json?after=t3_m26axk
16
https://www.reddit.com/r/netflix.json?after=t3_m1309t
13
https://www.reddit.com/r/netflix.json?after=t3_m12tlj
7
https://www.reddit.com/r/netflix.json?after=t3_lzh8ts
5
https://www.reddit.com/r/netflix.json?after=t3_lyz5y1
7
https://www.reddit.com/r/netflix.json?after=t3_ly5tbu
12
https://www.reddit.com/r/netflix.json?after=t3_lxdjlp
8
https://www.reddit.com/r/netflix.json?after=t3_lwg1t4
17
https://www.reddit.com/r/netflix.json?after=t3_lvsri5
17
https://www.reddit.com/r/netflix.json?after=t3_lufddi
2
https://www.reddit.com/r/netflix.json?after=t3_lt0sq2
3
https://www.reddit.com/r/netflix.json?after=t3_lsjgrg
10
https://www.reddit.com/r/netflix.json?after=t3_lsm1bi
16
https://www.reddit.com/r/netflix.json?after=t3_lqyqku
7
https://www.reddit.com/r/netflix.json?after=t3_lqapzr
11
https://www.reddit.com/r/netflix.json?after=t3_lolrgs
1

In [28]:
posts_df = pd.DataFrame(posts)

In [29]:
pd.set_option('display.max_columns', len(posts_df.columns))
posts_df

Unnamed: 0,approved_at_utc,subreddit,selftext,author_fullname,saved,mod_reason_title,gilded,clicked,title,link_flair_richtext,subreddit_name_prefixed,hidden,pwls,link_flair_css_class,downs,top_awarded_type,hide_score,name,quarantine,link_flair_text_color,upvote_ratio,author_flair_background_color,subreddit_type,ups,total_awards_received,media_embed,author_flair_template_id,is_original_content,user_reports,secure_media,is_reddit_media_domain,is_meta,category,secure_media_embed,link_flair_text,can_mod_post,score,approved_by,author_premium,thumbnail,edited,author_flair_css_class,author_flair_richtext,gildings,content_categories,is_self,mod_note,created,link_flair_type,wls,removed_by_category,banned_by,author_flair_type,domain,allow_live_comments,selftext_html,likes,suggested_sort,banned_at_utc,url_overridden_by_dest,view_count,archived,no_follow,is_crosspostable,pinned,over_18,all_awardings,awarders,media_only,can_gild,spoiler,locked,author_flair_text,treatment_tags,visited,removed_by,num_reports,distinguished,subreddit_id,mod_reason_by,removal_reason,link_flair_background_color,id,is_robot_indexable,report_reasons,author,discussion_type,num_comments,send_replies,whitelist_status,contest_mode,mod_reports,author_patreon_flair,author_flair_text_color,permalink,parent_whitelist_status,stickied,url,subreddit_subscribers,created_utc,num_crossposts,media,is_video,link_flair_template_id,media_metadata,is_gallery,gallery_data
0,,netflix,,t2_6yobi,False,,0,False,Netflix now allows you to remove a movie/serie...,[],r/netflix,False,6,one,0,,False,t3_hrlw2a,False,dark,0.99,,public,5381,8,"{'content': '&lt;iframe class=""embedly-embed"" ...",,False,[],{'oembed': {'provider_url': 'http://imgur.com'...,False,False,,"{'content': '&lt;iframe class=""embedly-embed"" ...",[META],False,5381,,False,,False,,[],{},,False,,1.594841e+09,text,6,,,text,imgur.com,True,,,,,https://imgur.com/XQx3ioW,,True,False,False,False,False,"[{'giver_coin_reward': None, 'subreddit_id': N...",[],False,False,False,True,,[],False,,,,t5_2qoxj,,,,hrlw2a,True,,ILikeLampz,,261,False,all_ads,False,[],False,,/r/netflix/comments/hrlw2a/netflix_now_allows_...,all_ads,True,https://imgur.com/XQx3ioW,808001,1.594813e+09,1,{'oembed': {'provider_url': 'http://imgur.com'...,False,,,,
1,,netflix,"Hello everyone, 8 years ago we woke up one day...",t2_3cpw5xub,False,,2,False,"Hi we're Mark, Zach and Broden from Aunty Donn...",[],r/netflix,False,6,,0,,False,t3_jvfoxg,False,dark,0.90,,public,6117,27,{},,False,[],,False,False,,{},,False,6117,,False,,1.60557e+09,,[],"{'gid_1': 8, 'gid_2': 2}",,True,,1.605591e+09,text,6,,,text,self.netflix,True,"&lt;!-- SC_OFF --&gt;&lt;div class=""md""&gt;&lt...",,,,,,False,False,False,False,False,"[{'giver_coin_reward': 0, 'subreddit_id': None...",[],False,False,False,False,Verified - Official Netflix,[],False,,,,t5_2qoxj,,,,jvfoxg,True,,netflix,,1657,True,all_ads,False,[],False,dark,/r/netflix/comments/jvfoxg/hi_were_mark_zach_a...,all_ads,True,https://www.reddit.com/r/netflix/comments/jvfo...,808001,1.605562e+09,4,,False,,,,
2,,netflix,,t2_p4hzt66,False,,0,False,Netflix Tests Cracking Down On Password Sharing,[],r/netflix,False,6,,0,,False,t3_m2ymuh,False,dark,0.96,,public,614,0,{},,False,[],,False,False,,{},,False,614,,False,,False,,[],{},,False,,1.615520e+09,text,6,,,text,hollywoodreporter.com,True,,,,,https://www.hollywoodreporter.com/live-feed/ne...,,False,False,False,False,False,[],[],False,False,False,False,,[],False,,,,t5_2qoxj,,,,m2ymuh,True,,JediNotePad,,285,False,all_ads,False,[],False,,/r/netflix/comments/m2ymuh/netflix_tests_crack...,all_ads,False,https://www.hollywoodreporter.com/live-feed/ne...,808001,1.615491e+09,0,,False,,,,
3,,netflix,The show seem neat but it looks more like a k...,t2_agh7r2kj,False,,0,False,Should I watch the show The Dragon Prince as a...,[],r/netflix,False,6,,0,,False,t3_m2uqy3,False,dark,0.87,,public,129,0,{},,False,[],,False,False,,{},,False,129,,False,,False,,[],{},,True,,1.615510e+09,text,6,,,text,self.netflix,False,"&lt;!-- SC_OFF --&gt;&lt;div class=""md""&gt;&lt...",,,,,,False,False,False,False,False,[],[],False,False,False,False,,[],False,,,,t5_2qoxj,,,,m2uqy3,True,,laddbent,,58,True,all_ads,False,[],False,,/r/netflix/comments/m2uqy3/should_i_watch_the_...,all_ads,False,https://www.reddit.com/r/netflix/comments/m2uq...,808001,1.615482e+09,0,,False,,,,
4,,netflix,I’m sure that someone has posted this; but if ...,t2_a0z00q7h,False,,0,False,Behind Her Eyes,[],r/netflix,False,6,,0,,False,t3_m39k8y,False,dark,0.92,,public,9,0,{},,False,[],,False,False,,{},,False,9,,False,,False,,[],{},,True,,1.615552e+09,text,6,,,text,self.netflix,False,"&lt;!-- SC_OFF --&gt;&lt;div class=""md""&gt;&lt...",,,,,,False,False,False,False,False,[],[],False,False,False,False,,[],False,,,,t5_2qoxj,,,,m39k8y,True,,tylerherman13,,4,True,all_ads,False,[],False,,/r/netflix/comments/m39k8y/behind_her_eyes/,all_ads,False,https://www.reddit.com/r/netflix/comments/m39k...,808001,1.615523e+09,0,,False,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2423,,netflix,Beware folks I do believe this is an email imi...,t2_4do16lze,False,,0,False,Netflix spam email,[],r/netflix,False,6,,0,,False,t3_lkgxct,False,dark,0.60,,public,1,0,{},,False,[],,False,False,,{},,False,1,,False,,1.61341e+09,,[],{},,True,,1.613435e+09,text,6,,,text,self.netflix,False,"&lt;!-- SC_OFF --&gt;&lt;div class=""md""&gt;&lt...",,,,,,False,True,False,False,False,[],[],False,False,False,False,,[],False,,,,t5_2qoxj,,,,lkgxct,True,,hillbillyharold101,,2,True,all_ads,False,[],False,,/r/netflix/comments/lkgxct/netflix_spam_email/,all_ads,False,https://www.reddit.com/r/netflix/comments/lkgx...,808016,1.613406e+09,0,,False,,"{'jvrr7lxu2oh61': {'status': 'valid', 'e': 'Im...",,
2424,,netflix,"Hi,\n\ncan we get everybody hates chris on Net...",t2_3reqsx93,False,,0,False,Everybody hates Chris,[],r/netflix,False,6,,0,,False,t3_lkg00v,False,dark,0.50,,public,0,0,{},,False,[],,False,False,,{},,False,0,,False,,False,,[],{},,True,,1.613432e+09,text,6,,,text,self.netflix,False,"&lt;!-- SC_OFF --&gt;&lt;div class=""md""&gt;&lt...",,,,,,False,True,False,False,False,[],[],False,False,False,False,,[],False,,,,t5_2qoxj,,,,lkg00v,True,,WasabiChief,,1,True,all_ads,False,[],False,,/r/netflix/comments/lkg00v/everybody_hates_chris/,all_ads,False,https://www.reddit.com/r/netflix/comments/lkg0...,808016,1.613403e+09,0,,False,,,,
2425,,netflix,I'm having a difficult time finding decent sho...,t2_54wya6au,False,,0,False,Watched Red Dot and was disappointed (spoilers),[],r/netflix,False,6,,0,,False,t3_lk2efg,False,dark,0.88,,public,12,0,{},,False,[],,False,False,,{},,False,12,,False,,False,,[],{},,True,,1.613381e+09,text,6,,,text,self.netflix,False,"&lt;!-- SC_OFF --&gt;&lt;div class=""md""&gt;&lt...",,,,,,False,False,False,False,False,[],[],False,False,True,False,,[],False,,,,t5_2qoxj,,,,lk2efg,True,,nonetodaysu,,27,True,all_ads,False,[],False,,/r/netflix/comments/lk2efg/watched_red_dot_and...,all_ads,False,https://www.reddit.com/r/netflix/comments/lk2e...,808016,1.613352e+09,0,,False,,,,
2426,,netflix,,t2_7r62g,False,,0,False,"In light of Hulu's ""Framing Britney Spears,"" N...",[],r/netflix,False,6,,0,,False,t3_lk43dk,False,dark,0.75,,public,10,0,{},,False,[],,False,False,,{},,False,10,,False,,False,,[],{},,False,,1.613387e+09,text,6,,,text,bloomberg.com,False,,,,,https://www.bloomberg.com/news/articles/2021-0...,,False,False,False,False,False,[],[],False,False,False,False,,[],False,,,,t5_2qoxj,,,,lk43dk,True,,hyogurt,,6,True,all_ads,False,[],False,,/r/netflix/comments/lk43dk/in_light_of_hulus_f...,all_ads,False,https://www.bloomberg.com/news/articles/2021-0...,808016,1.613358e+09,0,,False,,,,


In [30]:
posts_df.columns

Index(['approved_at_utc', 'subreddit', 'selftext', 'author_fullname', 'saved',
       'mod_reason_title', 'gilded', 'clicked', 'title', 'link_flair_richtext',
       ...
       'url', 'subreddit_subscribers', 'created_utc', 'num_crossposts',
       'media', 'is_video', 'link_flair_template_id', 'media_metadata',
       'is_gallery', 'gallery_data'],
      dtype='object', length=107)

All the post information along with meta data is available in a single record. We will not use meta data like clicked, saved, hidden , etc. <br>
We will only use text fields of post containing meaningful words.

In [31]:
posts_df['selftext'].eq('').sum()

624

### Export To CSV

In [33]:
posts_df.to_csv('../Data/Netflix.csv', index=False)