### Scrape reddit information using Python and PRAW



Since you are already reading through this tutorial, you must be already familiar with Python. Alternatively, you can get a quick overview here: https://docs.python.org/3/tutorial/index.html

In this tutorial, we will introduce PRAW: The Python Reddit API Wrapper, and how to use it to scrap the data from Reddit. Before we can get started, there are two pre-requisites: 

- Reddit Account
- Client ID & Client Secret: These two values are needed to access Reddit’s API as a script application (http://www.storybench.org/how-to-scrape-reddit-with-python/)

Let's begin!!

#### Initialize Reddit instance
As the first step, we require an instance of the `Reddit` class to do anything with PRAW, and there are two states these instances can be in: `read-only` and `authorized`.  While read-only instance allows one to scrap publicly available information, to do more, one needs an authorized instance. Below, we intialized the authorized instance:

In [2]:
# import dependent libraries
import datetime
import praw
import pandas as pd

# reddit API connection
reddit = praw.Reddit(client_id='mEcafppA5wl08g',
                     client_secret='9XBc3AT0XnxRq7CW3a_ToxNJfqc',
                     user_agent='comments_scrapper',
                     username='skapadia',
                     password='Sha31tennine0')

#### Obtain a Subreddit

To obtain a Subreddit instance, pass the subreddit’s name when calling subreddit on your Reddit instance.

In [8]:
# assume you have a Reddit instance bound to variable `reddit`
subreddit = reddit.subreddit('AskReddit')

print(subreddit.display_name)
print(subreddit.title) 
print(subreddit.description)

AskReddit
Ask Reddit...
###### [ [ SERIOUS ] ](http://www.reddit.com/r/askreddit/submit?selftext=true&title=%5BSerious%5D)


##### [Rules](https://www.reddit.com/r/AskReddit/wiki/index#wiki_rules):
1. You must post a clear and direct question in the title. The title may contain two, short, necessary context sentences.
No text is allowed in the textbox. Your thoughts/responses to the question can go in the comments section. [more >>](http://goo.gl/tMUR4k)

2. Any post asking for advice should be generic and not specific to your situation alone. [more >>](http://goo.gl/2L771B)

3. Askreddit is for open-ended discussion questions. [more >>](http://goo.gl/DcPPLf)


5. Askreddit is not your soapbox, personal army, or advertising platform. [more >>](http://goo.gl/DG4Q2M)

6. Questions seeking professional advice are inappropriate for this subreddit and will be removed. [more >>](http://goo.gl/G6Zbap)

7. Soliciting money, goods, services, or favours is not allowed. [more >>](http://goo.gl/Ce

#### Obtain Submission Instances from a Subreddit

Now that you have a Subreddit instance, you can iterate through some of its submissions, each bound to an instance of Submission. There are several sorts that you can iterate through:

- controversial
- gilded
- hot
- new
- rising
- top

Each of these methods will immediately return a ListingGenerator, which is to be iterated through.

In [4]:
# assume you have a Subreddit instance bound to variable `subreddit`
x = []
for submission in subreddit.hot(limit=4):
    print(submission.title)  # Output: the submission's title
    print(submission.score)  # Output: the submission's score
    print(submission.id)     # Output: the submission's ID
    print(submission.url)    # Output: the URL the submission points to
                             # or the submission's URL if it's a self post
    x.append(submission.comments.list())

Weekly Entering & Transitioning Thread | 24 Mar 2019 - 31 Mar 2019
9
b4y15v
https://www.reddit.com/r/datascience/comments/b4y15v/weekly_entering_transitioning_thread_24_mar_2019/
Are most data science jobs related to marketing or am I having bad luck on the job search?
92
b5q1gr
https://www.reddit.com/r/datascience/comments/b5q1gr/are_most_data_science_jobs_related_to_marketing/
I've open-sourced a Python package that lets you provide an input CSV and a target field to predict to Python/a CLI, and generate a robust machine learning model *and custom Python code* to run it in production workflows.
27
b5quu3
https://www.reddit.com/r/datascience/comments/b5quu3/ive_opensourced_a_python_package_that_lets_you/
How to cultivate data science soft skills?
49
b5mjjf
https://www.reddit.com/r/datascience/comments/b5mjjf/how_to_cultivate_data_science_soft_skills/


In [20]:
topics_dict = {'submission_id':[],
               'user_id':[],
               'submission_timestamp':[],
               'title':[],
               
    
    
    "title":[],
                "score":[],
                "id":[], "url":[],
                "comms_num": [],
                "created": [],
                "body":[]}


{'_comments': <praw.models.comment_forest.CommentForest object at 0x115e0e390>,
 '_comments_by_id': {'t1_egzak05': Comment(id='egzak05'),
                     't1_egzavx3': Comment(id='egzavx3'),
                     't1_egzb4sq': Comment(id='egzb4sq'),
                     't1_egzbjmk': Comment(id='egzbjmk'),
                     't1_egzczkp': Comment(id='egzczkp'),
                     't1_egzd2vf': Comment(id='egzd2vf'),
                     't1_egzde07': Comment(id='egzde07'),
                     't1_egzdqop': Comment(id='egzdqop'),
                     't1_egzdsnf': Comment(id='egzdsnf'),
                     't1_egze0di': Comment(id='egze0di'),
                     't1_egzeqw0': Comment(id='egzeqw0'),
                     't1_egzf0x8': Comment(id='egzf0x8'),
                     't1_egzfbd2': Comment(id='egzfbd2'),
                     't1_egzffkj': Comment(id='egzffkj'),
                     't1_egzfgh4': Comment(id='egzfgh4'),
                     't1_egzfhtl': Comment(id='egz

In [5]:
print x

[[Comment(id='ejcad1d'), Comment(id='ejek92d'), Comment(id='ej9wtfu'), Comment(id='ej9zmlm'), Comment(id='eja2th6'), Comment(id='eja9o6q'), Comment(id='ejaf795'), Comment(id='ejar9lz'), Comment(id='ejeh2zz'), Comment(id='ejf0ozk'), Comment(id='ejfadc4'), Comment(id='ejfj0qt'), Comment(id='ej9zlxu'), Comment(id='ejadcwl'), Comment(id='ejardti'), Comment(id='ejb5tv5'), Comment(id='ejbl6xa'), Comment(id='ejc40ik'), Comment(id='ejckigh'), Comment(id='ejcl080'), Comment(id='ejcp77i'), Comment(id='ejdc81g'), Comment(id='ejdk9zy'), Comment(id='ejdouvn'), Comment(id='ejdvx5o'), Comment(id='eje0nbs'), Comment(id='eje861u'), Comment(id='ejg1u8z'), Comment(id='ejbb1fw'), Comment(id='ejbqtuv'), Comment(id='ejc28yx'), Comment(id='ejc7dm4'), Comment(id='ejc9td2'), Comment(id='ejcazaa'), Comment(id='ejdy0ni'), Comment(id='ejdzir1'), Comment(id='ejewosv'), Comment(id='ejfc995'), Comment(id='ejfrpuj'), Comment(id='ejchpqa'), Comment(id='ejcn8im'), Comment(id='ejc4452'), Comment(id='ejcz34s'), Comment(i

In [12]:
for submission in subreddit.stream.submissions():
    print(submission.author)  # Output: the submission's title

fat54
Flamepig27
feliciathemule
desmursblancs
zestybiscuit
Graviee
Horrorito
YangKiVee
-KpopTrash-
ScoobertNoJewbert
caitlinmeyer
SteveJackson007
badassite
Longjumping_Walrus
MCRfan1551
JerryKurle
DrKurtCockings
M0NAB33
Sanga212
_YoungLink_
Thaton3dood
WoodyTheCowboi
squidbenji
GenericNameNumber46
lenoodlenugget
frankthelizard
tordue
spacethekidd
druughammerfist
summonblood
sausagesniffer
GoldenOwl25
starzwillsucceed
JerryKurle
hutimuti
MedswithBreakfast
AskMeAboutMyTie
shadowsbored
pratiksawa
YaBoiRoli
Ayru_
MWK666
lonew099
Randy_fulcher_8
JerryKurle
lilmizzvalz
DawnofMidnight7
ogSKH7
paradox717
XxQuarterizexX
TheKnightPainter
lunacy74
rmrgdr
EpicGiggler
GenericNameNumber46
DanishFirhan
bugeyeswhitedragon
smokiefish
eatshittpitt
Writrang77
dabpacito69
Balkarax
capix1
bestloliconRU
bombtheoranges
TrakerGames
3decadesin
S3Dzyy
DawnofMidnight7
klucx
halfcookedspaghet
cruisingjoe01
JTitor5100
DanscoRed
FIGHTWITHLOGIC
Hamplural
iCupepe_
FaceCheck69
chloe2120
dreadpirateroberts92
OldManofth

KeyboardInterrupt: 