<div>
<img src=https://www.institutedata.com/wp-content/uploads/2019/10/iod_h_tp_primary_c.svg width="300">
</div>

# Lab 2.2.3 
# *Mining Social Media on Reddit*

## The Reddit API and the PRAW Package

The Reddit API is rich and complex, with many endpoints (https://www.reddit.com/dev/api/). It includes methods for navigating its collections, which include various kinds of media as well as comments. Fortunately, the Python library PRAW reduces much of this complexity.

Reddit requires developers to create and authenticate an app before they can use the API, but the process is much less onerus than some, and does not have waiting period for approval of new developers (as of 18 August 2018).

### 1. Create a Reddit App

Go to https://www.reddit.com/prefs/apps and click "create an app".

Enter the following in the form:

- a name for your app
- select "script" radio button
- a description
- a redirect URI

(Nb. For pulling data into a data science experiment, a local port can be used for the Redirect URI; try http://127.0.0.1:1410)


- click "create app"
- from the form that displays, copy the following to a local text file (or to this notebook):

  - name (the name you gave to your app)
  - redirect URI
  - personal use script (this is your OAuth 2 Client ID)
  - secret (this is your OAuth 2 Secret)

### 2. Register for API Access

- follow the link at https://www.reddit.com/wiki/api and read the terms of use for Reddit API access 
- fill in the form fields at the bottom 
  - make sure to enter your new OAuth Client ID where indicated
  - your use case could be something like "Training in API usage for data science projects"
  - your platform could be something like "Jupyter Notebooks / Python"
  
- click "SUBMIT"
 
- when asked for User-Agent, enter something that fits this pattern:
  `your_os-python:your_reddit_appname:v1.0 (by /u/your_reddit_username)`

### 3. Load Python Libraries

In [3]:
import praw
import requests
import json
import pprint
from datetime import datetime, date, time

### 4. Authenticate from your Python script

You could assign your authentication details explicitly, as follows:

In [4]:
my_user_agent = ''   # your user Agent string goes in here
my_client_id = ''   # your Client ID string goes in here
my_client_secret = ''   # your Secret string goes in here

A better way would be to store these details externally, so they are not displayed in the notebook:

- create a file called "auth_reddit.json" in your "notebooks" directory, and save your credentials there in JSON format:

`{   "my_client_id": "your Client ID string goes in here",` <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;` "my_client_secret": "your Secret string goes in here",` <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`"my_user_agent": "your user Agent string goes in here"` <br>
`}`

Use the following code to load the credentials:  

In [5]:
pwd()  # make sure your working directory is where the file is

'C:\\Users\\ryan\\Downloads'

In [9]:
path_auth = 'auth_reddit.json'
auth = json.loads(open(path_auth).read())
pp = pprint.PrettyPrinter(indent=4)
# For debugging only:
#pp.pprint(auth)

my_user_agent = auth['my_user_agent']
my_client_id = auth['my_client_id']
my_client_secret = auth['my_client_secret']

Security considerations: 
- this method only keeps your credentials invisible as long as nobody else gets access to this notebook file 
- if you wanted another user to have access to the executable notebook without divulging your credentials you should set up an OAuth 2.0 workflow to let them obtain and apply their own API tokens when using your app
- if you just want to share your analyses, you could use a separate script (which you don't share) to fetch the data and save it locally, then use a second notebook (with no API access) to load and analyse the locally stored data

### 5. Exploring the API

Here is how to connect to Reddit with read-only access:

In [10]:
reddit = praw.Reddit(client_id = my_client_id, 
                     client_secret = my_client_secret, 
                     user_agent = my_user_agent)

print('Read-only = ' + str(reddit.read_only))  # Output: True

Read-only = True


In the next cell, put the cursor after the '.' and hit the [tab] key to see the available members and methods in the response object:

In [22]:
reddit.front

<praw.models.front.Front at 0x23c0d900730>

Consult the PRAW and Reddit API documentation. Print a few of the response members below:

In [25]:
for submission in reddit.front.hot():
    print(submission)

ljpos8
ljq5ww
ljx7rj
ljqf26
lk02ca
lk4jbs
lk3dj7
ljo6li
ljtf3w
ljt7v0
ljoe32
lk4jlv
ljzs88
lk3uyw
lk362l
ljsu04
ljy06n
lk019e
lk0elj
lk4giq
lk4o95
ljzhje
ljzydu
ljqlnd
ljx2pe
ljq9q9
lk399g
ljqgay
lk1y96
lk4rt2
lk2lve
ljqhty
lk1i0l
ljq6dh
ljpba9
ljty54
lk376e
lk1f9q
ljpl0p
ljq9yx
ljryac
ljs41i
ljxbir
lk08bm
ljod1j
lk16tf
ljrglv
lk45wf
lk1y2d
lk15sj
lk1ugp
ljxgni
ljrjim
ljtvga
ljywyn
lk0jtu
ljoofe
ljy3wa
lk18ea
ljr9dr
lk1jj4
lk3hul
ljmmkl
ljxkkv
lk3bar
ljvdjb
lk2kzr
ljov42
ljz4rr
ljpltu
lk2rir
lk26ah
ljvlv3
ljmvzj
ljzsg6
ljziv9
ljz3wl
ljsqx1
lk4lyg
ljo9dy
ljzkkq
lk2utl
lk35or
lk06pe
ljxarw
ljt01v
ljwk2m
lk38d8
ljy5zt
lk2gb6
lk264a
ljzpcm
ljtrjx
ljq4z4
ljqzar
ljzp2i
ljy5xm
lk49sw
ljpc9f
lk1kme


Content in Reddit is grouped by topics called "subreddits". Content, called "submissions", is fetched by calling the `subreddit` method of the connection object (which is our `reddit` variable) with an argument that matches an actual topic. 

We also need to append a further method call to a "subinstance", such as one of the following:

- controversial
- gilded
- hot
- new
- rising
- top

One of the submission objects members is `title`. Fetch and print 10 submission titles from the 'learnpython' subreddit using one of the subinstances above:

In [26]:
for submission in reddit.subreddit('learnpython').hot(limit=10):
    print(submission.title)

Ask Anything Monday - Weekly Thread
Stock Predictor Python
Rate my blackjack game please!
I made a fractal
Why am I getting an IndexError: list index out of range in this script?
Python for Digital Marketing?
What's the correct way to do FizzBuzz?
Duplicate items in list of lists
How do I change the x and y axes numbers in matplotlib without changing the actual graph?
How would I make a battleship that takes inputs from terminal, and the changes are reflected in html?


Now retrieve 10 authors:

In [27]:
for submission in reddit.subreddit('learnpython').hot(limit=10):
    print(submission.author)

AutoModerator
cavebird2020
cycleking303
just_a_dude2727
prickett94
notrealadvice
McHaaps
breadncheesetheking1
Wallywutsizface
Steven0710


Note that we obtained the titles and authors from separate API calls. Can we expect these to correspond to the same submissions? If not, how could we gurantee that they do?

In [81]:
for submission in reddit.subreddit('wallstreetbets').hot(limit=10):
    print(submission.title)

Weekend Discussion Thread for the Weekend of February 12, 2021
What a lovely day!
Happy Valentines Day
Can anyone guess when I discovered options trading?
Soon may the tendie man cone
Who's tryna join my new analytics company?
To all of you who lost 80% on GME. This one's for you.
My Awesome WSB Birthday Cake - HOLD 🤲💎🤲
Well, this is what my wife's boyfriend told me to do
Valentines Gift 🎁


Why doesn't the next cell produce output?

In [82]:
for submission in submissions:
    print(submission.comments)

<praw.models.comment_forest.CommentForest object at 0x0000023C10CE0AF0>
<praw.models.comment_forest.CommentForest object at 0x0000023C10F60D00>
<praw.models.comment_forest.CommentForest object at 0x0000023C10FA4E50>
<praw.models.comment_forest.CommentForest object at 0x0000023C10FDC670>


In [76]:
submissions

<praw.models.listing.generator.ListingGenerator at 0x23c0eaea460>

Print two comments associated with each of these submissions:

In [86]:
submissions = reddit.subreddit('wallstreetbets').hot(limit=5)
for submission in submissions:
    all_comments = submission.comments.list()
    print(submission.comments)
    for comment in all_comments:
        print(comment.body)

<praw.models.comment_forest.CommentForest object at 0x0000023C11B459A0>
Edit:
Flair or ban results below in replies.

Edit 2:
Its over. Stop commenting flair or ban, or its ban
I am gambling to fill a giant hole in my life
# GME still in sideways - this can mean one thing only
Next week should definitely go up or down
Why are the markets closed Presidents’ Day. They should be open with a 2x multiplier
Turn off the screens, go get pizza, and get laid. It’s the weekend! Don’t forget to support your local drug dealers!
i bought pltr calls for the run up. monday is closed and earnings are pre market on tuesday. 🤡🤡🤡🤡🤡
I’m somewhat new to options. I think I’m getting the concept though - you buy them and then all your money is gone, right?
Wow. Not having any buying power has saved me from making ALOT Of dumb decisions
I don't "hold bags".

I just let everything expire worthless because I've got money to burn. 

It's a power move. Peasants.

Edit: [Just found out it's another 3 day weekend. 

AttributeError: 'MoreComments' object has no attribute 'body'

Referring to the API documentation, explore the submissions object and print some interesting data:

In [37]:
submissions = reddit.subreddit('politics').hot(limit=5)
for submission in submissions:
    print(submission.author)

optimalg
dorestes
TJ_SP
Trainrideviews
Ganrokh


#### Posting to Reddit

To be able to post to your Reddit account (i.e. contribute submissions), you need to connect to the API with read/write privilege. This requires an *authorised instance*, which is obtained by including your Reddit user name and password in the connection request: 

In [40]:
path_auth = 'auth_reddit.json'
auth = json.loads(open(path_auth).read())
pp = pprint.PrettyPrinter(indent=4)
# For debugging only:
#pp.pprint(auth)

my_user_agent = auth['my_user_agent']
my_client_id = auth['my_client_id']
my_client_secret = auth['my_client_secret']
username = auth['username']
password = auth['password']

In [49]:
reddit = praw.Reddit(client_id=my_client_id,
                     client_secret=my_client_secret,
                     user_agent=my_user_agent,
                     username=username,
                     password=password)
print(reddit.read_only)  # Output: False

False


You could hide these last two credentials by adding them to your JSON file and then reading all five values at once.

In [51]:
submissions = reddit.subreddit('wallstreetbets').top("all",limit=10)
for submission in submissions:
    print(submission.title)

Times Square right now
UPVOTE so everyone sees we got SUPPORT
GME YOLO update — Jan 28 2021
GME YOLO month-end update — Jan 2021
CLASS ACTION AGAINST ROBINHOOD. Allowing people to only sell is the definition of market manipulation. A class action must be started, Robinhood has made plenty of money off selling info about our trades to the hedge funds to be able to pay out a little for causing people to loose money now
It’s treason then
Used some of my GME tendies to buy Nintendo Switches from Gamestop, then donated them to a Children's Hospital. Got featured on the local news and brought glory to WSB.
IT'S POWER TO THE TRADERS NOW
GME YOLO update — Jan 27 2021 --------------------------------------- guess i need 102 characters in title now
GME YOLO update — Feb 1 2021


In [57]:
submissions = reddit.subreddit('wallstreetbets').hot(limit=10)
for submission in submissions:
    print(submission.is_self)

True
False
False
False
False
False
False
False
False
False


>
>


>
>




---



---



> > > > > > > > > © 2021 Institute of Data


---



---



