<div>
<img src=https://www.institutedata.com/wp-content/uploads/2019/10/iod_h_tp_primary_c.svg width="300">
</div>

# Lab 3.2.2
# *Mining Social Media on Reddit*

## The Reddit API and the PRAW Package

The Reddit API is rich and complex, with many endpoints (https://www.reddit.com/dev/api/). It includes methods for navigating its collections, which include various kinds of media as well as comments. Fortunately, the Python library PRAW reduces much of this complexity.

Reddit requires developers to create and authenticate an app before they can use the API, but the process is much less onerous than some, and does not have waiting period for approval of new developers.

### 1. Create a Reddit App

Go to https://www.reddit.com/prefs/apps and click "create an app".

Enter the following in the form:

- a name for your app
- select "script" radio button
- a description
- a redirect URI

(Nb. For pulling data into a data science experiment, a local port can be used for the Redirect URI; try http://127.0.0.1:1410)


- click "create app"
- from the form that displays, copy the following to a local text file (or to this notebook):

  - name (the name you gave to your app)
  - redirect URI
  - personal use script (this is your OAuth 2 Client ID)
  - secret (this is your OAuth 2 Secret)

### 2. Register for API Access

- follow the link at https://www.reddit.com/wiki/api and read the terms of use for Reddit API access
- fill in the form fields at the bottom
  - make sure to enter your new OAuth Client ID where indicated
  - your use case could be something like "Training in API usage for data science projects"
  - your platform could be something like "Jupyter Notebooks / Python"
  
- click "SUBMIT"

- when asked for User-Agent, enter something that fits this pattern:
  `your_os-python:your_reddit_appname:v1.0 (by /u/your_reddit_username)`

### 3. Load Python Libraries

In [1]:
!pip install praw



In [1]:
import praw
import requests
import json
import pprint
from datetime import datetime, date, time

### 4. Authenticate from your Python script

You could assign your authentication details explicitly, as follows:

In [3]:
my_user_agent = 'Removed for security reason'   # your user Agent string goes in here
my_client_id = 'Removed for security reason'   # your Client ID string goes in here
my_client_secret = 'Removed for security reason'   # your Secret string goes in here

A better way would be to store these details externally, so they are not displayed in the notebook:

- create a file called "auth_reddit.json" in your "notebooks" directory, and save your credentials there in JSON format:

`{   "my_client_id": "your Client ID string goes in here",` <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;` "my_client_secret": "your Secret string goes in here",` <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`"my_user_agent": "your user Agent string goes in here"` <br>
`}`

Use the following code to load the credentials:  

In [4]:
pwd()  # make sure your working directory is where the file is

'C:\\Users\\patri\\OneDrive\\UTS and personal doc 2022\\Documents\\Person Docs\\Data science program\\Labs 3'

In [None]:
path_auth = 'C:/Users/patri/OneDrive/UTS and personal doc 2022/Documents/Person Docs/Data science program/Labs 3/auth_reddit.json'
auth = json.loads(open(path_auth).read())
pp = pprint.PrettyPrinter(indent=4)
# For debugging only:

pp.pprint(auth)


In [8]:
my_user_agent = auth['my_user_agent']
my_client_id = auth['my_client_id']
my_client_secret = auth['my_client_secret']

Security considerations:
- this method only keeps your credentials invisible as long as nobody else gets access to this notebook file
- if you wanted another user to have access to the executable notebook without divulging your credentials you should set up an OAuth 2.0 workflow to let them obtain and apply their own API tokens when using your app
- if you just want to share your analyses, you could use a separate script (which you don't share) to fetch the data and save it locally, then use a second notebook (with no API access) to load and analyse the locally stored data

### 5. Exploring the API

Here is how to connect to Reddit with read-only access:

In [9]:
reddit = praw.Reddit(client_id = my_client_id,
                     client_secret = my_client_secret,
                     user_agent = my_user_agent)

print('Read-only = ' + str(reddit.read_only))  # Output: True

Read-only = True


In the next cell, put the cursor after the '.' and hit the [tab] key to see the available members and methods in the response object:

In [None]:
reddit.

In [11]:
subreddit_name = 'malaysia'
subreddit = reddit.subreddit(subreddit_name)

In [13]:
comments = []
for comment in subreddit.comments(limit=1000):
    comments.append(comment)

In [14]:
reddit.subreddit(subreddit_name)

Subreddit(display_name='malaysia')

Consult the PRAW and Reddit API documentation. Print a few of the response members below:

In [16]:
top_posts = subreddit.hot(limit=10)

# Print response members
for post in top_posts:
    print(f"Title      : {post.title}")
    print(f"Author     : {post.author}")
    print(f"Score      : {post.score}")
    print(f"URL        : {post.url}")
    print(f"Subreddit  : {post.subreddit}")
    print(f"Created    : {post.created_utc}")
    print("-" * 60)

Title      : KL Board Game Night Meetup - Friday 25 July 7.30pm, Vivae Board Games Cafe
Author     : Yugie
Score      : 17
URL        : https://www.reddit.com/r/malaysia/comments/1m61e25/kl_board_game_night_meetup_friday_25_july_730pm/
Subreddit  : malaysia
Created    : 1753148564.0
------------------------------------------------------------
Title      : /r/Malaysia daily random discussion and quick questions thread for 24 July 2025
Author     : AutoModerator
Score      : 0
URL        : https://www.reddit.com/r/malaysia/comments/1m7jj6d/rmalaysia_daily_random_discussion_and_quick/
Subreddit  : malaysia
Created    : 1753300838.0
------------------------------------------------------------
Title      : M40s and T20s of r/malaysia..
Author     : abdulsamri89
Score      : 402
URL        : https://i.redd.it/6erabazoeref1.png
Subreddit  : malaysia
Created    : 1753335524.0
------------------------------------------------------------
Title      : PSA: Dont buy from a Shopee seller without a 

Content in Reddit is grouped by topics called "subreddits". Content, called "submissions", is fetched by calling the `subreddit` method of the connection object (which is our `reddit` variable) with an argument that matches an actual topic.

We also need to append a further method call to a "subinstance", such as one of the following:

- controversial
- gilded
- hot
- new
- rising
- top

One of the submission objects members is `title`. Fetch and print 10 submission titles from the 'learnpython' subreddit using one of the subinstances above:

In [17]:
for submission in reddit.subreddit('learnpython').hot(limit=10):
    print(submission.title)

Ask Anything Monday - Weekly Thread
Just my first usable project
uv run ModuleNotFoundError despite pandas being installed in .venv (Windows)
MBA Student New to Python – Need Guidance for Using It in Finance
% works differently on negative negative numbers in python
Humble suggestion:  Please fix or otherwise resolve a non-working link in this subreddit's wiki
Want to learn python, need advice
High Level Python Programmer in 2 years
Bulk link tracking
Making sure i understand how a "for" statement works


Now retrieve 10 authors:

In [18]:
for submission in reddit.subreddit('learnpython').hot(limit=10):
    print(submission.author)

AutoModerator
LiMe2116
Fun_Signature_9812
Ok_Royal4131
o_genie
JonBarPoint
MadFaceInvasion
therottingCinePhile
uniqueusername42O
AlmightyAntwan12


Note that we obtained the titles and authors from separate API calls. Can we expect these to correspond to the same submissions? If not, how could we gurantee that they do?

In [24]:
submissions = reddit.subreddit('learnpython').hot(limit=10)
for submission in submissions:
    print("Author: {} | Title: {}".format(submission.author, submission.title))

Author: AutoModerator | Title: Ask Anything Monday - Weekly Thread
Author: LiMe2116 | Title: Just my first usable project
Author: o_genie | Title: % works differently on negative negative numbers in python
Author: Fun_Signature_9812 | Title: uv run ModuleNotFoundError despite pandas being installed in .venv (Windows)
Author: MadFaceInvasion | Title: Want to learn python, need advice
Author: Ok_Royal4131 | Title: MBA Student New to Python – Need Guidance for Using It in Finance
Author: JonBarPoint | Title: Humble suggestion:  Please fix or otherwise resolve a non-working link in this subreddit's wiki
Author: therottingCinePhile | Title: High Level Python Programmer in 2 years
Author: uniqueusername42O | Title: Bulk link tracking
Author: AlmightyAntwan12 | Title: Making sure i understand how a "for" statement works


Why doesn't the next cell produce output?

In [25]:
for submission in submissions:
    print(submission.comments)

Answer: it doesn’t print the actual comments. That’s because submission.comments is not a simple list — it’s a special PRAW object (CommentForest) that needs to be explicitly loaded or traversed.

In [26]:
# The API is lazy, and submissions is a generator -- not a data structure:
submissions
# it must be refreshed in the same cell that invokes its output.

<praw.models.listing.generator.ListingGenerator at 0x1e1e88d29d0>

Print two comments associated with each of these submissions:

In [27]:
submissions = reddit.subreddit('learnpython').hot(limit=10)
for submission in submissions:
    top_level_comments = list(submission.comments)
    all_comments = submission.comments.list()[:2]
    for comment in all_comments:
        print(comment.body)

As someone learning, the most important thing to focus should be what have you learnt from the project.

what were the errors or issues you faced and how did you overcome them ?

those are more important than the project final version.
I would say there were not that much errors except for sqlite3. I just completed a class on oop so I was able to go through with that easily. Also I had learned sql a few years back and I wanted to try using it again
In JavaScript, `%` is called the remainder operator, and in Python it is called the modulo operation, but this doesn't really change the way in which they work. They just have different conventions as to how to handle negative numbers. However, mathematically they are equivalent. 

Mathematically, the result of modulo division is actually an equivalence class, not a single value. The value shown as result of this operation in different programming languages is simply a matter of convention. In maths, it's usually the smallest nonnegative int

Referring to the API documentation, explore the submissions object and print some interesting data:

In [28]:
for submission in reddit.subreddit('learnpython').hot(limit=5):
    print(f"Title           : {submission.title}")
    print(f"Author          : {submission.author}")
    print(f"Score           : {submission.score}")
    print(f"Upvote Ratio    : {submission.upvote_ratio}")
    print(f"Number of Comments : {submission.num_comments}")
    print(f"Subreddit       : {submission.subreddit}")
    print(f"Created (UTC)   : {submission.created_utc}")
    print(f"URL             : {submission.url}")
    print(f"Is Self Post?   : {submission.is_self}")
    print(f"Post Hint       : {submission.post_hint if hasattr(submission, 'post_hint') else 'N/A'}")
    print(f"NSFW?           : {submission.over_18}")
    print(f"Stickied?       : {submission.stickied}")
    print("-" * 60)

Title           : Ask Anything Monday - Weekly Thread
Author          : AutoModerator
Score           : 1
Upvote Ratio    : 0.67
Number of Comments : 0
Subreddit       : learnpython
Created (UTC)   : 1753056054.0
URL             : https://www.reddit.com/r/learnpython/comments/1m543pq/ask_anything_monday_weekly_thread/
Is Self Post?   : True
Post Hint       : N/A
NSFW?           : False
Stickied?       : True
------------------------------------------------------------
Title           : Just my first usable project
Author          : LiMe2116
Score           : 3
Upvote Ratio    : 0.72
Number of Comments : 2
Subreddit       : learnpython
Created (UTC)   : 1753339561.0
URL             : https://www.reddit.com/r/learnpython/comments/1m7x8o9/just_my_first_usable_project/
Is Self Post?   : True
Post Hint       : self
NSFW?           : False
Stickied?       : False
------------------------------------------------------------
Title           : % works differently on negative negative numbers in

#### Posting to Reddit

To be able to post to your Reddit account (i.e. contribute submissions), you need to connect to the API with read/write privilege. This requires an *authorised instance*, which is obtained by including your Reddit user name and password in the connection request:

In [29]:
reddit = praw.Reddit(client_id='my client id',
                     client_secret='my client secret',
                     user_agent='my user agent',
                     username='my username',
                     password='my password')
print(reddit.read_only)  # Output: False

False


You could hide these last two credentials by adding them to your JSON file and then reading all five values at once.

>
>


>
>




---



---



> > > > > > > > > © 2025 Institute of Data


---



---



