Reddit Opinion Mining and Sentiment Analysis

A project written in R and Python to mine a Reddit corpus.

Requirements

Python and its dependencies

Python 3
PRAW
requests
bs4
numpy
fuzzywuzzy
nltk
matplotlib

Recommended: Install python related packages in a virtual environment.

Install using pip install -U <package-name>. NLTK also requires that you install the corpuses for tokens and stopwords for the English language.

R and its dependencies

R
sna
ggnetwork
svglite
igraph
intergraph
rsvg
ggplot2

Install using install.packages(<package-name>).

Obtaining Reddit API access credentials

Create a Reddit account, and while logged in, navigate to preferences > apps
Click on the Are you a developer? Create an app... button
Fill in the details-
- name: Name of your bot/script
- Select the option 'script'
- description: Put in a description of your bot/script
- redirect uri: http://localhost:8080
Click on Create App.
You will be given a client_id and a client_secret. Keep them confidential.

Extracting edge data from the Pushshift Reddit dataset

Sign up / login on Google BigQuery.
Select or create a new project and click on 'Compose Query'.
Paste the contents of the SQL script in the folder subreddit-viz in the editor and run it.
Download the generated CSV file as reddit-edge-list.csv and save it within the subreddit-viz folder.

Running the scripts

To obtain the subreddit visualizations, run the R script using R CMD BATCH reddit.R. Make sure to create an empty folder called subreddit-groups in the same folder as the script.

Create a file named praw.ini with it's contents as:

[<bot-name>]
username: reddit username
password: reddit password
client_id: client_id that you got
client_secret: client_secret that you got

Run the script getdata.py via python3 getdata.py.
It should scrape all the necessary data in approximately 20-25 minutes.
Run analysis.py using python3 analysis.py [args]. The arguments the script accepts are -
- no arguments - Runs sentiment analysis on the entire data.
- -h or --help - Prints the usage details.
- -w string type or --words string type - Generates a word distribution of the given string and type - positive or negative. Requires that sentiment analysis for the particular term already be performed previously.
- string - Looks for similar strings in the corpus and performs sentiment analysis on it.

Credits

Max Woolf's blog post on subreddit visualizations was of great help in making this project.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
sentiment-analyses		sentiment-analyses
subreddit-viz		subreddit-viz
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
report.pdf		report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sentiment-analyses

sentiment-analyses

subreddit-viz

subreddit-viz

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

report.pdf

report.pdf

Repository files navigation

Reddit Opinion Mining and Sentiment Analysis

Requirements

Python and its dependencies

R and its dependencies

Obtaining Reddit API access credentials

Extracting edge data from the Pushshift Reddit dataset

Running the scripts

Credits

About

Releases

Packages

Languages

License

pranau97/reddit-opinion-mining

Folders and files

Latest commit

History

Repository files navigation

Reddit Opinion Mining and Sentiment Analysis

Requirements

Python and its dependencies

R and its dependencies

Obtaining Reddit API access credentials

Extracting edge data from the Pushshift Reddit dataset

Running the scripts

Credits

About

Topics

Resources

License

Stars

Watchers

Forks

Languages