# Reddit's Most Discussed Stocks
In late January of 2021, Reddit, specifically the subreddit Wallstreetbets, made national headlines with the now famous 'meme' stock of Gamestop.  On Wallstreetbets, it was noticed that the short interest float in Gamestop was over 100% which led to a short squeeze that sent the price of Gamestop from around 4 USD all the way to almost 500 USD.  Given this incredible gain, retail investors flooded the market and the userbase of Wallstreetbets grew by millions.

Given the success of Gamestop, there have been many stocks that have followed in similar patterns.  As an investor, one would like to stay ahead of the masses and try to pick stocks that may be just gaining interest.  It's very hard to constantly be on these subreddits trying to pick the right stocks.

In this project, we utilize Reddit's API to extract what stocks are being discussed most frequently in different subreddits or in actual threads.  In the program we offer the user many different options on where to look, when, to include penny stocks or not, and then we save a dataframe with more information and then a stock ticker count.  The program I wrote is proprietary, so I will only be running and discussing the program for illustrative purposes.

Hopefully, by automating the stock ticker counts, we can then research the stocks and make educated decisions on what could possibly be a lucrative stock to invest in.

# Using the Program
The program begins by offering the user a few different options.  We ask if they want to scrape a Thread or Subreddit.  When browsing Reddit, a Subreddit is a section of Reddit dedicated to a topic.  For example, Wallstreetbets is a Subreddit.  Within this Subreddit, there are many different Threads.  Wallstreetbets has daily discussions that are very popular.  A couple of extremely popular Threads are the "Daily Discussion Thread For (specific date)" and "What Are Your Moves For Tomorrow, (specific date)".  Both of those typically get around 20,000 comments in a day.  

When the user chooses Thread, they need to input the exact Thread link.  If they choose Subreddit, they need to specify which one, ie Wallstreetbets, Stocks, Investing, etc.  Continuing with a Subreddit, the user needs to specify what day they are interested in to scrape all the Thread titles for that day.  Then they'll be asked if they want to include the main post for each Thread.  The main post is the section of the Thread where the creator of the Thread includes their post.

Regardless of whether the user chooses Thread or Subreddit, the next question will be if they want to include Penny Stocks.  Penny Stocks are very risky and volatile.  They also aren't listed on the main Stock Exchanges and some platforms don't even let you trade them.  Due to this, Subreddits like Wallstreetbets and Stocks ban those tickers from being discussed.  So the user may not want to include these since they're not supposed to be discussed in these Subreddits.  However, if the user is interested in Subreddits such as Pennystocks, they'll want to include these.

Lastly, the user is asked what name they want to save the Dataframe as.  Once the program has scraped and counted all tickers, the user can have access to the stock ticker counts and the full dataframe containing all the comments to explore further.

# How the Program Works
The program utilizes the PRAW library to converse with Reddit's API to scrape the data.  On the backend of PRAW, when scraping the data, PRAW helps by automatically parsing the JSON and extracting out what the user is requesting.  This can be anything like comments, titles, dates, etc.

Based on what the user asked to scrape for, the program will then compile a Dataframe and print out the Titles, Comments, etc.  For display purposes in Jupyter, we limited this to the first 10 posts.  Once the Dataframe is compiled, we save it as a CSV file.

Next, we import all of the Stock Exchanges Tickers and Security Names.  If the user requested to include Penny Stocks, we then import those as well.  We do some data cleaning to extract out just the Ticker and Security Name in the format that will help with the counting.  We have a list of frequently misclassfied Tickers that we then drop from the Dataframe to help keep the counts accurate at the end of the program.  Once clean, we create a dictionary with all the Ticker's and Security name's with a 0 count.  Using regular expressions, we then loop through the Dataframe and anytime a Ticker or Security Name is mentioned, we increase the count.  

Once the loop is complete, we sort the values and print out the top 20 tickers discussed.  For fun, we display how long the program took to run and then exit.  The user now as access to the Ticker Count and Master DataFrame CSV files to explore further.

# Running the Program on a Subreddit
We will now run the program for a Subreddit's Titles and Main Posts.  Again for illustration purposes, we limited the print out of Post Titles to the first 10 posts.

In [4]:
%run reddit_j.py

Thread or Subreddit?: subreddit
Which subreddit? ie. wallstreetbets: wallstreetbets
What day? Must be in the last few days (9,10,11,etc.): 18
Include main post?: yes
Include Penny Stocks?: no
Save DataFrame under what filename?: wsb_titles+MP_June_18_2021
Extracting comments...
------------------------------------------------------------
POST TITLE#  1 :  Guys SPY usually recovers within a month right?
POST TITLE#  2 :  AMZN PrimeDay Calls. Moonday Tuesday Print?
POST TITLE#  3 :  $WISH 🚀 ALL IN with my father’s inheritance! IMMA MAKE YOU PROUD POPS rip 🥲
POST TITLE#  4 :  I just bet against SPY and the DOW based on some DD. Am I stupid?
POST TITLE#  5 :  Wall Street Week Ahead for the trading week beginning June 21st, 2021
POST TITLE#  6 :  It ain’t much but YOLO quad witching w/ NVDA
POST TITLE#  7 :  Life Savings $WISH Yolo
POST TITLE#  8 :  SP500 Winners and Losers | 6/18/2021
POST TITLE#  9 :  WISH 45 C 7/16
POST TITLE#  10 :  I started monday with 5 contracts, kept doubling down 

It looks like the program was a success!  It only took 1 minute and 37 seconds and it extracted out the most discussed Tickers.  We can see that within the Title + Main Post's for June 18th on Wallstreetbets, AMC, UWMC, HSY, PALANTIR, and DKGN were the top 5.  We can see that PLTR (the actual Ticker for Palantir) was also in the top 10. Combining those two together acutally puts PLTR as the most talked about.

# Running the Program on a Thread
We will now run the program on a Thread.  With threads typically having thousands of posts, the more data we get, the more accurate our program will become.  Since Wallstreetbets really started this movement and the Daily Discussion Threads offer the most comments, this will give us a better picture on what's being discussed the most. 

Let's run the program on the "What are your Moves Tomorrow, June 21, 2021".  Again for illustration purposes, we limited the print out of comments to the first 10 posts.

In [6]:
%run reddit_j.py

Thread or Subreddit?: thread
What is the full Reddit Thread Link?: https://www.reddit.com/r/wallstreetbets/comments/o4dpla/what_are_your_moves_tomorrow_june_21_2021/
Include Penny Stocks?: no
Save DataFrame under what filename?: wsb_moves_tom_June_21_2021
Extracting comments...
------------------------------------------------------------
TITLE:  What Are Your Moves Tomorrow, June 21, 2021
------------------------------------------------------------
MAIN POST:  Your daily trading discussion thread. Please keep the shitposting to a minimum.  .......
------------------------------------------------------------
Extracting comments...
POST#  1 :  Seeing doom and gloom about futures and the market here in wsb Sunday night.

Wake up Monday to green everywhere. 

This is the way.
POST#  2 :  Lehman Brothers and Bear Stearns seem like obvious buys.
POST#  3 :  GUH 

Sorry guys I’m practicing for tomorrow
POST#  4 :  It’s only when a mosquito lands on your balls that you realize there is always 

Success!  The program took 34 minutes and 27 seconds to finish and our top 5 Stock Tickers are Spy, GME, WKHS, CLOV, and CLF.  Now that we ran this, let's read in the master dataframe to take a look at the first 5 rows.

In [10]:
import pandas as pd
pd.set_option('display.max_rows', None, 'display.max_columns', None)
wsb_moves = pd.read_csv('wsb_moves_tom_June_21_2021/wsb_moves_tom_June_21_2021.csv')

In [11]:
wsb_moves.head()

Unnamed: 0.1,Unnamed: 0,body
0,0,"What Are Your Moves Tomorrow, June 21, 2021"
1,1,Your daily trading discussion thread. Please k...
2,2,Seeing doom and gloom about futures and the ma...
3,3,Lehman Brothers and Bear Stearns seem like obv...
4,4,GUH \n\nSorry guys I’m practicing for tomorrow


Given that above we saw that SPY was the number one most discussed stock, it may be helpful to filter the dataframe on SPY and see what the comments look like for that ticker.

In [14]:
SPY = wsb_moves[wsb_moves['body'].str.contains('SPY')]
SPY.head(20)

Unnamed: 0.1,Unnamed: 0,body
76,76,I don't understand why 10Y is dropping so hard...
90,90,"The rate hike is already socialized, SPY 422 eow"
128,128,SPY collapse tomorrow. I’m selling everything...
139,139,You ever made 10x on SPY calls? Well this is y...
158,158,No end in sight to the selloff. No apparent sh...
195,195,SPY is barely even down. Why are people so sca...
201,201,"Guys, historically SPY always bounces off the ..."
251,251,"Guh, futures are bright red…Guess I better sta..."
271,271,"i’m down to my last $500 — started with $2k, w..."
277,277,Fuck SPY it is QQQ summer


Looking at the first 20 comments about SPY, the overall sentiment seems to be negative.

# Next Steps
I am constantly updating this program and it needs to evolve to become better and more useful.  One next step that I am currently working on is to have the count combine the Ticker and Security Name to fix the above issue in the Subreddit Ticker Count (ie. PALANTIR AND PLTR).  

The most important next step I'm working on is adding a Machine Learning algorithm to analyze the overall sentiment for each Ticker discussed.  For example, above, looking at the first 20 comments on SPY, the sentiment seemed negative.  The goal of the algorithm would be to go through all of the comments for each ticker and conclude as to whether the sentiment is negative or positive.  

Knowing how many times a Stock is mentioned is helpful to pick up on trends.  However, to make the program more useful, knowing the overall sentiment of the stock can help the investor make a smarter purchase.  Knowing the sentiment and can help lead the investor to know whether to buy or sell, or when to utilize options such as Puts and Calls.