# Analysis of Reddit WallStreetBets Posts: Revealing Market Trends and Investor Sentiment

## Table of Contents

- [Introduction](#Introduction)
- [Data Description](#Data-Description)
- [Market Trends Analysis](#Market-Trends-Analysis)
- [Investor Sentiment Analysis](#Investor-Sentiment-Analysis)
- [Conclusion](#Conclusion)
- [Acknowledgements](#acknowledgements)


## Introduction
With the widespread use of social media, the WallStreetBets (WSB) subreddit on Reddit has become a focal point for investors worldwide. This analysis aims to study the WSB posts dataset obtained from Kaggle to reveal market trends and investor sentiment. We will focus on the following key indicators: post title (title), score (score), number of comments (comms_num), and post creation time (created).

### import necessary python library

In [10]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import spacy
from string import punctuation
from collections import Counter
from nltk.sentiment import SentimentIntensityAnalyzer

In [3]:
# load data
wsb = pd.read_csv('reddit_wsb.csv')
wsb.head()

Unnamed: 0,title,score,id,url,comms_num,created,body,timestamp
0,"It's not about the money, it's about sending a...",55,l6ulcx,https://v.redd.it/6j75regs72e61,6,1611863000.0,,2021-01-28 21:37:41
1,Math Professor Scott Steiner says the numbers ...,110,l6uibd,https://v.redd.it/ah50lyny62e61,23,1611862000.0,,2021-01-28 21:32:10
2,Exit the system,0,l6uhhn,https://www.reddit.com/r/wallstreetbets/commen...,47,1611862000.0,The CEO of NASDAQ pushed to halt trading “to g...,2021-01-28 21:30:35
3,NEW SEC FILING FOR GME! CAN SOMEONE LESS RETAR...,29,l6ugk6,https://sec.report/Document/0001193125-21-019848/,74,1611862000.0,,2021-01-28 21:28:57
4,"Not to distract from GME, just thought our AMC...",71,l6ufgy,https://i.redd.it/4h2sukb662e61.jpg,156,1611862000.0,,2021-01-28 21:26:56


In [4]:
print(wsb.shape[0],"posts \n")

print('average score:',wsb['score'].mean(),'\n'
      'highest score:',wsb['score'].max(), '\n'
      'lowest score:', wsb['score'].min(),'\n')

print('average comment number:',wsb['comms_num'].mean(),'\n'
     'highest comment number:',wsb['comms_num'].max(),'\n'
     'lowest comment number:',wsb['comms_num'].min(),'\n')

print('earliest post:',wsb.iloc[0,7],'\n'
      'newest post:' , wsb.iloc[-1,7])

53187 posts 

average score: 1382.461052512832 
highest score: 348241 
lowest score: 0 

average comment number: 263.2602515652321 
highest comment number: 93268 
lowest comment number: 0 

earliest post: 2021-01-28 21:37:41 
newest post: 2021-08-02 12:00:14


## Data Description

First, we perform descriptive statistical analysis on the dataset to understand the basic characteristics of the posts.

- Number of posts: There are 53187 posts in the dataset.
- Score distribution: The average score of the posts is 1382, with the highest score being 348241 and the lowest score being 0.
- Comments distribution: The average number of comments per post is 263, with the highest number of comments being 93268 and the lowest being 0.
- Creation time distribution: The creation time span of the posts ranges from 2021-01-28 to 2021-08-02.


## Market Trends Analysis

Next, we will analyze the market trends found within WSB posts.

- Popular stocks: By counting the occurrence frequency of stock ticker symbols mentioned in post titles, we can identify the most popular stocks in the market.



In [5]:

nlp = spacy.load("en_core_web_lg", disable=["ner", "textcat"])

def process(text):
    text = ''.join(c for c in text if c not in punctuation)
    tokens = [token.lemma_.lower() for token in nlp(' '.join(text.split())) if token.lemma_.lower() not in nlp.Defaults.stop_words]
    return tokens

title_comb = (process(title) for title in wsb['title'])

all_titles_processed = [token for title_tokens in title_comb for token in title_tokens]

title_counter = Counter(all_titles_processed)

In [6]:
title_counter.most_common(100)

[('🚀', 18086),
 ('gme', 8783),
 ('buy', 6274),
 ('💎', 5666),
 ('hold', 5068),
 ('amc', 3492),
 ('robinhood', 3171),
 ('stock', 3095),
 ('sell', 2966),
 ('🙌', 2234),
 ('yolo', 2102),
 ('share', 2088),
 ('short', 2041),
 ('moon', 1905),
 ('like', 1843),
 ('🦍', 1526),
 ('let', 1494),
 ('m', 1465),
 ('today', 1462),
 ('market', 1407),
 ('bb', 1360),
 ('dip', 1304),
 ('time', 1261),
 ('fuck', 1241),
 ('loss', 1221),
 ('day', 1131),
 ('wsb', 1126),
 ('dd', 1115),
 ('nok', 1111),
 ('’', 1095),
 ('money', 1079),
 ('week', 1055),
 ('ape', 1015),
 ('retard', 975),
 ('good', 971),
 ('update', 960),
 ('right', 952),
 ('"', 949),
 ('think', 927),
 ('squeeze', 922),
 ('know', 903),
 ('new', 896),
 ('hand', 883),
 ('gain', 879),
 ('2021', 871),
 ('trading', 857),
 ('need', 855),
 ('price', 850),
 ('gamestop', 848),
 ('fucking', 846),
 ('trade', 846),
 ('guy', 842),
 ('come', 833),
 ('look', 801),
 ('big', 797),
 ('stop', 792),
 ('want', 772),
 ('fund', 772),
 ('lose', 755),
 ('play', 743),
 ('hedge',

- Market sentiment: By analyzing changes in post scores and comment counts, we can gauge the overall market sentiment. For example, an increase in scores and comment counts may indicate high market sentiment, while a decrease may indicate low sentiment.

In [18]:

sia = SentimentIntensityAnalyzer()
def get_sentiment_score(text):
    sentiment = sia.polarity_scores(text)
    return sentiment['compound']
wsb['sent_score'] = wsb['title'].apply(get_sentiment_score)
mean_sentiment=wsb['sent_score'].mean()
if mean_sentiment > 0.05:
    market_sentiment = "High"
elif mean_sentiment < -0.05:
    market_sentiment = "Low"
else:
    market_sentiment = "Neutral"

print("Market Sentiment:", market_sentiment, '\nsimple average sentiment analysis score:', mean_sentiment)

Market Sentiment: High 
simple average sentiment analysis score: 0.05002351702483689


## Investor Sentiment Analysis

To gain a deeper understanding of investor sentiment, we will analyze the content of WSB posts.

- Keyword extraction: Using natural language processing techniques, we can extract keywords from the post bodies to reveal investors' focuses and sentiments.
- Sentiment analysis: By employing sentiment analysis tools, we can assess the emotional tone of the posts, such as positive, negative, or neutral. This will help us understand the emotional changes of investors in different markets.

## Conclusion

Through the analysis of Reddit WallStreetBets posts, we can better understand market trends and investor sentiment. This information is valuable for investors, analysts, and policymakers alike. In future research, we can further explore these data to uncover more insights about market dynamics and investor behavior.