### Imports
First is the importation of pandas for handling our data and NLTK, a natural language toolkit for Python. With the NLTK download function, VADER can be installed. The sentiment analyzer VADER is specifically trained on social media and news data, so it aligns well as a tool for extracting sentiment scores for each post.

In [15]:
import pandas
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

nltk.download("vader_lexicon")

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\darks\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

### Setup

Next, the preprocessed data is read into a DataFrame. A check punctuation function is also created to add punctuation to the end of a post title if necessary. The sentence break allows the sentiment analyzer to better parse the full text when the title and post body are joined.

In [16]:
scores = []
df = pandas.read_csv("data/preprocessed_data.csv")

def checkPunctuation(title):
    last = title[-1]
    if last == "." or last == "?" or last == "!":
        return title
    else:
        return title + "."

### Score Analysis
With the data cleaned, each entry will have its title and body text joined and parsed. The sentiment analysis uses a lexicon-based approach to generate scores to be stored in a list. The compound score, which aggregates the total positive and negative sentiment, is added onto the dataframe as a new column.

In [17]:
for title, post_text in zip(df["Title"], df["Text"]):
    body = "" if pandas.isna(post_text) else post_text
    full_text = checkPunctuation(title) + " " + body
    polarity_score = SentimentIntensityAnalyzer().polarity_scores(full_text)
    scores.append(polarity_score)

df["Sentiment"] = pandas.DataFrame(scores)["compound"]

In [18]:
df

Unnamed: 0,Id,Date,Title,Text,URL,Sentiment
0,1g11s6m,2024-10-11,Tesla Robovan,,https://v.redd.it/edo1hio122ud1,0.0000
1,1g3rnc9,2024-10-14,"Tesla's $30,000 Robotaxi Hits Major Speed Bump...",,https://www.forbes.com.au/news/innovation/tesl...,-0.1280
2,1grqiw1,2024-11-15,Those who think removing the EV tax credit wil...,"1. Trump removes $7,500 EV tax credits and imp...",https://www.reddit.com/r/wallstreetbets/commen...,-0.6360
3,1hccdd0,2024-12-12,What it feels like shorting Tesla now... My pu...,,https://i.redd.it/nmrpww419c6e1.jpeg,0.3612
4,1gu1zp8,2024-11-18,Tesla stock pops 8% in premarket after report ...,,https://www.cnbc.com/2024/11/18/tesla-tsla-sto...,0.4404
...,...,...,...,...,...,...
1075,1buswl2,2024-04-03,Fancy Names Intel vs Carlisle,I’ve always been proud of owning Intel in a DR...,https://www.reddit.com/r/investing/comments/1b...,0.8271
1076,1akd4ss,2024-02-06,Roth IRA Advice needed please,Hello allI recently opened a Roth IRA account ...,https://www.reddit.com/r/investing/comments/1a...,0.7777
1077,1ay2itr,2024-02-23,Suggestions for selling stock,I have around $300k stock of the company I wor...,https://www.reddit.com/r/investing/comments/1a...,0.9100
1078,1axmuim,2024-02-23,"I already have a Roth IRA, invest half of savi...","I’m 22, soon to be 23, and currently have $32,...",https://www.reddit.com/r/investing/comments/1a...,0.0000
