In [1]:
import warnings

import pandas as pd
import numpy as np

from tqdm.auto import tqdm
from tqdm import tqdm_notebook

warnings.filterwarnings("ignore")

In [2]:
df=pd.read_csv("datasets/reddit_posts_with_topics_keywords.csv", low_memory=False)
df.head()

Unnamed: 0,post_id,title,timestamp,body,body_type,topic_name,cleaned_text,keywords
0,1b0m6c1,r/CasualConversation is looking for new modera...,2024-02-26 16:48:14,"Hello everyone,It's that time again, [we're lo...",post,22_subs_subreddits_trolls_subreddit,hello everyone time look new mod join team doc...,"modmail courteous professional,grow strict quo..."
1,1b5h6x7,Deleting social media was one of the best thin...,2024-03-03 13:29:12,I know that technically reddit is a social med...,post,1_fb_deleted_instagram_facebook,know technically reddit social medium not nega...,"consider delete social,instagram tiktok,health..."
2,1b5lu9j,Lonlieness is not about gender! My Opinion.,2024-03-03 16:56:09,Okay so I just wanted to get this off of my mi...,post,359_loneliness_lonely_epidemic_intiate,okay want get mind keep see people post man lo...,"man loneliness epidemic,understand gender thin..."
3,1b5jo0m,Have you gave up any hobbies?,2024-03-03 15:23:08,I used to be a doll collector. Then when I was...,post,-1_breakfast_cream_listening_bus,use doll collector tell weird old donate every...,"use doll collector,throw away hobby,day happy ..."
4,1b5c8ah,Is it me or has the world just stopped moving ...,2024-03-03 08:24:06,"This is strange, but I feel like:1) I don‚Äôt re...",post,-1_breakfast_cream_listening_bus,strange feel like not remember anything happen...,"good year covid,like remember happen,people tr..."


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 131952 entries, 0 to 131951
Data columns (total 8 columns):
 #   Column        Non-Null Count   Dtype 
---  ------        --------------   ----- 
 0   post_id       131952 non-null  object
 1   title         2285 non-null    object
 2   timestamp     131952 non-null  object
 3   body          131952 non-null  object
 4   body_type     131952 non-null  object
 5   topic_name    131952 non-null  object
 6   cleaned_text  131952 non-null  object
 7   keywords      131175 non-null  object
dtypes: object(8)
memory usage: 8.1+ MB


# Sentiment Classification

In [4]:
# https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest
from transformers import pipeline

MODEL="cardiffnlp/twitter-roberta-base-sentiment-latest"

sentiment_predict_pipe = pipeline(
    "sentiment-analysis",
    model=MODEL,
    tokenizer=MODEL,
    max_length=512,
    padding=True,
    truncation=True
)




In [6]:
# Positive Sentiment Text
text1 = df.loc[1, 'body']
print(text1)
print()
# predict_sentiment(text=text1)
sentiment_predict_pipe(text1)

I know that technically reddit is a social media but for me it doesn't negatively effect my mental health, in fact it helps me quite a lot and I have gained quite a bit of knowledge from surfing it  Almost two months ago I deleted Instagram and Tiktok and honestly it was super difficult the first few days but..  \- I had time to do things that I would otherwise have not done due to scrolling endlessly. I started working out and I am super proud of myself for that  \- I don't really care about my looks as much anymore  \- My screen time has decreased by quite a lot, I have social interactions in person way more now  So I guess what I'm saying is if you've been considering deleting social media for whatever reason, do it.



[{'label': 'positive', 'score': 0.8177728056907654}]

In [7]:
# Negative Sentiment Text
text2 = df.loc[63, 'body']
print(text2)
print()
# predict_sentiment(text=text2)
sentiment_predict_pipe(text2)

My birthday is Tuesday, I was supposed to be driving down to see my parents for an extra long weekend to celebrate and hang out but the weather took a turn for the worst. Now I'm trying to see if any of my friends may want to hang out this weekend instead.I remember feeling this same sense of disappointment when I was a kid and sometimes my friends who lived on farms would have to cancel on my birthday party due to weather conditions.I know it's no one's fault and there's nothing that can be done but right now I just feel like that sad disappointed little girl. It's all rushing back. I wanted to see the dog and the cat this weekend and eat angel food cake üò≠ü•≤Anyways.... those of you with summer birthdays, what's that like LOL?



[{'label': 'negative', 'score': 0.7702678442001343}]

In [8]:
# Neutral Sentiment Text
text3 = df.loc[116, 'body']
print(text3)
print()
# predict_sentiment(text=text3)
sentiment_predict_pipe(text3)

Pretty much what the title says. English isn't my first language, but I've pretty much dealt with it since I was a baby. Never thought I would write in my life, but it just came to me, so why not. üòÇü§∑üèº‚Äç‚ôÇÔ∏è Didn't wanna post on poetry or poetry critique subs, since I don't consider myself a poet. Let me know what y'all think. Thanks! üòÅHere goes, I was never born. I will never die. I walk the shadows and bask in the light. To glare into nothingness, to find oneself within. While I climb and fall as I tug at fate's strings.



[{'label': 'neutral', 'score': 0.6266416907310486}]

In [9]:
# Define the batch size
batch_size = 64

# Load the tweets into memory
tweets = df.loc[:, 'body'].to_list() 

# Classify the tweets in batches
results = []
for i in tqdm(range(0, len(df), batch_size)):
    
    batch = tweets[i:i+batch_size]
    sentiments = sentiment_predict_pipe(batch)
    results.extend(sentiments)

  0%|          | 0/2062 [00:00<?, ?it/s]

In [10]:
df['sentiment']=results

In [15]:
(
    df.pipe(lambda x: x.join(pd.json_normalize(x['sentiment'])))
    .rename(columns={'label':'sentiment_label', 'score':'sentiment_score'})
    .drop('sentiment', axis=1)
    .to_csv("datasets/reddit_posts_with_topics_keywords_sentiments.csv", index=False)
)