In [None]:
!pip install datasets



In [None]:
from datasets import load_dataset
import pandas as pd
import re
import spacy
from tqdm.notebook import tqdm
import torch
from transformers import RobertaTokenizer, RobertaForSequenceClassification, pipeline


## Load the Dataset

In [None]:
dataset = load_dataset("CarperAI/openai_summarize_tldr")
train_dataset = dataset['train']
test_dataset = dataset['test']
train_dataset[1]

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


{'prompt': 'SUBREDDIT: r/loseit\nTITLE: SV & NSV! Keeping on keeping on.\nPOST: 30F, 5\'6". SW: 236 GW: 150 CW: 219\n\nI weigh myself weekly and measure myself monthly. I\'d hit a plateau the last four weeks or so where I was stuck at 222. Felt like kind of a bummer, but knew it\'s because I haven\'t been as strict as I should with my diet, and the last week and a half have been crazy with life things, so I haven\'t been exercising as frequently as I\'ve gotten used to. When I weighed myself as normal on Monday, I was kind of disappointed to see the scale not budging and figured it was time to buckle down again and really watch my diet. Today was my measure-in day, and I\'ve felt cruddy in general since Monday because I caught some chest congestion/cold bug over the weekend. I get on the scale...it says 219. Whaaaaat? I take my measurements, which are down slightly from last month, and with an total-body loss of 8 inches from my starting point on 12/23/14! Some of my clothes have been 

In [None]:
test_dataset.column_names

['prompt', 'label']

## Inspect Data

In [None]:
# inspect column names
print(f"Dataset Column Names:\n {dataset.column_names}")

# create a DataFrame and inspect
pd.set_option('display.max_colwidth', None) # For better insight
df = pd.DataFrame(train_dataset)
df.iloc[1]


Dataset Column Names:
 {'train': ['prompt', 'label'], 'test': ['prompt', 'label'], 'valid': ['prompt', 'label']}


Unnamed: 0,1
prompt,"SUBREDDIT: r/loseit\nTITLE: SV & NSV! Keeping on keeping on.\nPOST: 30F, 5'6"". SW: 236 GW: 150 CW: 219\n\nI weigh myself weekly and measure myself monthly. I'd hit a plateau the last four weeks or so where I was stuck at 222. Felt like kind of a bummer, but knew it's because I haven't been as strict as I should with my diet, and the last week and a half have been crazy with life things, so I haven't been exercising as frequently as I've gotten used to. When I weighed myself as normal on Monday, I was kind of disappointed to see the scale not budging and figured it was time to buckle down again and really watch my diet. Today was my measure-in day, and I've felt cruddy in general since Monday because I caught some chest congestion/cold bug over the weekend. I get on the scale...it says 219. Whaaaaat? I take my measurements, which are down slightly from last month, and with an total-body loss of 8 inches from my starting point on 12/23/14! Some of my clothes have been feeling a bit looser as of late and now I know it's just not in my head. I'm now the lightest and smallest I've been since right around high school!\nTL;DR:"
label,"Progress is still happening, even when you think it might not be! Don't get discouraged, even if your journey seems to be going slowly. Don't give up, warriors."


## Preprocess the Data

In [None]:
# we only want to use the prompt column
df = df.iloc[:, [0]]

df.iloc[:2]

Unnamed: 0,prompt
0,SUBREDDIT: r/relationships\nTITLE: I (f/22) have to figure out if I want to still know these girls or not and would hate to sound insulting\nPOST: Not sure if this belongs here but it's worth a try. \n\nBackstory:\nWhen I (f/22) went through my first real breakup 2 years ago because he needed space after a year of dating roand it effected me more than I thought. It was a horrible time in my life due to living with my mother and finally having the chance to cut her out of my life. I can admit because of it was an emotional wreck and this guy was stable and didn't know how to deal with me. We ended by him avoiding for a month or so after going to a festival with my friends. When I think back I wish he just ended. So after he ended it added my depression I suffered but my friends helped me through it and I got rid of everything from him along with cutting contact. \n\nNow: Its been almost 3 years now and I've gotten better after counselling and mild anti depressants. My mother has been out of my life since then so there's been alot of progress. Being stronger after learning some lessons there been more insight about that time of my life but when I see him or a picture everything comes back. The emotions and memories bring me back down. \n\nHis friends (both girls) are on my facebook because we get along well which is hard to find and I know they'll always have his back. But seeing him in a picture or talking to him at a convention having a conversation is tough. Crying confront of my current boyfriend is something I want to avoid. \n\nSo I've been thinking that I have to cut contact with these girls because it's time to move on because it's healthier. It's best to avoid him as well. But will they be insulted? Will they accept it? Is there going to be awkwardness? I'm not sure if it's the right to do and could use some outside opinions.\nTL;DR:
1,"SUBREDDIT: r/loseit\nTITLE: SV & NSV! Keeping on keeping on.\nPOST: 30F, 5'6"". SW: 236 GW: 150 CW: 219\n\nI weigh myself weekly and measure myself monthly. I'd hit a plateau the last four weeks or so where I was stuck at 222. Felt like kind of a bummer, but knew it's because I haven't been as strict as I should with my diet, and the last week and a half have been crazy with life things, so I haven't been exercising as frequently as I've gotten used to. When I weighed myself as normal on Monday, I was kind of disappointed to see the scale not budging and figured it was time to buckle down again and really watch my diet. Today was my measure-in day, and I've felt cruddy in general since Monday because I caught some chest congestion/cold bug over the weekend. I get on the scale...it says 219. Whaaaaat? I take my measurements, which are down slightly from last month, and with an total-body loss of 8 inches from my starting point on 12/23/14! Some of my clothes have been feeling a bit looser as of late and now I know it's just not in my head. I'm now the lightest and smallest I've been since right around high school!\nTL;DR:"


In [None]:
total_rows = df.shape[0]
print("Total rows in the DataFrame:", total_rows)

Total rows in the DataFrame: 116722


In [None]:
# Filtering out the subreddit categories
df['subreddit_category'] = df['prompt'].str.extract(r'SUBREDDIT: r/(\w+)')

# Count the occurrences of each category
category_counts = df['subreddit_category'].value_counts()

# Print the total number of unique categories
total_unique_categories = len(category_counts)
print("Total Unique Categories:", total_unique_categories)

# Print number of entries per category
print("\nCategory Counts:")
print(category_counts)

Total Unique Categories: 29

Category Counts:
subreddit_category
relationships          63324
AskReddit              15440
relationship_advice     8691
tifu                    7685
dating_advice           2849
personalfinance         2312
Advice                  2088
legaladvice             1997
offmychest              1582
loseit                  1452
jobs                    1084
self                    1048
BreakUps                 838
askwomenadvice           688
dogs                     638
running                  567
pettyrevenge             548
needadvice               528
travel                   452
Parenting                435
weddingplanning          433
Pets                     366
Dogtraining              362
cats                     324
AskDocs                  283
college                  264
GetMotivated             169
books                    161
Cooking                  114
Name: count, dtype: int64


Now I want to delete 40% of all subreddits in theor corresponding categories while maintaining the proportions.

In [None]:
# Count the number of words in each row
df['word_count'] = df['prompt'].str.split().str.len()

# Calculate the median word count
median_word_count = df['word_count'].median()

# Display the results
df[['prompt', 'word_count']].head()

Unnamed: 0,prompt,word_count
0,SUBREDDIT: r/relationships\nTITLE: I (f/22) have to figure out if I want to still know these girls or not and would hate to sound insulting\nPOST: Not sure if this belongs here but it's worth a try. \n\nBackstory:\nWhen I (f/22) went through my first real breakup 2 years ago because he needed space after a year of dating roand it effected me more than I thought. It was a horrible time in my life due to living with my mother and finally having the chance to cut her out of my life. I can admit because of it was an emotional wreck and this guy was stable and didn't know how to deal with me. We ended by him avoiding for a month or so after going to a festival with my friends. When I think back I wish he just ended. So after he ended it added my depression I suffered but my friends helped me through it and I got rid of everything from him along with cutting contact. \n\nNow: Its been almost 3 years now and I've gotten better after counselling and mild anti depressants. My mother has been out of my life since then so there's been alot of progress. Being stronger after learning some lessons there been more insight about that time of my life but when I see him or a picture everything comes back. The emotions and memories bring me back down. \n\nHis friends (both girls) are on my facebook because we get along well which is hard to find and I know they'll always have his back. But seeing him in a picture or talking to him at a convention having a conversation is tough. Crying confront of my current boyfriend is something I want to avoid. \n\nSo I've been thinking that I have to cut contact with these girls because it's time to move on because it's healthier. It's best to avoid him as well. But will they be insulted? Will they accept it? Is there going to be awkwardness? I'm not sure if it's the right to do and could use some outside opinions.\nTL;DR:,356
1,"SUBREDDIT: r/loseit\nTITLE: SV & NSV! Keeping on keeping on.\nPOST: 30F, 5'6"". SW: 236 GW: 150 CW: 219\n\nI weigh myself weekly and measure myself monthly. I'd hit a plateau the last four weeks or so where I was stuck at 222. Felt like kind of a bummer, but knew it's because I haven't been as strict as I should with my diet, and the last week and a half have been crazy with life things, so I haven't been exercising as frequently as I've gotten used to. When I weighed myself as normal on Monday, I was kind of disappointed to see the scale not budging and figured it was time to buckle down again and really watch my diet. Today was my measure-in day, and I've felt cruddy in general since Monday because I caught some chest congestion/cold bug over the weekend. I get on the scale...it says 219. Whaaaaat? I take my measurements, which are down slightly from last month, and with an total-body loss of 8 inches from my starting point on 12/23/14! Some of my clothes have been feeling a bit looser as of late and now I know it's just not in my head. I'm now the lightest and smallest I've been since right around high school!\nTL;DR:",215
2,"SUBREDDIT: r/relationships\nTITLE: Me [19F] with my friend [19M] 10 months, Insecurities - Show or Tell?\nPOST: What are your stories about insecurities you've had in past relationships? How have you dealt with them, particularly the ones that you can't hide?\n\nI'm not currently in a relationship, but recently I've realized that there is someone who likes me, and I'm interested in them, too. Frankly, the only reason I'm not asking them out is because I know that I have some insecurities that need to be worked through - particularly in the realm of body image. While I'm confident in the rest of my body, I've had terrible, awful acne both on my arms and breasts since I was very young. It's a special type with no complete cure, but doctors suggested that I keep my skin oiled until it goes away (dryness irritates it). Because of this it's not so much present anymore as large clusters of scars are.\n\nWould I warn someone about this upfront before anything sexual? Would I just let it surprise them when the clothes come off? Do I tell them ""Let's keep on my shirt for now"" while we do our business? \n\nHave you had experiences with anything similar? I want to hear how they went!\nTL;DR:",213
3,"SUBREDDIT: r/personalfinance\nTITLE: Prioritize student debt or saving for down payment?\nPOST: I have $25k in student debt. One private loan at 9.5% (highest priority obviously) and nine others federal between 3.4% and 6.8%. Minimum payment per month total is $301.16. Over the next 9 months, I will pay off $11k of these, which will get rid of everything above 5% interest and will drop the total minimum payment to $150. \n\nAt the end of the 9 months, our savings will be around $35k. At that time my husband will need to purchase a car so some of that will be his down payment. So more realistically $25-30k. \n\nSometime in the future, between a year to two years from now, my husband and I may be moving. Typical single family homes in this area go for around $300k. \n\nAt the end of the 9 months, should I continue to focus on paying down student debt (which will be a balance of $14k by then) or growing our savings/down payment? I have $5200/mo to somehow split between debt and down payment and I'm not sure how best to allocate it.\nTL;DR:",190
4,"SUBREDDIT: r/relationships\nTITLE: My[25m] girlfriend [24f] is only nice and pleasant when I'm aloof and distant. (9 months)\nPOST: Throwaway\n\nI noticed the more I'm cold and distant towards my girlfriend, the more pleasant she becomes. She'll come over and clean my apartment, do laundry, dishes and cook for me, even as far as to offer oral favors while I'm drinking a beer! \n\nShe seems completely happy and content during this time, which makes me happy and I naturally want to do things back for her. As soon as I start doing her favors, she picks fights and complains nonstop. Latest issue was I offered to take her and her mom to dinner. She kept giving me shit about how I'm going to be spending too much time with my brother (who's visiting for a week soon), which she was totally fine with when I was being distant with her. She'll call me a bitch in a joking way, and just take the piss out of me whenever I'm kind or go out of my way to apologize. \n\nThis naturally makes me feel cold and indifferent toward her. Once she senses that, she's all about making me the happiest boyfriend and apologizes for all the shit she was giving me the week previously. It's a vicious cycle but I'm not sure what to do here. I've brought this up with her and she recognizes it and has no solution. She just ""feels differently towards me sometimes"" and can't explain it.\n\n**So what do I do here? Do I keep up the aloof, distant attitude to keep her interested or suffer her negging in kindness, my default setting.\nTL;DR:",278


In [None]:
print(f"Median Word Count: {median_word_count}")

Median Word Count: 266.0


In [None]:
# Since we want to keep the proportions of long and short posts, we make a new column with the posts word lengths
df['length'] = df['prompt'].str.len()

# Set seed for random selection of calculated remaining posts
SEED = 42
PERCENTAGE = 0.1 # 10% of data

# Empty list where we append the remaining posts later
remaining_posts = []

# Iterate over every category and keep 60% of the posts for each category
for category, group in df.groupby('subreddit_category'):

    # Divide the posts into two groups # TODO average length
    short_posts = group[group['length'] < median_word_count]
    long_posts = group[group['length'] >= median_word_count]

    # Calculate how many of short and longs posts are being kept
    num_short_to_keep = int(PERCENTAGE * (len(short_posts)))
    num_long_to_keep = int(PERCENTAGE * (len(long_posts)))

    # Select the short and longs posts randomly
    selected_short_posts = short_posts.sample(n=num_short_to_keep, random_state=SEED)
    selected_long_posts = long_posts.sample(n=num_long_to_keep, random_state=SEED)

    # Append remaining posts
    remaining_posts.append(pd.concat([selected_short_posts, selected_long_posts]))

# Store the remaining posts in a new df
shortened_df = pd.concat(remaining_posts, ignore_index=True)

print("Number of remaining posts:", len(shortened_df))

Number of remaining posts: 11653


As we can see we significantly reduced the number of rows we are going to continue to work with

In [None]:
shortened_df[:5]

Unnamed: 0,prompt,subreddit_category,word_count,length
0,"SUBREDDIT: r/Advice\nTITLE: I'm 23, Moved back in with parents. How do I make new friends reddit?\nPOST: Ok, Ill give you the low down. \n\nI was fairly popular in school always surrounded by friends, but as we all left for university my friendship group separated dramatically. I took 2 gap years and worked as a Teaching assistant. All of my co-workers were older and well.. teachers so I didn't have much of a social life from there.\n\nI then went to Uni again, tonnes of friends until I had to leave due to family shit and stress. My friends are still there. \n\nI haven't been able to go and visit, as I've started a new job (last sept). I work as part of a team of 6 , in estate. My co-workers are nice but not my couple of tea. Plus they have a friendship group established, and In a commission based job, nobody is really your friend anyway.\n\nLastly, My best friend has been with me for 6 years. We made the horrible horrible horrible mistake of hooking up 3 month ago and she broke up with me last night, so she could be single.. Turns out shes back with her ex.... yep. \n\nSo. In a world where I now feel like I have no friends. How do I start to rebuild my circle?\n\nThanks,\nTL;DR:",Advice,228,1185
1,"SUBREDDIT: r/Advice\nTITLE: Always feel an overwhelming lack of confidence while doing things which I excell at and have been doing for years.\nPOST: I don't want to get into specifics, but I've been working in my field for over 6 years. I've accomplished a lot, I've been recognized as good at what I do by superiors and coworkers and I've got a lot of accomplishments and things I've done that should make me more confident and secure, but I still feel nervous, unprepared and anxious before work.\n\nI can't explain it, but I feel anxious and nervous every day before work and during the day even though I should feel comfortable and confident. I love what I do, but I always feel anxious or someone is going to tell me I'm doing a bad job, even though this has never happened.\nTL;DR:",Advice,143,784
2,"SUBREDDIT: r/Advice\nTITLE: Dealing with this break up\nPOST: Hey there,\n\nThis morning I broke up with my girlfriend after 5+ months of having a relationship. This might not be a long time, plus we live in different countries and never once touched each other during this whole relationship, yet I feel broken and she is worse. I talked to her best friend who is really mad at me(understandable of course) and she says that Lucy(my girlfriend) is really in love with me and right now she is a mess; angry, sad, frustrated, betrayed, confused and unloved.\nI get all of that and I feel like the worst human being there is for doing that to her. I still care so much for her and knowing I just kinda destroyed her made me cry all day so far.\nThe reason I broke up with her is that I no longer saw a future for us together. Not without her or me changing, and I also realized I didnt really want to change for her... \nDoes anyone have any idea how I can make this easier for her? What is the best thing I can do right now? I dont need advice for myself, I started this and I feel like I deserve to feel the way I do. But I want to make this easier for her, and I dont know how...\nTL;DR:",Advice,233,1181
3,"SUBREDDIT: r/Advice\nTITLE: Disagreement when picking rooms in the new house\nPOST: My house next year has one basement room no one wants as it's on a floor by itself and next to the kitchen. No one wants to be alone on a floor by themselves with the kitchen and living room and if people are pre drinking, coming into the kitchen you'll be in the room next door. Every room is en-suite except the 2 in the attic which share. No one wants the basement and people don't want but don't mind the attic. It's 2 guys and 8 girls in my house and we agree that 2 girls will take the attic as they work together and hang out a lot. \n \nEventually we decide to draw names and numbers out of a hat for rooms is the most fair method. One of the more vocal girls gets the basement and declares that she refuses to go into that room. She states that ""as a girl"" she doesn't want to be alone on the floor, it's scary for a girl to be there, any noise would scare you. Also the fact that she (and 2 other girls) found the house after going to a few viewings is stated as why its unfair. \n \nAs I expected, they turn to the other guy and myself, requesting we swap. Us lads should be there, it's too scary. However if the house was all girls, someone would have that room. I don't want the room for the same reason as everyone else, I don't want to be isolated and and next to the kitchen. We discuss this for a good half hour but me and the other guy don't change our minds. \n \nI'd say we're both nice guys so it's almost expected we'd take the worse room but on this instance I feel like I have to stand my ground. If we let them bully us with the majority they'll be doing it all of next year. The room isn't even near the front door so it can't be that scary and regardless of it being the 'gentlemanly' thing to do the rooms were picked fairly and it's unfair on us to expect us to swap just because we're the two guys in the house. Am I doing the right thing here by not swapping or am I being selfish?\nTL;DR:",Advice,398,2017
4,"SUBREDDIT: r/Advice\nTITLE: I don't know what I believe\nPOST: I don't typically think deeply randomly during the day, but I had a realization today that was a bit off-putting and it stuck with me all day. After not being able to shake it, I would like some advice.\n\nI came to the realization that I don't have any values that I hold close. I don't know what I believe in. I don't know where I reside politically, I don't have a clue. How can I be almost 20 years old and so uninformed. I feel as if everything has my attention for a limited time before I see or hear the next new thing. I can't stop, think, and internalize anything.\n\nIt's somewhat alarming because I'm starting to realize that my whole life has been this way. Is it because I'm young and impressionable? I want to have strong, defined values that I essentially demonstrate and live by every day. I want to know everything but it's not possible which is saddening.\n\nI question reality every day and constantly ponder what is true in our world. Everything is skewed. The news reports, politicians, and corporate industry? Who tells the truth? How do I know who is right? Or are we all wrong? Too many possibilities have me spinning in circles and I feel stuck.\nTL;DR:",Advice,226,1233


Now we can try to preprocess this data

In [None]:
# Function to clean up each entry
def remove_heads_and_tails(text):
    # Find the position of the relevant sections
    start = text.find("TITLE") + len("TITLE ")
    end = text.rfind("TL;DR") if "TL;DR" in text else len(text)

    # Extract the content between TITLE and TL;DR
    cleaned = text[start:end].strip()

    # Remove 'POST:' sections followed by any number of '*' characters or spaces
    cleaned = re.sub(r'POST:\s*\*+\s*', '', cleaned)  # Handles "POST: ****" and variations

    # Remove 'POST:' with no asterisks (with or without spaces)
    cleaned = re.sub(r'POST:\s*', '', cleaned)  # Handles "POST: " with or without spaces

    # Remove all newlines (single and multiple) and replace with a space
    cleaned = re.sub(r'\n+', ' ', cleaned)  # Replace multiple newlines with a single space
    cleaned = cleaned.replace('\n', ' ')  # Replace any remaining single newlines with a space

    # Remove the occurrence of '.\nTL;DR:' where TL;DR comes after a period and newline
    cleaned = re.sub(r'\.\nTL;DR:', '', cleaned)

    # Normalize whitespace (removes extra spaces between words)
    cleaned = ' '.join(cleaned.split())

    return cleaned

# Apply the cleaning function to the content column
shortened_df['shortened'] = shortened_df['prompt'].apply(remove_heads_and_tails)
shortened_df = shortened_df.drop(columns=['prompt', 'length'])

In [None]:
shortened_df.head()

Unnamed: 0,subreddit_category,word_count,shortened
0,Advice,228,"I'm 23, Moved back in with parents. How do I make new friends reddit? Ok, Ill give you the low down. I was fairly popular in school always surrounded by friends, but as we all left for university my friendship group separated dramatically. I took 2 gap years and worked as a Teaching assistant. All of my co-workers were older and well.. teachers so I didn't have much of a social life from there. I then went to Uni again, tonnes of friends until I had to leave due to family shit and stress. My friends are still there. I haven't been able to go and visit, as I've started a new job (last sept). I work as part of a team of 6 , in estate. My co-workers are nice but not my couple of tea. Plus they have a friendship group established, and In a commission based job, nobody is really your friend anyway. Lastly, My best friend has been with me for 6 years. We made the horrible horrible horrible mistake of hooking up 3 month ago and she broke up with me last night, so she could be single.. Turns out shes back with her ex.... yep. So. In a world where I now feel like I have no friends. How do I start to rebuild my circle? Thanks,"
1,Advice,143,"Always feel an overwhelming lack of confidence while doing things which I excell at and have been doing for years. I don't want to get into specifics, but I've been working in my field for over 6 years. I've accomplished a lot, I've been recognized as good at what I do by superiors and coworkers and I've got a lot of accomplishments and things I've done that should make me more confident and secure, but I still feel nervous, unprepared and anxious before work. I can't explain it, but I feel anxious and nervous every day before work and during the day even though I should feel comfortable and confident. I love what I do, but I always feel anxious or someone is going to tell me I'm doing a bad job, even though this has never happened."
2,Advice,233,"Dealing with this break up Hey there, This morning I broke up with my girlfriend after 5+ months of having a relationship. This might not be a long time, plus we live in different countries and never once touched each other during this whole relationship, yet I feel broken and she is worse. I talked to her best friend who is really mad at me(understandable of course) and she says that Lucy(my girlfriend) is really in love with me and right now she is a mess; angry, sad, frustrated, betrayed, confused and unloved. I get all of that and I feel like the worst human being there is for doing that to her. I still care so much for her and knowing I just kinda destroyed her made me cry all day so far. The reason I broke up with her is that I no longer saw a future for us together. Not without her or me changing, and I also realized I didnt really want to change for her... Does anyone have any idea how I can make this easier for her? What is the best thing I can do right now? I dont need advice for myself, I started this and I feel like I deserve to feel the way I do. But I want to make this easier for her, and I dont know how..."
3,Advice,398,"Disagreement when picking rooms in the new house My house next year has one basement room no one wants as it's on a floor by itself and next to the kitchen. No one wants to be alone on a floor by themselves with the kitchen and living room and if people are pre drinking, coming into the kitchen you'll be in the room next door. Every room is en-suite except the 2 in the attic which share. No one wants the basement and people don't want but don't mind the attic. It's 2 guys and 8 girls in my house and we agree that 2 girls will take the attic as they work together and hang out a lot. Eventually we decide to draw names and numbers out of a hat for rooms is the most fair method. One of the more vocal girls gets the basement and declares that she refuses to go into that room. She states that ""as a girl"" she doesn't want to be alone on the floor, it's scary for a girl to be there, any noise would scare you. Also the fact that she (and 2 other girls) found the house after going to a few viewings is stated as why its unfair. As I expected, they turn to the other guy and myself, requesting we swap. Us lads should be there, it's too scary. However if the house was all girls, someone would have that room. I don't want the room for the same reason as everyone else, I don't want to be isolated and and next to the kitchen. We discuss this for a good half hour but me and the other guy don't change our minds. I'd say we're both nice guys so it's almost expected we'd take the worse room but on this instance I feel like I have to stand my ground. If we let them bully us with the majority they'll be doing it all of next year. The room isn't even near the front door so it can't be that scary and regardless of it being the 'gentlemanly' thing to do the rooms were picked fairly and it's unfair on us to expect us to swap just because we're the two guys in the house. Am I doing the right thing here by not swapping or am I being selfish?"
4,Advice,226,"I don't know what I believe I don't typically think deeply randomly during the day, but I had a realization today that was a bit off-putting and it stuck with me all day. After not being able to shake it, I would like some advice. I came to the realization that I don't have any values that I hold close. I don't know what I believe in. I don't know where I reside politically, I don't have a clue. How can I be almost 20 years old and so uninformed. I feel as if everything has my attention for a limited time before I see or hear the next new thing. I can't stop, think, and internalize anything. It's somewhat alarming because I'm starting to realize that my whole life has been this way. Is it because I'm young and impressionable? I want to have strong, defined values that I essentially demonstrate and live by every day. I want to know everything but it's not possible which is saddening. I question reality every day and constantly ponder what is true in our world. Everything is skewed. The news reports, politicians, and corporate industry? Who tells the truth? How do I know who is right? Or are we all wrong? Too many possibilities have me spinning in circles and I feel stuck."


In [None]:
# spacy.cli.download("en_core_web_sm")
nlp = spacy.load("en_core_web_sm", disable=["parser", "ner", "tagger", "textcat"])

def clean_text_spacy(text):
    doc = nlp(text)  # Process the text
    cleaned_tokens = [
        token.text.lower()  # Convert to lowercase
        for token in doc
    ]
    return ' '.join(cleaned_tokens)

tqdm.pandas(desc="Processing texts")

shortened_df['preprocessed'] = shortened_df['shortened'].progress_apply(clean_text_spacy)

preprocessed_df = shortened_df[['subreddit_category', 'word_count', 'preprocessed']]

Processing texts:   0%|          | 0/11653 [00:00<?, ?it/s]



In [None]:
preprocessed_df.head()

Unnamed: 0,subreddit_category,word_count,preprocessed
0,Advice,228,"i 'm 23 , moved back in with parents . how do i make new friends reddit ? ok , ill give you the low down . i was fairly popular in school always surrounded by friends , but as we all left for university my friendship group separated dramatically . i took 2 gap years and worked as a teaching assistant . all of my co - workers were older and well .. teachers so i did n't have much of a social life from there . i then went to uni again , tonnes of friends until i had to leave due to family shit and stress . my friends are still there . i have n't been able to go and visit , as i 've started a new job ( last sept ) . i work as part of a team of 6 , in estate . my co - workers are nice but not my couple of tea . plus they have a friendship group established , and in a commission based job , nobody is really your friend anyway . lastly , my best friend has been with me for 6 years . we made the horrible horrible horrible mistake of hooking up 3 month ago and she broke up with me last night , so she could be single .. turns out she s back with her ex .... yep . so . in a world where i now feel like i have no friends . how do i start to rebuild my circle ? thanks ,"
1,Advice,143,"always feel an overwhelming lack of confidence while doing things which i excell at and have been doing for years . i do n't want to get into specifics , but i 've been working in my field for over 6 years . i 've accomplished a lot , i 've been recognized as good at what i do by superiors and coworkers and i 've got a lot of accomplishments and things i 've done that should make me more confident and secure , but i still feel nervous , unprepared and anxious before work . i ca n't explain it , but i feel anxious and nervous every day before work and during the day even though i should feel comfortable and confident . i love what i do , but i always feel anxious or someone is going to tell me i 'm doing a bad job , even though this has never happened ."
2,Advice,233,"dealing with this break up hey there , this morning i broke up with my girlfriend after 5 + months of having a relationship . this might not be a long time , plus we live in different countries and never once touched each other during this whole relationship , yet i feel broken and she is worse . i talked to her best friend who is really mad at me(understandable of course ) and she says that lucy(my girlfriend ) is really in love with me and right now she is a mess ; angry , sad , frustrated , betrayed , confused and unloved . i get all of that and i feel like the worst human being there is for doing that to her . i still care so much for her and knowing i just kinda destroyed her made me cry all day so far . the reason i broke up with her is that i no longer saw a future for us together . not without her or me changing , and i also realized i did nt really want to change for her ... does anyone have any idea how i can make this easier for her ? what is the best thing i can do right now ? i do nt need advice for myself , i started this and i feel like i deserve to feel the way i do . but i want to make this easier for her , and i do nt know how ..."
3,Advice,398,"disagreement when picking rooms in the new house my house next year has one basement room no one wants as it 's on a floor by itself and next to the kitchen . no one wants to be alone on a floor by themselves with the kitchen and living room and if people are pre drinking , coming into the kitchen you 'll be in the room next door . every room is en - suite except the 2 in the attic which share . no one wants the basement and people do n't want but do n't mind the attic . it 's 2 guys and 8 girls in my house and we agree that 2 girls will take the attic as they work together and hang out a lot . eventually we decide to draw names and numbers out of a hat for rooms is the most fair method . one of the more vocal girls gets the basement and declares that she refuses to go into that room . she states that "" as a girl "" she does n't want to be alone on the floor , it 's scary for a girl to be there , any noise would scare you . also the fact that she ( and 2 other girls ) found the house after going to a few viewings is stated as why its unfair . as i expected , they turn to the other guy and myself , requesting we swap . us lads should be there , it 's too scary . however if the house was all girls , someone would have that room . i do n't want the room for the same reason as everyone else , i do n't want to be isolated and and next to the kitchen . we discuss this for a good half hour but me and the other guy do n't change our minds . i 'd say we 're both nice guys so it 's almost expected we 'd take the worse room but on this instance i feel like i have to stand my ground . if we let them bully us with the majority they 'll be doing it all of next year . the room is n't even near the front door so it ca n't be that scary and regardless of it being the ' gentlemanly ' thing to do the rooms were picked fairly and it 's unfair on us to expect us to swap just because we 're the two guys in the house . am i doing the right thing here by not swapping or am i being selfish ?"
4,Advice,226,"i do n't know what i believe i do n't typically think deeply randomly during the day , but i had a realization today that was a bit off - putting and it stuck with me all day . after not being able to shake it , i would like some advice . i came to the realization that i do n't have any values that i hold close . i do n't know what i believe in . i do n't know where i reside politically , i do n't have a clue . how can i be almost 20 years old and so uninformed . i feel as if everything has my attention for a limited time before i see or hear the next new thing . i ca n't stop , think , and internalize anything . it 's somewhat alarming because i 'm starting to realize that my whole life has been this way . is it because i 'm young and impressionable ? i want to have strong , defined values that i essentially demonstrate and live by every day . i want to know everything but it 's not possible which is saddening . i question reality every day and constantly ponder what is true in our world . everything is skewed . the news reports , politicians , and corporate industry ? who tells the truth ? how do i know who is right ? or are we all wrong ? too many possibilities have me spinning in circles and i feel stuck ."


In [None]:
file_path = f'..\\resources\\preprocessed_df_{PERCENTAGE:.0%}_data.tsv'

try:
    preprocessed_df.to_csv(file_path, index=False, sep="\t")
    print(f"File saved successfully: {file_path}")
except Exception as e:
    print(f"Error saving file: {e}")

File saved successfully: ..\resources\preprocessed_df_10%_data.tsv


In [None]:
#download preprocessed
from google.colab import files
files.download(file_path)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

okay! now we can run it over the roberta_toxicity_classifier!

In [None]:
try:
    processed_df = pd.read_csv(file_path, sep="\t")
except Exception as e:
    print(f"Error loading file: {e}")

In [None]:
processed_df[:2]

Unnamed: 0,subreddit_category,word_count,preprocessed
0,Advice,228,"i 'm 23 , moved back in with parents . how do i make new friends reddit ? ok , ill give you the low down . i was fairly popular in school always surrounded by friends , but as we all left for university my friendship group separated dramatically . i took 2 gap years and worked as a teaching assistant . all of my co - workers were older and well .. teachers so i did n't have much of a social life from there . i then went to uni again , tonnes of friends until i had to leave due to family shit and stress . my friends are still there . i have n't been able to go and visit , as i 've started a new job ( last sept ) . i work as part of a team of 6 , in estate . my co - workers are nice but not my couple of tea . plus they have a friendship group established , and in a commission based job , nobody is really your friend anyway . lastly , my best friend has been with me for 6 years . we made the horrible horrible horrible mistake of hooking up 3 month ago and she broke up with me last night , so she could be single .. turns out she s back with her ex .... yep . so . in a world where i now feel like i have no friends . how do i start to rebuild my circle ? thanks ,"
1,Advice,143,"always feel an overwhelming lack of confidence while doing things which i excell at and have been doing for years . i do n't want to get into specifics , but i 've been working in my field for over 6 years . i 've accomplished a lot , i 've been recognized as good at what i do by superiors and coworkers and i 've got a lot of accomplishments and things i 've done that should make me more confident and secure , but i still feel nervous , unprepared and anxious before work . i ca n't explain it , but i feel anxious and nervous every day before work and during the day even though i should feel comfortable and confident . i love what i do , but i always feel anxious or someone is going to tell me i 'm doing a bad job , even though this has never happened ."


In [None]:
#Load the tokenizer and model
tokenizer = RobertaTokenizer.from_pretrained('s-nlp/roberta_toxicity_classifier')
model = RobertaForSequenceClassification.from_pretrained('s-nlp/roberta_toxicity_classifier')

#Set the model to evaluation mode
model.eval()

#Function to classify toxicity
def classify_toxicity(text):
    # Tokenize the input text
    batch = tokenizer.encode(text, return_tensors="pt", truncation=True, max_length=512)

    # Move inputs to GPU if available
    if torch.cuda.is_available():
        batch = batch.to('cuda')
        model.to('cuda')  # Move model to GPU

    #Get the model output
    with torch.no_grad():
        output = model(batch)

# Apply sigmoid to logits and get the predicted label
    predicted_label = torch.sigmoid(output.logits).argmax().item()

    return predicted_label

# Use tqdm to display a progress bar
tqdm.pandas(desc="Processing toxicity labels")

#Apply the toxicity classifier to each cleaned content post with a progress bar
processed_df['toxicity_label'] = processed_df['preprocessed'].progress_apply(classify_toxicity)

#Keep 'subreddit_category', 'word_count', 'cleaned_content', and 'toxicity_label' columns
final_df = processed_df[['subreddit_category', 'word_count', 'preprocessed', 'toxicity_label']]

processed_data_path = f'..\\resources\\toxicity_analysis_{PERCENTAGE:.0%}_data.tsv'

try:
    final_df.to_csv(processed_data_path, index=False, sep="\t")
    print(f"File saved successfully: {file_path}")
except Exception as e:
    print(f"Error saving file: {e}")

Some weights of the model checkpoint at s-nlp/roberta_toxicity_classifier were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Processing toxicity labels:   0%|          | 0/11653 [00:00<?, ?it/s]

File saved successfully: ..\resources\preprocessed_df_10%_data.tsv


In [None]:
final_df['toxicity_label'].unique()
value_counts = final_df['toxicity_label'].value_counts()
print(value_counts)

toxicity_label
0    11165
1      488
Name: count, dtype: int64


In [None]:
final_df.head()

Unnamed: 0,subreddit_category,word_count,preprocessed,toxicity_label
0,Advice,228,"i 'm 23 , moved back in with parents . how do i make new friends reddit ? ok , ill give you the low down . i was fairly popular in school always surrounded by friends , but as we all left for university my friendship group separated dramatically . i took 2 gap years and worked as a teaching assistant . all of my co - workers were older and well .. teachers so i did n't have much of a social life from there . i then went to uni again , tonnes of friends until i had to leave due to family shit and stress . my friends are still there . i have n't been able to go and visit , as i 've started a new job ( last sept ) . i work as part of a team of 6 , in estate . my co - workers are nice but not my couple of tea . plus they have a friendship group established , and in a commission based job , nobody is really your friend anyway . lastly , my best friend has been with me for 6 years . we made the horrible horrible horrible mistake of hooking up 3 month ago and she broke up with me last night , so she could be single .. turns out she s back with her ex .... yep . so . in a world where i now feel like i have no friends . how do i start to rebuild my circle ? thanks ,",0
1,Advice,143,"always feel an overwhelming lack of confidence while doing things which i excell at and have been doing for years . i do n't want to get into specifics , but i 've been working in my field for over 6 years . i 've accomplished a lot , i 've been recognized as good at what i do by superiors and coworkers and i 've got a lot of accomplishments and things i 've done that should make me more confident and secure , but i still feel nervous , unprepared and anxious before work . i ca n't explain it , but i feel anxious and nervous every day before work and during the day even though i should feel comfortable and confident . i love what i do , but i always feel anxious or someone is going to tell me i 'm doing a bad job , even though this has never happened .",0
2,Advice,233,"dealing with this break up hey there , this morning i broke up with my girlfriend after 5 + months of having a relationship . this might not be a long time , plus we live in different countries and never once touched each other during this whole relationship , yet i feel broken and she is worse . i talked to her best friend who is really mad at me(understandable of course ) and she says that lucy(my girlfriend ) is really in love with me and right now she is a mess ; angry , sad , frustrated , betrayed , confused and unloved . i get all of that and i feel like the worst human being there is for doing that to her . i still care so much for her and knowing i just kinda destroyed her made me cry all day so far . the reason i broke up with her is that i no longer saw a future for us together . not without her or me changing , and i also realized i did nt really want to change for her ... does anyone have any idea how i can make this easier for her ? what is the best thing i can do right now ? i do nt need advice for myself , i started this and i feel like i deserve to feel the way i do . but i want to make this easier for her , and i do nt know how ...",0
3,Advice,398,"disagreement when picking rooms in the new house my house next year has one basement room no one wants as it 's on a floor by itself and next to the kitchen . no one wants to be alone on a floor by themselves with the kitchen and living room and if people are pre drinking , coming into the kitchen you 'll be in the room next door . every room is en - suite except the 2 in the attic which share . no one wants the basement and people do n't want but do n't mind the attic . it 's 2 guys and 8 girls in my house and we agree that 2 girls will take the attic as they work together and hang out a lot . eventually we decide to draw names and numbers out of a hat for rooms is the most fair method . one of the more vocal girls gets the basement and declares that she refuses to go into that room . she states that "" as a girl "" she does n't want to be alone on the floor , it 's scary for a girl to be there , any noise would scare you . also the fact that she ( and 2 other girls ) found the house after going to a few viewings is stated as why its unfair . as i expected , they turn to the other guy and myself , requesting we swap . us lads should be there , it 's too scary . however if the house was all girls , someone would have that room . i do n't want the room for the same reason as everyone else , i do n't want to be isolated and and next to the kitchen . we discuss this for a good half hour but me and the other guy do n't change our minds . i 'd say we 're both nice guys so it 's almost expected we 'd take the worse room but on this instance i feel like i have to stand my ground . if we let them bully us with the majority they 'll be doing it all of next year . the room is n't even near the front door so it ca n't be that scary and regardless of it being the ' gentlemanly ' thing to do the rooms were picked fairly and it 's unfair on us to expect us to swap just because we 're the two guys in the house . am i doing the right thing here by not swapping or am i being selfish ?",0
4,Advice,226,"i do n't know what i believe i do n't typically think deeply randomly during the day , but i had a realization today that was a bit off - putting and it stuck with me all day . after not being able to shake it , i would like some advice . i came to the realization that i do n't have any values that i hold close . i do n't know what i believe in . i do n't know where i reside politically , i do n't have a clue . how can i be almost 20 years old and so uninformed . i feel as if everything has my attention for a limited time before i see or hear the next new thing . i ca n't stop , think , and internalize anything . it 's somewhat alarming because i 'm starting to realize that my whole life has been this way . is it because i 'm young and impressionable ? i want to have strong , defined values that i essentially demonstrate and live by every day . i want to know everything but it 's not possible which is saddening . i question reality every day and constantly ponder what is true in our world . everything is skewed . the news reports , politicians , and corporate industry ? who tells the truth ? how do i know who is right ? or are we all wrong ? too many possibilities have me spinning in circles and i feel stuck .",0


In [None]:
# download
from google.colab import files
files.download(processed_data_path)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>