# Working 9 to 5: Themes Behind Antiwork Sentiments on Reddit

### Research Question
What factors influence Redditors decisions to reject traditional employment, what are the major themes in the subreddit r/Antiwork?

### Background
The past few years have changed the way that we work, caused shifts in workplace and what people expect their work could be, resulting in changing sentiments among workers. These sentiments were at the forefront during the pandemic as people quit jobs, large labor strikes were seen in different industries, and union activity and support was also seen to rise.[^1] 

### Data
Reddit is the social media platform that will be used for the study, leveraging an existing dataset from Social Grep which includes two datasets (posts, comments) made on r/Antiwork up to February 18, 2022. [^2]

### References
[^1]: Goldberg, E. (2023, October). On the Future of Work, a Reporter Looks Back. *New York Times*. https://www.nytimes.com/2023/10/08/insider/future-of-work-reporter.html?smid=url-share <br/>
[^2]: Social Grep (2022, March). The /r/Antiwork Subreddit Dataset. *Social Grep*. https://socialgrep.com/datasets/the-antiwork-subreddit-dataset?utm_source=dataworld&utm_medium=link&utm_campaign=theantiworksubredditdataset


## Initial Data Exploration
Starting off by importing the dataset and seeing what data is available.

Columns that I'll keep:
- id
- created_utc	
- permalink
- url
- selftext
- title
- score

Redundant Columns: these aren't needed in this initial review because they don't provide any helpful info
- subreddit.name ➡️ `antiwork` is the only thing here
- subreddit.nsfw ➡️ `False` is the only thing here
- subreddit.id	➡️ `2y77d` is the only thing here
- type ➡️ `post` is the only thing here
- domain ➡️ not really something that appears to provide utility

In [2]:
import pandas as pd

df = pd.read_csv('/Users/ingridarreola/Desktop/Grad School - Data Science/Z639 - Social Media Mining/Ingrid Arreola - Paper 3/Data/antiwork-subreddit-dataset-posts.csv')

df.head(5)

Unnamed: 0,type,id,subreddit.id,subreddit.name,subreddit.nsfw,created_utc,permalink,domain,url,selftext,title,score
0,post,svw6x3,2y77d,antiwork,False,1645228719,https://old.reddit.com/r/antiwork/comments/svw...,self.antiwork,,I was hired at the **Neon Museum** as a tour g...,Neon Museum Las Vegas took away our tips,15
1,post,svw6jv,2y77d,antiwork,False,1645228687,https://old.reddit.com/r/antiwork/comments/svw...,i.redd.it,https://i.redd.it/vuoctaq0koi81.png,,Working,1887
2,post,svw5e8,2y77d,antiwork,False,1645228588,https://old.reddit.com/r/antiwork/comments/svw...,self.antiwork,,"So, I'm quite new to the jobs front then most ...",Kind of feel like screaming into the cyberspace,4
3,post,svw498,2y77d,antiwork,False,1645228495,https://old.reddit.com/r/antiwork/comments/svw...,i.redd.it,https://i.redd.it/1w1unxjfjoi81.png,,"Democracy is a lie, especially in the modern w...",14060
4,post,svw3qt,2y77d,antiwork,False,1645228450,https://old.reddit.com/r/antiwork/comments/svw...,self.antiwork,,My boss asked me today what I plan on doing wh...,Master's Degree - No Pay Raise but OT,63


Reviwing the column `domain` to see if it would be helpful for the project. Notes below:
- There are 5126 different types of domain
- There aren't really groups of domain that are being grouped together, so it's not really usable in clear way

In [15]:
domain_list = (df['domain'].unique())
#print(len(df['domain'].unique()))      #`5126` items in the domain

#looping through the all the things in the domain column
#for item in domain_list:
#    print(item)

from collections import Counter

# Use Counter to count occurrences
counter = Counter(domain_list)

# Loop through the unique items and print the count
for domain, count in counter.items():
    print(f"{domain}: {count} occurrences")


self.antiwork: 1 occurrences
i.redd.it: 1 occurrences
youtu.be: 1 occurrences
businessinsider.com: 1 occurrences
reddit.com: 1 occurrences
nypost.com: 1 occurrences
i.imgur.com: 1 occurrences
singularityhub.com: 1 occurrences
livingwage.mit.edu: 1 occurrences
studypool.com: 1 occurrences
nytimes.com: 1 occurrences
youtube.com: 1 occurrences
jacobinmag.com: 1 occurrences
slate.com: 1 occurrences
washingtonpost.com: 1 occurrences
9to5mac.com: 1 occurrences
old.reddit.com: 1 occurrences
gobankingrates.com: 1 occurrences
cnn.com: 1 occurrences
polygon.com: 1 occurrences
twitter.com: 1 occurrences
docs.google.com: 1 occurrences
abcnews.go.com: 1 occurrences
cynicusrex.com: 1 occurrences
imgur.com: 1 occurrences
bloomberg.com: 1 occurrences
military.com: 1 occurrences
itismycar.com: 1 occurrences
web.archive.org: 1 occurrences
v.redd.it: 1 occurrences
doletown.com: 1 occurrences
wwwsaleemcompk.blogspot.com: 1 occurrences
npr.org: 1 occurrences
preppykitchen.com: 1 occurrences
scontent.ffxe1-

The initial review has provided helpful information and eliminated columns that probably won't be useful for me in the project. Columns that I'll keep: id, created_utc	, permalink, url, selftext, title, score


In [5]:
#print(df['score'].sort())

top_100_scores = df['score'].nlargest(100)
top_100_rows = df.loc[top_100_scores.index]
print(top_100_rows)

        type      id subreddit.id subreddit.name  subreddit.nsfw  created_utc  \
180988  post  q82vqk        2y77d       antiwork           False   1634227843   
179838  post  q9dwp6        2y77d       antiwork           False   1634397131   
121437  post  r5tn55        2y77d       antiwork           False   1638296382   
180024  post  q972uf        2y77d       antiwork           False   1634368711   
31263   post  scbw37        2y77d       antiwork           False   1643111225   
...      ...     ...          ...            ...             ...          ...   
194609  post  pazoxk        2y77d       antiwork           False   1629850043   
41263   post  s7xsfw        2y77d       antiwork           False   1642619406   
97841   post  rgcawn        2y77d       antiwork           False   1639502351   
119647  post  r6o5bl        2y77d       antiwork           False   1638390095   
128263  post  r1lpxi        2y77d       antiwork           False   1637807500   

                           

In [6]:
top_100_rows.to_csv('/Users/ingridarreola/Downloads/top_100.csv')