This is a data analysis of Reddit posts pertaining to the effects of AI on the job market. We start by importing the libraries that we will need.

In [3]:
import pandas as pd
import matplotlib as plt
import re

from collections import Counter

Next, we import our dataset into Pandas for processing.

In [5]:
df = pd.read_csv('ai_automation_job_market_data200_edit.csv')

pd.set_option('display.max_columns', None) # Avoid truncation

Now it's time for initial data exploration

In [7]:
df

Unnamed: 0,Title,Selftext,Score,Number of Comments,Created UTC,Unnamed: 5
0,CMV: People who think that most jobs are going...,I have been lucky enough to have a career path...,705,511.0,1.708275e+09,
1,Is this the last presidential election in US h...,[Uber is partnering with Chinese firm BYD to b...,406,230.0,1.727364e+09,
2,What will the stock market do if ai creates gr...,I know nobody has a crystal ball and no one ca...,36,166.0,1.726364e+09,
3,"I'm Nick Kolakowski, Senior Editor at Dice. AM...","Hi! I?檓 Nick Kolakowski, the Senior Editor of ...",102,143.0,1.718886e+09,
4,The Elephant In the Room with AI art and AI Au...,"\n\nAs I see it, the main issue with AI art a...",165,334.0,1.672166e+09,
...,...,...,...,...,...,...
173,What does the future of our work look like in ...,I'm sure we've all been thinking a lot lately ...,18,79.0,1.728255e+09,
174,"Blackmen, is this dystopian reality possible?",Is this dystopian reality possible?\n\n\t??Mas...,10,42.0,1.739900e+09,
175,Learnings from my Experience in USA: [BTech ->...,**TLDR:**\n\n1. US immigration and job landsca...,128,40.0,1.733100e+09,
176,The 15 Best (Free to Use) AI Tools for Creatin...,While we wait for ChatGPT to roll out its own ...,458,64.0,1.695663e+09,


The first thing we notice is there's an unnecessary column with NaN (non-numerical, invalid) values. In addition, the timestamp column in UTC isn't in an easily human-readable format, so it needs to be displayed as a human-readable date even though UTC is easier to use for data processing.

In [9]:
df = df.drop('Unnamed: 5', axis=1)

In [10]:
df

Unnamed: 0,Title,Selftext,Score,Number of Comments,Created UTC
0,CMV: People who think that most jobs are going...,I have been lucky enough to have a career path...,705,511.0,1.708275e+09
1,Is this the last presidential election in US h...,[Uber is partnering with Chinese firm BYD to b...,406,230.0,1.727364e+09
2,What will the stock market do if ai creates gr...,I know nobody has a crystal ball and no one ca...,36,166.0,1.726364e+09
3,"I'm Nick Kolakowski, Senior Editor at Dice. AM...","Hi! I?檓 Nick Kolakowski, the Senior Editor of ...",102,143.0,1.718886e+09
4,The Elephant In the Room with AI art and AI Au...,"\n\nAs I see it, the main issue with AI art a...",165,334.0,1.672166e+09
...,...,...,...,...,...
173,What does the future of our work look like in ...,I'm sure we've all been thinking a lot lately ...,18,79.0,1.728255e+09
174,"Blackmen, is this dystopian reality possible?",Is this dystopian reality possible?\n\n\t??Mas...,10,42.0,1.739900e+09
175,Learnings from my Experience in USA: [BTech ->...,**TLDR:**\n\n1. US immigration and job landsca...,128,40.0,1.733100e+09
176,The 15 Best (Free to Use) AI Tools for Creatin...,While we wait for ChatGPT to roll out its own ...,458,64.0,1.695663e+09


Having cleaned our dataset, the next thing to do is check the frequency of specific key terms in our post data, both the titles and main text. We will load our terms from an external file to make this easier to maintain.

Example reference: [GeeksForGeeks Python Read Text File into List or Array - Using List Comprehension](https://www.geeksforgeeks.org/python-read-text-file-into-list-or-array/#using-list-comprehension) for guidance on loading lines from an external text file.

In [12]:
key_terms=[]

with open('terms.txt', 'r') as file:
    key_terms = [line.strip() for line in file]

print(key_terms)

['AI', 'artificial intelligence', 'LLM', 'automation', 'impact', 'unemployment', 'risk', 'loss', 'layoff', 'fired', 'redundant', 'struggling', 'obsolete', 'retrain', 'stress', 'concerned', 'scared', 'frustrated', 'hopeless', 'overwhelmed']


This function is largely hand-written, because an attempt to use [Anaconda Assistant](https://www.anaconda.com/capability/anaconda-assistant) to generate an example resulted in a function with too many parameters, redundant comments, and variable names that don't flow particularly well with the conventions used elsewhere in this notebook. As a result, the cell with AI-generated code was removed and replaced with this hand-written function. Only the most superficial code to lower the case of all characters in a string was reused from generated code. [Anaconda Assistant](https://www.anaconda.com/capability/anaconda-assistant) has been turned off for this notebook and will not be used for anything else. This function is to be reused for both titles and post bodies.

In [14]:
def string_frequency(column='Title'):
    strings_lower = df[column].str.lower()

    frequencies = {}

    for term in key_terms:
        frequencies[term] = 0

    for string in strings_lower:
        split_string = string.split() # split once per loop for efficiency
        for word in split_string:
            for term in key_terms:
                if word == term:
                    frequencies[term] += 1

    return pd.DataFrame.from_dict(frequencies, orient='index')

Verify the columns that will have their text processed.

In [16]:
df["Title"]

0      CMV: People who think that most jobs are going...
1      Is this the last presidential election in US h...
2      What will the stock market do if ai creates gr...
3      I'm Nick Kolakowski, Senior Editor at Dice. AM...
4      The Elephant In the Room with AI art and AI Au...
                             ...                        
173    What does the future of our work look like in ...
174        Blackmen, is this dystopian reality possible?
175    Learnings from my Experience in USA: [BTech ->...
176    The 15 Best (Free to Use) AI Tools for Creatin...
177                       Been Unemployed for 1 Year! ?槓
Name: Title, Length: 178, dtype: object

In [17]:
df["Selftext"]

0      I have been lucky enough to have a career path...
1      [Uber is partnering with Chinese firm BYD to b...
2      I know nobody has a crystal ball and no one ca...
3      Hi! I?檓 Nick Kolakowski, the Senior Editor of ...
4       \n\nAs I see it, the main issue with AI art a...
                             ...                        
173    I'm sure we've all been thinking a lot lately ...
174    Is this dystopian reality possible?\n\n\t??Mas...
175    **TLDR:**\n\n1. US immigration and job landsca...
176    While we wait for ChatGPT to roll out its own ...
177    It's official... I've been unemployed for 1 ye...
Name: Selftext, Length: 178, dtype: object

In [18]:
title_df=string_frequency()

In [19]:
title_df

Unnamed: 0,0
AI,0
artificial intelligence,0
LLM,0
automation,15
impact,5
unemployment,1
risk,0
loss,0
layoff,0
fired,0


In [20]:
selftext_df=string_frequency('Selftext')

AttributeError: 'float' object has no attribute 'split'

In [None]:
display(selftext_df)

Our raw dataframes indicate a disproportionate emphasis on automation, however we must use graphs to make it easier to visualize frequency.