# Project 3(2): NLP - Data Cleaning

Done by: Richelle-Joy Chia, a Redditor-and-data-science enthusiast! 

Problem statement: Through natural language processing and classification models, how can we help Reddit and other interested parties classify posts based on the texts used by people who may be depressed or anxious? Furthermore, how can sentiment analysis be utilized to detect emotions associated with depression and anxiety?

## Data cleaning

In [1]:
#import libraries 

import requests
import pandas as pd

In [2]:
# import datasets

anxiety = pd.read_csv('../anxiety.csv')
depression = pd.read_csv('../depression.csv')

In [3]:
# check anxiety dataset

anxiety.head()

Unnamed: 0.1,Unnamed: 0,date_time,subreddit,selftext,title
0,0,2022-10-03 07:50:04,Anxiety,I seem to just be tired all the time at the mo...,Aching across chest and stomach and tiredness
1,1,2022-10-03 07:49:48,Anxiety,"Hey, so I have this problem for quite some tim...",Very cold hands and feet while anxious
2,2,2022-10-03 07:42:30,Anxiety,I keep feeling like I’m losing the sensation i...,Feeling of lost sensation
3,3,2022-10-03 07:41:39,Anxiety,hi im dealing with intense anxiety and depress...,needs someone to talk to..
4,4,2022-10-03 07:38:55,Anxiety,I’ve tried everything cbt therapy antipsychoti...,Nothing helps my anxiety apart from benzodiaze...


In [4]:
# check depression dataset

depression.head()

Unnamed: 0.1,Unnamed: 0,date_time,subreddit,selftext,title
0,0,2022-10-03 07:49:52,depression,I was in the hospital due to depression. I was...,suicidal ideation spectrum
1,1,2022-10-03 07:49:11,depression,It seems like it just gets worse and worse. So...,Need Advice
2,2,2022-10-03 07:46:13,depression,"I don’t know where to begin, but I think I’m n...",I’m giving up
3,3,2022-10-03 07:40:33,depression,[removed],The world is a lonely place
4,4,2022-10-03 07:39:03,depression,[removed],Trigger warning: My friend commited suicide a ...


In [5]:
# check null values on depression dataset 

depression['selftext'].isna().sum()

0

In [6]:
# check null values on anxiety dataset 

anxiety['selftext'].isna().sum()

680

In [7]:
# fill na values in anxiety dataset

anxiety['selftext'] = anxiety['selftext'].fillna('')

# check null values 

anxiety['selftext'].isna().sum()

0

In [8]:
# check shape before combing datasets

print(anxiety.shape)
print(depression.shape)

(15242, 5)
(15239, 5)


In [9]:
# combine both datasets

df = anxiety.append(depression)
df.shape

  df = anxiety.append(depression)


(30481, 5)

In [10]:
# look at df

df.head()

Unnamed: 0.1,Unnamed: 0,date_time,subreddit,selftext,title
0,0,2022-10-03 07:50:04,Anxiety,I seem to just be tired all the time at the mo...,Aching across chest and stomach and tiredness
1,1,2022-10-03 07:49:48,Anxiety,"Hey, so I have this problem for quite some tim...",Very cold hands and feet while anxious
2,2,2022-10-03 07:42:30,Anxiety,I keep feeling like I’m losing the sensation i...,Feeling of lost sensation
3,3,2022-10-03 07:41:39,Anxiety,hi im dealing with intense anxiety and depress...,needs someone to talk to..
4,4,2022-10-03 07:38:55,Anxiety,I’ve tried everything cbt therapy antipsychoti...,Nothing helps my anxiety apart from benzodiaze...


In [11]:
# re-label the subreddit topics to numbers 

df['subreddit'] = df['subreddit'].apply(lambda x: 0 if x == 'Anxiety' else 1)

In [12]:
# drop unnecessary column

df = df.drop(['Unnamed: 0'], axis=1)

In [13]:
# confirm column has been dropped and relabelled 

df.head()

Unnamed: 0,date_time,subreddit,selftext,title
0,2022-10-03 07:50:04,0,I seem to just be tired all the time at the mo...,Aching across chest and stomach and tiredness
1,2022-10-03 07:49:48,0,"Hey, so I have this problem for quite some tim...",Very cold hands and feet while anxious
2,2022-10-03 07:42:30,0,I keep feeling like I’m losing the sensation i...,Feeling of lost sensation
3,2022-10-03 07:41:39,0,hi im dealing with intense anxiety and depress...,needs someone to talk to..
4,2022-10-03 07:38:55,0,I’ve tried everything cbt therapy antipsychoti...,Nothing helps my anxiety apart from benzodiaze...


In [14]:
# combine title and text into 1 column

joined = []
for index, row in df.iterrows():
    join = row['title'] + " " + row['selftext']
    joined.append(join)
joined

['Aching across chest and stomach and tiredness I seem to just be tired all the time at the moment, I also seem to get a fair few chest aches and pains up and down my abdomen that keep making me think oh god I’ve got that horrible illness starting with C, does anyone else experience anything similar to this in terms of the aches and pains and tiredness too ?',
 "Very cold hands and feet while anxious Hey, so I have this problem for quite some time. It all started 3 years ago, somehow I just got more depressed, stressed out and tired, bored with my life, thinking about all existential things and such, half year ago I even had quite strong panic attacks and now I am already in a better circumstances I don't get them that much anymore, however I still get anxious a few times in a week, slight anxiety which is manageable. And what I found throughout these years that whenever I find myself in a panic attack, anxious or such I always get cold hands and feet, and I never had that in my life, 

## Export data to csv

In [15]:
df['joined'] = joined

In [16]:
df.to_csv('./joined.csv', index=False)

In [17]:
df[['selftext','subreddit']].to_csv('./selftext.csv', index=False)
df[['title','subreddit']].to_csv('./title.csv', index=False)