# Wellbeing Police

From various subreddit, we have scraped close to 4000 posts. The subreddits include "r/SuicideWatch", "r/BipolarReddit", "r/Anxiety", "r/AnxietyDepression", "r/Depression", and "r/Happy". Each of the subreddit posts currently sitting within its own csv file. We will see what are the fields that are consistent across the CSV files and choose those applicable. Clean up the data, removed the columns not needed. 

As all the scrapes was created from the same script, the columns of the resulting csv are same across the files. 
We will proceed to label them according to the subreddit they came from and combine them into a single dataframe.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import pandas as pd
import glob
import os

In [3]:
path = "./reddit/csvs/"
all_csvs = glob.glob(os.path.join(path, "*.csv"))

all_dfs = []
for filename in all_csvs:
    df = pd.read_csv(filename, index_col=None, header=0)
    df = df[["title", "subreddit", "body"]]
    all_dfs.append(df)

main_df = pd.concat(all_dfs)
print(main_df.shape)
main_df.head()
    

(8899, 3)


Unnamed: 0,title,subreddit,body
0,Self Help and Self Care Resources,ptsd,Unfortunately this is a small subreddit and as...
1,Survey thread,ptsd,If you have a survey you would like to share w...
2,"PTSD never getting better, don’t want to be al...",ptsd,Made a throwaway account for this obviously.\n...
3,I can’t be bothered with people anymore,ptsd,Why do I have to remind people all the time th...
4,I can't stop peeing my pants,ptsd,This is incredibly embarrassing but I am diagn...


A few of the subreddit is indicating the same underlying mental problems, we proceed to add an attribute such that these minor differentiation will be grouped for better and easier identification.

In [4]:
sub_problem_mapping = {
    "ptsd": "PTSD",
    "Anxiety": "anxiety",
    "AnxietyDepression": "anxiety",
    "SuicideWatch": "suicidal",
    "depression": "depression", 
    "BipolarReddit": "bipolar",
    "schizophrenia": "schizophrenia",
    "EDAnonymous": "eating disorder",
    "EatingDisorders": "eating disorder"
}

main_df["problem"] = [sub_problem_mapping[s] for s in main_df["subreddit"]]
main_df.head()

Unnamed: 0,title,subreddit,body,problem
0,Self Help and Self Care Resources,ptsd,Unfortunately this is a small subreddit and as...,PTSD
1,Survey thread,ptsd,If you have a survey you would like to share w...,PTSD
2,"PTSD never getting better, don’t want to be al...",ptsd,Made a throwaway account for this obviously.\n...,PTSD
3,I can’t be bothered with people anymore,ptsd,Why do I have to remind people all the time th...,PTSD
4,I can't stop peeing my pants,ptsd,This is incredibly embarrassing but I am diagn...,PTSD


### Text Processing
- 

In [5]:
from text_processing import text_processing

In [6]:
clean_text = text_processing()

clean_text.process_data(main_df, headers = ['title', 'subreddit', 'body'])

main_df.head()

Unnamed: 0,title,subreddit,body,problem
0,self help and self care resources,ptsd,unfortunately this is a small subreddit and as...,PTSD
1,survey thread,ptsd,if you have a survey you would like to share w...,PTSD
2,"ptsd never getting better, don’t want to be al...",ptsd,made a throwaway account for this obviously. i...,PTSD
3,i can’t be bothered with people anymore,ptsd,why do i have to remind people all the time th...,PTSD
4,i can't stop peeing my pants,ptsd,this is incredibly embarrassing but i am diagn...,PTSD
