# Wellbeing Police

From various subreddit, we have scraped close to 4000 posts. The subreddits include "r/SuicideWatch", "r/BipolarReddit", "r/Anxiety", "r/AnxietyDepression", "r/Depression", and "r/Happy". Each of the subreddit posts currently sitting within its own csv file. We will see what are the fields that are consistent across the CSV files and choose those applicable. Clean up the data, removed the columns not needed. 

As all the scrapes was created from the same script, the columns of the resulting csv are same across the files. 
We will proceed to label them according to the subreddit they came from and combine them into a single dataframe.

In [39]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [40]:
import pandas as pd
import glob
import os

In [41]:
path = "./reddit/csvs/"
all_csvs = glob.glob(os.path.join(path, "*.csv"))

all_dfs = []
for filename in all_csvs:
    df = pd.read_csv(filename, index_col=None, header=0)
    df = df[["title", "subreddit", "body"]]
    all_dfs.append(df)

main_df = pd.concat(all_dfs)
print(main_df.shape)
main_df.head()
    

(8899, 3)


Unnamed: 0,title,subreddit,body
0,Weekly thread: Do I have an eating disorder?,EatingDisorders,This is a weekly thread to ask about eating di...
1,I hope this message helps someone,EatingDisorders,[https://www.pinterest.com/pin/932245191595710...
2,Should I have this bit of cake? 🤔,EatingDisorders,"Was my boyfriend birthday a few days ago, not ..."
3,Constant illness preventing weight gain in rec...,EatingDisorders,I(25F) have had AN now for over half my life. ...
4,how to stay consistent in eating?,EatingDisorders,"i’m not sure if i have an eating disorder, but..."


A few of the subreddit is indicating the same underlying mental problems, we proceed to add an attribute such that these minor differentiation will be grouped for better and easier identification.

In [42]:
sub_problem_mapping = {
    "ptsd": "PTSD",
    "Anxiety": "anxiety",
    "AnxietyDepression": "anxiety",
    "SuicideWatch": "suicidal",
    "depression": "depression", 
    "BipolarReddit": "bipolar",
    "schizophrenia": "schizophrenia",
    "EDAnonymous": "eating disorder",
    "EatingDisorders": "eating disorder"
}

main_df["problem"] = [sub_problem_mapping[s] for s in main_df["subreddit"]]
main_df.head()

Unnamed: 0,title,subreddit,body,problem
0,Weekly thread: Do I have an eating disorder?,EatingDisorders,This is a weekly thread to ask about eating di...,eating disorder
1,I hope this message helps someone,EatingDisorders,[https://www.pinterest.com/pin/932245191595710...,eating disorder
2,Should I have this bit of cake? 🤔,EatingDisorders,"Was my boyfriend birthday a few days ago, not ...",eating disorder
3,Constant illness preventing weight gain in rec...,EatingDisorders,I(25F) have had AN now for over half my life. ...,eating disorder
4,how to stay consistent in eating?,EatingDisorders,"i’m not sure if i have an eating disorder, but...",eating disorder


In [43]:
from text_processing import text_processing

In [55]:
clean_text = text_processing()

clean_text.process_data(main_df, headers = ['title', 'subreddit', 'body'])

main_df.head()

Unnamed: 0,title,subreddit,body,problem
0,weekly thread: do i have an eating disorder?,eatingdisorders,this is a weekly thread to ask about eating di...,eating disorder
2,should i have this bit of cake?,eatingdisorders,"was my boyfriend birthday a few days ago, not ...",eating disorder
3,constant illness preventing weight gain in rec...,eatingdisorders,i(25f) have had an now for over half my life. ...,eating disorder
4,how to stay consistent in eating?,eatingdisorders,"i’m not sure if i have an eating disorder, but...",eating disorder
5,hypochondriac with an ed. could use advice,eatingdisorders,i'm a textbook hypochondriac and bulimic. sinc...,eating disorder
