<a href="https://colab.research.google.com/github/kstrickland680/MentalHealthRedditAnalysis/blob/main/DataGathering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import requests
import pandas as pd
import json
import csv
import time
import datetime

In [None]:

import sys
sys.path.append('/content/drive/MyDrive/MentalHealthReddit/')
import config

# Data Gathering
This notebook is dedicated to gathering data from Reddit to be used for analysis. I use the Pushshift API which is used when analyzing large amounts of Reddit data.  

## Helper Functions

In this section, I define helper functions to gather information from Reddit.  I use the pushshift api. Pushshift is a big-data storage project that contains a copy of reddit comments and submissions.  It is more useful than the reddit API alone wen analyzing large quantities of data.  

I define a basic function to get the data (getPushshiftData).  Data queries are limited to 100 returned at a time, so I also create getAllPushshiftData to gather more data than 100 for a search. 

CollectData, collectCommentData, and collectOtherSubmissionData gather the relevant information from the searches and store it as a dictionary entry. 

In [None]:
def getPushshiftData(query, before, sub, sz):
  """
  Returns Reddit Data for the query term in the specified subreddit. 
    Paramaters: 
      query - string: what to search
      before - int: UTC date to gather data before
      sub - string: subreddit to gather data from
      sz - int: number of results  
    Returns:
      Reddit data 
  """
  url = 'https://api.pushshift.io/reddit/search/submission/?title='+str(query)+'&limit='+str(sz)+'&before='+str(before)+'&subreddit='+str(sub)

  print(url)
  r = requests.get(url)
  data = json.loads(r.text)
  return data['data']


In [None]:
def collectData(submission, stats, bpd=False, bipolar=False):
  """
  Collects submission data. 
    Paramaters: 
      submission: the reddit data
      stats: dictionary you want the information added to
      bpd: Whether the data is from the bpd subreddit
      bipolar: Whether the data is from the bipolar reddit
  Appends the gathered information into the dictionary, with the author as key 
  """
  #submission and a dictionary you want it appnded too 
  subData = list() #list to store data points
  title = submission['title']
  url = submission['url']
  try:
      flair = submission['link_flair_text']
  except KeyError:
      flair = "NaN"    
  author = submission['author']
  sub_id = submission['id']
  score = submission['score']
  created = datetime.datetime.fromtimestamp(submission['created_utc']) #1520561700.0
  created_utc = submission['created_utc']
  numComms = submission['num_comments']
  permalink = submission['permalink']
  try:
    selftext = submission['selftext']
  except KeyError:
    selftext = "NaN"
  subreddit = submission['subreddit']
  #author_id = submission['author_fullname']
  if bpd:
    if flair == 'Person w/o BPD':
      return
  if bipolar:
    if ((flair == 'Undiagnosed') or (flair == 'Friend/Family')):
      return

  subData = [sub_id,title,url,score,created,created_utc,numComms,permalink,flair,selftext,subreddit]
  stats[author] = subData

In [None]:
def collectCommentData(submission, stats):
  """
  Collects comment data and adds it to the dictionary. 
  Parameters:
    Submission: Data from redit
    Stats: Dictionary to add the information to. 
  Appends the gathered information into the dictionary, with the comment_ID as key
  """

  #submission and a dictionary you want it appnded too 
  subData = list() #list to store data points
  author = submission['author']
  com_id = submission['id']
  score = submission['score']
  created = datetime.datetime.fromtimestamp(submission['created_utc']) #1520561700.0
  created_utc = submission['created_utc']
  body = submission['body']
  try:
    url = submission['permalink']
  except KeyError:
    url = "NaN"
  subreddit = submission['subreddit']
  #author_id = submission['author_fullname']
 

  subData = [author,score,created,created_utc,url,body,subreddit]
  stats[com_id] = subData

In [None]:
def collectOtherSubmissionData(submission, stats):
  """
  Collects submission data and adds it to the dictionary. 
  *Different from collectData in the key it uses to add the information to the dictionary*  
  Parameters:
    Submission: Data from redit
    Stats: Dictionary to add the information to. 
  Appends the gathered information into the dictionary, with the sub_ID as key
  """
  #submission and a dictionary you want it appnded too 
  subData = list() #list to store data points
  title = submission['title']
  url = submission['url']
  try:
      flair = submission['link_flair_text']
  except KeyError:
      flair = "NaN"    
  author = submission['author']
  sub_id = submission['id']
  score = submission['score']
  created = datetime.datetime.fromtimestamp(submission['created_utc']) #1520561700.0
  created_utc = submission['created_utc']
  numComms = submission['num_comments']
  permalink = submission['permalink']
  try:
    selftext = submission['selftext']
  except KeyError:
    selftext = "NaN"
  subreddit = submission['subreddit']
  #author_id = submission['author_fullname']

  subData = [author, title,url,score,created,created_utc,numComms,permalink,flair,selftext,subreddit]
  stats[sub_id] = subData

In [None]:
def getAllPushshiftData(query, before, sub, size):
    """
  Returns Reddit Data for the query term in the specified subreddit.
  *Different from Pushshift Data in that it runs more than one pushshift search*  
    Paramaters: 
      query - string: what to search
      before - int: UTC date to gather data before
      sub - string: subreddit to gather data from
      sz - int: number of results  
    Returns:
      Reddit data 
  """
  returndata = getPushshiftData(query, before, sub, size)
  #print(type(returndata))
  length = len(returndata)
  newdata = [1]
  while (length < size) and (len(newdata) !=0): 
    newdata = getPushshiftData(query, returndata[-1]['created_utc'], sub, size)
    returndata = returndata + newdata
    length = len(returndata)
    print(len(returndata))
    time.sleep(1)
  return returndata



## Gathering users for the "experimetal" classes. 

In this section, I go through and identify the users whose posts and comments I'll be examining for the "experimental" classes. 

### Borderline Personality Data 

I gather entries to be classified as borderlne personality disorder. Users are classified as belonging to this class if their flair doesn't mark them as a friend/family member or undiagnosed. I store the post that identifies them for the dataset.  1417 users were identified for this class. 

In [None]:

bpd_dict = {}
bpddata = getAllPushshiftData("", 1612069200, 'bpd', 2000)

for submission in bpddata:
  collectData(submission, bpd_dict, bpd=True)

print(str(len(bpd_dict)) + " submissions have added to list")


https://api.pushshift.io/reddit/search/submission/?title=&limit=2000&before=1612069200&subreddit=bpd
https://api.pushshift.io/reddit/search/submission/?title=&limit=2000&before=1612026749&subreddit=bpd
200
https://api.pushshift.io/reddit/search/submission/?title=&limit=2000&before=1611970771&subreddit=bpd
300
https://api.pushshift.io/reddit/search/submission/?title=&limit=2000&before=1611911462&subreddit=bpd
400
https://api.pushshift.io/reddit/search/submission/?title=&limit=2000&before=1611868930&subreddit=bpd
500
https://api.pushshift.io/reddit/search/submission/?title=&limit=2000&before=1611809256&subreddit=bpd
600
https://api.pushshift.io/reddit/search/submission/?title=&limit=2000&before=1611675630&subreddit=bpd
700
https://api.pushshift.io/reddit/search/submission/?title=&limit=2000&before=1611624200&subreddit=bpd
800
https://api.pushshift.io/reddit/search/submission/?title=&limit=2000&before=1611586873&subreddit=bpd
900
https://api.pushshift.io/reddit/search/submission/?title=&l

### Depression
I gather entries to be classified as depressed. Users are classified as belonging to this class if they self-report themselves as diagnosed in a post in the subreddit depression. I store the post that identifies them for the dataset. 1009 users were identified for this class.

In [None]:
dep_search = ['"I was diagnosed with depression"', '"My depression"', '"I was diagnosed with MDD"', 
              '"I am diagnosed with depression"']

dep_dict = {}
depdata = []

for item in dep_search:
  depdata2 = getAllPushshiftData(item, 1612069200, 'depression', 1000)
  type(depdata2)
  depdata = depdata+depdata2

for submission in depdata:
  collectData(submission, dep_dict)

print(str(len(dep_dict)) + " submissions have added to list")



https://api.pushshift.io/reddit/search/submission/?title="I was diagnosed with depression"&limit=1000&before=1612069200&subreddit=depression
https://api.pushshift.io/reddit/search/submission/?title="I was diagnosed with depression"&limit=1000&before=1299378808&subreddit=depression
66
https://api.pushshift.io/reddit/search/submission/?title="My depression"&limit=1000&before=1612069200&subreddit=depression
https://api.pushshift.io/reddit/search/submission/?title="My depression"&limit=1000&before=1610807469&subreddit=depression
200
https://api.pushshift.io/reddit/search/submission/?title="My depression"&limit=1000&before=1609506482&subreddit=depression
300
https://api.pushshift.io/reddit/search/submission/?title="My depression"&limit=1000&before=1607980399&subreddit=depression
400
https://api.pushshift.io/reddit/search/submission/?title="My depression"&limit=1000&before=1606382235&subreddit=depression
500
https://api.pushshift.io/reddit/search/submission/?title="My depression"&limit=1000&

In [None]:
dep_dict

{'Funny_Eyebrow': ['k3yk8o',
  'After breakup, I feel completely numb, is this normal? I was diagnosed with depression 3 years ago.',
  'https://www.reddit.com/r/depression/comments/k3yk8o/after_breakup_i_feel_completely_numb_is_this/',
  1,
  datetime.datetime(2020, 11, 30, 16, 0, 33),
  1606752033,
  0,
  '/r/depression/comments/k3yk8o/after_breakup_i_feel_completely_numb_is_this/',
  'NaN',
  '[removed]',
  'depression'],
 'leafy_and_lethal': ['j950tb',
  'Friday I was diagnosed with depression and today is my first day on meds',
  'https://www.reddit.com/r/depression/comments/j950tb/friday_i_was_diagnosed_with_depression_and_today/',
  1,
  datetime.datetime(2020, 10, 11, 13, 1, 39),
  1602421299,
  0,
  '/r/depression/comments/j950tb/friday_i_was_diagnosed_with_depression_and_today/',
  'NaN',
  "Hello everyone? I'm new here! \n\nThe person who gave me the diagnosis said on the outside I'm super bubbly. Then barely under the surface I am a sad person. I am sad. I felt that I wasn'

### Bipolar

I gather entries to be classified as bipolar. Users are classified as belonging to this class if their flair doesn't mark them as a friend/family member or undiagnosed when they posted in the subreddit "BipolarReddit". I store the post that identifies them for the dataset.  1572 users were identified for this class. 

In [None]:
bipolar_dict = {}

bipolardata = getAllPushshiftData("", 1612069200, 'BipolarReddit', 3000)


for submission in bipolardata:
  collectData(submission, bipolar_dict, bipolar=True)

print(str(len(bipolar_dict)) + " submissions have added to list")


https://api.pushshift.io/reddit/search/submission/?title=&limit=3000&before=1612069200&subreddit=BipolarReddit
https://api.pushshift.io/reddit/search/submission/?title=&limit=3000&before=1611806251&subreddit=BipolarReddit
200
https://api.pushshift.io/reddit/search/submission/?title=&limit=3000&before=1611480420&subreddit=BipolarReddit
300
https://api.pushshift.io/reddit/search/submission/?title=&limit=3000&before=1611275350&subreddit=BipolarReddit
400
https://api.pushshift.io/reddit/search/submission/?title=&limit=3000&before=1611022458&subreddit=BipolarReddit
500
https://api.pushshift.io/reddit/search/submission/?title=&limit=3000&before=1610783624&subreddit=BipolarReddit
600
https://api.pushshift.io/reddit/search/submission/?title=&limit=3000&before=1610531078&subreddit=BipolarReddit
700
https://api.pushshift.io/reddit/search/submission/?title=&limit=3000&before=1610339886&subreddit=BipolarReddit
800
https://api.pushshift.io/reddit/search/submission/?title=&limit=3000&before=16101100

### PTSD

I gather entries to be classified as PTSD. Users are classified as belonging to this class if they self-report themselves as diagnosed in a post in the subreddit "ptsd". I store the post that identifies them for the dataset. 1090 users were identified for this class.

In [None]:
ptsd_search = ['"I was diagnosed with PTSD"', '"My PTSD"', '"I am diagnosed with PTSD"', '"I have PTSD"']

ptsd_dict = {}
ptsddata = []

for item in ptsd_search:
  ptsddata2 = getAllPushshiftData(item, 1612069200, 'ptsd', 1000)
  ptsddata = ptsddata+ptsddata2

for submission in ptsddata:
  collectData(submission, ptsd_dict)

print(str(len(ptsd_dict)) + " submissions have added to list")


https://api.pushshift.io/reddit/search/submission/?title="I was diagnosed with PTSD"&limit=1000&before=1612069200&subreddit=ptsd
https://api.pushshift.io/reddit/search/submission/?title="I was diagnosed with PTSD"&limit=1000&before=1378877692&subreddit=ptsd
39
https://api.pushshift.io/reddit/search/submission/?title="My PTSD"&limit=1000&before=1612069200&subreddit=ptsd
https://api.pushshift.io/reddit/search/submission/?title="My PTSD"&limit=1000&before=1593560398&subreddit=ptsd
200
https://api.pushshift.io/reddit/search/submission/?title="My PTSD"&limit=1000&before=1576215435&subreddit=ptsd
300
https://api.pushshift.io/reddit/search/submission/?title="My PTSD"&limit=1000&before=1562012318&subreddit=ptsd
400
https://api.pushshift.io/reddit/search/submission/?title="My PTSD"&limit=1000&before=1535813478&subreddit=ptsd
500
https://api.pushshift.io/reddit/search/submission/?title="My PTSD"&limit=1000&before=1480348328&subreddit=ptsd
600
https://api.pushshift.io/reddit/search/submission/?ti

### ADHD

I gather entries to be classified as ADHD. Users are classified as belonging to this class if they self-report themselves as diagnosed in a post in the subreddit "ADHD". I store the post that identifies them for the dataset. 1085 users were identified for this class.

In [None]:
adhd_search = ['"I was diagnosed with ADHD"', '"My ADHD"', '"I am diagnosed with ADHD"']

adhd_dict = {}
adhddata = []

for item in adhd_search:
  adhddata2 = getAllPushshiftData(item, 1612069200, 'ADHD', 1000)
  adhddata = adhddata+adhddata2

for submission in adhddata:
  collectData(submission, adhd_dict)

print(str(len(adhd_dict)) + " submissions have added to list")


https://api.pushshift.io/reddit/search/submission/?title="I was diagnosed with ADHD"&limit=1000&before=1612069200&subreddit=ADHD
https://api.pushshift.io/reddit/search/submission/?title="I was diagnosed with ADHD"&limit=1000&before=1531282466&subreddit=ADHD
182
https://api.pushshift.io/reddit/search/submission/?title="I was diagnosed with ADHD"&limit=1000&before=1316155132&subreddit=ADHD
182
https://api.pushshift.io/reddit/search/submission/?title="My ADHD"&limit=1000&before=1612069200&subreddit=ADHD
https://api.pushshift.io/reddit/search/submission/?title="My ADHD"&limit=1000&before=1610652742&subreddit=ADHD
200
https://api.pushshift.io/reddit/search/submission/?title="My ADHD"&limit=1000&before=1609122005&subreddit=ADHD
300
https://api.pushshift.io/reddit/search/submission/?title="My ADHD"&limit=1000&before=1607775641&subreddit=ADHD
400
https://api.pushshift.io/reddit/search/submission/?title="My ADHD"&limit=1000&before=1606439500&subreddit=ADHD
500
https://api.pushshift.io/reddit/se

### Anxiety 

I gather entries to be classified as anxious. For anxious, I do not try to base on a specific diagnosis, but instead identify users that report struggling with anxiety in the subreddit "anxiety".  I store the post that identifies them for the dataset. 1963 users were identified for this class.

In [None]:
anxiety_search = ['"I was diagnosed with anxiety"', '"My anxiety"', 
               '"I have anxiety"', '"I was diagnosed with GAD"']

anxiety_dict = {}
anxietydata = []

for item in anxiety_search:
  anxietydata2 = getAllPushshiftData(item, 1612069200, 'anxiety', 1000)
  anxietydata = anxietydata+anxietydata2

for submission in anxietydata:
  collectData(submission, anxiety_dict)

print(str(len(anxiety_dict)) + " submissions have added to list")

https://api.pushshift.io/reddit/search/submission/?title="I was diagnosed with anxiety"&limit=1000&before=1612069200&subreddit=anxiety
https://api.pushshift.io/reddit/search/submission/?title="I was diagnosed with anxiety"&limit=1000&before=1346298827&subreddit=anxiety
14
https://api.pushshift.io/reddit/search/submission/?title="My anxiety"&limit=1000&before=1612069200&subreddit=anxiety
https://api.pushshift.io/reddit/search/submission/?title="My anxiety"&limit=1000&before=1610842016&subreddit=anxiety
200
https://api.pushshift.io/reddit/search/submission/?title="My anxiety"&limit=1000&before=1609810736&subreddit=anxiety
300
https://api.pushshift.io/reddit/search/submission/?title="My anxiety"&limit=1000&before=1608626295&subreddit=anxiety
400
https://api.pushshift.io/reddit/search/submission/?title="My anxiety"&limit=1000&before=1607548814&subreddit=anxiety
500
https://api.pushshift.io/reddit/search/submission/?title="My anxiety"&limit=1000&before=1606509945&subreddit=anxiety
600
https

In [None]:
anxiety_dict.keys()

dict_keys(['finessegangx', 'VanStrategist', 'polin_da', 'bobobaa', 'RyanJKaz', 'unofficialmoderator', '123ww55ssopa', 'exceptionallybland', 'kaupa_lupa', 'TripleGenesis', 'Daniaximehc', 'eddie90100', 'chunky_monkey96', '[deleted]', 'DaughterOfRageNLove_', 'keeperoftheworld69', 'iamreallytired123', 'throw4w4ygirl', 'Happy_Artichoke_108', 'OtherwiseCranberry88', 'bab_101', 'IamAlextric', 'SlipOutrageous5333', 'Aztec690', 'yungsaturn827', 'april_eleven', 'sylvia_bloodbath', 'bearface_stan', 'ehnotreallyokay', 'little-big-endian', 'MindsetMoment', 'EclecticCuriosity', 'Feeling-Yak7354', 'DeadHallow', 'Hacibay', 'f30bim', 'Secure-Imagination11', 'ChaseSpike11', 'winnie_bago', 'WisdomLove123', '_ActualHumanGarbage', 'Aromatic_Industry_73', 'Petrichor1995', 'lemonsandkevins', 'jahb0i', 'floorgang_gang', 'ksyrup', 'mtaylorcs', 'coffeegorl22', 'vanillacupcakess', 'ilovemycutechins', 'Sick_Bubbl3gum', 'Somepenguinsss', 'kaphwin', 'Eucharism', 'sweetpeabb', 'MrAlexander18', 'ilikesims4', 'ArcticE

## Protect Usernames for confidentiality purposes

In this section, I encode the usernames of those in the "experimental" classes in order to provide some confidentiality. 

In [None]:
#delete users that appear in multiple dictoinaries.  I chose to do this because the overlap was less than 100
data_dictionaries = [anxiety_dict, adhd_dict, bpd_dict, bipolar_dict, dep_dict, ptsd_dict]
username_list = list(anxiety_dict.keys())+list(adhd_dict.keys()) + list(bpd_dict.keys()) + list( bipolar_dict.keys()) + list(dep_dict.keys()) + list(ptsd_dict.keys())

import collections
duplicates = [item for item, count in collections.Counter(username_list).items() if count > 1]

for entry in data_dictionaries:
  for duplicate in duplicates:
    if duplicate in entry.keys():
      entry.pop(duplicate)

#make new username_list 
username_list = list(anxiety_dict.keys())+list(adhd_dict.keys()) + list(bpd_dict.keys()) + list( bipolar_dict.keys()) + list(dep_dict.keys()) + list(ptsd_dict.keys())
print(len(username_list))
print(len(set(username_list)))

7994
7994


In [None]:
def codeUsernames(names):
  """
  creates a dictionary mapping each name to a number representing it.  Returns the dictionary. 
  """
  username_set = set(names)
  #enumerate returns 2 variables, the count and the value of the item at that iteration 
  token_map = {name: i for i, name in enumerate(username_set)} 
  return token_map


In [None]:
#dictionary of encoded usernames
username_encoded = codeUsernames(username_list)


In [None]:
username_encoded.keys()



In [None]:
def getName(val):
  """When given the number, return the username associated with it"""
  for key, value in username_encoded.items():
    if val == value:
      return key
  return "NaN"

In [None]:
def anonymize_data(data):
  """
  takes in dictionary, and replaces the key (username) with the encoding of the username
  """
  old_names = list(data.keys())
  for name in old_names:
    new_key = username_encoded.get(name)
    data[new_key]=data.pop(name)

In [None]:
# anonymizing all the data 

data_dictionaries = [anxiety_dict, adhd_dict, bpd_dict, bipolar_dict, dep_dict, ptsd_dict]

for entry in data_dictionaries:
  anonymize_data(entry)

dep_dict.keys()

dict_keys([778, 3984, 566, 2704, 3802, 5682, 6973, 660, 2109, 2749, 5276, 3356, 1292, 5032, 2430, 2981, 1384, 4082, 7215, 1698, 1876, 5501, 2938, 6870, 1575, 5085, 2292, 5349, 3978, 5940, 4487, 2852, 6240, 4601, 1152, 931, 6537, 636, 5136, 2807, 6303, 5998, 6376, 7177, 4067, 1230, 4532, 6531, 3979, 5415, 1842, 5126, 3301, 7018, 3653, 7515, 5332, 167, 7641, 728, 5377, 393, 7231, 4350, 592, 6078, 4587, 6370, 75, 6313, 5517, 3137, 5555, 4117, 6036, 907, 4710, 5273, 5622, 6573, 1316, 6145, 4351, 1996, 3555, 5410, 3498, 5626, 3601, 5985, 600, 3061, 3475, 537, 5528, 5390, 5185, 7799, 1534, 7003, 264, 3157, 7088, 661, 785, 7451, 3258, 3440, 5098, 7706, 6956, 6414, 4245, 2692, 6317, 4238, 5606, 6740, 6202, 1488, 6802, 706, 1740, 5308, 3918, 3494, 3577, 543, 4608, 4663, 5755, 1336, 6304, 4764, 6218, 7090, 2116, 982, 6285, 4057, 6212, 5402, 3829, 1653, 4174, 2742, 876, 5033, 7895, 7448, 6842, 4282, 2449, 6529, 7815, 7453, 4379, 47, 1722, 6189, 3166, 198, 1349, 1061, 6696, 4358, 6882, 7666, 6785,

## Creating dataframes for experimental cases 

I create a dataframe for each group and write it to file. 

In [None]:
def createDataFrame(data):
  """
  Creates a dataframe from a dictionary, set up for submissions
  """
  return_df = pd.DataFrame.from_dict(data, orient='index', columns=
                                    ['sub_id', 'title', 'url', 'score', 'created', 'created_utc',
                                     'numComments', 'permalink', 'flair', 'selfext', 'subreddit'])
  return_df.reset_index(inplace=True)
  return_df = return_df.rename(columns = {'index': 'author'})
  return return_df

In [None]:

anxiety_df = createDataFrame(anxiety_dict)
adhd_df = createDataFrame(adhd_dict)
bpd_df = createDataFrame(bpd_dict)
bipolar_df = createDataFrame(bipolar_dict)
dep_df = createDataFrame(dep_dict)
ptsd_df = createDataFrame(ptsd_dict)

ptsd_df.head()

Unnamed: 0,author,sub_id,title,url,score,created,created_utc,numComments,permalink,flair,selfext,subreddit
0,7778,izgr81,I was diagnosed with PTSD today,https://www.reddit.com/r/ptsd/comments/izgr81/...,1,2020-09-25 09:57:04,1601027824,0,/r/ptsd/comments/izgr81/i_was_diagnosed_with_p...,,So I got an official detailed diagnosis from a...,ptsd
1,787,i5ijy1,I was diagnosed with PTSD yesterday,https://www.reddit.com/r/ptsd/comments/i5ijy1/...,1,2020-08-07 17:55:33,1596822933,4,/r/ptsd/comments/i5ijy1/i_was_diagnosed_with_p...,,I went to a psychiatrist a day ago and after a...,ptsd
2,5895,hopdot,"So, I was diagnosed with PTSD recently...",https://www.reddit.com/r/ptsd/comments/hopdot/...,1,2020-07-10 13:57:12,1594389432,8,/r/ptsd/comments/hopdot/so_i_was_diagnosed_wit...,,I don’t really know how to handle the diagnosi...,ptsd
3,4880,hnub4q,Today I was diagnosed with PTSD.,https://www.reddit.com/r/ptsd/comments/hnub4q/...,1,2020-07-09 01:49:55,1594259395,13,/r/ptsd/comments/hnub4q/today_i_was_diagnosed_...,,TW for brief abuse/addiction/death mention\n\n...,ptsd
4,3733,hmhqox,"I was diagnosed with PTSD today, what are my n...",https://www.reddit.com/r/ptsd/comments/hmhqox/...,1,2020-07-06 22:11:09,1594073469,5,/r/ptsd/comments/hmhqox/i_was_diagnosed_with_p...,,"in 2015, I was nearly murdered and from 2010-2...",ptsd


In [None]:
dep_df.head()

Unnamed: 0,author,sub_id,title,url,score,created,created_utc,numComments,permalink,flair,selfext,subreddit
0,2477,k3yk8o,"After breakup, I feel completely numb, is this...",https://www.reddit.com/r/depression/comments/k...,1,2020-11-30 16:00:33,1606752033,0,/r/depression/comments/k3yk8o/after_breakup_i_...,,[removed],depression
1,7763,j950tb,Friday I was diagnosed with depression and tod...,https://www.reddit.com/r/depression/comments/j...,1,2020-10-11 13:01:39,1602421299,0,/r/depression/comments/j950tb/friday_i_was_dia...,,Hello everyone? I'm new here! \n\nThe person w...,depression
2,6624,ija3bc,i was diagnosed with depression a couple years...,https://www.reddit.com/r/depression/comments/i...,1,2020-08-30 09:29:48,1598779788,2,/r/depression/comments/ija3bc/i_was_diagnosed_...,,most of the time when i look up what depressio...,depression
3,2834,ij8i0i,I was diagnosed with depression about a year a...,https://www.reddit.com/r/depression/comments/i...,1,2020-08-30 06:54:23,1598770463,3,/r/depression/comments/ij8i0i/i_was_diagnosed_...,,I had major depression about a year ago with m...,depression
4,4168,hyh2ac,Should I tell my brother and sisters that I wa...,https://www.reddit.com/r/depression/comments/h...,1,2020-07-26 23:21:35,1595805695,1,/r/depression/comments/hyh2ac/should_i_tell_my...,,So I (30m) was diagnosed with depression and a...,depression


In [None]:
cd drive/


/content/drive


In [None]:
ls

[0m[01;34mMyDrive[0m/  [01;34mShareddrives[0m/


In [None]:
cd MyDrive/

/content/drive/MyDrive


In [None]:
 cd MentalHealthReddit/

/content/drive/MyDrive/MentalHealthReddit


In [None]:
cd Data/

/content/drive/MyDrive/MentalHealthReddit/Data


In [None]:
anxiety_df.to_csv('anxiety.csv')

In [None]:
adhd_df.to_csv('adhd.csv')

In [None]:
bpd_df.to_csv('bpd.csv')

In [None]:
bipolar_df.to_csv('bipolar.csv')

In [None]:
dep_df.to_csv('dep.csv')

In [None]:
ptsd_df.to_csv('ptsd.csv')

# Get Data from other subreddits
I write helper functions to gather comments and submissions from the users identified in the above section. 


In [None]:
def getOtherComments(author, after):
""" 
Get comment pushshift data (reddit data) for the given author after the given time. 
Returns the data
"""
  url = 'https://api.pushshift.io/reddit/search/comment/?author='+str(author)+'&after='+str(after)+'&size=100'
  print(url)
  r = requests.get(url)
  try:
    data = json.loads(r.text)
    return data['data']
  except ValueError:
    return ['NaN']
  return data['data']

In [None]:
def getOtherSubmissions(author, after):
  """
  Get submission pushshift data (reddit data) for the given author after the given time. 
  Returns the data
  """

  url = 'https://api.pushshift.io/reddit/search/submission/?author='+str(author)+'&after='+str(after)+'&size=100'
  print(url)
  r = requests.get(url)
  print("passed request")
  try:
    data = json.loads(r.text)
    return data['data']
  except ValueError:
    return ['NaN']
  return data['data']

In [None]:
 def getOtherCommentData(df):
   """
   Takes in a dataframe, and uses the author and date columns to search for more comments by that user
   Returns a dictionary with the new information
   """
  #takes in the dataframe 
  comment_dict = {}
  for user, time in zip(df['author'], df['created_utc']):
    username=getName(user)
    if (username != 'NaN') or (username != None): 
      data = getOtherComments(username, time)
      for comment in data:
        if comment != 'NaN':
          collectCommentData(comment, comment_dict)
  return comment_dict

In [None]:
def getOtherSubmissionData(df):
   """
   Takes in a dataframe, and uses the author and date columns to search for more submissions by that user
   Returns a dictionary with the new information
   """
  submission_dict = {}
  for user, time in zip(df['author'], df['created_utc']):
    username=getName(user)
    data = getOtherSubmissions(username, time)
    for submission in data:
      if submission != 'NaN':
        #collectData(submission, submission_dict)
        collectOtherSubmissionData(submission, submission_dict)
  return submission_dict

In [None]:
def anonymize_submissions(data):
  """
  Takes in a dictionary, anonymizes the username
  """
  for key in data:
    tempname = data.get(key)[0]
    codename = username_encoded.get(tempname)
    data.get(key)[0] = codename


In [None]:
alter_anxietycom = getOtherCommentData(anxiety_df)

https://api.pushshift.io/reddit/search/comment/?author=finessegangx&after=1605823041&size=100
https://api.pushshift.io/reddit/search/comment/?author=VanStrategist&after=1601773048&size=100
https://api.pushshift.io/reddit/search/comment/?author=polin_da&after=1585360223&size=100
https://api.pushshift.io/reddit/search/comment/?author=bobobaa&after=1581643539&size=100
https://api.pushshift.io/reddit/search/comment/?author=RyanJKaz&after=1576678024&size=100
https://api.pushshift.io/reddit/search/comment/?author=unofficialmoderator&after=1547585723&size=100
https://api.pushshift.io/reddit/search/comment/?author=exceptionallybland&after=1572559957&size=100
https://api.pushshift.io/reddit/search/comment/?author=kaupa_lupa&after=1552233921&size=100
https://api.pushshift.io/reddit/search/comment/?author=TripleGenesis&after=1549901547&size=100
https://api.pushshift.io/reddit/search/comment/?author=Daniaximehc&after=1548195120&size=100
https://api.pushshift.io/reddit/search/comment/?author=eddie9

In [None]:
alter_anxietysub = getOtherSubmissionData(anxiety_df)


https://api.pushshift.io/reddit/search/submission/?author=finessegangx&after=1605823041&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=VanStrategist&after=1601773048&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=polin_da&after=1585360223&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=bobobaa&after=1581643539&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=RyanJKaz&after=1576678024&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=unofficialmoderator&after=1547585723&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=exceptionallybland&after=1572559957&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=kaupa_lupa&after=1552233921&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=TripleGenesis&after=1549901547&size=100
passed 

In [None]:
anonymize_submissions(alter_anxietysub)

In [None]:
anonymize_submissions(alter_anxietycom)

In [None]:
def createDataFrameSubs(data):
  """
  Takes in a dictionary and returns a dataframe, set up for Submissions
  """

  return_df = pd.DataFrame.from_dict(data, orient='index', columns=
                                    ['author', 'title', 'url', 'score', 'created', 'created_utc',
                                     'numComms', 'permalink', 'flair', 'selftex', 'subreddit'])
  return_df.reset_index(inplace=True)
  return_df = return_df.rename(columns = {'index': 'sub_id'})
  return return_df


In [None]:
def createDataFrameComments(data):
  """
  Takes in a dictionary and returns a dataframe, set up for Comments
  """
  return_df = pd.DataFrame.from_dict(data, orient='index', columns=
                                    ['author', 'score', 'created', 'created_utc',
                                     'url', 'body', 'subreddit'])
  return_df.reset_index(inplace=True)
  return_df = return_df.rename(columns = {'index': 'com_id'})
  return return_df


In [None]:
anxiety_subs_df = createDataFrameSubs(alter_anxietysub)
anxiety_subs_df.head()

Unnamed: 0,sub_id,author,title,url,score,created,created_utc,numComms,permalink,flair,selftex,subreddit
0,lhukth,3045,Someone drop her OF stuff,https://i.redd.it/l7ro2rawwwg61.jpg,1,2021-02-11 20:58:26,1613077106,4,/r/Leximulaaa/comments/lhukth/someone_drop_her...,,,Leximulaaa
1,lm55xj,3045,Quick Question on fake percs,https://www.reddit.com/r/opiates/comments/lm55...,1,2021-02-17 21:27:01,1613597221,17,/r/opiates/comments/lm55xj/quick_question_on_f...,,So a homie of mine got some suspect looking pe...,opiates
2,mf6nri,3045,Got about 10-15 pics of her dm me. Selling the...,https://i.redd.it/rfq9fcnmatp61.jpg,1,2021-03-28 18:25:28,1616955928,0,/r/JessVinci/comments/mf6nri/got_about_1015_pi...,,,JessVinci
3,mkvmp8,3045,Noob,https://www.reddit.com/r/HappyEndingMassage/co...,1,2021-04-05 21:39:26,1617658766,5,/r/HappyEndingMassage/comments/mkvmp8/noob/,,"I’m 23, short and people have said I looked li...",HappyEndingMassage
4,mmf49y,3045,Scored First Time,https://www.reddit.com/r/HappyEndingMassage/co...,1,2021-04-07 23:45:20,1617839120,3,/r/HappyEndingMassage/comments/mmf49y/scored_f...,,I’ve never done this before but decided to giv...,HappyEndingMassage


In [None]:
anxiety_coms_df = createDataFrameComments(alter_anxietycom)
anxiety_coms_df.head(20)

Unnamed: 0,com_id,author,score,created,created_utc,url,body,subreddit
0,gcvp6f9,3045,2,2020-11-19 22:36:34,1605825394,/r/Anxiety/comments/jxc8vp/how_come_after_i_wa...,Yeah you are right it’s probably my brain anti...,Anxiety
1,gcw1cwq,3045,1,2020-11-20 00:10:50,1605831050,/r/latinas/comments/jxdt28/what_do_you_think_a...,Weak,latinas
2,gcw94v3,3045,1,2020-11-20 01:22:45,1605835365,/r/Anxiety/comments/jxeq5h/can_anxiety_general...,Yes. For me when my anxiety started after havi...,Anxiety
3,gcyec2m,3045,1,2020-11-20 16:42:34,1605890554,/r/DaniellePertusiello/comments/jsu516/hooters...,What’s her new ig,DaniellePertusiello
4,gcyoq7d,3045,1,2020-11-20 18:04:59,1605895499,/r/Anxiety/comments/jxtpgy/always_lightheadedd...,Yes I find myself kind of light headed and diz...,Anxiety
5,ghiu03o,3045,1,2020-12-30 18:36:01,1609353361,/r/damngoodinterracial/comments/kn6udk/big_boo...,Sauce?,damngoodinterracial
6,ghiv9vc,3045,1,2020-12-30 18:46:23,1609353983,/r/InterracialTeenSex/comments/kig7x0/white_gi...,Source,InterracialTeenSex
7,gmzegtw,3045,1,2021-02-11 20:58:44,1613077124,/r/Leximulaaa/comments/lhukth/someone_drop_her...,I need to see more of this lil thot,Leximulaaa
8,gmzwizx,3045,1,2021-02-11 23:14:47,1613085287,/r/Leximulaaa/comments/lhukth/someone_drop_her...,It’s on her IG,Leximulaaa
9,gmzwke5,3045,1,2021-02-11 23:15:06,1613085306,/r/Leximulaaa/comments/l7wpa7/rleximulaaa_loun...,We need her onlyfans,Leximulaaa


In [None]:
from google.colab import drive

In [None]:
anxiety_subs_df.to_csv('anxiey_subs.csv')

In [None]:

anxiety_coms_df.to_csv('anxiety_coms.csv')

In [None]:
alter_ptsdcom = getOtherCommentData(ptsd_df)

https://api.pushshift.io/reddit/search/comment/?author=off0noff&after=1601027824&size=100
https://api.pushshift.io/reddit/search/comment/?author=-NeoRoseBud-&after=1596822933&size=100
https://api.pushshift.io/reddit/search/comment/?author=alicesann&after=1594389432&size=100
https://api.pushshift.io/reddit/search/comment/?author=not_monica&after=1594259395&size=100
https://api.pushshift.io/reddit/search/comment/?author=beatmethefuckupbro&after=1594073469&size=100
https://api.pushshift.io/reddit/search/comment/?author=TheActualMonthOfJune&after=1592288923&size=100
https://api.pushshift.io/reddit/search/comment/?author=yodawastall&after=1590522228&size=100
https://api.pushshift.io/reddit/search/comment/?author=thetreewasblue&after=1587969634&size=100
https://api.pushshift.io/reddit/search/comment/?author=w0rldsosnow&after=1576464624&size=100
https://api.pushshift.io/reddit/search/comment/?author=Fatsohuggingbear&after=1576842312&size=100
https://api.pushshift.io/reddit/search/comment/?aut

In [None]:
alter_ptsdsub = getOtherSubmissionData(ptsd_df)

https://api.pushshift.io/reddit/search/submission/?author=off0noff&after=1601027824&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=-NeoRoseBud-&after=1596822933&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=alicesann&after=1594389432&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=not_monica&after=1594259395&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=beatmethefuckupbro&after=1594073469&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=TheActualMonthOfJune&after=1592288923&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=yodawastall&after=1590522228&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=thetreewasblue&after=1587969634&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=w0rldsosnow&after=1576464624&size=100
pa

In [None]:
anonymize_submissions(alter_ptsdcom)
anonymize_submissions(alter_ptsdsub)

In [None]:
ptsd_subs_df = createDataFrameSubs(alter_ptsdsub)
ptsd_coms_df = createDataFrameComments(alter_ptsdcom)
ptsd_subs_df.head(10)

Unnamed: 0,sub_id,author,title,url,score,created,created_utc,numComms,permalink,flair,selftex,subreddit
0,j0jgjy,1599,"""I don't care if your supervisor says the same...",https://www.reddit.com/r/talesfromcallcenters/...,1,2020-09-27 02:55:03,1601175303,12,/r/talesfromcallcenters/comments/j0jgjy/i_dont...,L,I work in as a customer service representative...,talesfromcallcenters
1,j49w03,1599,"""Your company's customer service has been real...",https://www.reddit.com/r/talesfromcallcenters/...,1,2020-10-03 04:50:24,1601700624,96,/r/talesfromcallcenters/comments/j49w03/your_c...,S,"""I assume it's because of corona and you guys ...",talesfromcallcenters
2,j5fxu5,1599,Customer thinks that by threatening to give me...,https://www.reddit.com/r/talesfromcallcenters/...,1,2020-10-05 08:47:53,1601887673,0,/r/talesfromcallcenters/comments/j5fxu5/custom...,M,TLDR this guy's booking got declined because h...,talesfromcallcenters
3,jftmzb,1599,I want to resign but I can't,https://www.reddit.com/r/callcentres/comments/...,1,2020-10-22 05:23:05,1603344185,5,/r/callcentres/comments/jftmzb/i_want_to_resig...,,"So I was retreched around May, but I got calle...",callcentres
4,kohnh9,1599,My mother accidentally drove over a cat and I ...,https://www.reddit.com/r/offmychest/comments/k...,1,2021-01-01 19:48:19,1609530499,2,/r/offmychest/comments/kohnh9/my_mother_accide...,,She basically didn't notice that there was any...,offmychest
5,koig5s,1599,My mother drove over a cat &amp; I feel trauma...,https://www.reddit.com/r/TrueOffMyChest/commen...,1,2021-01-01 20:31:20,1609533080,2,/r/TrueOffMyChest/comments/koig5s/my_mother_dr...,,* posting here aswell because this sub seems m...,TrueOffMyChest
6,kpa5ps,1599,"We told the MM to take lord, he refused and th...",https://i.redd.it/fsujv31bo0961.jpg,1,2021-01-03 01:04:22,1609635862,0,/r/mobilelegends/comments/kpa5ps/we_told_the_m...,Gameplay,,mobilelegends
7,ktfgsi,1599,I am getting an authorisation error whenever I...,https://i.redd.it/j9xz7sx087a61.jpg,1,2021-01-09 00:09:43,1610150983,0,/r/CallOfDutyMobile/comments/ktfgsi/i_am_getti...,Support,,CallOfDutyMobile
8,kwsoav,1599,"I keep receiving this authorisation error, ple...",https://i.redd.it/o2tjhrshn6b61.jpg,1,2021-01-13 23:18:46,1610579926,0,/r/CallOfDutyMobile/comments/kwsoav/i_keep_rec...,Support,,CallOfDutyMobile
9,l6t52c,1599,Thoughts on quitting therapy,https://www.reddit.com/r/Advice/comments/l6t52...,1,2021-01-28 10:01:17,1611828077,3,/r/Advice/comments/l6t52c/thoughts_on_quitting...,,"For a little bit of a background, I've been di...",Advice


In [None]:
ptsd_subs_df.to_csv('ptsd_subs.csv')

In [None]:
ptsd_coms_df.to_csv('ptsd_coms.csv')

In [None]:
alter_bipolarcom = getOtherCommentData(bipolar_df)

https://api.pushshift.io/reddit/search/comment/?author=DaemonWorshiper&after=1612069164&size=100
https://api.pushshift.io/reddit/search/comment/?author=aloneagainnaturally1&after=1606921104&size=100
https://api.pushshift.io/reddit/search/comment/?author=cerebral_thunders&after=1612062258&size=100
https://api.pushshift.io/reddit/search/comment/?author=mo282&after=1612053523&size=100
https://api.pushshift.io/reddit/search/comment/?author=jibberjabbery&after=1604358114&size=100
https://api.pushshift.io/reddit/search/comment/?author=CurveGalaxy&after=1610675980&size=100
https://api.pushshift.io/reddit/search/comment/?author=picodegalloyum7&after=1604286638&size=100
https://api.pushshift.io/reddit/search/comment/?author=midheaven111&after=1609342272&size=100
https://api.pushshift.io/reddit/search/comment/?author=twotimesbi&after=1612042553&size=100
https://api.pushshift.io/reddit/search/comment/?author=throwawayaccnt34567&after=1612042393&size=100
https://api.pushshift.io/reddit/search/comm

In [None]:
alter_bipolarsub = getOtherSubmissionData(bipolar_df)

https://api.pushshift.io/reddit/search/submission/?author=DaemonWorshiper&after=1612069164&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=aloneagainnaturally1&after=1606921104&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=cerebral_thunders&after=1612062258&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=mo282&after=1612053523&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=jibberjabbery&after=1604358114&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=CurveGalaxy&after=1610675980&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=picodegalloyum7&after=1604286638&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=midheaven111&after=1609342272&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=twotimesbi&after=1612042553&size=1

In [None]:
anonymize_submissions(alter_bipolarcom)
anonymize_submissions(alter_bipolarsub)

In [None]:
bipolar_subs_df = createDataFrameSubs(alter_bipolarsub)
bipolar_coms_df = createDataFrameComments(alter_bipolarcom)

bipolar_subs_df.head(10)

Unnamed: 0,sub_id,author,title,url,score,created,created_utc,numComms,permalink,flair,selftex,subreddit
0,l9sdjl,844,I physically cringed when I saw this in my Fac...,https://i.redd.it/1k7jipwa0se61.jpg,1928,2021-02-01 02:20:22,1612146022,239,/r/ForwardsFromKlandma/comments/l9sdjl/i_physi...,,,ForwardsFromKlandma
1,lq2kbr,844,Is water filtered by a Zero Water brand filter...,https://www.reddit.com/r/carnivorousplants/com...,1,2021-02-22 22:58:58,1614034738,2,/r/carnivorousplants/comments/lq2kbr/is_water_...,,I specifically got a Drosera Capensis because ...,carnivorousplants
2,lwg49b,844,Should I be at all worried about these brown s...,https://i.redd.it/rgclnsq6epk61.jpg,1,2021-03-02 23:57:42,1614729462,4,/r/carnivorousplants/comments/lwg49b/should_i_...,,,carnivorousplants
3,mvoy4o,844,I shared the article in the waronrugs and pooc...,https://i.redd.it/mlty4gsw8lu61.jpg,8,2021-04-21 20:42:16,1619037736,7,/r/SafeMoon/comments/mvoy4o/i_shared_the_artic...,wen rug!?,,SafeMoon
4,mw5luq,844,r/CryptoCurrency is on the attack now.,https://www.reddit.com/r/SafeMoon/comments/mw5...,1,2021-04-22 14:10:42,1619100642,41,/r/SafeMoon/comments/mw5luq/rcryptocurrency_is...,Information/News,There’s some very popular posts in the r/Crypt...,SafeMoon
5,mz106g,844,"Just a reminder to those checking us out from,...",https://www.reddit.com/r/SafeMoon/comments/mz1...,1,2021-04-26 16:00:36,1619452836,2,/r/SafeMoon/comments/mz106g/just_a_reminder_to...,General,[removed],SafeMoon
6,mz1vdq,844,"Just a reminder to those checking us out from,...",https://www.reddit.com/r/SafeMoon/comments/mz1...,1,2021-04-26 16:38:49,1619455129,4,/r/SafeMoon/comments/mz1vdq/just_a_reminder_to...,General,They have their own crypto currency on that su...,SafeMoon
7,n4ugde,844,Just a reminder for everyone. The community is...,https://www.reddit.com/r/SafeMoon/comments/n4u...,1,2021-05-04 17:42:53,1620150173,3,/r/SafeMoon/comments/n4ugde/just_a_reminder_fo...,Community Unity,I’ve noticed that amidst the community hype th...,SafeMoon
8,nrdbc9,844,Anyone know why I received a random ONE in tru...,https://www.reddit.com/r/trustapp/comments/nrd...,1,2021-06-03 13:18:29,1622726309,3,/r/trustapp/comments/nrdbc9/anyone_know_why_i_...,General question,[removed],trustapp
9,k7994d,4998,Losing friends and relationships during depres...,https://www.reddit.com/r/BipolarReddit/comment...,1,2020-12-05 15:21:39,1607181699,5,/r/BipolarReddit/comments/k7994d/losing_friend...,,Anyone else find it difficult to maintain frie...,BipolarReddit


In [None]:
bipolar_coms_df.to_csv('bipolar_coms.csv')

In [None]:
bipolar_subs_df.to_csv('bipolar_subs.csv')

In [None]:
alter_bpdcom = getOtherCommentData(bpd_df)

https://api.pushshift.io/reddit/search/comment/?author=-artistoverhere-&after=1612069153&size=100
https://api.pushshift.io/reddit/search/comment/?author=AQuietBorderline&after=1612068753&size=100
https://api.pushshift.io/reddit/search/comment/?author=retrogradecapricorn&after=1611308290&size=100
https://api.pushshift.io/reddit/search/comment/?author=treestandingbrow&after=1612065962&size=100
https://api.pushshift.io/reddit/search/comment/?author=tatoo1ne&after=1612065290&size=100
https://api.pushshift.io/reddit/search/comment/?author=StephanieSpoiler&after=1612065273&size=100
https://api.pushshift.io/reddit/search/comment/?author=Ca3Al2Si3O12&after=1612064981&size=100
https://api.pushshift.io/reddit/search/comment/?author=throwaway3098439&after=1612062749&size=100
https://api.pushshift.io/reddit/search/comment/?author=youneedmeat&after=1611853054&size=100
https://api.pushshift.io/reddit/search/comment/?author=Buckmiester2017&after=1612062154&size=100
https://api.pushshift.io/reddit/sea

In [None]:
alter_bpdsub = getOtherSubmissionData(bpd_df)

https://api.pushshift.io/reddit/search/submission/?author=-artistoverhere-&after=1612069153&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=AQuietBorderline&after=1612068753&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=retrogradecapricorn&after=1611308290&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=treestandingbrow&after=1612065962&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=tatoo1ne&after=1612065290&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=StephanieSpoiler&after=1612065273&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=Ca3Al2Si3O12&after=1612064981&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=throwaway3098439&after=1612062749&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=youneedmeat&after=16118

In [None]:
anonymize_submissions(alter_bpdcom)
anonymize_submissions(alter_bpdsub)

In [None]:
bpd_subs_df = createDataFrameSubs(alter_bpdsub)
bpd_coms_df = createDataFrameComments(alter_bpdcom)
bpd_subs_df.head(10)

Unnamed: 0,sub_id,author,title,url,score,created,created_utc,numComms,permalink,flair,selftex,subreddit
0,lc0zcg,2134,My Theory on BPD Behavior and Empathy - (Some ...,https://www.reddit.com/r/BPD/comments/lc0zcg/m...,458,2021-02-03 23:03:35,1612393415,71,/r/BPD/comments/lc0zcg/my_theory_on_bpd_behavi...,Lesson Learned,It is difficult to focus on another persons' f...,BPD
1,lfi3mf,2134,It must be possible to change - (As far as rel...,https://www.reddit.com/r/BPD/comments/lfi3mf/i...,1,2021-02-08 18:22:52,1612808572,0,/r/BPD/comments/lfi3mf/it_must_be_possible_to_...,Relationships,Psychologists are beginning to believe that it...,BPD
2,ljuzpq,2134,What's the best way handle an intense mood shi...,https://www.reddit.com/r/BPD/comments/ljuzpq/w...,1,2021-02-14 19:11:22,1613329882,3,/r/BPD/comments/ljuzpq/whats_the_best_way_hand...,Quiet Borderline,Have you ever been talking to someone when som...,BPD
3,lmni1n,2134,Personality Disorder vs. Learned Behavior,https://www.reddit.com/r/psychologyresearch/co...,1,2021-02-18 14:05:36,1613657136,22,/r/psychologyresearch/comments/lmni1n/personal...,,Imagine a scenario where a child grows up arou...,psychologyresearch
4,lmurha,2134,If BPD symptoms only come out following trauma...,https://www.reddit.com/r/BPD/comments/lmurha/i...,1,2021-02-18 19:20:15,1613676015,0,/r/BPD/comments/lmurha/if_bpd_symptoms_only_co...,Perspective Needed,"Is it really due to personality, or simply a r...",BPD
5,lnjowt,2134,It feels hopeless and like I would be doing th...,https://www.reddit.com/r/SuicideWatch/comments...,1,2021-02-19 16:18:02,1613751482,0,/r/SuicideWatch/comments/lnjowt/it_feels_hopel...,,I'm feeling very suicidal because I struggle s...,SuicideWatch
6,lpy1jg,2134,Did medication actually help you?,https://www.reddit.com/r/depression/comments/l...,1,2021-02-22 19:53:04,1614023584,11,/r/depression/comments/lpy1jg/did_medication_a...,,I've been depressed for over 15 years and some...,depression
7,m0uk1b,2134,"With enough dedication and work, do you believ...",https://www.reddit.com/r/BPD/comments/m0uk1b/w...,1,2021-03-09 00:38:10,1615250290,2,/r/BPD/comments/m0uk1b/with_enough_dedication_...,Relationships,This is the area I keep thinking there has to ...,BPD
8,m0upvh,2134,Another reason I sometimes think I was misdiag...,https://www.reddit.com/r/BPD/comments/m0upvh/a...,1,2021-03-09 00:47:02,1615250822,7,/r/BPD/comments/m0upvh/another_reason_i_someti...,Venting,One of the key features of BPD is unstable rel...,BPD
9,m1biah,2134,Can BPD be misdiagnosed due to the right combi...,https://www.reddit.com/r/BPD/comments/m1biah/c...,1,2021-03-09 17:30:04,1615311004,0,/r/BPD/comments/m1biah/can_bpd_be_misdiagnosed...,Perspective Needed,"For example, I've also been diagnosed with PTS...",BPD


In [None]:
bpd_subs_df.to_csv('bpd_subs.csv')

In [None]:
bpd_coms_df.to_csv('bpd_coms.csv')

In [None]:
alter_adhdcom = getOtherCommentData(adhd_df)

https://api.pushshift.io/reddit/search/comment/?author=aboze04&after=1612026856&size=100
https://api.pushshift.io/reddit/search/comment/?author=Vibrantical&after=1611949543&size=100
https://api.pushshift.io/reddit/search/comment/?author=Mk_it_so&after=1611253679&size=100
https://api.pushshift.io/reddit/search/comment/?author=paul-pryor&after=1611247677&size=100
https://api.pushshift.io/reddit/search/comment/?author=straymender&after=1609613007&size=100
https://api.pushshift.io/reddit/search/comment/?author=kanjobanjo17&after=1608226638&size=100
https://api.pushshift.io/reddit/search/comment/?author=SylviaTheKitty&after=1606931085&size=100
https://api.pushshift.io/reddit/search/comment/?author=kaprielmoon&after=1606402994&size=100
https://api.pushshift.io/reddit/search/comment/?author=Jayu2&after=1606220263&size=100
https://api.pushshift.io/reddit/search/comment/?author=universemessages&after=1603959680&size=100
https://api.pushshift.io/reddit/search/comment/?author=sexi_squidward&after

In [None]:
alter_adhdsub = getOtherSubmissionData(adhd_df)

https://api.pushshift.io/reddit/search/submission/?author=aboze04&after=1612026856&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=Vibrantical&after=1611949543&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=Mk_it_so&after=1611253679&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=paul-pryor&after=1611247677&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=straymender&after=1609613007&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=kanjobanjo17&after=1608226638&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=SylviaTheKitty&after=1606931085&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=kaprielmoon&after=1606402994&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=Jayu2&after=1606220263&size=100
passed request
https://api

In [None]:
anonymize_submissions(alter_adhdcom)
anonymize_submissions(alter_adhdsub)

In [None]:
adhd_subs_df = createDataFrameSubs(alter_adhdsub)
adhd_coms_df = createDataFrameComments(alter_adhdcom)
adhd_subs_df.head(10)

Unnamed: 0,sub_id,author,title,url,score,created,created_utc,numComms,permalink,flair,selftex,subreddit
0,l8pbdd,7528,I've been feeding the wildbirds since I've bee...,https://i.redd.it/ijtndiqbyhe61.jpg,2,2021-01-30 16:31:31,1612024291,4,/r/IDAP/comments/l8pbdd/ive_been_feeding_the_w...,,,IDAP
1,l8r6bs,7528,Now I'm wondering about everything I am and do...,https://www.reddit.com/r/ADHD/comments/l8r6bs/...,1,2021-01-30 17:50:28,1612029028,2,/r/ADHD/comments/l8r6bs/now_im_wondering_about...,Questions/Advice/Support,It's been three days since my ADHD-C diagnosis...,ADHD
2,l750h4,4705,Tips for taking short hair dogs snowshoeing???...,https://i.redd.it/264te4a0d4e61.jpg,3,2021-01-28 18:48:50,1611859730,20,/r/snowshoeing/comments/l750h4/tips_for_taking...,,,snowshoeing
3,ln7mee,4705,Does anyone have suggestions for a freediving ...,https://www.reddit.com/r/freediving/comments/l...,1,2021-02-19 05:16:48,1613711808,2,/r/freediving/comments/ln7mee/does_anyone_have...,,For the next six months I’m going hard into tr...,freediving
4,m2jg04,4705,Recommendations for a GPS tracker for my backc...,https://www.reddit.com/r/dogs/comments/m2jg04/...,1,2021-03-11 06:13:49,1615443229,2,/r/dogs/comments/m2jg04/recommendations_for_a_...,,[removed],dogs
5,m2jmtl,4705,[Discussion] I’m looking for recommendations o...,https://www.reddit.com/r/dogs/comments/m2jmtl/...,1,2021-03-11 06:25:46,1615443946,13,/r/dogs/comments/m2jmtl/discussion_im_looking_...,Misc,Does anyone have any experience with dog GPS’s...,dogs
6,ndgrl5,4705,I’d love to know more about the lichen in the ...,https://i.redd.it/wqs6wiw1yez61.jpg,1,2021-05-16 04:51:30,1621140690,2,/r/Lichen/comments/ndgrl5/id_love_to_know_more...,,,Lichen
7,kpu68m,1334,"The popular ""wasting your life post"" expanded ...",https://www.reddit.com/r/ADHD/comments/kpu68m/...,1,2021-01-03 22:03:24,1609711404,2,/r/ADHD/comments/kpu68m/the_popular_wasting_yo...,,"I'm new to this sub as of last month, and i've...",ADHD
8,kqj8ah,1334,"Coping with grief, guilt, and shame and moving...",https://www.reddit.com/r/BreakUps/comments/kqj...,1,2021-01-04 22:06:01,1609797961,5,/r/BreakUps/comments/kqj8ah/coping_with_grief_...,,"After getting broke up with, obviously you fee...",BreakUps
9,kr7she,1334,Grieving a pet that no longer belongs to you,https://www.reddit.com/r/Petloss/comments/kr7s...,1,2021-01-05 21:05:52,1609880752,5,/r/Petloss/comments/kr7she/grieving_a_pet_that...,,This is different from many of the people on h...,Petloss


In [None]:
adhd_subs_df.to_csv('adhd_subs.csv')

In [None]:
adhd_coms_df.to_csv('adhd_coms.csv')

In [None]:
dep_df.head()

Unnamed: 0,author,sub_id,title,url,score,created,created_utc,numComments,permalink,flair,selfext,subreddit
0,2277,k3yk8o,"After breakup, I feel completely numb, is this...",https://www.reddit.com/r/depression/comments/k...,1,2020-11-30 16:00:33,1606752033,0,/r/depression/comments/k3yk8o/after_breakup_i_...,,[removed],depression
1,529,j950tb,Friday I was diagnosed with depression and tod...,https://www.reddit.com/r/depression/comments/j...,1,2020-10-11 13:01:39,1602421299,0,/r/depression/comments/j950tb/friday_i_was_dia...,,Hello everyone? I'm new here! \n\nThe person w...,depression
2,5046,ija3bc,i was diagnosed with depression a couple years...,https://www.reddit.com/r/depression/comments/i...,1,2020-08-30 09:29:48,1598779788,2,/r/depression/comments/ija3bc/i_was_diagnosed_...,,most of the time when i look up what depressio...,depression
3,6557,ij8i0i,I was diagnosed with depression about a year a...,https://www.reddit.com/r/depression/comments/i...,1,2020-08-30 06:54:23,1598770463,3,/r/depression/comments/ij8i0i/i_was_diagnosed_...,,I had major depression about a year ago with m...,depression
4,4683,hyh2ac,Should I tell my brother and sisters that I wa...,https://www.reddit.com/r/depression/comments/h...,1,2020-07-26 23:21:35,1595805695,1,/r/depression/comments/hyh2ac/should_i_tell_my...,,So I (30m) was diagnosed with depression and a...,depression


In [None]:
alter_depcom = getOtherCommentData(dep_df)

https://api.pushshift.io/reddit/search/comment/?author=Funny_Eyebrow&after=1606752033&size=100
https://api.pushshift.io/reddit/search/comment/?author=leafy_and_lethal&after=1602421299&size=100
https://api.pushshift.io/reddit/search/comment/?author=celtree03&after=1598779788&size=100
https://api.pushshift.io/reddit/search/comment/?author=koobtooboob&after=1598770463&size=100
https://api.pushshift.io/reddit/search/comment/?author=sometimesitbelikedat&after=1595805695&size=100
https://api.pushshift.io/reddit/search/comment/?author=ugly_worthless_trash&after=1595792408&size=100
https://api.pushshift.io/reddit/search/comment/?author=DeadlyDan123&after=1594688115&size=100
https://api.pushshift.io/reddit/search/comment/?author=Ruby_Lowe&after=1589411320&size=100
https://api.pushshift.io/reddit/search/comment/?author=Chrisrawraw&after=1586678766&size=100
https://api.pushshift.io/reddit/search/comment/?author=la_d0xa&after=1585072717&size=100
https://api.pushshift.io/reddit/search/comment/?auth

In [None]:
alter_depsub = getOtherSubmissionData(dep_df)

https://api.pushshift.io/reddit/search/submission/?author=Funny_Eyebrow&after=1606752033&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=leafy_and_lethal&after=1602421299&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=celtree03&after=1598779788&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=koobtooboob&after=1598770463&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=sometimesitbelikedat&after=1595805695&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=ugly_worthless_trash&after=1595792408&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=DeadlyDan123&after=1594688115&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=Ruby_Lowe&after=1589411320&size=100
passed request
https://api.pushshift.io/reddit/search/submission/?author=Chrisrawraw&after=1586678766&siz

In [None]:
anonymize_submissions(alter_depcom)


In [None]:
anonymize_submissions(alter_depsub)

In [None]:
dep_subs_df = createDataFrameSubs(alter_depsub)

In [None]:
dep_subs_df.head()

Unnamed: 0,sub_id,author,title,url,score,created,created_utc,numComms,permalink,flair,selftex,subreddit
0,k40gmp,778,"After breakup, I feel completely numb, is this...",https://www.reddit.com/r/relationship_advice/c...,1,2020-11-30 17:29:42,1606757382,2,/r/relationship_advice/comments/k40gmp/after_b...,,[removed],relationship_advice
1,k416rs,778,I’ve been feeling numb since my breakup that h...,https://www.reddit.com/r/relationship_advice/c...,1,2020-11-30 18:00:53,1606759253,2,/r/relationship_advice/comments/k416rs/ive_bee...,,[removed],relationship_advice
2,k41j0f,778,"Why is this happening? After breakup, I feel c...",https://www.reddit.com/r/relationship_advice/c...,1,2020-11-30 18:15:51,1606760151,3,/r/relationship_advice/comments/k41j0f/why_is_...,,"\nFeeling numb after breakup, is this a normal...",relationship_advice
3,k4b7mu,778,"Why is this happening? After breakup, I feel c...",https://www.reddit.com/r/relationship_advice/c...,1,2020-12-01 02:10:18,1606788618,1,/r/relationship_advice/comments/k4b7mu/why_is_...,,"\n\nFeeling numb after breakup, is this a norm...",relationship_advice
4,k4rqs5,778,I don’t know why this is happening. It’s been ...,https://www.reddit.com/r/relationship_advice/c...,1,2020-12-01 19:13:17,1606849997,9,/r/relationship_advice/comments/k4rqs5/i_dont_...,,"I’m confused about my feelings. So, I have a h...",relationship_advice


In [None]:

dep_coms_df = createDataFrameComments(alter_depcom)

In [None]:
dep_coms_df.head()

Unnamed: 0,com_id,author,score,created,created_utc,url,body,subreddit
0,ge69z36,Funny_Eyebrow,1,2020-11-30 18:48:37,1606762117,/r/relationship_advice/comments/k41j0f/why_is_...,"Yeah, I think so too. I think I’ll be okay for...",relationship_advice
1,geaf0o2,Funny_Eyebrow,1,2020-12-01 19:26:35,1606850795,/r/relationship_advice/comments/k4rqs5/i_dont_...,"Damn, Now that I pictured I kinda feel a littl...",relationship_advice
2,geafutc,Funny_Eyebrow,1,2020-12-01 19:32:58,1606851178,/r/relationship_advice/comments/k4rqs5/i_dont_...,Because that means she probably moved on and I...,relationship_advice
3,geagp0q,Funny_Eyebrow,1,2020-12-01 19:39:22,1606851562,/r/relationship_advice/comments/k4rqs5/i_dont_...,"Nope, it wasn’t mutual. She actually broke up ...",relationship_advice
4,geaj1xa,Funny_Eyebrow,1,2020-12-01 19:57:20,1606852640,/r/relationship_advice/comments/k4rqs5/i_dont_...,"Yeah, Thank you for the advice. Truly apprecia...",relationship_advice


In [None]:
dep_subs_df.head()

Unnamed: 0,sub_id,author,title,url,score,created,created_utc,numComms,permalink,flair,selftex,subreddit
0,k40gmp,778,"After breakup, I feel completely numb, is this...",https://www.reddit.com/r/relationship_advice/c...,1,2020-11-30 17:29:42,1606757382,2,/r/relationship_advice/comments/k40gmp/after_b...,,[removed],relationship_advice
1,k416rs,778,I’ve been feeling numb since my breakup that h...,https://www.reddit.com/r/relationship_advice/c...,1,2020-11-30 18:00:53,1606759253,2,/r/relationship_advice/comments/k416rs/ive_bee...,,[removed],relationship_advice
2,k41j0f,778,"Why is this happening? After breakup, I feel c...",https://www.reddit.com/r/relationship_advice/c...,1,2020-11-30 18:15:51,1606760151,3,/r/relationship_advice/comments/k41j0f/why_is_...,,"\nFeeling numb after breakup, is this a normal...",relationship_advice
3,k4b7mu,778,"Why is this happening? After breakup, I feel c...",https://www.reddit.com/r/relationship_advice/c...,1,2020-12-01 02:10:18,1606788618,1,/r/relationship_advice/comments/k4b7mu/why_is_...,,"\n\nFeeling numb after breakup, is this a norm...",relationship_advice
4,k4rqs5,778,I don’t know why this is happening. It’s been ...,https://www.reddit.com/r/relationship_advice/c...,1,2020-12-01 19:13:17,1606849997,9,/r/relationship_advice/comments/k4rqs5/i_dont_...,,"I’m confused about my feelings. So, I have a h...",relationship_advice


In [None]:
ls

dep_subs.csv  [0m[01;34mdrive[0m/  [01;34msample_data[0m/


In [None]:
cd MyDrive/MentalHealthReddit/Data/

/content/drive/MyDrive/MentalHealthReddit/Data


In [None]:
dep_subs_df.to_csv('dep_subs.csv')

In [None]:
dep_coms_df.to_csv('dep_coms.csv')

## Get Control Group

I identified subreddits that the users above posted to (analysis part in mentalhealthredditanalysis.ipynb), and gather users from those subreddits to be the "control" group. I create new helper functions focused on gather data from those subreddits, and then gathering more data from identified users. 

In [None]:
control_subreddits = ['AskReddit',
 'AmItheAsshole',
 'teenagers',
 'relationship_advice',
 'relationships',
 'aww',
 'offmychest',
 'NoStupidQuestions',
 'unpopularopinion',
 'pics',
 'PurplePillDebate',
 'todayilearned',
 'Advice',
 'AskTeenGirls',
 'bois',
 'AskWomen',
 'hiphopheads',
 'trees',
 'PokemonTCG',
 'stopdrinking',
 'funny',
 'gonewild',
 'worldnews',
 'MaddenUltimateTeam',
 'loseit',
 'WTF',
 'childfree',
 'Depop',
 'trumpet',
 'lawofattraction',
 'AskWomenOver30',
 'Hamilton',
 'askanincel',
 'homestuck',
 'asktrp',
 'Aquariums',
 'PoliticalCompassMemes',
 'sex',
 'MakeNewFriendsHere',
 'Eberron',
 'AskDocs',
 'asktransgender',
 'Showerthoughts',
 'randomactsofamazon',
 'MakeupAddiction',
 'actuallesbians',
 'RealEstate',
 'CBD',
 'politics',
 'ImaginaryHorrors',
 'survivor',
 'FreeCompliments',
 'thesims',
 'WWEGames',
 'memes',
 'ExNoContact',
 'UnsentLetters',
 'dogecoin',
 'weed',
 'AskMen',
 'confession',
 'nba',
 'GERD',
 'intj',
 'gifs',
 'JUSTNOMIL',
 'leagueoflegends',
 'Hashimotos',
 'TumblrInAction',
 'forhonor',
 'TeenMomOGandTeenMom2',
 'mildlyinteresting',
 'TwoXChromosomes',
 'personalfinance',
 'legaladvice',
 'TooAfraidToAsk',
 'Parenting',
 'playboicarti',
 'Coronavirus',
 'horror',
 'gaming',
 'Needafriend',
 'medical',
 'CasualConversation',
 'de',
 'india',
 'dogs',
 'Drugs',
 'confessions',
 'cats',
 'RoastMe',
 'Narcolepsy',
 'nyc',
 'videos',
 'tattoos',
 'bisexual',
 'news',
 'wholesomememes',
 'trashy',
 'AMA',
 'short',
 'Wishlist',
 'lgbt',
 'sewing',
 'ChoosingBeggars',
 'dating',
 'entwives',
 'prettyaltgirls',
 'LawofAttractionAdvice',
 'acne',
 'traaaaaaannnnnnnnnns',
 'AnimalCrossing',
 'mildlyinfuriating',
 'MadeOfStyrofoam',
 'PewdiepieSubmissions',
 'narcissisticparents',
 'Epilepsy',
 'budgies',
 'TrueOffMyChest',
 'RedDeadOnline',
 'RedditSessions',
 'tifu',
 'AskOuija',
 'PublicFreakout',
 'interestingasfuck',
 'Cringetopia',
 'BreakUps',
 'college',
 'bettafish',
 'starterpacks',
 'SubredditDrama',
 'ftm',
 'curlyhair',
 'AdviceAnimals',
 'rant',
 'piercing',
 'deadbydaylight',
 'SafeMoon',
 'socialskills',
 'Tinder',
 'oddlysatisfying',
 'insanepeoplefacebook',
 'INTP',
 'DunderMifflin',
 'MechanicalKeyboards',
 'distantsocializing',
 'Vent',
 'polyamory',
 'jobs',
 'careerguidance',
 'pregnant',
 'MorbidReality',
 'conspiracy',
 'Psychic',
 'formula1',
 'tipofmytongue',
 'SkincareAddiction',
 'Osana',
 'NoFap',
 'nextfuckinglevel',
 'explainlikeimfive',
 'niceguys',
 'changemyview',
 'CasualUK',
 'askwomenadvice',
 'movies',
 'askgaybros',
 'MtF',
 '2meirl4meirl',
 'MadeMeSmile',
 'OldSchoolCool',
 'medical_advice',
 'LifeProTips',
 'service_dogs',
 'CrohnsDisease',
 'AskAnAmerican',
 'ChronicPain',
 'NintendoSwitch',
 'apexlegends',
 'DoesAnybodyElse',
 'cursedcomments',
 'books',
 'gardening',
 'dating_advice',
 'amiugly',
 'iamatotalpieceofshit',
 'food',
 'Wellthatsucks',
 'writing',
 'Conservative',
 'witchcraft',
 'fasting',
 'progresspics',
 'me_irl',
 '4595',
 'dankmemes',
 'birthcontrol',
 'commandandconquer',
 'Minecraft',
 'wallstreetbets',
 'TheMandalorianTV',
 'morbidquestions',
 'asexuality',
 'harrypotter',
 'IdiotsInCars',
 'FortNiteBR',
 'blackdesertonline',
 'pornfree',
 'Christianity',
 'TheYouShow',
 'Meditation',
 'leaves',
 'BeautyGuruChatter',
 'Random_Acts_Of_Amazon',
 'Overwatch',
 'FortniteCompetitive',
 'Psychonaut',
 'ask',
 'GlobalOffensive',
 'pcmasterrace',
 'WhitePeopleTwitter',
 'Rabbits',
 'houseplants',
 'Supplements',
 'whatisthisthing',
 'egg_irl',
 'StarWars',
 'Sims4',
 'selfimprovement',
 'techsupport',
 'selfie',
 'Teachers',
 '2007scape',
 'lonely',
 'pan_media',
 'shrooms',
 'creepyPMs',
 'vegan',
 'Fitness',
 'DecidingToBeBetter',
 'apple',
 'self',
 'spirituality',
 'TikTokCringe',
 'nfl',
 'facepalm',
 'Marriage',
 'languagelearning',
 'titanfolk',
 'Chiraqology',
 'keto',
 'Music',
 'AnimalsOnReddit',
 'suggestmeabook',
 'PsilocybinMushrooms',
 'microdosing',
 'Periods',
 'infp',
 'antiwork',
 'Empaths',
 'kroger',
 'PCOS',
 'AskUK',
 'wow',
 'mbti',
 'AskAstrologers',
 'anime',
 'Sneakers',
 'LSD',
 'stocks',
 'CallOfDutyMobile',
 'h3h3productions',
 'migraine',
 'masskillers',
 'exmuslim',
 'DnD',
 'peaceCorpsCoding',
 'blackcats',
 'GMT400',
 'r4r',
 'EDAnonymous',
 'buildapc',
 '196',
 'AreTheStraightsOK',
 'Dentistry',
 'ibs',
 'Screenwriting',
 'COVID19positive',
 'Crystals',
 'FemaleDatingStrategy',
 'delta8',
 'fantasyfootball',
 'hockey',
 'chat',
 'recruitinghell',
 'UFOs',
 'Hellenism',
 'Codependency',
 'RocketLeagueExchange',
 'breakingmom',
 'CryptoCurrency',
 'stupidpol',
 'PokemonGoRaids',
 'polls',
 'introvert',
 'ClashOfClans',
 'TaylorSwift',
 'indonesia',
 'CysticFibrosis',
 'sportsbook',
 'summonerswar',
 'Amoledbackgrounds',
 'SamONellaAcademy',
 'thebachelor',
 'TallTeenagers',
 'yungblud',
 'soccer',
 'SquaredCircle',
 'Mcat',
 'benzorecovery',
 'NoFeeAC',
 'Sat',
 'cyberpunkgame',
 'CoronavirusUK',
 'FashionReps',
 'Repsneakers',
 'DMT',
 'HeadphoneAdvice',
 'AmongUs',
 'makinghiphop',
 'FemaleLevelUpStrategy',
 'redscarepod',
 'SimDemocracy',
 'DrMartens',
 'Petloss',
 'KGATLW',
 'GoForGold',
 'TransTryouts',
 'Snus',
 'amcstock',
 'onewordeach',
 'MoneyDiariesACTIVE']

In [None]:
len(control_subreddits)

342

In [None]:
def getUserPushData(subreddit, before):
  """
  Returns comment data from Pushshift from the specfied subreddit before the specified time; aggregates author
  """
   url = 'https://api.pushshift.io/reddit/search/comment/?before='+str(before)+'&subreddit='+str(subreddit)+'&aggs=author&agg_size=100'
   print(url)
   r = requests.get(url)
   data = json.loads(r.text)
   return data['data']


In [None]:
def getUserPushData2(subreddit, before):
  """
  Returns comment data from Pushshift from the specfied subreddit before the specified time
  """
   url = 'https://api.pushshift.io/reddit/search/comment/?before='+str(before)+'&subreddit='+str(subreddit)+'&size=100'
   print(url)
   r = requests.get(url)
   data = json.loads(r.text)
   return data['data']

In [None]:
def getAllUserPushData(subreddit, before, size):
"""
Runs repeated searches of Pushshift until the desired length of data is reached
"""

  returndata = getUserPushData2(subreddit, before)
  length = len(returndata)
  newdata = [1]
  while (length < size) and (len(newdata) !=0): 
    newdata = getUserPushData2(subreddit, returndata[-1]['created_utc'])
    returndata = returndata + newdata
    length = len(returndata)
    print(len(returndata))
    time.sleep(1)
  return returndata

In [None]:
def collectNewUserData(submission, stats):
  """
  Takes in a submission, and the dictionary you wanat to append the information too
  """
  try:    
    author = submission['author']
  except KeyError:
    return
  #created = datetime.datetime.fromtimestamp(submission['created_utc']) #1520561700.0
  try:
    created_utc = submission['created_utc']
  except KeyError:
    return
  #subData = [created_utc, created]
  stats[author] = created_utc


### Get control data

I go through the identified subreddits, and collect usernames for users who post there. 

In [None]:
user_dict = {}
for subreddit in control_subreddits[:50]:
  userdata = getAllUserPushData(subreddit, 1612069200, 1500)
  for submission in userdata:
    collectNewUserData(submission, user_dict)

https://api.pushshift.io/reddit/search/comment/?before=1612069200&subreddit=AskReddit&size=100
https://api.pushshift.io/reddit/search/comment/?before=1612069161&subreddit=AskReddit&size=100
200
https://api.pushshift.io/reddit/search/comment/?before=1612069113&subreddit=AskReddit&size=100
300
https://api.pushshift.io/reddit/search/comment/?before=1612069078&subreddit=AskReddit&size=100
400
https://api.pushshift.io/reddit/search/comment/?before=1612069033&subreddit=AskReddit&size=100
500
https://api.pushshift.io/reddit/search/comment/?before=1612068990&subreddit=AskReddit&size=100
600
https://api.pushshift.io/reddit/search/comment/?before=1612068947&subreddit=AskReddit&size=100
700
https://api.pushshift.io/reddit/search/comment/?before=1612068896&subreddit=AskReddit&size=100
800
https://api.pushshift.io/reddit/search/comment/?before=1612068846&subreddit=AskReddit&size=100
900
https://api.pushshift.io/reddit/search/comment/?before=1612068799&subreddit=AskReddit&size=100
1000
https://api.p

In [None]:

len(user_dict)

33069

In [None]:
control_subreddits.remove('SafeMoon') #was only recently created 

In [None]:
for subreddit in control_subreddits[50:100]:
  userdata = getAllUserPushData(subreddit, 1612069200, 1500)
  for submission in userdata:
    collectNewUserData(submission, user_dict)

https://api.pushshift.io/reddit/search/comment/?before=1612069200&subreddit=survivor&size=100
https://api.pushshift.io/reddit/search/comment/?before=1612063573&subreddit=survivor&size=100
200
https://api.pushshift.io/reddit/search/comment/?before=1612059638&subreddit=survivor&size=100
300
https://api.pushshift.io/reddit/search/comment/?before=1612055826&subreddit=survivor&size=100
400
https://api.pushshift.io/reddit/search/comment/?before=1612051655&subreddit=survivor&size=100
500
https://api.pushshift.io/reddit/search/comment/?before=1612048278&subreddit=survivor&size=100
600
https://api.pushshift.io/reddit/search/comment/?before=1612043778&subreddit=survivor&size=100
700
https://api.pushshift.io/reddit/search/comment/?before=1612041017&subreddit=survivor&size=100
800
https://api.pushshift.io/reddit/search/comment/?before=1612036299&subreddit=survivor&size=100
900
https://api.pushshift.io/reddit/search/comment/?before=1612033233&subreddit=survivor&size=100
1000
https://api.pushshift.i

In [None]:
len(user_dict)

66926

In [None]:
for subreddit in control_subreddits[100:150]:
  userdata = getAllUserPushData(subreddit, 1612069200, 1500)
  for submission in userdata:
    collectNewUserData(submission, user_dict)

https://api.pushshift.io/reddit/search/comment/?before=1612069200&subreddit=short&size=100
https://api.pushshift.io/reddit/search/comment/?before=1612037497&subreddit=short&size=100
200
https://api.pushshift.io/reddit/search/comment/?before=1611982534&subreddit=short&size=100
300
https://api.pushshift.io/reddit/search/comment/?before=1611936278&subreddit=short&size=100
400
https://api.pushshift.io/reddit/search/comment/?before=1611899847&subreddit=short&size=100
500
https://api.pushshift.io/reddit/search/comment/?before=1611867569&subreddit=short&size=100
600
https://api.pushshift.io/reddit/search/comment/?before=1611846852&subreddit=short&size=100
700
https://api.pushshift.io/reddit/search/comment/?before=1611792944&subreddit=short&size=100
800
https://api.pushshift.io/reddit/search/comment/?before=1611470227&subreddit=short&size=100
900
https://api.pushshift.io/reddit/search/comment/?before=1611446067&subreddit=short&size=100
1000
https://api.pushshift.io/reddit/search/comment/?befor

IndexError: ignored

In [None]:
len(user_dict)

90057

In [None]:
for subreddit in control_subreddits[137:200]:
  userdata = getAllUserPushData(subreddit, 1612069200, 800)
  for submission in userdata:
    collectNewUserData(submission, user_dict)

https://api.pushshift.io/reddit/search/comment/?before=1612069200&subreddit=socialskills&size=100
https://api.pushshift.io/reddit/search/comment/?before=1612059152&subreddit=socialskills&size=100
200
https://api.pushshift.io/reddit/search/comment/?before=1612049795&subreddit=socialskills&size=100
300
https://api.pushshift.io/reddit/search/comment/?before=1612040529&subreddit=socialskills&size=100
400
https://api.pushshift.io/reddit/search/comment/?before=1612031883&subreddit=socialskills&size=100
500
https://api.pushshift.io/reddit/search/comment/?before=1612024496&subreddit=socialskills&size=100
600
https://api.pushshift.io/reddit/search/comment/?before=1612015577&subreddit=socialskills&size=100
700
https://api.pushshift.io/reddit/search/comment/?before=1612003259&subreddit=socialskills&size=100
800
https://api.pushshift.io/reddit/search/comment/?before=1612069200&subreddit=Tinder&size=100
https://api.pushshift.io/reddit/search/comment/?before=1612067249&subreddit=Tinder&size=100
200


In [None]:
len(user_dict)

112595

In [None]:
for subreddit in control_subreddits[200:300]:
  userdata = getAllUserPushData(subreddit, 1612069200, 600)
  for submission in userdata:
    collectNewUserData(submission, user_dict)

https://api.pushshift.io/reddit/search/comment/?before=1612069200&subreddit=morbidquestions&size=100
https://api.pushshift.io/reddit/search/comment/?before=1612049354&subreddit=morbidquestions&size=100
200
https://api.pushshift.io/reddit/search/comment/?before=1612034310&subreddit=morbidquestions&size=100
300
https://api.pushshift.io/reddit/search/comment/?before=1612013197&subreddit=morbidquestions&size=100
400
https://api.pushshift.io/reddit/search/comment/?before=1611981731&subreddit=morbidquestions&size=100
500
https://api.pushshift.io/reddit/search/comment/?before=1611956162&subreddit=morbidquestions&size=100
600
https://api.pushshift.io/reddit/search/comment/?before=1612069200&subreddit=asexuality&size=100
https://api.pushshift.io/reddit/search/comment/?before=1612060531&subreddit=asexuality&size=100
200
https://api.pushshift.io/reddit/search/comment/?before=1612051131&subreddit=asexuality&size=100
300
https://api.pushshift.io/reddit/search/comment/?before=1612043550&subreddit=as

In [None]:
for subreddit in control_subreddits[300:]:
  userdata = getAllUserPushData(subreddit, 1612069200, 200)
  for submission in userdata:
    collectNewUserData(submission, user_dict)

https://api.pushshift.io/reddit/search/comment/?before=1612069200&subreddit=stupidpol&size=100
https://api.pushshift.io/reddit/search/comment/?before=1612066141&subreddit=stupidpol&size=100
200
https://api.pushshift.io/reddit/search/comment/?before=1612069200&subreddit=PokemonGoRaids&size=100
https://api.pushshift.io/reddit/search/comment/?before=1612068054&subreddit=PokemonGoRaids&size=100
200
https://api.pushshift.io/reddit/search/comment/?before=1612069200&subreddit=polls&size=100
https://api.pushshift.io/reddit/search/comment/?before=1612066214&subreddit=polls&size=100
200
https://api.pushshift.io/reddit/search/comment/?before=1612069200&subreddit=introvert&size=100
https://api.pushshift.io/reddit/search/comment/?before=1612031372&subreddit=introvert&size=100
200
https://api.pushshift.io/reddit/search/comment/?before=1612069200&subreddit=ClashOfClans&size=100
https://api.pushshift.io/reddit/search/comment/?before=1612060847&subreddit=ClashOfClans&size=100
200
https://api.pushshift.

In [None]:
len(user_dict)

141691

In [None]:
user_dict.values()

dict_values([1612068623, 1612069046, 1612069198, 1612041653, 1612069198, 1612069149, 1612068496, 1612069196, 1612067035, 1612066724, 1612069196, 1612069196, 1612038191, 1612069195, 1612063532, 1612069194, 1612069194, 1612069194, 1612069073, 1612069193, 1612069193, 1612069192, 1612069192, 1612069192, 1612068881, 1612068772, 1612069191, 1612068425, 1612069092, 1612069190, 1612069188, 1612069188, 1612069188, 1612068375, 1612068576, 1612069186, 1612069186, 1612068774, 1612068298, 1612068887, 1612051688, 1612068932, 1612069182, 1612069182, 1611950291, 1612068502, 1612069147, 1612066192, 1612069036, 1612068671, 1612069178, 1612069178, 1612068989, 1612069178, 1612069178, 1612069100, 1612069177, 1612068571, 1612049586, 1612069176, 1612066194, 1612069174, 1612065477, 1612069173, 1612068881, 1612069172, 1612069172, 1612069171, 1612069170, 1612069170, 1612045740, 1612069169, 1612068862, 1612069169, 1612068978, 1612068859, 1612068572, 1612069168, 1612067198, 1612069167, 1612069030, 1612069167, 161

In [None]:
control_user_df = pd.DataFrame.from_dict(user_dict, orient='index', columns=['created_utc'])
control_user_df.reset_index(inplace=True)
control_user_df = control_user_df.rename(columns={'index':'user'})
control_user_df.head()

NameError: ignored

In [None]:
cd drive/MyDrive/

/content/drive/MyDrive


In [None]:
cd MentalHealthReddit/Data/

/content/drive/MyDrive/MentalHealthReddit/Data


In [None]:
control_user_df.to_csv('control_user_df.csv')

In [None]:
username_df = pd.read_csv('control_user_df.csv')
username_df.head()

Unnamed: 0.1,Unnamed: 0,user,created_utc
0,0,Skurk-the-Grimm,1612068623
1,1,Voltiel,1612069046
2,2,MasterKlaw,1612069198
3,3,AdvocateSaint,1612041653
4,4,FroggiJoy87,1612069198


In [None]:
control_username = username_df['user'].tolist()

In [None]:
len(control_username)

141691

In [None]:
adhd_df = pd.read_csv('adhd.csv')
anxiety_df = pd.read_csv('anxiety.csv')
bipolar_df = pd.read_csv('bipolar.csv')
bpd_df = pd.read_csv('bpd.csv')
dep_df = pd.read_csv('dep.csv')
ptsd_df = pd.read_csv('ptsd.csv')
ptsd_df.head()

Unnamed: 0.1,Unnamed: 0,author,sub_id,title,url,score,created,created_utc,numComments,permalink,flair,selfext,subreddit
0,0,1599,izgr81,I was diagnosed with PTSD today,https://www.reddit.com/r/ptsd/comments/izgr81/...,1,2020-09-25 09:57:04,1601027824,0,/r/ptsd/comments/izgr81/i_was_diagnosed_with_p...,,So I got an official detailed diagnosis from a...,ptsd
1,1,4571,i5ijy1,I was diagnosed with PTSD yesterday,https://www.reddit.com/r/ptsd/comments/i5ijy1/...,1,2020-08-07 17:55:33,1596822933,4,/r/ptsd/comments/i5ijy1/i_was_diagnosed_with_p...,,I went to a psychiatrist a day ago and after a...,ptsd
2,2,1139,hopdot,"So, I was diagnosed with PTSD recently...",https://www.reddit.com/r/ptsd/comments/hopdot/...,1,2020-07-10 13:57:12,1594389432,8,/r/ptsd/comments/hopdot/so_i_was_diagnosed_wit...,,I don’t really know how to handle the diagnosi...,ptsd
3,3,1658,hnub4q,Today I was diagnosed with PTSD.,https://www.reddit.com/r/ptsd/comments/hnub4q/...,1,2020-07-09 01:49:55,1594259395,13,/r/ptsd/comments/hnub4q/today_i_was_diagnosed_...,,TW for brief abuse/addiction/death mention\n\n...,ptsd
4,4,4281,hmhqox,"I was diagnosed with PTSD today, what are my n...",https://www.reddit.com/r/ptsd/comments/hmhqox/...,1,2020-07-06 22:11:09,1594073469,5,/r/ptsd/comments/hmhqox/i_was_diagnosed_with_p...,,"in 2015, I was nearly murdered and from 2010-2...",ptsd


In [None]:
username_encoded.keys()



In [None]:
control_username = [x for x in control_username if x not in list(username_encoded.keys())]
len(control_username)
#removed users who were in the experimental group 

141386

In [None]:
len(username_encoded)

7994

In [None]:
def codeMoreUsernames(names):
  #takes in new names to encode
  token_map = {name: i+7994 for i, name in enumerate(names)}
  return token_map


In [None]:
controlnames_encoded = codeMoreUsernames(control_username)

In [None]:
len(controlnames_encoded)

141386

In [None]:
all_names = username_encoded 
all_names.update(controlnames_encoded)
len(all_names)

149380

In [None]:
ls

adhd_coms.csv     anxiety_subs.csv  bpd.csv              dep_subs.csv
adhd.csv          bipolar_coms.csv  bpd_subs.csv         ptsd_coms.csv
adhd_subs.csv     bipolar.csv       control_user_df.csv  ptsd.csv
anxiety_coms.csv  bipolar_subs.csv  dep_coms.csv         ptsd_subs.csv
anxiety.csv       bpd_coms.csv      dep.csv


In [None]:
all_names_df = pd.DataFrame.from_dict(all_names, orient='index', columns=['code'])
all_names_df.reset_index(inplace=True)
all_names_df = all_names_df.rename(columns={'index': 'user'})
all_names_df.head()

Unnamed: 0,user,code
0,cluelessin,0
1,CPTSD811,1
2,angelhippie,2
3,SquishKittyxoxo,3
4,jennylovesyoualot,4


In [None]:
all_names_df.to_csv('all_usernames_encoded.csv')

In [None]:
username_df.head()

Unnamed: 0.1,Unnamed: 0,user,created_utc
0,0,Skurk-the-Grimm,1612068623
1,1,Voltiel,1612069046
2,2,MasterKlaw,1612069198
3,3,AdvocateSaint,1612041653
4,4,FroggiJoy87,1612069198


In [None]:
username_df.rename(columns = {'user': 'author'}, inplace = True)
username_df.head()

Unnamed: 0.1,Unnamed: 0,author,created_utc
0,0,Skurk-the-Grimm,1612068623
1,1,Voltiel,1612069046
2,2,MasterKlaw,1612069198
3,3,AdvocateSaint,1612041653
4,4,FroggiJoy87,1612069198


## Get data from the identified control users
(continued in DataGathering2.ipynb)

In [None]:
 def getControlCommentData(df):
   """
   Takes in a dataframe, gets comments for each user
   """
  comment_dict = {}
  for user, time in zip(df['author'], df['created_utc']):
    #username=getName(user)
    data = getOtherComments(user, time)
    for comment in data:
      if comment != 'NaN':
        collectCommentData(comment, comment_dict)
  return comment_dict


In [None]:
def getControlSubmissionData(df):
  """
  Takes in a dataframe, gets submissoins for each user 
  """
  submission_dict = {}
  for user, time in zip(df['author'], df['created_utc']):
    data = getOtherSubmissions(user, time)
    for submission in data:
      if submission != 'NaN':
        #collectData(submission, submission_dict)
        collectOtherSubmissionData(submission, submission_dict)
  return submission_dict


In [None]:
first_comments = getControlCommentData(username_df[0:1000])
type(first_comments)

https://api.pushshift.io/reddit/search/comment/?author=Skurk-the-Grimm&after=1612068623&size=100
https://api.pushshift.io/reddit/search/comment/?author=Voltiel&after=1612069046&size=100
https://api.pushshift.io/reddit/search/comment/?author=MasterKlaw&after=1612069198&size=100
https://api.pushshift.io/reddit/search/comment/?author=AdvocateSaint&after=1612041653&size=100
https://api.pushshift.io/reddit/search/comment/?author=FroggiJoy87&after=1612069198&size=100
https://api.pushshift.io/reddit/search/comment/?author=tequila-la&after=1612069149&size=100
https://api.pushshift.io/reddit/search/comment/?author=brownzilla99&after=1612068496&size=100
https://api.pushshift.io/reddit/search/comment/?author=ALoudMeow&after=1612069196&size=100
https://api.pushshift.io/reddit/search/comment/?author=PaladinMax&after=1612067035&size=100
https://api.pushshift.io/reddit/search/comment/?author=nomoreshoppingsprees&after=1612066724&size=100
https://api.pushshift.io/reddit/search/comment/?author=rylaia&a

dict