## The Depressive Data Source
"Depressive Tweets" were collectted retrieved using the Twitter scraping tool TWINT. Tweets were collected by searching for terms specifically related to depression. Data was gathered for the terms: 1. Depressed, 2. Depression, 3. Hopeless, 4. Lonely, 5. Suicide, 6.Antidepressant, 7. Antidepressants. These tweets proved to contain lexical features strongly indicative of depression and were ideal for training an efficient and robust classifier.

In [1]:
#import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
import re
from bs4 import BeautifulSoup
from nltk.tokenize import WordPunctTokenizer
tok = WordPunctTokenizer()
import warnings
warnings.filterwarnings("ignore")

## Read in files of different scrape terms

In [2]:
depressive_tweets = pd.read_csv('depression/depressive_unigram_tweets.csv')

In [3]:
depressive_tweets.head()

Unnamed: 0.1,Unnamed: 0,id,time,tweet,hashtags,cashtags
0,0,1.15135e+18,21:25:13,"Wow, my dad yday: ‚Äúyou don‚Äôt take those stupid...",[],[]
1,1,1.15135e+18,21:25:07,what part of this was really harmfult of a lot...,[],[]
2,2,1.15135e+18,21:25:06,one of the ways I got through my #depression i...,"['#depression', '#uncoveringthenewu', '#change...",[]
3,3,1.15135e+18,21:24:55,see i wanna do one of them but they all say th...,[],[]
4,4,1.15135e+18,21:24:51,IS IT clinical depression or is it the palpabl...,[],[]


In [4]:
#depression_tweets = pd.read_csv('depression/tweets.csv')
#depression_tweets.head()

In [5]:
#depression_tweets.drop(['date', 'timezone', 'username', 'name', 'conversation_id', 'created_at', 'user_id', 'place', 'likes_count', 'link', 'retweet', 'quote_url', 'video', 'user_rt_id', 'near', 'geo', 'mentions', 'urls', 'photos', 'replies_count', 'retweets_count'], axis = 1, inplace = True)
#depression_tweets.head()

In [6]:
depressed_tweets = pd.read_csv('depression/tweets.csv')
depressed_tweets.drop(['date', 'timezone', 'username', 'name', 'conversation_id', 'created_at', 'user_id', 'place', 'likes_count', 'link', 'retweet', 'quote_url', 'video', 'user_rt_id', 'near', 'geo', 'mentions', 'urls', 'photos', 'replies_count', 'retweets_count'], axis = 1, inplace = True)
depressed_tweets.head()

Unnamed: 0,id,time,tweet,hashtags,cashtags
0,1151347096966041603,21:25:13,"Wow, my dad yday: ‚Äúyou don‚Äôt take those stupid...",[],[]
1,1151347069627576320,21:25:07,what part of this was really harmfult of a lot...,[],[]
2,1151347066255396865,21:25:06,one of the ways I got through my #depression i...,"['#depression', '#uncoveringthenewu', '#change...",[]
3,1151347022789611520,21:24:55,see i wanna do one of them but they all say th...,[],[]
4,1151347006406893568,21:24:51,IS IT clinical depression or is it the palpabl...,[],[]


In [7]:
hopeless_tweets = pd.read_csv('hopeless/tweets.csv')
hopeless_tweets.drop(['date', 'timezone', 'username', 'name', 'conversation_id', 'created_at', 'user_id', 'place', 'likes_count', 'link', 'retweet', 'quote_url', 'video', 'user_rt_id', 'near', 'geo', 'mentions', 'urls', 'photos', 'replies_count', 'retweets_count'], axis = 1, inplace = True)
hopeless_tweets.head()

Unnamed: 0,id,time,tweet,hashtags,cashtags
0,1151526536471728134,09:18:15,"Hopeless, crazed, and dispossessed, I walked o...",[],[]
1,1151526442922139649,09:17:52,KAP haberini beklerken serSERƒ∞n olmu≈ütuk,[],[]
2,1151526396210110464,09:17:41,17-july-2019. ü¶âüíõ.,[],[]
3,1151526283890683904,09:17:15,‡πÄ‡∏õ‡πá‡∏ô‡πÑ‡∏£‡∏™‡∏≤‡∏ß‡∏ô‡πâ‡∏≠‡∏¢,[],[]
4,1151526267738628097,09:17:11,ÿßŸÜÿß ŸÇÿßÿπÿØŸá ÿßÿπŸäÿ¥ ÿßÿ≥ÿπÿØ ÿßŸäÿßŸÖ ÿ≠Ÿäÿßÿ™Ÿäüíõ.,[],[]


In [8]:
lonely_tweets = pd.read_csv('lonely/tweets.csv')
lonely_tweets.drop(['date', 'timezone', 'username', 'name', 'conversation_id', 'created_at', 'user_id', 'place', 'likes_count', 'link', 'retweet', 'quote_url', 'video', 'user_rt_id', 'near', 'geo', 'mentions', 'urls', 'photos', 'replies_count', 'retweets_count'], axis = 1, inplace = True)
lonely_tweets.head()

Unnamed: 0,id,time,tweet,hashtags,cashtags
0,1152982582843326466,09:44:03,i dont know why but he looks so lonely in this...,[],[]
1,1152982578741284865,09:44:02,–Ø –ø–æ—Å–ª–µ —Ç–æ–≥–æ –∫–∞–∫ —É–≤–∏–¥–µ–ª–∞ –∏—Ö –Ω–∞—á–∞–ª–∞ –æ—Ç—Ä–∞—â–∏–≤–∞—Ç—å ...,[],[]
2,1152982577181024259,09:44:02,Even follow you on all social networks,[],[]
3,1152982576153239552,09:44:02,"#Nowplaying: Garmonsway, Gibbon and Harrington...",['#nowplaying'],[]
4,1152982566263296000,09:43:59,Laying in this hammock every Sunday alone is g...,[],[]


In [9]:
antidepressant_tweets = pd.read_csv('antidepressant/tweets.csv')
antidepressant_tweets.drop(['date', 'timezone', 'username', 'name', 'conversation_id', 'created_at', 'user_id', 'place', 'likes_count', 'link', 'retweet', 'quote_url', 'video', 'user_rt_id', 'near', 'geo', 'mentions', 'urls', 'photos', 'replies_count', 'retweets_count'], axis = 1, inplace = True)
antidepressant_tweets.head()

Unnamed: 0,id,time,tweet,hashtags,cashtags
0,1152991919137538048,10:21:09,I hate that the antidepressants made me feel w...,[],[]
1,1152991631722913793,10:20:01,Your beliefs ultimately are manifested in your...,[],[]
2,1152991531789406209,10:19:37,I think current trends lead to a world where e...,[],[]
3,1152991116733628416,10:17:58,Anti-Depressants and Recovery https://www.mar...,"['#medication', '#antidepressants', '#eatingdi...",[]
4,1152990783420751872,10:16:38,Have you thought about getting a sleep study d...,[],[]


In [10]:
antidepressants_tweets = pd.read_csv('antidepressants/tweets.csv')
antidepressants_tweets.drop(['date', 'timezone', 'username', 'name', 'conversation_id', 'created_at', 'user_id', 'place', 'likes_count', 'link', 'retweet', 'quote_url', 'video', 'user_rt_id', 'near', 'geo', 'mentions', 'urls', 'photos', 'replies_count', 'retweets_count'], axis = 1, inplace = True)
antidepressants_tweets.head()

Unnamed: 0,id,time,tweet,hashtags,cashtags
0,1152995178359218176,10:34:06,i can't think logically and all of shit i say ...,[],[]
1,1152994945537576960,10:33:11,Recently moved to Australia and was ASTOUNDED ...,[],[]
2,1152994834359209985,10:32:44,Maybe I should go back on my antidepressants. ...,[],[]
3,1152994452606033920,10:31:13,What It‚Äôs Like to Know You‚Äôll Be on Antidepres...,[],[]
4,1152994432188370949,10:31:08,Do antidepressants work? :/,[],[]


In [11]:
suicide_tweets = pd.read_csv('suicide/tweets.csv')
suicide_tweets.drop(['date', 'timezone', 'username', 'name', 'conversation_id', 'created_at', 'user_id', 'place', 'likes_count', 'link', 'retweet', 'quote_url', 'video', 'user_rt_id', 'near', 'geo', 'mentions', 'urls', 'photos', 'replies_count', 'retweets_count'], axis = 1, inplace = True)
suicide_tweets.head()

Unnamed: 0,id,time,tweet,hashtags,cashtags
0,1152996044604682241,10:37:33,Suicide Thoughts ....,[],[]
1,1152995993148899329,10:37:21,If I wake up as a white person in my next life...,[],[]
2,1152995985053900800,10:37:19,I fixed my bio (Cant add a banner because Twit...,[],[]
3,1152995984642887683,10:37:19,Weaponizign Suicide disturbs me a lot Cardi B ...,[],[]
4,1152995955559620608,10:37:12,#sam harcel√© par ses camarades de classe se #s...,"['#sam', '#suicide']",[]


In [12]:
depressed_tweets_combined = pd.concat([depressive_tweets, depressed_tweets, hopeless_tweets, lonely_tweets, antidepressant_tweets, antidepressants_tweets, suicide_tweets], ignore_index=True)
depressed_tweets_combined

Unnamed: 0.1,Unnamed: 0,id,time,tweet,hashtags,cashtags
0,0,1.15135E+18,21:25:13,"Wow, my dad yday: ‚Äúyou don‚Äôt take those stupid...",[],[]
1,1,1.15135E+18,21:25:07,what part of this was really harmfult of a lot...,[],[]
2,2,1.15135E+18,21:25:06,one of the ways I got through my #depression i...,"['#depression', '#uncoveringthenewu', '#change...",[]
3,3,1.15135E+18,21:24:55,see i wanna do one of them but they all say th...,[],[]
4,4,1.15135E+18,21:24:51,IS IT clinical depression or is it the palpabl...,[],[]
...,...,...,...,...,...,...
250034,,1152367589030391809,17:00:17,„Åà„Å£ÔºüÔºÅ„Ç™„Éã„Ç£ÁµêÊßã„Å™„ÅäÊ≠≥‚Ä¶‚Ä¶Ôºà‰ªäÁü•„Å£„ÅüÔºâ Â§â„Å™Â£∞„Åß„Å°„ÇÉ„Å£„Åü(Á¨ë),[],[]
250035,,1152367565483761664,17:00:12,"#PhysicianFriday ""Let's empower doctors to tak...","['#physicianfriday', '#suicide', '#physicians'...",[]
250036,,1152367519283367936,17:00:01,A spike in suicides among teenage boys in the ...,"['#aztrauma', '#traumatraining', '#suicide', '...",[]
250037,,1152367516083204096,17:00:00,Need some support? Check out the following res...,[],[]


In [13]:
depressed_tweets_combined = depressed_tweets_combined.drop_duplicates()
depressed_tweets_combined

Unnamed: 0.1,Unnamed: 0,id,time,tweet,hashtags,cashtags
0,0,1.15135E+18,21:25:13,"Wow, my dad yday: ‚Äúyou don‚Äôt take those stupid...",[],[]
1,1,1.15135E+18,21:25:07,what part of this was really harmfult of a lot...,[],[]
2,2,1.15135E+18,21:25:06,one of the ways I got through my #depression i...,"['#depression', '#uncoveringthenewu', '#change...",[]
3,3,1.15135E+18,21:24:55,see i wanna do one of them but they all say th...,[],[]
4,4,1.15135E+18,21:24:51,IS IT clinical depression or is it the palpabl...,[],[]
...,...,...,...,...,...,...
250034,,1152367589030391809,17:00:17,„Åà„Å£ÔºüÔºÅ„Ç™„Éã„Ç£ÁµêÊßã„Å™„ÅäÊ≠≥‚Ä¶‚Ä¶Ôºà‰ªäÁü•„Å£„ÅüÔºâ Â§â„Å™Â£∞„Åß„Å°„ÇÉ„Å£„Åü(Á¨ë),[],[]
250035,,1152367565483761664,17:00:12,"#PhysicianFriday ""Let's empower doctors to tak...","['#physicianfriday', '#suicide', '#physicians'...",[]
250036,,1152367519283367936,17:00:01,A spike in suicides among teenage boys in the ...,"['#aztrauma', '#traumatraining', '#suicide', '...",[]
250037,,1152367516083204096,17:00:00,Need some support? Check out the following res...,[],[]


In [14]:
export_csv = depressed_tweets_combined.to_csv(r'depressive_tweets_conbined.csv')