# Analyzing Political Tweets on a Depression Prediction ML Model
### Sam Spell, James Tipton

Political rhetoric and discussions have seemingly become more polarized recently. In history and while reaching adulthood, being able to vote and be a part of politics is a very important role in a stable and healthy society. This project aims to use machine learning to develop a model to predict depression based on a string of text from twitter. Once this model is developed, it can be used to conduct an analysis on political messages sent online. We will be able to draw out patterns in twitter texts that the machine learning model classifies as showing signs of Depression. Another goal of this machine learning model is to extract patterns of text that can be connected to patterns of political messaging if they exist, and to compare this to a temporal aspect. With the changing view on polarized politics, it will be interesting to test if there is a change in the prevalence of messages classified with “depression” throughout different political times.


Step 1: Libraries

In [8]:
import pandas as pd
from sklearn.model_selection import train_test_split

import numpy as np
from numpy import savetxt
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import svm
import nltk
from nltk.corpus import stopwords

In [9]:
# nltk.download('punkt')
# nltk.download('stopwords')

Step 2: Data Cleaning

In [10]:
depress = pd.read_csv("depression.csv")
strings = depress["clean_text"]
stop_words = set(stopwords.words('english'))


In [11]:
depress.head()

Unnamed: 0,clean_text,is_depression
0,we understand that most people who reply immed...,1
1,welcome to r depression s check in post a plac...,1
2,anyone else instead of sleeping more when depr...,1
3,i ve kind of stuffed around a lot in my life d...,1
4,sleep is my greatest and most comforting escap...,1


In [12]:
strings.head()

0    we understand that most people who reply immed...
1    welcome to r depression s check in post a plac...
2    anyone else instead of sleeping more when depr...
3    i ve kind of stuffed around a lot in my life d...
4    sleep is my greatest and most comforting escap...
Name: clean_text, dtype: object

In [13]:
new_str = []

for string in strings:
    words = nltk.word_tokenize(string)
    filtered_words = [word for word in words if word.lower() not in stop_words]
    filtered_str = ' '.join(filtered_words)
    #print(string)
    #print()
    #print(filtered_str)
    #print()
    #print()
    #print()
    new_str.append(filtered_str)
    
# print(new_str)

In [14]:
print(len(new_str))
print(len(strings))

7731
7731
