# Analyzing Political Tweets on a Depression Prediction ML Model
### Sam Spell, James Tipton

Political rhetoric and discussions have seemingly become more polarized recently. In history and while reaching adulthood, being able to vote and be a part of politics is a very important role in a stable and healthy society. This project aims to use machine learning to develop a model to predict depression based on a string of text from twitter. Once this model is developed, it can be used to conduct an analysis on political messages sent online. We will be able to draw out patterns in twitter texts that the machine learning model classifies as showing signs of Depression. Another goal of this machine learning model is to extract patterns of text that can be connected to patterns of political messaging if they exist, and to compare this to a temporal aspect. With the changing view on polarized politics, it will be interesting to test if there is a change in the prevalence of messages classified with “depression” throughout different political times.


Step 1: Libraries

In [22]:
import pandas as pd
from sklearn.model_selection import train_test_split

import numpy as np
from numpy import savetxt
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import svm
import nltk
from nltk.corpus import stopwords, wordnet
from nltk.stem import PorterStemmer, WordNetLemmatizer

Step 1.5 Download these once

In [27]:
# nltk.download('punkt')
# nltk.download('stopwords')
# nltk.download('wordnet')
# nltk.download('omw-1.4')

[nltk_data] Downloading package omw-1.4 to
[nltk_data]     /Users/jamestipton/nltk_data...


True

Step 2: Data Cleaning

In [24]:
# isolate text column of dataset
d = pd.read_csv("depression.csv")
text = d["clean_text"]

# determine stopwords
stop_words = set(stopwords.words('english'))


In [25]:
# define function to remove stopwords
def remove_stopwords(text):
    tokens = nltk.word_tokenize(text)
    filtered_tokens = [word for word in tokens if word.lower() not in stop_words]
    filtered_text = " ".join(filtered_tokens)
    return filtered_text

text = text.apply(remove_stopwords)

In [28]:
lemmatizer = WordNetLemmatizer()

def lemmatize_text(text):
    tokens = nltk.word_tokenize(text)
    lemmatized_tokens = [lemmatizer.lemmatize(token) for token in tokens]
    lemmatized_text = " ".join(lemmatized_tokens)
    return lemmatized_text

text = text.apply(lemmatize_text)

In [29]:
stemmer = PorterStemmer()

def stem_text(text):
    tokens = nltk.word_tokenize(text)
    stemmed_tokens = [stemmer.stem(token) for token in tokens]
    stemmed_text = " ".join(stemmed_tokens)
    return stemmed_text

text = text.apply(stem_text)


In [30]:
print(text)

0       understand peopl repli immedi op invit talk pr...
1       welcom r depress check post place take moment ...
2       anyon els instead sleep depress stay night avo...
3       kind stuf around lot life delay inevit work jo...
4       sleep greatest comfort escap whenev wake day l...
                              ...                        
7726                                                 snow
7727                                  moulin roug mad cri
7728                            tri shout find peopl list
7729    ughh find red sox hat got ta wear creepi nick ...
7730    slept wonder final tri swatch new project clas...
Name: clean_text, Length: 7731, dtype: object
