# Political Social Media Analysis

In this project, I will try to compare the tweets of Donald Trump, Barrack Obama, and Hillary Clinton to come up with meaningful insights

In this notebook, I will import the cleaned data and come up with as many insights as possible

There are 3 CSV files which will be used:
1. DonaldTrumpClean
2. BarackObamaClean
3. HillaryClintonClean

All 3 have the same structure
date,retweet,text,author

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

Read the clean data

In [2]:
trump = pd.read_csv("data/DonaldTrumpClean.csv")
obama = pd.read_csv("data/BarackObamaClean.csv")
clinton = pd.read_csv("data/HillaryClintonClean.csv")

In [3]:
print(len(trump), len(obama), len(clinton))

8439 2125 3256


In [4]:
trump.drop('Unnamed: 0', axis=1, inplace=True)
obama.drop('Unnamed: 0', axis=1, inplace=True)
clinton.drop('Unnamed: 0', axis=1, inplace=True)

In [5]:
trump['date'] = pd.to_datetime(trump['date'])
obama['date'] = pd.to_datetime(obama['date'])
clinton['date'] = pd.to_datetime(clinton['date'])

## Topic Modelling

In [22]:
trumpTweetList = list(trump.text)

### Cleaning and Preprocessing

Cleaning is an important step before any text mining task, in this step, we will remove the punctuations, stopwords and normalize the corpus.

In [23]:
from nltk.corpus import stopwords 
from nltk.stem.wordnet import WordNetLemmatizer
import string
stop = set(stopwords.words('english'))
exclude = set(string.punctuation) 
lemma = WordNetLemmatizer()
def clean(doc):
    stop_free = " ".join([i for i in doc.lower().split() if i not in stop])
    punc_free = ''.join(ch for ch in stop_free if ch not in exclude)
    normalized = " ".join(lemma.lemmatize(word) for word in punc_free.split())
    return normalized

doc_clean = [clean(doc).split() for doc in trumpTweetList] 

### Preparing Document-Term Matrix


In [25]:
# Importing Gensim
import gensim
from gensim import corpora

# Creating the term dictionary of our courpus, where every unique term is assigned an index. 
dictionary = corpora.Dictionary(doc_clean)

# Converting list of documents (corpus) into Document Term Matrix using dictionary prepared above.
doc_term_matrix = [dictionary.doc2bow(doc) for doc in doc_clean]



### Running LDA Model

In [26]:
# Creating the object for LDA model using gensim library
Lda = gensim.models.ldamodel.LdaModel

In [None]:
# Running and Trainign LDA model on the document term matrix.
ldamodel = Lda(doc_term_matrix, num_topics=3, id2word = dictionary, passes=50)