Setup

# Twitter Tweet Analysis

## Workflow
1. Install Packages
2. Get data
3. Visually perform sentiment analysis
  * plot number of positive, negative, neutral tweets each day
  * plot the proportion of postive, negative, neutral tweets
  * create a word cloud from the tweets

## 1. Install packges

In [None]:
from google.colab import output
!curl -Ol https://raw.githubusercontent.com/teaching-repositories/isys2001-worksheets/main/stopwords.py
!pip install TextBlob
output.clear()
print("Required packages installed")

## 2. Get and Clean the Data

In [None]:
!curl -Ol https://raw.githubusercontent.com/teaching-repositories/isys2001-worksheets/main/trump_tweets.csv

In [None]:
import re

def clean(text):
  ''' Uses regular expresison to extract english letter and digits from the supplied text. '''
  regExp = "(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)"
  return ' '.join(re.sub(regExp, " ", text).split())


Lets have a look at the data

In [None]:
import pandas as pd

tweets_df = pd.read_csv('trump_tweets.csv')
tweets_df['Clean Text'] = tweets_df['text'].apply(clean)
tweets_df.head()

## 3. Perform Sentiment Analysis
The sentiment function of textblob returns two properties, polarity, and subjectivity.

See: https://textblob.readthedocs.io/en/dev/index.html


### Polarity
Polarity is float which lies in the range of [-1,1] where 1 means positive statement and -1 means a negative statement. 

### Subjectivity
Subjective sentences generally refer to personal opinion, emotion or judgment whereas objective refers to factual information. Subjectivity is also a float which lies in the range of [0,1].

In [None]:
from textblob import TextBlob

# Wrapper so can use Pandas apply() function on a column
def getSubjectivity(text):
    return TextBlob(str(text)).sentiment.subjectivity

def getPolarity(text):
    return TextBlob(str(text)).sentiment.polarity

# Calculate sentiment,
tweets_df['Subjectivity'] = tweets_df['Clean Text'].apply(getSubjectivity)
tweets_df['Polarity'] = tweets_df['Clean Text'].apply(getPolarity)

tweets_df.head()

In [None]:
tweets_df.boxplot(column=['Subjectivity','Polarity'], grid=False, figsize=(12,8))

# Sentiment over time.

From the documentaiton, the values of polsrity can be interpreted a sentiment.  So we can *calculate* the sentiment of each tweet as either positive, negative or neutral.

> There are more precise ways, keeping it simple for thie exercise.

It could be interesting to plot these over time.  SO each day there is a number of tweets, what proportions are positive, negative or neutral.

First let us add a sentiment column


In [6]:
def sentiment(polarity):
  if polarity > 0:
      return 'positive'
  elif polarity == 0:
      return 'neutral'
  else:
      return 'negative'

In [None]:
tweets_df['Sentiment'] = tweets_df['Polarity'].apply(sentiment)
tweets_df.head()

In [None]:
pos = tweets_df[tweets_df["Sentiment"]=="positive"]
pos.head()

Write a function to calculate how many each day.  We will write a function because want to do this for negative and netural tweets.

In [9]:
def num_tweets(df):
    return df['Date'].value_counts().sort_index()

pos_per_day = num_tweets(pos)

In [None]:
pos_per_day.plot()

In [None]:
neg = tweets_df[tweets_df["Sentiment"]=="negative"]
neg_per_day = num_tweets(neg)
neg_per_day.plot()

In [None]:
neu = tweets_df[tweets_df["Sentiment"]=="neutral"]
neu_per_day = num_tweets(neu)
neu_per_day.plot()

How about propotions of sentiment type?

Visit Python Graph Gallery, 'Part of the Whole', select 'Pie' and loot at the example

In [None]:
import matplotlib.pyplot as plt
values = [len(pos),len(neg),len(neu)]
plt.pie(values)

We can do better. 

In [None]:
import matplotlib.pyplot as plt

values = [len(pos),len(neg),len(neu)]
labels = ['Positive', 'Negative', 'Neutral']
colors = ['b', 'g', 'r']
plt.pie(values, colors=colors, labels= values,counterclock=False, shadow=True)
plt.title('Sentiment Portions')
plt.legend(labels,loc=3)
plt.show()

## WordCLoud

Visit Python Graph Gallery, 'Ranking', select 'Word Cloud' and loot at the example

In [None]:
# Libraries
from wordcloud import WordCloud
import matplotlib.pyplot as plt
from stopwords import ENGLISH_STOP_WORDS

# Get all the messages
messages = ' '.join(tweets_df['Clean Text'])

stop_words = ENGLISH_STOP_WORDS.add(search_term)
# Create the wordcloud object
wordcloud = WordCloud(width=680, height=480, margin=0,
                      stopwords=ENGLISH_STOP_WORDS).generate(messages)

# Display the generated image:
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.margins(x=0, y=0)
plt.show()