# Explorinhg Elon's Tweets

First, we will break down the dataset

The dataset contains information about tweets from Elon Musk in 2022. The dataset breaksdown as follows:

- `Tweets`: original text of the tweet
- `Retweets`: number of retweets of the current tweet
- `Likes`: number of likes of the current tweet
- `Date`: creation date of the tweet
- `Cleaned_tweets`: text of the original tweet after removing elements such as 'RT', hashtags, mentions, emojis and leading/lagging whitespace

In [1]:
# Importing libraries
import pandas as pd
import numpy as np
import re
import matplotlib.pylab as plt
import seaborn as sns
from textblob import TextBlob

%matplotlib inline

# Reference: https://www.kaggle.com/code/marta99/elon-musk-s-tweets-sentiment-analysis/notebook

In [2]:
# Import the data
df = pd.read_csv('rawdata.csv')
df.head()

Unnamed: 0,Tweets,Retweets,Likes,Date
0,@PeterSchiff 🤣 thanks,209,7021,2022-10-27 16:17:39
1,@ZubyMusic Absolutely,755,26737,2022-10-27 13:19:25
2,Dear Twitter Advertisers https://t.co/GMwHmInPAS,55927,356623,2022-10-27 13:08:00
3,@BillyM2k 👻,802,19353,2022-10-27 02:32:48
4,Meeting a lot of cool people at Twitter today!,9366,195546,2022-10-26 21:39:32


So this data is not cleaned. This is the next stage.

In [3]:
# Creating a function to clean the tweets

def cleantwt (twt):
  emoj = re.compile("["
        u"\U0001F600-\U0001F64F"  # emoticons
        u"\U0001F300-\U0001F5FF"  # symbols & pictographs
        u"\U0001F680-\U0001F6FF"  # transport & map symbols
        u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
        u"\U00002500-\U00002BEF"  # chinese char
        u"\U00002702-\U000027B0"
        u"\U000024C2-\U0001F251"
        u"\U0001f926-\U0001f937"
        u"\U00010000-\U0010ffff"
        u"\u2640-\u2642" 
        u"\u2600-\u2B55"
        u"\u200d"
        u"\u23cf"
        u"\u23e9"
        u"\u231a"
        u"\ufe0f"  # dingbats
        u"\u3030"
                      "]+", re.UNICODE)

  twt = re.sub('RT', '', twt) # remove 'RT' from tweets
  twt = re.sub('#[A-Za-z0-9]+', '', twt) # remove the '#' from the tweets
  twt = re.sub('\\n', '', twt) # remove the '\n' character
  twt = re.sub('https?:\/\/\S+', '', twt) # remove the hyperlinks
  twt = re.sub('@[\S]*', '', twt) # remove @mentions
  twt = re.sub('^[\s]+|[\s]+$', '', twt) # remove leading and trailing whitespaces
  twt = re.sub(emoj, '', twt) # remove emojis
  return twt

In [4]:
# Create a new column with the cleaned tweets

df['Cleaned_Tweets'] = df['Tweets'].apply(cleantwt)

In [5]:
df.head()

Unnamed: 0,Tweets,Retweets,Likes,Date,Cleaned_Tweets
0,@PeterSchiff 🤣 thanks,209,7021,2022-10-27 16:17:39,thanks
1,@ZubyMusic Absolutely,755,26737,2022-10-27 13:19:25,Absolutely
2,Dear Twitter Advertisers https://t.co/GMwHmInPAS,55927,356623,2022-10-27 13:08:00,Dear Twitter Advertisers
3,@BillyM2k 👻,802,19353,2022-10-27 02:32:48,
4,Meeting a lot of cool people at Twitter today!,9366,195546,2022-10-26 21:39:32,Meeting a lot of cool people at Twitter today!


In [6]:
# Check for duplicates
df.duplicated().sum()

0

We can see here that some tweets contained only emojis, so when they were cleaned, that leaves and empty tweet. These will be deleted.

In [7]:
df.drop(df[df['Cleaned_Tweets']==''].index, inplace=True)

Sentiment analysis has characteristics known as `polarity` and `subjectivity`. 

`Polarity` is a float value within the range [-1.0 to 1.0] where 0 indicates neutral, +1 indicates a very positive sentiment and -1 represents a very negative sentiment.

`Subjectivity` is a float value within the range [0.0 to 1.0] where 0.0 is very objective and 1.0 is very subjective. Subjective sentence expresses some personal feelings, views, beliefs, opinions, allegations, desires, beliefs, suspicions, and speculations where as Objective sentences are factual.

This is taken from this Medium post: https://medium.com/analytics-vidhya/sentiment-analysis-using-textblob-ecaaf0373dff#:~:text=Polarity%20is%20a%20float%20value,and%201.0%20is%20very%20subjective. 

We will not examine these features.

In [8]:
# Function to get polarity with textblob
def getPolarity(twt):
    return TextBlob(twt).sentiment.polarity

# Function to get subjectivity
def getSubjectivity(twt):
    return TextBlob(twt).sentiment.subjectivity

Now we will create new columns in the data to save the output of the functions above

In [9]:
df['subjectivity']=df['Cleaned_Tweets'].apply(getSubjectivity)
df['polarity']=df['Cleaned_Tweets'].apply(getPolarity)

We can take a look to see what this addition looks like

In [10]:
df.head()

Unnamed: 0,Tweets,Retweets,Likes,Date,Cleaned_Tweets,subjectivity,polarity
0,@PeterSchiff 🤣 thanks,209,7021,2022-10-27 16:17:39,thanks,0.2,0.2
1,@ZubyMusic Absolutely,755,26737,2022-10-27 13:19:25,Absolutely,0.9,0.2
2,Dear Twitter Advertisers https://t.co/GMwHmInPAS,55927,356623,2022-10-27 13:08:00,Dear Twitter Advertisers,0.0,0.0
4,Meeting a lot of cool people at Twitter today!,9366,195546,2022-10-26 21:39:32,Meeting a lot of cool people at Twitter today!,0.65,0.4375
5,Entering Twitter HQ – let that sink in! https:...,145520,1043592,2022-10-26 18:45:58,Entering Twitter HQ – let that sink in!,0.0,0.0


We will now create a function to convert these numbers into labels to signify the sentiments as `positive`, `negative` or `neutral`.

In [11]:
def applySentiment(value):
    if value < 0:
        return 'Negative'
    elif value > 0:
        return 'Positive'
    else:
        return 'Neutral'


In [12]:
# Create a new column 'Sentiment'
df['Sentiment'] = df['polarity'].apply(applySentiment)

In [13]:
df.head(10)

Unnamed: 0,Tweets,Retweets,Likes,Date,Cleaned_Tweets,subjectivity,polarity,Sentiment
0,@PeterSchiff 🤣 thanks,209,7021,2022-10-27 16:17:39,thanks,0.2,0.2,Positive
1,@ZubyMusic Absolutely,755,26737,2022-10-27 13:19:25,Absolutely,0.9,0.2,Positive
2,Dear Twitter Advertisers https://t.co/GMwHmInPAS,55927,356623,2022-10-27 13:08:00,Dear Twitter Advertisers,0.0,0.0,Neutral
4,Meeting a lot of cool people at Twitter today!,9366,195546,2022-10-26 21:39:32,Meeting a lot of cool people at Twitter today!,0.65,0.4375,Positive
5,Entering Twitter HQ – let that sink in! https:...,145520,1043592,2022-10-26 18:45:58,Entering Twitter HQ – let that sink in!,0.0,0.0,Neutral
8,@ARodTV Definitely closer to citizen journalis...,699,10189,2022-10-26 17:05:16,Definitely closer to citizen journalism – loca...,0.333333,0.166667,Positive
9,@sandyleevincent Nobody bats 1000 🤷‍♂️,126,2920,2022-10-26 15:42:50,Nobody bats 1000,0.0,0.0,Neutral
10,A beautiful thing about Twitter is how it empo...,37951,294406,2022-10-26 15:27:40,A beautiful thing about Twitter is how it empo...,0.8125,0.675,Positive
11,@teslaownersSV I’m a big fan of citizen journa...,488,5529,2022-10-26 15:22:43,I’m a big fan of citizen journalism!,0.1,0.0,Neutral
12,“According to unnamed sources close to the mat...,4603,62693,2022-10-26 14:50:58,“According to unnamed sources close to the mat...,0.0,0.0,Neutral


# Visualisation

In [None]:
# Scatter plot
sns.set_style('darkgrid')
plt.figure(figsize=(8,6))

markers = {'positive':'o', 'neutral':'s', 'negative':'x'}

sns.scatterplot(data=df, x = 'polarity', y='subjectivity', hue='Sentiment', style='Sentiment', markers=markers, palette='grist_earth').set(xlim=(-1,1)) 

plt.title('Scatter Plot')
plt.xlabel('Polarity')
plt.ylabel('Subjectivity')

plt.tight_layout()