Commence the Twitter sentiment analysis task by importing essential Python libraries and the dataset:

In [3]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
import re
import nltk


In [4]:
data = pd.read_csv("https://raw.githubusercontent.com/amankharwal/Website-data/master/twitter.csv")
print(data.head())

   Unnamed: 0  count  hate_speech  offensive_language  neither  class  \
0           0      3            0                   0        3      2   
1           1      3            0                   3        0      1   
2           2      3            0                   3        0      1   
3           3      3            0                   2        1      1   
4           4      6            0                   6        0      1   

                                               tweet  
0  !!! RT @mayasolovely: As a woman you shouldn't...  
1  !!!!! RT @mleew17: boy dats cold...tyga dwn ba...  
2  !!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...  
3  !!!!!!!!! RT @C_G_Anderson: @viva_based she lo...  
4  !!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...  


In the provided dataset, the "tweet" column encompasses the tweets crucial for analyzing the sentiments of individuals involved in the discussion. However, to proceed effectively, extensive cleaning is required to address various errors and special symbols prevalent within these tweets, given their propensity for language inaccuracies. Here's how we can undertake the cleaning process for the "tweet" column:

In [5]:
nltk.download('stopwords')
stemmer = nltk.SnowballStemmer("english")
from nltk.corpus import stopwords
import string
stopword=set(stopwords.words('english'))

def clean(text):
    text = str(text).lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    text = [word for word in text.split(' ') if word not in stopword]
    text=" ".join(text)
    text = [stemmer.stem(word) for word in text.split(' ')]
    text=" ".join(text)
    return text
data["tweet"] = data["tweet"].apply(clean)

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/m.daoudadala/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


The subsequent step involves computing sentiment scores for the tweets and assigning them labels denoting whether they are positive, negative, or neutral. Below is the methodology for calculating sentiment scores of the tweets:

In [6]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
nltk.download('vader_lexicon')
sentiments = SentimentIntensityAnalyzer()
data["Positive"] = [sentiments.polarity_scores(i)["pos"] for i in data["tweet"]]
data["Negative"] = [sentiments.polarity_scores(i)["neg"] for i in data["tweet"]]
data["Neutral"] = [sentiments.polarity_scores(i)["neu"] for i in data["tweet"]]

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/m.daoudadala/nltk_data...


Next, I'll selectively choose the columns from this data that are necessary for the remaining tasks in the Twitter sentiment analysis:

In [7]:
data = data[["tweet", "Positive", 
             "Negative", "Neutral"]]
print(data.head())

                                               tweet  Positive  Negative  \
0   rt mayasolov woman shouldnt complain clean ho...     0.147     0.157   
1   rt  boy dat coldtyga dwn bad cuffin dat hoe  ...     0.000     0.280   
2   rt urkindofbrand dawg rt  ever fuck bitch sta...     0.000     0.577   
3             rt cganderson vivabas look like tranni     0.333     0.000   
4   rt shenikarobert shit hear might true might f...     0.154     0.407   

   Neutral  
0    0.696  
1    0.720  
2    0.423  
3    0.667  
4    0.440  


Now, let's examine the most frequent label assigned to the tweets based on the sentiment scores:

In [8]:
x = sum(data["Positive"])
y = sum(data["Negative"])
z = sum(data["Neutral"])

def sentiment_score(a, b, c):
    if (a>b) and (a>c):
        print("Positive 😊 ")
    elif (b>a) and (b>c):
        print("Negative 😠 ")
    else:
        print("Neutral 🙂 ")
sentiment_score(x, y, z)

Neutral 🙂 


So, the majority of tweets are neutral, indicating a lack of overtly positive or negative sentiment. Now, let's analyze the total sentiment scores:

In [9]:
print("Positive: ", x)
print("Negative: ", y)
print("Neutral: ", z)

Positive:  2880.086000000009
Negative:  7201.020999999922
Neutral:  14696.887999999733


While the total count of neutral tweets significantly outweighs both negative and positive sentiments, it's noteworthy that negative tweets outnumber positive ones among all the sentiments expressed. Consequently, it suggests that a substantial portion of the opinions conveyed tends towards negativity.

## Summary
So this is how you can perform the task of Twitter sentiment analysis by using the Python programming language. Analyzing sentiments is a task of natural language processing. All the social media platforms need to keep a check on the sentiments of people engaged in a discussion. I hope you liked this article on Twitter sentiment analysis using Python. Feel free to ask your valuable questions in the comments section below.