<img src="https://spectrum.ieee.org/image/MzY0MTIwMA.jpeg" width="100%" height="800">

# Pfizer Tweets by Indians

* **Let's Analyze the tweets made by Indians regarding the Pfizer Vaccine to treat the COVID-19.**
* **Let's also do a Sentiment Analysis i.e "Polarity" & "Subjectivity" so that we can gain some Reviews and opinions of people w.r.t the Vaccine.**

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
import plotly.express as px
import math
import re
from wordcloud import WordCloud, STOPWORDS
from textblob import TextBlob                #Library for performing Sentiment Analysis

In [None]:
pd.set_option('display.max_colwidth', None)

In [None]:
df=pd.read_csv('../input/pfizer-vaccine-tweets/vaccination_tweets.csv')

In [None]:
df.head()

In [None]:
df.drop(['id','is_retweet'],axis=1,inplace=True)

In [None]:
df.info()

In [None]:
df.isnull().sum()

In [None]:
# As there's only 1 missing value in this column, let's fill it with the highest class value.

df.source=df.source.fillna('Twitter for iPhone')

In [None]:
df.fillna("Not Available",inplace=True)

In [None]:
df.user_name=df.user_name.apply(lambda x: ''.join([i for i in x if i.isalpha()]))

In [None]:
df.user_created=df.user_created.astype('datetime64[ns]')
df.date=df.date.astype('datetime64[ns]')

In [None]:
# function to convert text to lowercase, remove punctuations & special Characters for generating a word cloud.

def clean_description(desc):
    if(desc == "Not Available"):
        pass
    else:
        desc=str(desc).lower()
    
        desc = re.sub('[^a-zA-Z]', ' ', desc)
    
        desc=re.sub("(\\d|\\W)+"," ",desc)
    
    return desc

In [None]:
# I have cleaned the user_description column in order to search for specific terms.

df['clean_desc']=df.user_description.apply(clean_description)

In [None]:
# Cleaned text

df['clean_text']=df.text.apply(clean_description)

In [None]:
# Sentiment Analysis i.e. Polarity & Subjectivity:-

# Polarity -> ranges from -1 to +1 where, -1 = Negative, 0 = Neutral & +1 = Positive.

# Subjectivity -> ranges from 0 to 1 where, 0 -> Objective & 1 -> Subjective.

# Objective can be considered as factual information whereas subjective can be considered as Personal opinions.

textblob=[TextBlob(text) for text in df.text]

df['Polarity']=[p.sentiment.polarity for p in textblob]
df['Subjectivity']=[s.sentiment.subjectivity for s in textblob]

In [None]:
# Considering Indian twitter accounts

India=df[df.user_location.str.contains("india",case=False,regex=True)]

# Word Cloud

In [None]:
def wordcloud(df,Feature,title):
    wordcloud = WordCloud(width = 800, height = 800, background_color = 'black', stopwords = STOPWORDS, max_words = 1000
                          , min_font_size = 20).generate(str(df[Feature]))
    fig = plt.figure(figsize = (8,8), facecolor = None)
    plt.imshow(wordcloud,interpolation='bilinear')
    plt.title(title,fontsize=18)
    plt.axis('off')
    plt.show()

In [None]:
wordcloud(India,'clean_text',"Word Cloud of Indian Twitter users")

In [None]:
Indian_medical=India[India.clean_desc.str.contains('mbbs|md|medicine|surgeon|doctor|medical|researcher',case=False,regex=True)]

In [None]:
wordcloud(Indian_medical,'clean_text',"Word Cloud of Indian Medical Professionals' tweets")

In [None]:
Indian_media=India[India.clean_desc.str.contains('journalism|writer|researcher|journalist|blogger|media|news|channel|Entertainment',
                                                 case=False,regex=True)]

In [None]:
wordcloud(Indian_media,'clean_text',"Word Cloud of Indian Media Tweets")

# Sentiment Analysis

In [None]:
def SentimentAnalysis(data,Feature,title):
    return px.scatter(data,x=Feature,marginal_x='box',marginal_y='box',title=title)
    

In [None]:
SentimentAnalysis(India,'Polarity',"Measure of Polarity of Indian Tweets w.r.t the Vaccine")

In [None]:
SentimentAnalysis(Indian_medical,'Polarity',"Measure of Polarity of Indian Medical Professionals w.r.t the Vaccine")

In [None]:
SentimentAnalysis(Indian_media,'Polarity',"Measure of Polarity of Indian media w.r.t the Vaccine")

In [None]:
SentimentAnalysis(India,'Subjectivity',"Measure of Subjectivity of Indians w.r.t the Vaccine")

In [None]:
SentimentAnalysis(Indian_medical,'Subjectivity',"Measure of Subjectivity of Indian medical Professionals w.r.t the Vaccine")

In [None]:
SentimentAnalysis(Indian_media,'Subjectivity',"Measure of Subjectivity of Indian media w.r.t the Vaccine")

In [None]:
# Top Indian Tweet Sources

Indian_tweet_sources=India.source.value_counts()
px.bar(Indian_tweet_sources,x='source')

In [None]:
# Top Hashtags used by Indians

Indian_tweet_hashtags=India[India.hashtags != "Not Available"]
Indian_tweet_hashtags=Indian_tweet_hashtags.hashtags.value_counts().head(20)
px.bar(Indian_tweet_hashtags,x='hashtags')

* **Let's Analyse Top Retweets, most liked tweets (Favorites) & Users with most followers with respect to Verified & Non-verified Twitter Accounts.**

In [None]:
def UserAccAnalysis(Yaxis,category,title):
    Data=India[India.user_verified == category]
    fig=px.bar(Data,x='user_name',y=Yaxis,title=title)
    return fig.update_layout(xaxis={'categoryorder':'total descending'})

In [None]:
UserAccAnalysis('retweets',True,"Top Retweets of Indian Verified Twitter User Accounts")

In [None]:
UserAccAnalysis('user_followers',True,"Top followers of Indian Verified Twitter User Accounts")

In [None]:
UserAccAnalysis('favorites',True,"Top favourite tweets of Indian Verified Twitter User Accounts")

* **The Below tweet is the most liked & Retweeted Indian Verified Twitter user Tweet**

In [None]:
Top_liked_VerifiedAcc_tweet=India[India.favorites == 1786]['text']
Top_liked_VerifiedAcc_tweet

In [None]:
Top_retweeted_VerifiedAcc_tweet=India[India.retweets == 51]['text']
Top_retweeted_VerifiedAcc_tweet

In [None]:
UserAccAnalysis('retweets',False,"Top retweets of Indian Non-Verified Twitter User Accounts")

In [None]:
UserAccAnalysis('user_followers',False,"Top followers of Indian Non-Verified Twitter User Accounts")

In [None]:
UserAccAnalysis('favorites',False,"Top favorite tweets of Indian Non-Verified Twitter User Accounts")

* **Below Tweet is the Most Retweeted Indian Non-verified Tweet**

In [None]:
Top_retweets_Non_VerifiedAcc_tweet=India[India.retweets == 112]['text']
Top_retweets_Non_VerifiedAcc_tweet

* **Below Tweet is the Most Liked Indian Non-verified Tweet**

In [None]:
Top_liked_Non_VerifiedAcc_tweet=India[India.favorites == 170]['text']
Top_liked_Non_VerifiedAcc_tweet

# Measure of Polarity of entire Dataset

In [None]:
SentimentAnalysis(df,'Polarity',"Measure of Polarity of all people w.r.t the Vaccine")

* **By Analysing the Above Scatterplot, the Average Polarity is around 0.3 which is a bit Positive.**

# Measure of Subjectivity of entire Dataset

In [None]:
SentimentAnalysis(df,'Subjectivity',"Measure of Subjectivity of all people w.r.t the Vaccine")

In [None]:
Average_Subjectivity=df.Subjectivity.mean()
Average_Subjectivity

* **The Subjectivity of the entire Dataset of texts is around 0.3 which is Objective (Factual Information)**