# SENTIMENT ANALYSIS OF COVID-19 TWEETS

This Notebook is to 

Text Preprocessing:

Tokenization

Entity Recognition

Score

The repository contains analysis of 1500 tweets associated with novel coronavirus COVID-19, which commenced on January 28,2020.

**DATA COLLECTION**

I used the Twitter's search API to gather historical tweets.

**DATA PREPROCESSING**
1. Conversion of tweets text to lowercase
2. Removal of Punctuations
3. Removal of Stopwords
4. Replacement of Emoji with text 

**Named Entity Recognition**

Named entity recognition algorithm of 'spacy' library was used to find entities in tweets

**Polarity Scores**
The tweets are classified based on positive,negative and neutral polarity scores.

**Interpretation of Compound Score:**
The Compound score is a metric that calculates the sum of all the lexicon ratings which have been normalized between -1(most extreme negative) and +1 (most extreme positive).
positive sentiment : (compound score >= 0.05) 
neutral sentiment : (compound score > -0.05) and (compound score < 0.05) 
negative sentiment : (compound score <= -0.05)




In [None]:
import pandas as pd
import json

In [None]:
import tensorflow as tf

# Get the GPU device name.
device_name = tf.test.gpu_device_name()

# The device name should look like the following:
if device_name == '/device:GPU:0':
    print('Found GPU at: {}'.format(device_name))
else:
    raise SystemError('GPU device not found')

Found GPU at: /device:GPU:0


In [None]:
import torch

# If there's a GPU available...
if torch.cuda.is_available():    

    # Tell PyTorch to use the GPU.    
    device = torch.device("cuda")

    print('There are %d GPU(s) available.' % torch.cuda.device_count())

    print('We will use the GPU:', torch.cuda.get_device_name(0))

# If not...
else:
    print('No GPU available, using the CPU instead.')
    device = torch.device("cpu")

There are 1 GPU(s) available.
We will use the GPU: Tesla T4


In [None]:
from google.colab import drive
drive.mount("/content/drive")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
import tweepy as tw

with open("/content/drive/MyDrive/twitter_keys.json",'r') as f:
  data = json.load(f)

In [None]:
auth = tw.OAuthHandler(data["consumer_key"],data["consumer_secret"])
api = tw.API(auth,wait_on_rate_limit=True)

In [None]:
search_words = ["#coronavirus","#covid19","#covid-19"]
date_since = "2020-04-30"

In [None]:
tweets = tw.Cursor(api.search,q=search_words,lang="en",since=date_since,count=1500).items()

In [None]:
tweets

<tweepy.cursor.ItemIterator at 0x7f3f7f5ec210>

In [None]:
tweet_details = [[tweet.geo,tweet.text,tweet.user.screen_name,tweet.user.location] for tweet in tweets]




In [None]:
tweets_df = pd.DataFrame(data=tweet_details,columns=["geo","text","user","location"])

In [None]:
tweets_df.to_csv("/content/drive/MyDrive/covid_tweets.csv"). ##Saving tweets in a csv file

In [None]:
tweets_df.head()

Unnamed: 0,geo,text,user,location
0,,RT @sapiofoxxxxxxxy: How globalists planned th...,Mark79641317,"Mallow, Ireland"
1,,The Mastercard Foundation on Tuesday announced...,SABCNews,South Africa
2,,"24,000+ COVID-19 infections so far in June in ...",NewsfirstSL,"Colombo, Sri Lanka"
3,,Covid-19 travel update: South Korea seeks trav...,Al_Maldives,
4,,COVID-19 Home Isolation Care\nWe are here to H...,QuickwellR,


In [None]:
tweets_df["location"].value_counts()

                                 3689
Los Angeles, CA                  2573
India                             341
United Kingdom                    255
Karlsruhe, Germany                233
                                 ... 
patreon.com/wordglass               1
Aurangabad | Maharashtra            1
Bellevue, WA #Navy #Unity2020       1
San Marcos, TX                      1
Unknown                             1
Name: location, Length: 1692, dtype: int64

**DATA PREPROCESSING**


In [None]:

##Removing mentions
import re
def clean_tweet_text(text):
  text = re.sub("RT @[\w]*:","",text)
  text = re.sub("@[\w]*","",text)
  text = re.sub("\n","",text)
  return text

In [None]:
tweets_df["text"] = tweets_df["text"].apply(lambda x: clean_tweet_text(x))

In [None]:
##Converting into lowercase

tweets_df["text"] =  tweets_df['text'].str.lower()

In [None]:
##Removing punctuations

tweets_df['text'] = tweets_df['text'].str.replace('[^\w\s]','')

In [None]:
#Spelling correction
from textblob import TextBlob
clean_text["text"] = clean_text['text'].apply(lambda x: str(TextBlob(x).correct()))

KeyboardInterrupt: ignored

In [None]:
##Stop-word removal

import nltk
from nltk.corpus import stopwords 
nltk.download('stopwords')

#STOPWORDS = set(sw.words('english'))

def remove_stopwords(text):
  return " ".join([word for word in str(text).split() if word not in stopwords.words()])

##Applying the removal of stopwords

tweets_df["text"] = tweets_df["text"].apply(remove_stopwords)

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [None]:
##Replacing emoji with text [Emoji conversion into text]

import pickle
import re

with open("/content/drive/MyDrive/Emoji_Dict.p",'rb') as emoji_file:
  print(emoji_file)
  emoji_dict = pickle.load(emoji_file)
  print(emoji_dict)
emoji_dict = {v: k for k, v in emoji_dict.items()} 

def convert_emojis_to_word(text):
    
    for emot in emoji_dict:
        text = re.sub(emot,emoji_dict[emot].replace(':',""),text)
        #text = re.sub(r'('+emot+')', "_".join(emoji_dict[emot].replace(",","").replace(":","").split()), text)
    return text

#convert_emojis_to_word("I won 🥇 in 🏏")

tweets_df["text"] = tweets_df["text"].apply(convert_emojis_to_word)


<_io.BufferedReader name='/content/drive/MyDrive/Emoji_Dict.p'>


In [None]:
tweets_df.head()

Unnamed: 0,geo,text,user,location
0,,globalists planned covid19 pandemic 10 years e...,Mark79641317,"Mallow, Ireland"
1,,mastercard foundation tuesday announced 13 bil...,SABCNews,South Africa
2,,24000 covid19 infections far june sri lanka no...,NewsfirstSL,"Colombo, Sri Lanka"
3,,covid19 travel update south korea seeks travel...,Al_Maldives,
4,,covid19 home isolation carewe help 18004194948...,QuickwellR,


In [None]:
tweets_df.to_csv("/content/drive/MyDrive/covid_tweets_clean1.csv")

In [None]:
import spacy
nlp = spacy.load("en_core_web_sm")



In [None]:

tweets_df["entities"] = tweets_df["text"].apply(lambda x: [(ent.text,ent.label_) for ent in nlp(x).ents])






In [None]:
#!pip install nltk.sentiment
import nltk.sentiment.vader #import SentimentIntensiityAnalyzer
nltk.download("vader_lexicon")

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

In [None]:
sia = nltk.sentiment.vader.SentimentIntensityAnalyzer()

In [None]:
tweets_df["sentiment"] = tweets_df["text"].apply(lambda x: sia.polarity_scores(x))

In [None]:
tweets_df.shape

(12409, 6)

In [None]:
tweets_df.to_csv("final_tweets_df_9062021.csv")