# Applying a popular Sentiment Analysis Tool among Social Media Analytics for enrichment

Data Source: https://www.kaggle.com/gpreda/covid19-tweets

An open source data collected from Twitter API with a query of covid19 hashtags

In [1]:
import pandas as pd
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from datetime import datetime, timedelta

In [2]:
df = pd.read_csv("FlatData/covid19_tweets.csv")
df.date = pd.to_datetime(df.date)

#### Vader Sentiment Analysis Introduction

"VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media. It is fully open-sourced under the [MIT License] (we sincerely appreciate all attributions and readily accept most contributions, but please don’t hold us liable)."

https://github.com/cjhutto/vaderSentiment


#### Scoring

positive sentiment: compound score >= 0.05

neutral sentiment: (compound score > -0.05) and (compound score < 0.05)

negative sentiment: compound score <= -0.05

In [3]:
# Preparing a function to iterate through data

def getSentimentScore(text):
    """
    This function parse text and then apply Vader Tool to return a score
    For ease of understanding, this study will only stick to the
    compound score
    """
    analyzer = SentimentIntensityAnalyzer()
    scores = analyzer.polarity_scores(text)
    compoundScore = scores['compound']
    
    return compoundScore

In [4]:
# Filtering out data to 30 days to make it run faster as a demo

# chunk_df = df[df['date'] >= datetime.now() - timedelta(days = 30)]
chunk_df = df.copy()

In [6]:
# Applying the method from above to enrich the data

chunk_df.loc[:,'VaderScoe'] = chunk_df['text'].apply(getSentimentScore)

In [None]:
# Cleaning up the data before the groupby
outputdf = chunk_df[['date', 'VaderScoe']]
outputdf.loc[:,'date'] = outputdf['date'].dt.date

In [None]:
dailySentiment = outputdf.groupby("date").agg(['min','max','mean'])

In [None]:
# Cleaning up the Column names for easier CSV parsing in later studies

dailySentiment.columns = ['VaderMin', 'VaderMax', 'VaderMean']

## Exporting to csv for joining

In [None]:
dailySentiment.to_csv("FlatData/(forJoins)covid19_tweets.csv")