## 1. Introduction

This notebook aims to capture the sentiment of the Virat Kohli hashtag on Twitter (from a small sample of Tweets).

We will use three things:

- Search phrase: **Kohli resigns**
- Scraped tweets
- Vader package in Python to estimate analysis

This is a basic approach, not utilizing the latest deep learning tehniques to estimate sentiment in text. Let's get started...

## 2. Coding

### 2.1. Import libraries

In [2]:
# Run the pip install command below if you don't already have the library
#!pip install git+https://github.com/JustAnotherArchivist/snscrape.git

# Imports
import os
import pandas as pd

### 2.2. Scrape 10,000 tweets using search phrase and snsscrape package


In [3]:
# Setting variables to be used in format string command below
tweet_count = 10000
text_query = "Kohli resigns"
since_date = "2022-01-14"
until_date = "2020-01-16"

# Using OS library to call CLI commands in Python
os.system('snscrape --jsonl --max-results {} --since {} twitter-search "{} until:{}"> text-query-tweets.json'.format(tweet_count, since_date, text_query, until_date))

1

In [4]:
# Read the json generated from the CLI command above and create a pandas dataframe
tweets_df = pd.read_json('text-query-tweets.json', lines=True)

## 2.3. Clean the data

This step involves taking the renderedContent column and stripping off all usernames so that only the tweet content is visible. 

### 2.3.1. Define a function that cleans @ words from text

In [13]:
def tweet_cleaner(tweet):
    split_list = tweet.split(" ")
    cleaned_list = [w for w in split_list if w.find("@") < 0]
    cleaned_tweet = " ".join([i for i in cleaned_list])
    
    return cleaned_tweet

#Example usage of this function:
#tweet_cleaner("""@CricRicky @BoriaMajumdar @imVkohli My hope is, now that he's no more the captain, I want someone to ask Rohit about the 'rift' he has with Virat. I want the stupid indian media to get an ass- whopping it got away from all these years. Kohli had been very formal with them while answering this. Rohit shouldn't.""")

### 2.3.2. Use this function and .apply() on  'renderedContent' column to create a new cleaned column

In [14]:
tweets_df['cleaned_tweets'] = tweets_df['renderedContent'].apply(lambda x: tweet_cleaner(x))

## 2.4. Create the sentiment scores in a new column

In [1]:
import nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer

sid = SentimentIntensityAnalyzer()

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\abhin\AppData\Roaming\nltk_data...


In [21]:
tweets_df['sentiment_scores'] = tweets_df['cleaned_tweets'].apply(lambda x: sid.polarity_scores(x)['compound'])
tweets_df[['cleaned_tweets', 'sentiment_scores']].head(5)

Unnamed: 0,cleaned_tweets,sentiment_scores
0,India's most successful captain is Virat Kohli...,0.8016
1,"His wokeism let him down, we all know how hypo...",-0.5859
2,"With the same batting avg. for last 2 years, &...",-0.8216
3,Happy retirement bhai🙏🏻hoping to see a video A...,0.7783
4,"Saale, Kalkatta ki rosgulla, Bonguly, GANDUNGL...",-0.6369
