In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session


**EDA and sentiment analysis on EndSARS tweets.**<br>

**Context**<br>

End SARS is a decentralised social movement and series of mass protests against police brutality in Nigeria. The slogan calls for the disbanding of the Special Anti-Robbery Squad (SARS), a notorious unit of the Nigerian Police with a long record of abuses. The protests which takes its name from the slogan started in 2017 as a Twitter campaign using the hashtag #EndSARS to demand the disbanding of the unit by the Nigerian government. After experiencing a revitalisation in October 2020 following more revelations of the abuses of the unit, mass demonstrations occurred throughout the major cities of Nigeria, accompanied by vociferous outrage on social media platforms. About 28 million tweets bearing the hashtag have been accumulated on Twitter alone.<br>

Source: [Wikipedia](https://en.wikipedia.org/wiki/End_SARS)<br>
Inspiration: The EndSARS Movement<br>
 
To begin with, I have on-boarded my Kaggle dataset, following tweets scraping exploration using this [snscrape notebook](https://www.kaggle.com/ulrich07/snscrape-exploration), with guidance from my mentor @ulrich G.<br>

The result is a dataset I uploaded on Kaggle named “Nigeria EndSARS tweets” which contains  9737 random tweets with the hashtag #EndSARS. The tweets fall  between dates 03rd December 2017 and 22nd April 2021. <br>

I would also like to acknowledge the following notebooks which were quite useful in this work. [COVID 19 Sentiment Analysis](https://www.kaggle.com/kartikmohan1999/covid19-sentiment-analysis) , [Covid19 Tweets EDA and Sentiment Analysis](https://www.kaggle.com/purvasingh/covid19-tweets-eda-and-sentiment-analysis) and my good friend Jawad Ahmad who assisted me greatly as well. 

This notebook is organized as follows:<br>

**1. Preprocessing on Nigeria EndSARS tweets**<br> 

    * Filtering needed columns.
    * Assigning content columns to text variables.
    * Remove URL from tweets
    * Convert tweets to lowercase
    * Removing punctuations, stopwords, brackets, emojis, links, words containing numbers etc.

**2. EDA and Sentiment Analysis**<br>

    * Top 5 words and frequency
    * 50 most common words and frequency
    * Getting polarity scores of tweets
    * Labelling scores based on compound polarity value
    * Joining the Labels column to the tweets.
    * Plotting the Sentiment scores
    * Group tweets by date and Labels
    * Plotting the daily tweets sentiment analysis
    * Generating word clouds
    * Sentiment analysis and word cloud for the month of October which was the month of the EndSARS protest.
    * Sentiment Analysis and word cloud for 20th October 2020 which was the day EndSARS protesters were shot at the Lekki Toll gate in Lagos, Nigeria.
    * Plotting the general tweets sentiment distribution
    * Top 10 twitter accounts with positive, negative and neutral tweets
    * Top 10 tweets by each sentiment based on polarity scores in descending order to manually check the validity of the sentiment analysis.
    
    
     
















In [None]:
# Importing the imporant libraries that will be used thoughout the notebook
import pandas as pd 
import numpy as np 
from IPython.display import display

import matplotlib.pyplot as plt 
import re
import string

import nltk
from nltk.tokenize import sent_tokenize
from nltk.corpus import words
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from nltk.stem import PorterStemmer
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk.sentiment.util import *
nltk.download('stopwords')
nltk.download('vader_lexicon')


from collections import Counter

from matplotlib import pyplot as plt
from matplotlib import ticker
import seaborn as sns
import plotly.express as px

sns.set(style="darkgrid")

In [None]:
# reading data from csv file
df = pd.read_csv("../input/nigeria-endsars-tweets/NigeriaEndSars data.csv")
df.head(5)

In [None]:
len(df)

In [None]:
# printing the data shape (how many rows and column)
df.shape

### Let's filter the coloumns needed

In [None]:
# filtering needed columns
needed_columns=['username','date','content']
df=df[needed_columns]
df.head()

### Let's convert the username type to category so we can assign a unique numerical code to each username

In [None]:
# convert username type from object to category for assigning the numbers after it
df.username=df.username.astype('category')
df.username=df.username.cat.codes # assign a unique numerical code to each category
df.date=pd.to_datetime(df.date).dt.date # it will give only the date

In [None]:
# printing first 5 rows
df.head(5)

### Let's assign the content column to 'texts' variable

In [None]:
# assigning content column to 'texts' variable
texts=df.content
texts

### Removing URLs from tweets

In [None]:
remove_url=lambda x:re.sub(r'http\S+','',str(x))
texts_lr=texts.apply(remove_url)
texts_lr

### Converting all tweets to lowercase

In [None]:
to_lower=lambda x: x.lower()
texts_lr_lc=texts_lr.apply(to_lower)
texts_lr_lc

### Removing punctuations

In [None]:
remove_puncs= lambda x:x.translate(str.maketrans('','',string.punctuation))
texts_lr_lc_np=texts_lr_lc.apply(remove_puncs)
texts_lr_lc_np

### Removing stopwords

In [None]:
more_words=["i", "me", "my", "myself", "we", "our", "ours", "ourselves", "you", "your", "yours", "yourself", "yourselves", "he", "him", "his", "himself", "she", "her", "hers", "herself", "it", "its", "itself", "they", "them", "their", "theirs", "themselves", "what", "which", "who", "whom", "this", "that", "these", "those", "am", "is", "are", "was", "were", "be", "been", "being", "have", "has", "had", "having", "do", "does", "did", "doing", "a", "an", "the", "and", "but", "if", "or", "because", "as", "until", "while", "of", "at", "by", "for", "with", "about", "against", "between", "into", "through", "during", "before", "after", "above", "below", "to", "from", "up", "down", "in", "out", "on", "off", "over", "under", "again", "further", "then", "once", "here", "there", "when", "where", "why", "how", "all", "any", "both", "each", "few", "more", "most", "other", "some", "such", "no", "nor", "not", "only", "own", "same", "so", "than", "too", "very", "s", "t", "can", "will", "just", "don", "should", "now"]
stop_words=set(stopwords.words('english')) #nltk package
stop_words.update(more_words)

remove_words=lambda x: ' '.join([word for word in x.split() if word not in stop_words]) #.join is from package string
texts_lr_lc_np_ns=r=texts_lr_lc_np.apply(remove_words)
texts_lr_lc_np_ns

#### let's create a big list of words out of all the tweets

In [None]:
words_list=[word for line in texts_lr_lc_np_ns for word in line.split()]
words_list[:5]

### Let's visualise the 50 most common words

In [None]:
# creating dataframe and bar graph of most common 50 words with their frequency
word_counts=Counter(words_list).most_common(50)
word_df=pd.DataFrame(word_counts)
word_df.columns=['word','frq']
display(word_df.head(5))
# px=import plotly.express
#px.bar(word_df,x='word',y='frq',title='Most common words')

fig = plt.figure(figsize = (15, 7))
 
# creating the bar plot
plt.bar(word_df['word'],word_df['frq'])
plt.xticks(rotation=90)
plt.xlabel('word')
plt.ylabel('frq')
plt.title('Most common words')
plt.show()

#### Putting the Cleaned text in main dataframe

In [None]:
display(df.head(5))
df.text=texts_lr_lc_np_ns
display(df.head(5))

### Let's do some additional data cleaning


In [None]:
def clean_text(text):
    '''remove text in square brackets,remove links,remove punctuation
    and remove words containing numbers.'''
    text = re.sub('\[.*?\]', '', text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    return text
df['content'] = df['content'].apply(lambda x: clean_text(x))
display(df)

### Let's remove emoticons, symbols or flags by their codes

In [None]:
# function to remove emoticons, symbols or flags by their codes
def remove_emoji(text):
    emoji_pattern = re.compile("["
                           u"\U0001F600-\U0001F64F"  # emoticons
                           u"\U0001F300-\U0001F5FF"  # symbols & pictographs
                           u"\U0001F680-\U0001F6FF"  # transport & map symbols
                           u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                           u"\U00002702-\U000027B0"
                           u"\U000024C2-\U0001F251"
                           "]+", flags=re.UNICODE)
    return emoji_pattern.sub(r'', text)

In [None]:
# applying remove_emoji function on tweets
df['content']=df['content'].apply(lambda x: remove_emoji(x))
display(df)

## Sentiment Analysis


In [None]:
# getting polarity scores of tweets and storing them in variable 'sentiment_scores'
sid=SentimentIntensityAnalyzer()
ps=lambda x:sid.polarity_scores(x)
sentiment_scores=df.text.apply(ps)
sentiment_scores

In [None]:
# create the data frame of negative, neutral, positive and compound polarity scroes
sentiment_df=pd.DataFrame(data=list(sentiment_scores))
display(sentiment_df)

### Labeling the scores based on the compound polarity value

In [None]:
# it will label the tweets as neutral if its compound polarity is 0 and positive if its greater than 0 and negative if its less than 0
labelize=lambda x:'neutral' if x==0 else('positive' if x>0 else 'negative')
sentiment_df['label']=sentiment_df.compound.apply(labelize)
display(sentiment_df.head(10))

### let's join two dataframes

In [None]:
display(df.head(5))
data=df.join(sentiment_df.label)
sentiment_df = df.join(sentiment_df)
display(data.head(5))

### Plotting the sentiment score counts

In [None]:
counts_df=data.label.value_counts().reset_index()
display(counts_df)

In [None]:
plt.figure(figsize=(8,5)) 
sns.barplot(x='index',y='label',data=counts_df)

### group number of counts by
#### date
#### positive,neutral,negative

In [None]:
data_agg=data[['username','date','label']]
display(data_agg.head(5))

In [None]:
data_agg=data_agg.groupby(['date','label'])
display(data_agg.head(5))

In [None]:
data_agg=data_agg.count()
display(data_agg.head(5))

In [None]:
data_agg=data_agg.reset_index()
display(data_agg.head(5))

### actually the 'username' is the count of users, so need to change the column name

In [None]:
data_agg.columns=['date','label','counts']
display(data_agg.head())

In [None]:
neg = data_agg[data_agg['label']=='negative']
pos = data_agg[data_agg['label']=='positive']
neu = data_agg[data_agg['label']=='neutral']

In [None]:
# px.line(data_agg,x='date',y='counts',color='label',title='Daily Tweet Sentimental Analysis')
fig = plt.figure(figsize = (15, 7))
plt.plot(pos['date'],pos['counts'], label='postivie')
plt.plot(neg['date'],neg['counts'], label='negative')
plt.plot(neu['date'],neu['counts'], label='neutral')
 
# Add labels and title
plt.title("Daily Tweet Sentimental Analysis")
plt.xlabel("date")
plt.ylabel("count")
plt.legend()
plt.show()

In [None]:
df['content']=df['content'].apply(lambda x: remove_emoji(x))
display(df)

### Generate wordcloud for this period

In [None]:
from wordcloud import WordCloud

In [None]:
cut_text = " ".join(df.text)
max_words=100
word_cloud = WordCloud(
                    background_color='white',
                    stopwords=set(stop_words),
                    max_words=max_words,
                    max_font_size=30,
                    scale=5,
    colormap='magma',
                    random_state=1).generate(cut_text)
fig = plt.figure(1, figsize=(50,50))
plt.axis('off')
plt.title('Word Cloud for Top '+str(max_words)+' words with # EndSARS on Twitter\n', fontsize=100,color='blue')
fig.subplots_adjust(top=2.3)
plt.imshow(word_cloud)
plt.show()

## Let's zoom into October 2020 and 20th October 2020 and re-run the same analysis for this specific month and day. This is when the protests and shootings happened.

#### let's create a big list of words out of all the tweets

In [None]:
df['date'] = pd.to_datetime(df['date'])
df['cleaned'] = texts_lr_lc_np_ns

In [None]:
df_oct = df[(df['date']>='2020-10-01') & (df['date']<='2020-10-31')].reset_index(drop=True)
df_oct_20 = df[df['date']>='2020-10-20'].reset_index(drop=True)

texts_lr_lc_np_ns_oct = df_oct['cleaned']
texts_lr_lc_np_ns_oct_20 = df_oct_20['cleaned']

In [None]:
words_list_oct=[word for line in texts_lr_lc_np_ns_oct for word in line.split()]
print('Oct month',words_list[:5])

words_list_oct_20=[word for line in texts_lr_lc_np_ns_oct_20 for word in line.split()]
print('20th Oct',words_list[:5])

In [None]:
# creating dataframe and bar graph of most common 50 words with their frequency
word_counts=Counter(words_list_oct).most_common(50)
word_df=pd.DataFrame(word_counts)
word_df.columns=['word','frq']
display(word_df.head(5))
# px=import plotly.express
#display(px.bar(word_df,x='word',y='frq',title='Most common words for Oct month'))
fig = plt.figure(figsize = (15, 7))
 
# creating the bar plot
plt.bar(word_df['word'],word_df['frq'])
plt.xticks(rotation=90)
plt.xlabel('word')
plt.ylabel('frq')
plt.title('Most common words for Oct month')
plt.show()

word_counts=Counter(words_list_oct_20).most_common(50)
word_df=pd.DataFrame(word_counts)
word_df.columns=['word','frq']
display(word_df.head(5))
# px=import plotly.express
# display(px.bar(word_df,x='word',y='frq',title='Most common words for 20 Oct'))
fig = plt.figure(figsize = (15, 7))
 
# creating the bar plot
plt.bar(word_df['word'],word_df['frq'])
plt.xticks(rotation=90)
plt.xlabel('word')
plt.ylabel('frq')
plt.title('Most common words for 20 Oct')
plt.show()

#### put the Cleaned text in main dataframe

In [None]:
df_oct['content'] = df_oct['cleaned']
df_oct_20['content'] = df_oct_20['cleaned']

In [None]:
df_oct.drop('cleaned',axis=1,inplace=True)
df_oct_20.drop('cleaned',axis=1,inplace=True)

### Some addtional cleaning

In [None]:
def clean_text(text):
    '''Make text lowercase, remove text in square brackets,remove links,remove punctuation
    and remove words containing numbers.'''
    text = re.sub('\[.*?\]', '', text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    return text
df_oct['content'] = df_oct['content'].apply(lambda x: clean_text(x))
df_oct_20['content'] = df_oct_20['content'].apply(lambda x: clean_text(x))
display(df_oct)
display(df_oct_20)

In [None]:
# function to remove emoticons, symbols or flags by their codes
def remove_emoji(text):
    emoji_pattern = re.compile("["
                           u"\U0001F600-\U0001F64F"  # emoticons
                           u"\U0001F300-\U0001F5FF"  # symbols & pictographs
                           u"\U0001F680-\U0001F6FF"  # transport & map symbols
                           u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                           u"\U00002702-\U000027B0"
                           u"\U000024C2-\U0001F251"
                           "]+", flags=re.UNICODE)
    return emoji_pattern.sub(r'', text)

In [None]:
# applying remove_emoji function on tweets
df_oct['content']=df_oct['content'].apply(lambda x: remove_emoji(x))
df_oct_20['content']=df_oct_20['content'].apply(lambda x: remove_emoji(x))
display(df_oct)
display(df_oct_20)

## Sentiment Analysis for October 2020 and 20th October 2020

In [None]:
# getting polarity scores of tweets and storing them in variable 'sentiment_scores'
sid=SentimentIntensityAnalyzer()
ps=lambda x:sid.polarity_scores(x)
sentiment_scores_oct=df_oct.content.apply(ps)
sentiment_scores_oct_20=df_oct_20.content.apply(ps)
display(sentiment_scores_oct)
display(sentiment_scores_oct_20)

In [None]:
# create the data frame of negative, neutral, positive and compound polarity scroes
sentiment_df_oct=pd.DataFrame(data=list(sentiment_scores_oct))
sentiment_df_oct_20=pd.DataFrame(data=list(sentiment_scores_oct_20))
display(sentiment_df_oct)
display(sentiment_df_oct_20)

### Labeling the scores based on the compound polarity value

In [None]:
# it will label the tweets as neutral if its compound polarity is 0 and positive if its greater than 0 and negative if its less than 0
labelize=lambda x:'neutral' if x==0 else('positive' if x>0 else 'negative')
sentiment_df_oct['label']=sentiment_df_oct.compound.apply(labelize)
sentiment_df_oct_20['label']=sentiment_df_oct_20.compound.apply(labelize)
display(sentiment_df_oct.head(10))
display(sentiment_df_oct_20.head(10))

### let's join two dataframes

In [None]:
display(df_oct.head(5))
data_oct=df.join(sentiment_df_oct.label)
display(data_oct.head(5))
display(df_oct_20.head(5))
data_oct_20=df.join(sentiment_df_oct_20.label)
display(data_oct_20.head(5))

### Plotting the sentiment score counts

In [None]:
counts_df_oct=data_oct.label.value_counts().reset_index()
counts_df_oct_20=data_oct_20.label.value_counts().reset_index()
display(counts_df_oct)
display(counts_df_oct_20)

In [None]:
plt.figure(figsize=(8,5)) 
sns.barplot(x='index',y='label',data=counts_df_oct)

plt.figure(figsize=(8,5)) 
sns.barplot(x='index',y='label',data=counts_df_oct_20)

### group number of counts by
#### date
#### positive,neutral,negative

In [None]:
data_agg_oct=data_oct[['username','date','label']]
data_agg_oct_20=data_oct_20[['username','date','label']]
display(data_agg_oct.head(5))
display(data_agg_oct_20.head(5))

In [None]:
data_agg_oct=data_agg_oct.groupby(['date','label'])
data_agg_oct_20=data_agg_oct_20.groupby(['date','label'])
display(data_agg_oct.head(5))
display(data_agg_oct_20.head(5))

In [None]:
data_agg_oct=data_agg_oct.count()
data_agg_oct_20=data_agg_oct_20.count()
display(data_agg_oct.head(5))
display(data_agg_oct_20.head(5))

In [None]:
data_agg_oct=data_agg_oct.reset_index()
data_agg_oct_20=data_agg_oct_20.reset_index()
display(data_agg_oct.head(5))
display(data_agg_oct_20.head(5))

### actually the 'username' is the count of users, so need to change the column name

In [None]:
data_agg_oct.columns=['date','label','counts']
data_agg_oct_20.columns=['date','label','counts']
display(data_agg_oct.head())
display(data_agg_oct_20.head())

In [None]:
neg = data_agg_oct[data_agg_oct['label']=='negative']
pos = data_agg_oct[data_agg_oct['label']=='positive']
neu = data_agg_oct[data_agg_oct['label']=='neutral']

# display(px.line(data_agg_oct,x='date',y='counts',color='label',title='Tweet Sentimental Analysis Oct'))
fig = plt.figure(figsize = (15, 7))
plt.plot(pos['date'],pos['counts'], label='postivie')
plt.plot(neg['date'],neg['counts'], label='negative')
plt.plot(neu['date'],neu['counts'], label='neutral')
 
# Add labels and title
plt.title("Tweet Sentimental Analysis Oct")
plt.xlabel("date")
plt.ylabel("counts")
plt.legend()
plt.show()

neg = data_agg_oct_20[data_agg_oct_20['label']=='negative']
pos = data_agg_oct_20[data_agg_oct_20['label']=='positive']
neu = data_agg_oct_20[data_agg_oct_20['label']=='neutral']

# display(px.line(data_agg_oct_20,x='date',y='counts',color='label',title='Daily Tweet Sentimental Analysis 20th Oct'))
fig = plt.figure(figsize = (15, 7))
plt.plot(pos['date'],pos['counts'], label='postivie')
plt.plot(neg['date'],neg['counts'], label='negative')
plt.plot(neu['date'],neu['counts'], label='neutral')
 
# Add labels and title
plt.title("Daily Tweet Sentimental Analysis 20th Oct")
plt.xlabel("date")
plt.ylabel("counts")
plt.legend()
plt.show()

In [None]:
df['content']=df['content'].apply(lambda x: remove_emoji(x))
display(df)

### Generating wordcloud for the specific period

In [None]:
from wordcloud import WordCloud

In [None]:
cut_text = " ".join(df.text)
max_words=100
word_cloud = WordCloud(
                    background_color='white',
                    stopwords=set(stop_words),
                    max_words=max_words,
                    max_font_size=30,
                    scale=5,
    colormap='magma',
                    random_state=1).generate(cut_text)
fig = plt.figure(1, figsize=(50,50))
plt.axis('off')
plt.title('Word Cloud for Top '+str(max_words)+' words with # EndSARS on Twitter\n', fontsize=100,color='blue')
fig.subplots_adjust(top=2.3)
plt.imshow(word_cloud)
plt.show()

In [None]:
date_df = df[['date']]
date_df['count'] = 1

In [None]:
df[(df['date']>='2020-10-01') & (df['date']<='2020-10-31')]

In [None]:
df1 = df.groupby(df['date'].dt.to_period('M'))['content'].count()
df1 = df1.resample('M').asfreq().fillna(0)
df1.plot(kind='bar',figsize=(20,10))

In [None]:
daily_tweets = df.groupby(['date'])['content'].count()

fig = plt.figure(figsize = (15,5))
plt.plot(daily_tweets.index,daily_tweets.values)
plt.title('Daily Tweets\' Trend', fontsize=16)
plt.xlabel('Dates')
plt.ylabel('# of Tweets')
plt.show()

In [None]:
sentiment_dist = data.label.value_counts()

plt.pie(sentiment_dist, labels=sentiment_dist.index, explode= (0.1,0,0),
        colors=['yellowgreen', 'gold', 'lightcoral'],
        autopct='%1.1f%%', shadow=True, startangle=140)
plt.title("Tweets\' Sentiment Distribution \n", fontsize=16, color='Black')
plt.axis('equal')
plt.tight_layout()
plt.show()

In [None]:
sentiment_df['username'] = sentiment_df['username'].astype(str)
# Function to filter top 10 tweets by sentiment
def top10AccountsBySentiment(sentiment):
    df = sentiment_df.query("label==@sentiment")
    top10 = df.groupby(by=["username"])['label'].count().sort_values(ascending=False)[:10]
    return(top10)

### Let's look at the Top 10 accounts by each sentiments


In [None]:
# Top 10 tweets by each sentiment
top10_pos = top10AccountsBySentiment("positive")
top10_neg = top10AccountsBySentiment("negative")
top10_neu = top10AccountsBySentiment("neutral")

fig, (ax1, ax2, ax3) = plt.subplots(1, 3, squeeze=True, figsize=(16,8))
fig.suptitle('Top 10 Twitter Accounts \n', fontsize=20)

ax1.barh(top10_pos.index, top10_pos.values, color='yellowgreen')
ax1.set_title("\n\n Positive Tweets", fontsize=16)

ax2.barh(top10_neg.index, top10_neg.values, color='lightcoral')
ax2.set_title("\n\n Negative Tweets", fontsize=16)

ax3.barh(top10_neu.index, top10_neu.values, color='gold')
ax3.set_title("\n\n Neutral Tweets", fontsize=16);

fig.tight_layout()
fig.show()

### Let's print and go through the tweets - Top 10 tweets by each sentiment based on the polarity scores 

In [None]:
pd.set_option('display.max_colwidth', None)
print('Top 10 positive tweets')
display(sentiment_df[sentiment_df['label']=='positive'].sort_values('compound',ascending=False)[0:10])
print('Top 10 negative tweets')
display(sentiment_df[sentiment_df['label']=='negative'].sort_values('compound')[0:10])
print('Top 10 neutral tweets')
display(sentiment_df[sentiment_df['label']=='neutral'].sort_values('compound')[0:10])

**Conclusion**<br> 

The EndSARS movement peaked in the month of October but it is clear from the analysis done that the tweets frequency have declined shortly after however, the frequency of tweets is higher now compared to the time before the protest and killing happened.<br> 

The analysis of most common words for October and 20th October 2020 includes the word protesters and killing. The sentiment analysis for the 20th of october had the highest negative labels.<br> 

In 2018 and 2019 there are times with sudden increase of the #EndSARS hashtags possibly triggered by viral cases of police brutality around these time frames. The movement was gaining momentum during these times as well that eventually led to the October nationwide protests.<br> 

In conclusion, as this was a beginers attempt, It will be interesting to explore further and see what more can be done with this dataset and especially  with the performance of the sentiment analysis and accuracy which seems to still be a challenge here even though looking at the top ten tweets for each sentiments seem quite accurate, the work can be improved especially with respect to the neutral sentiments labelling <br> 

The performance of a text classification model is heavily dependent upon the type of words used and type of features created for classification.  I may be useful to work towards re-training pre-existing sentiment algorithm and customise to my specific case which may still be used to improve the results of my analysis in this notebook.
