# Black Panther 18: An Analysis of Twitter's Perception

# 1. Introduction

In this Data analytics Project, I downloaded Black Panther 18 dataset from kaggle and then performed a sentiment analysis on the data using the VaderSentiment library in Python.

# Contents
1. Introduction
2. Data source
3. Data Preprocessing
4. Sentiment Analysis
5. Conclusion

# 2. Data Source

 I downloaded black panther 18 dataset from kaggle

In [2]:
#importing liberies
import pandas as pd
import numpy as np
import regex as re
import cleantext
import string
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords
from cleantext import clean
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import warnings 
from collections import Counter


Since the GPL-licensed package `unidecode` is not installed, using Python's `unicodedata` package which yields worse results.


In [3]:
#importing data frame
DataFrame=pd.read_csv("Black Panther.csv", encoding='latin1')

In [4]:
#selecting tweets in english only
df1=DataFrame[DataFrame['Language']=='en']

In [5]:
#droping duplicates and null values
df1.duplicated(subset='User_name').sum()

19892

In [6]:
df=df1.drop_duplicates()

In [7]:
#Inspect Dataframe
df.shape

(49298, 5)

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 49298 entries, 0 to 57116
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Tweets     49298 non-null  object
 1   User_name  49298 non-null  object
 2   Language   49298 non-null  object
 3   Location   34997 non-null  object
 4   Time       49298 non-null  object
dtypes: object(5)
memory usage: 2.3+ MB


# 3. Data Preprocessing

In [9]:
#filling null values in the location column
df['Location']=df['Location'].fillna('No location')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Location']=df['Location'].fillna('No location')


In [10]:
# Define function to extract hashtags and remove # with REGEX
def getHashtags(tweet):
    tweet = tweet.lower()  
    tweet = re.findall(r'\#\w+',tweet) 
    return " ".join(tweet)

df['Hashtags'] = df['Tweets'].apply(getHashtags)
df.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Hashtags'] = df['Tweets'].apply(getHashtags)


Unnamed: 0,Tweets,User_name,Language,Location,Time,Hashtags
0,RT @CoachWilmore: #120: William OÕNeal and the...,SusieNattibree,en,No location,Sun Mar 04 10:28:35 +0000 2018,#120
1,RT @soprettyinlou: I hope my girl Shuri can br...,zinedine_7x,en,P,Sun Mar 04 10:28:35 +0000 2018,
2,"RT @PollsNig: Ok guys get in here, who do you ...",edxxtrock,en,"Toulouse, France",Sun Mar 04 10:28:35 +0000 2018,
3,the thing is... black panther was so so good b...,zekejaegers,en,No location,Sun Mar 04 10:28:36 +0000 2018,
4,RT @HillaryClinton: Saw Black Panther with Bil...,quirion77,en,"Loire-Atlantique, Pays de la Loire",Sun Mar 04 10:28:36 +0000 2018,


In [11]:
hashtags_list = df['Hashtags'].tolist()

# Iterate over all hashtags and split where there is more than one hashtag
hashtags = []
for item in hashtags_list:
    item = item.split()
    for i in item:
        hashtags.append(i)

# Determine Unique count of all hashtags used
counts = Counter(hashtags)
hashtags_df = pd.DataFrame.from_dict(counts, orient='index').reset_index()
hashtags_df.columns = ['Hashtags', 'Count']
hashtags_df.sort_values(by='Count', ascending=False, inplace=True)

In [12]:
hashtags_df.head(10)

Unnamed: 0,Hashtags,Count
19,#blackpanther,1462
4,#wakanda,492
34,#wakandaforever,290
41,#triggeraliberalin4words,204
10,#fortnite,174
11,#iheartawards,147
21,#sundaytoday,139
13,#exol,119
50,#triggeraconservativein2words,118
12,#bestfanarmy,116


In [13]:
casts=["black panther","eric killmonger","okoye","nakia","shuri","m'baku","ramonda","zuri"]

In [14]:
# Define function to extract the casts from each Tweet
def get_cast(tweet):
    tweet = tweet.lower() 
    tweet_tokens = nltk.word_tokenize(tweet)
    cast = [char for char in tweet_tokens if char in casts] 
    return " ".join(cast)

In [15]:
# Extract casts to a new column
df['cast'] = df['Tweets'].apply(get_cast)
df.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['cast'] = df['Tweets'].apply(get_cast)


Unnamed: 0,Tweets,User_name,Language,Location,Time,Hashtags,cast
0,RT @CoachWilmore: #120: William OÕNeal and the...,SusieNattibree,en,No location,Sun Mar 04 10:28:35 +0000 2018,#120,
1,RT @soprettyinlou: I hope my girl Shuri can br...,zinedine_7x,en,P,Sun Mar 04 10:28:35 +0000 2018,,shuri
2,"RT @PollsNig: Ok guys get in here, who do you ...",edxxtrock,en,"Toulouse, France",Sun Mar 04 10:28:35 +0000 2018,,okoye
3,the thing is... black panther was so so good b...,zekejaegers,en,No location,Sun Mar 04 10:28:36 +0000 2018,,
4,RT @HillaryClinton: Saw Black Panther with Bil...,quirion77,en,"Loire-Atlantique, Pays de la Loire",Sun Mar 04 10:28:36 +0000 2018,,


In [16]:
# Define function to replace characters names with correct spellings
def castNames(casts):
    replacements = [('zury','zuri'), ('zurie', 'zuri'), ('shury', 'shuri'), ('shurie', 'shuri'),('nakiya','nakia')]
    for pat,repl in replacements:
        casts = re.sub(pat, repl, casts)
    return casts
df['cast'] = df['cast'].apply(castNames)
df.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['cast'] = df['cast'].apply(castNames)


Unnamed: 0,Tweets,User_name,Language,Location,Time,Hashtags,cast
0,RT @CoachWilmore: #120: William OÕNeal and the...,SusieNattibree,en,No location,Sun Mar 04 10:28:35 +0000 2018,#120,
1,RT @soprettyinlou: I hope my girl Shuri can br...,zinedine_7x,en,P,Sun Mar 04 10:28:35 +0000 2018,,shuri
2,"RT @PollsNig: Ok guys get in here, who do you ...",edxxtrock,en,"Toulouse, France",Sun Mar 04 10:28:35 +0000 2018,,okoye
3,the thing is... black panther was so so good b...,zekejaegers,en,No location,Sun Mar 04 10:28:36 +0000 2018,,
4,RT @HillaryClinton: Saw Black Panther with Bil...,quirion77,en,"Loire-Atlantique, Pays de la Loire",Sun Mar 04 10:28:36 +0000 2018,,


In [17]:
cast_list = df['cast'].tolist()

# Iterate over all cast names and split where there is more than one cast
casts = []
for item in cast_list:
    item = item.split()
    for i in item:
        casts.append(i)

# Determine Unique count of all cast
counts = Counter(casts)
cast_df = pd.DataFrame.from_dict(counts, orient='index').reset_index()
cast_df.columns = ['casts', 'count']
cast_df.sort_values(by='count', ascending=False, inplace=True)

In [18]:
#check for top five casts
cast_df.head(5)

Unnamed: 0,casts,count
1,okoye,9570
0,shuri,1035
2,nakia,40
3,m'baku,16
4,ramonda,15


In [19]:
#Cleaning tweeter data for sentiment analysis
def text_process(tweet):
    #Converting tweets to lowercase 
    tweet=tweet.lower()
    #Removing emojis
    clean(tweet, no_emoji=True)
    #Removing URL's
    tweet=re.sub(r"http\S+|www\S+|https\S+",'',tweet,flags=re.MULTILINE)
    #Removing repeating characters
    tweet=re.sub(r'\@\w+|\#\w+|\d+', '', tweet)
    #Removing stopwords
    tokens=nltk.word_tokenize(tweet)
    filted_words=[w for w in tokens if w not in stopwords.words('english')]
    #Removing punctuations
    nopunc=[w for w in filted_words if w not in string.punctuation]
    lemmatizer=WordNetLemmatizer()
    lemma_words=[lemmatizer.lemmatize(w) for w in nopunc]
    return " ".join(lemma_words)
#Applying text_process function to the data frame
df['text']=df['Tweets'].apply(text_process)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['text']=df['Tweets'].apply(text_process)


# 4. Sentiment Analysis

In [20]:
analyzer=SentimentIntensityAnalyzer()
df['scores']=df['text'].apply(lambda text: analyzer.polarity_scores(text) )

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['scores']=df['text'].apply(lambda text: analyzer.polarity_scores(text) )


In [21]:
df.head(2)

Unnamed: 0,Tweets,User_name,Language,Location,Time,Hashtags,cast,text,scores
0,RT @CoachWilmore: #120: William OÕNeal and the...,SusieNattibree,en,No location,Sun Mar 04 10:28:35 +0000 2018,#120,,rt william oõneal murder fred hampton william ...,"{'neg': 0.281, 'neu': 0.719, 'pos': 0.0, 'comp..."
1,RT @soprettyinlou: I hope my girl Shuri can br...,zinedine_7x,en,P,Sun Mar 04 10:28:35 +0000 2018,,shuri,rt hope girl shuri bring back erik round black...,"{'neg': 0.0, 'neu': 0.756, 'pos': 0.244, 'comp..."


In [22]:
#identify the polarity
def sentimentpredict(sentiment):
    if sentiment['compound']>=0.05:
        return "Positive"
    elif sentiment['compound']<=-0.05: 
        return "Negative"
    else:
        return "Neutral"
df['label']=df['scores'].apply(lambda x: sentimentpredict(x))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['label']=df['scores'].apply(lambda x: sentimentpredict(x))


In [23]:
df.head(5)

Unnamed: 0,Tweets,User_name,Language,Location,Time,Hashtags,cast,text,scores,label
0,RT @CoachWilmore: #120: William OÕNeal and the...,SusieNattibree,en,No location,Sun Mar 04 10:28:35 +0000 2018,#120,,rt william oõneal murder fred hampton william ...,"{'neg': 0.281, 'neu': 0.719, 'pos': 0.0, 'comp...",Negative
1,RT @soprettyinlou: I hope my girl Shuri can br...,zinedine_7x,en,P,Sun Mar 04 10:28:35 +0000 2018,,shuri,rt hope girl shuri bring back erik round black...,"{'neg': 0.0, 'neu': 0.756, 'pos': 0.244, 'comp...",Positive
2,"RT @PollsNig: Ok guys get in here, who do you ...",edxxtrock,en,"Toulouse, France",Sun Mar 04 10:28:35 +0000 2018,,okoye,rt ok guy get think would win battle okoye bla...,"{'neg': 0.126, 'neu': 0.583, 'pos': 0.291, 'co...",Positive
3,the thing is... black panther was so so good b...,zekejaegers,en,No location,Sun Mar 04 10:28:36 +0000 2018,,,thing ... black panther good session late woke...,"{'neg': 0.0, 'neu': 0.775, 'pos': 0.225, 'comp...",Positive
4,RT @HillaryClinton: Saw Black Panther with Bil...,quirion77,en,"Loire-Atlantique, Pays de la Loire",Sun Mar 04 10:28:36 +0000 2018,,,rt saw black panther bill afternoon amp loved ...,"{'neg': 0.0, 'neu': 0.522, 'pos': 0.478, 'comp...",Positive


In [24]:
df['Country']=df['Location'].apply(lambda x: x.split(',')[-1])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Country']=df['Location'].apply(lambda x: x.split(',')[-1])


In [28]:
df.dropna(subset=['cast'],axis=0,inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.dropna(subset=['cast'],axis=0,inplace=True)


In [29]:
df.describe()

Unnamed: 0,Tweets,User_name,Language,Location,Time,Hashtags,cast,text,scores,label,Country
count,49298,49298,49298,49298,49298,49298.0,49298.0,49298,49298,49298,49298
unique,15369,44118,1,17476,16693,1020.0,20.0,13511,3837,3,14009
top,RT @Fatnando: Okoye was about to kill her man ...,MarvelCmcs_Newz,en,No location,Sun Mar 04 18:43:10 +0000 2018,,,rt okoye kill man name wakanda think sheõll th...,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",Positive,No location
freq,7430,37,49298,14301,47,45043.0,38672.0,7431,14603,20843,14301


In [30]:
df.to_csv('final_file_updated.csv',index=False)

# Conclusion

I exported the data to power bi for furthur cleaning and visualisation 