<div class="alert alert-block alert-info"><b>IAB303</b> - Data Analytics for Business Insight</div>

## Social Media Analysis


Covid-19 has impacted businesses in many different ways. Nowadays, with the arrival of Covid-19 vaccines, there is an increased interest in policies such as no jab no job. Certain companies such as Qantas has already decided to impose this policy. Considering the power of social media, we want to analyse the position of people regarding the no jab no job policy before taking this decision.

How do you feel about the no jab no job policy? Have you shared your thoughts on social media? How can we understand sentiments behind those thoughts?

1. Connect to Twitter API and extract tweets
2. Basic sentiment analysis
3. Basic visualisation

## Connect to Twitter API and extract tweets

1. How can we connect to Twitter API?
2. How can we extract tweets of a particular hashtag?
3. What is the structure of a tweet?
4. How can we extract more information about a tweet?

### Generating Twitter API keys

1. Go to the [Twitter Developers Page](https://developer.twitter.com/en) and apply for a developers account
2. Once in the developers portal. Create a new project and an app. After completing this process, you will see the consumer keys (API-Key and API-Secret-Key). You MUST copy and save this keys in a secure place as this information WILL NOT BE ACCESIBLE AGAIN
3. After this process, you will be able to create the access and secret token for your account. As with the consumer keys. You MUST copy and save this keys in a secure place as this information WILL NOT BE ACCESIBLE AGAIN

tweepy is a python library which we can use to connect to Twitter and extract tweets. Store Twitter application's credentials in variables. Write a function to connect to Twitter API.

In [None]:
!pip install tweepy
!pip install textblob

In [None]:
# import required libraries
import tweepy           # To access and consume Twitter's API
import pandas as pd     # To handle data
import numpy as np      # For number computing
import json

# Twitter App access keys
my_secrets = " " # The location of your Twitter API keys
with open(my_secrets,'r') as file:
    credentials = json.load(file)

# Consume:
CONSUMER_KEY    = credentials["api_key"]
CONSUMER_SECRET = credentials["api_secret_key"]

# Access:
ACCESS_TOKEN  = credentials["access_token"]
ACCESS_SECRET = credentials["access_token_secret"]

# API's setup:
def connectToTwitterAPI():
    """
    Utility function to setup the Twitter's API
    with access keys.
    """
    # Authentication and access using keys
    auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
    auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)

    # Return API with authentication
    api = tweepy.API(auth)
    return api

Call the function and connect to Twitter API. Extract Tweets using a search term. In this example we use a hashtag to search for specific tweets.

In [None]:
# Create an extractor object
extractor = connectToTwitterAPI()

# Specify search criteria and extract tweets into a list
tweets = extractor.search(q="# ", lang = "en", count=50) # Find the most appropriate hashtag for our purpose

# Print the total number of extracted tweets
print("Number of tweets extracted: {}.\n".format(len(tweets)))

Print the most recent 5 tweets and observe the nature of extracted tweets.

In [None]:
# Print the most recent 5 tweets:
print("5 recent tweets:\n")
for tweet in tweets[:5]:
    print(tweet.text)
    print()

Data frames are useful to store data in tableau format. Create a data frame and store relavant data from each tweet. Display the first 10 tweets stored in the data frame in table format.

In [None]:
# Create a pandas dataframe with columns: Tweets, len, ID, Date, Source, Likes, RTs
data = 

# Add relavant data from each tweet - Hint: Check how we printed the tweet text


# Display the first 10 elements of the dataframe
data.head(10)

Extract the most liked tweet and the most retweeted tweet.

In [None]:
 # Extract the tweet with the highest number of likes and retweets

maxLikes = 
maxRetweets  = 

fav = data[data.Likes == maxLikes].index[0]
rt  = data[data.RTs == maxRetweets].index[0]

# Max FAVs:
print("The tweet with more likes is: \n{}".format(data['Tweets'][fav]))
print("Number of likes: {}".format(maxLikes))

# Max RTs:
print("The tweet with more retweets is: \n{}".format(data['Tweets'][rt]))
print("Number of retweets: {}".format(maxRetweets))

## Basic Sentiment Analysis

1. Use Textblob python library to classify polarity of a tweet
2. Compute the sentiment for each tweet


We can determine whether a tweet is positive, negative or neutral using sentiment analysis. Textblob library provides a simple API to support sentiment analysis. It has an already trained analyzer to classify the polarity of a given text. We define two functions in the following code. One to pre-process and clean the tweet content and the other to compute the sentiment associated with each tweet.

In [None]:
from textblob import TextBlob
import re

def cleanTweet(tweet):
    '''
    Utility function to clean the text in a tweet by removing 
    links and special characters using regex.
    '''
    return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", tweet).split())

def analyseSentiment(tweet):
    '''
    Utility function to classify the polarity of a tweet
    using textblob.
    '''
    analysis = TextBlob(cleanTweet(tweet))
    if analysis.sentiment.polarity > 0:
        return 1
    elif analysis.sentiment.polarity == 0:
        return 0
    else:
        return -1

Compute sentiment for each tweet and add it to the data frame we created previously.

In [None]:
# Compute sentiment for each tweet and add the result into a new column


# Display the first 10 elements of the dataframe
display(data.head(10))

## Basic Visualization

1. Calculate percentages of positive, neutral and negative tweets
2. Plot percentages in a pie chart

In [None]:
# Construct lists with classified tweets

positiveTweets = 
neutralTweets = 
negativeTweets = 

# Calculate percentages

positivePercent = 
neutralPercent = 
negativePercent = 

# Print percentages

print("Percentage of positive tweets: {}%".format(positivePercent))
print("Percentage of neutral tweets: {}%".format(neutralPercent))
print("Percentage de negative tweets: {}%".format(negativePercent))

Plot sentiment percentages in a pie chart

In [None]:
# For plotting and visualization
from IPython.display import display
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

labels = ['Positive', 'Neutral', 'Negative']
sizes = [positivePercent, neutralPercent, negativePercent]

# Set different colors
colors = ['green', 'grey', 'red']

plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%', startangle=140)
plt.axis('equal')
plt.show()

## Challenge Scenario

Here's a challenge for those who are interested in further analysis.

The previous insights are quite useful to have an idea of the general conversations in Twitter. However, a tweet from an influential person can shift the sentiment of a group of people. In this sense, how influential are the people that have a negative sentiment towards the no jab no job policy?