# Module 5 Tutorial - Social Media Analysis

In this tutorial, we'll be looking at one possible way of extracting data from a social media outlet, namely, Twitter. In doing so, we'll also conduct some basic sentiment analysis, and implement an appropriate form of visualisation for our results.

## Retrieving Data From A 'Social Media' API - Twitter Data

### [1] Question - Finding 'Trending' Topics For Publication

Let us consider the business concern of today's tutorial:

> **CONCERN:** A popular news publication in Brisbane has been trying to improve their readership. Ways of doing this could possibly include covering more relevant topics, or understanding how readers feel about said topics. They have reached out to you for assistance regarding this matter, in hopes that it could meaningfully guide their understanding over what articles to publish. Can you assist them?

As given by the business concern, the question focuses on improving readership of a news publication.

>**QUESTION:** What possible forms of information are relevant for this question?

>>**ANSWER:** ???

Partial to this question, a second question emerges:

>**QUESTION:** Where might we acquire this information?

>>**ANSWER:** ???

__NOTE:__ Be careful in your answer. Despite the quality of information that certain sources give, the degree of 'set up' required to retrieve such information can be beaten by alternative sources that are only slightly less helpful.

### [2] Data - Finding 'Trending' Topics For Publication


For this task, we are going to be delving into the 'Twittosphere' (Twitter) as our choice of social media for the business concern. As such, there are some necessary steps to set up our interface for the analytics process.

__NOTE:__ These instructions assume that you have a Twitter account, and that you are currently signed into your account in your web browser.

To gather data from Twitter, we will need to create a Twitter App. This is essentially a small intermediate application that allows you to connect with Twitter. You can read more about it here: https://developer.twitter.com/en/products/twitter-api

>**QUESTION:** Why does Twitter require that individuals use apps to connect to their service?

>>**ANSWER:** ???

__Step 1.__ Go to the website URL entitled https://developer.twitter.com/en/apps/, and click the _Create An App_ button in the top right corner:

<img src="graphics/5_s_1.png" style="margin-left: 50px; width: 40%;">

__Step 2.__ In the form that appears, you will need to fill in the fields entitled _App Name_, _Web URL_, _App Description_, and the question regarding how the app will be used. Give the app a unique and meaningful name, state https://qut.edu.au for the web URL, and feel free to provide a paraphrase of the following statement for both the description of the app, as well as how it will be used:

>_This is an app that we will be using for academic purposes, to understand what people are feeling with relation to certain topics on Twitter._


<img src="graphics/5_s_2.png" style="margin-left: 50px; width: 40%;">

__Step 3.__ Once you have completed the form, click _Submit_, and then select the _Create_ button in the _Developer Terms_ window that appears:

<img src="graphics/5_s_3.png" style="margin-left: 50px; width: 40%;">

__Step 4.__ You should then be directed to a page that contains information about the new Twitter _App_ that you have created. Click the _Keys and Tokens_ tab, and then scroll down:

<img src="graphics/5_s_4.png" style="margin-left: 50px; width: 40%;">

__Step 5.__ Note your _API Key_ and _API Secret_. You will need these for the upcoming analysis:

<img src="graphics/5_s_5.png" style="margin-left: 50px; width: 40%;">

__Step 6.__ Lastly, click the _Generate_ button overheading the _API Token_ section. In the window that appears, note both the _Access Token_ and _Access Secret_ that have been created:

<img src="graphics/5_s_6.png" style="margin-left: 50px; width: 40%;">



### [3] Analysis - Twitter Trend Discovery, Tweet Retrieval And Sentiment Analysis

Now we can begin our analysis. We're going to start by installing the __tweepy__ package, which is used to access your Twitter App in the Python interface. We'll install the package by running the following line of code:

In [None]:
!pip install tweepy

Then we will import the required packages:

#### You will need to use the following python commands:

```python
    import
```

#### And you will need to import the following packages:

```python
    tweepy
    numpy
    pandas
```

In [None]:
# import required libraries
import ???           # To access and consume Twitter's API
import ??? as pd     # To handle data
import ??? as np      # For number computing

Next, we'll validate that we can successfully connect to Twitter. Noting the steps of the __Data__ phase of our analytics process, fill in the required fields to set up the API:

In [5]:
# Twitter App access keys

APP_KEY    = ???
APP_SECRET = ???

ACCESS_TOKEN  = ???
ACCESS_SECRET = ???

# API's setup:
def connectToTwitterAPI():
    """
    Utility function to setup the Twitter's API
    with access keys.
    """
    # Authentication and access using keys:
    auth = tweepy.OAuthHandler(APP_KEY, ???)
    auth.set_access_token(ACCESS_TOKEN, ???)

    # Return API with authentication:
    api = tweepy.API(auth)
    return api

Recalling the business concern, we're interested in finding out what trends are most relevant in Brisbane. To do this, we'll need to review the __tweepy__ documentation (see this URL: http://docs.tweepy.org/en/v3.5.0/api.html#).

>**QUESTION:** Are there any functions that can help us find out what is trending in Brisbane?

>>**ANSWER:** ???

Next, we are going to use the functions we've found to find out what is trending in Brisbane. Let's go ahead and firstly get the details of Brisbane (according to Twitter):

#### You will need to use the following python commands:

```python
    connectToTwitterAPI()
    api.trends_closest() # HINT: The latitude and longitude of Brisbane are -27.476102 and 153.028024
```

In [None]:
# Create the API:
api = ???()

api.???(???, ???)

>**QUESTION:** From looking at the data that was returned, what can be said about how Twitter interprets Brisbane?

>>**ANSWER:** ???

Next, let's extract the trends from Brisbane:

#### You will need to use the following python commands:

```python
    api.trends_place()
```

In [54]:
extracted_trends = api.???(???)

Let's take a basic look at the data, through a sample (what can be said about it?):

In [None]:
???[0][???][0]

Next, let's create a dataframe that groups the popular trending tweets:

In [None]:
trend_volume = []
trend_hashtag = []

# Populate the data structure from the extracted trends
for trend in ???:
    if trend[???] != None:
        trend_hashtag.append(trend[???])
        trend_volume.append(trend[???])

# Make a new dataframe
trend_hashtag_volume = pd.DataFrame({'Hashtag':???, 'Volume':???}) 
trend_hashtag_volume

While this data is excellent for helping us understand what is trending through Twitter, it's only part of the concern. Recall that the client wants us to also find out how individuals feel about said topics.

To explore this topic, we can firstly grab some tweets from one of the popular hashtags:

In [None]:
tweets = api.search(q=???, count=50)

# Print the total number of extracted tweets
print("Number of tweets extracted: {}.\n".format(len(???)))

Next, we can display the 5 most recent tweets from this hashtag, and observe the nature of extracted tweets:

In [None]:
# Print the most recent 5 tweets:
print("5 recent tweets:\n")
for tweet in tweets[:???]:
    print(tweet.text)

Perhaps we can organise these tweets in a more appropriate fashion. Let's construct a dataframe for our data, and then display the first 10 tweets from said dataframe:

In [None]:
# Create a pandas dataframe as follows:
data = pd.DataFrame(data=[tweet.text for tweet in tweets], columns=['Tweets'])

# Add relavant data from each tweet:
data['len']  = np.array([len(tweet.text) for tweet in tweets]) #textual content legnth
data['ID']   = np.array([tweet.id for tweet in tweets])
data['Date'] = np.array([tweet.created_at for tweet in tweets])
data['Source'] = np.array([tweet.source for tweet in tweets])
data['Likes']  = np.array([tweet.favorite_count for tweet in tweets]) #likes counts
data['RTs']    = np.array([tweet.retweet_count for tweet in tweets]) #retweets count

# Display the first 10 elements of the dataframe:
display(data.head(10))

How might we go about finding the most liked, or retweeted tweets?

In [None]:
 # Extract the tweet with more FAVs and more RTs:

maxLikes = np.???(data['Likes'])
maxRetweets  = np.???(data['RTs'])

fav = data[data.Likes == ???].index[0]
rt  = data[data.RTs == ???].index[0]

# Max FAVs:
print("The tweet with more likes is: \n{}".format(data['Tweets'][fav]))
print("Number of likes: {}".format(maxLikes))

# Max RTs:
print("The tweet with more retweets is: \n{}".format(data['Tweets'][rt]))
print("Number of retweets: {}".format(maxRetweets))

So far, the techniques we've explored are great for retrieving tweets, and understanding their statistics. However, our client wants to go a bit further to actually understand how individuals are feeling. For such a task, we are fortunate that a Python package can help us achieve this goal.

The API known as __textblob__ is capable of understanding whether someone is conveying themself in a positive or negative fashion through their tweet. It has an already trained analyzer to classify the polarity of a given text. We define two functions in the following code. One to pre-process and clean the tweet content and the other to compute the sentiment associated with each tweet.

In [None]:
!pip install textblob

In [65]:
from ??? import TextBlob
import re

def cleanTweet(tweet):
    '''
    Utility function to clean the text in a tweet by removing 
    links and special characters using regex.
    '''
    return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", tweet).split())

def analyseSentiment(???):
    '''
    Utility function to classify the polarity of a tweet
    using textblob.
    '''
    analysis = TextBlob(???(???))
    if analysis.sentiment.polarity > 0:
        return 1
    elif analysis.sentiment.polarity == 0:
        return 0
    else:
        return -1

Then we can compute the sentiment for each tweet and add it to the dataframe we created previously.

In [None]:
# Compute sentiment for each tweet and add the result into a new column:
data['SA'] = np.array([ analyseSentiment(tweet) for tweet in data['Tweets'] ])

# Display the first 10 elements of the dataframe:
display(data.head(10))

>**QUESTION:** From looking at the sentiment analysis results conducted on the following tweets, what can be said about how the authors of these tweets were feeling?

>>**ANSWER:** ???

### [4] Visualisation - Pie Charts

As perhaps evident in the previous step, reading sentiment off of a dataframe is quite difficult. That is why we will now implement a graphing technique - specifically a pie chart - to visualise the sentiment of our tweeters.

Let's begin by calculating the percentages of positive, negative, and neutral tweets for our previously examined hashtag:

In [None]:
# Construct lists with classified tweets:

positiveTweets = [ tweet for index, tweet in enumerate(data['Tweets']) if data['SA'][index] ? ???]
neutralTweets = [ tweet for index, tweet in enumerate(data['Tweets']) if data['SA'][index] ?? ???]
negativeTweets = [ tweet for index, tweet in enumerate(data['Tweets']) if data['SA'][index] ? ???]

# Calculate percentages

positivePercent = len(???)*100/len(data['Tweets'])
neutralPercent = len(???)*100/len(data['Tweets'])
negativePercent = len(???)*100/len(data['Tweets'])

# Print percentages:

print("Percentage of positive tweets: {}%".format(???))
print("Percentage of neutral tweets: {}%".format(???))
print("Percentage de negative tweets: {}%".format(???))

Then we can plot the details on a pie graph. It is quite convenient that this particular visualisation technique lends itself to visualisation by percentage:

In [None]:
# For plotting and visualization:
import matplotlib.pyplot as plt

labels = [???, ???, ???]
sizes = [???, ???, ???]

# Set different colors
colors = ['green', 'grey', 'red']

plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%', startangle=140)
plt.axis('equal')
plt.show()

We can go even further, and also compare the popularity of our previously examined trends:

In [None]:
plt.bar(trend_hashtag_volume.loc[:,'Hashtag'],trend_hashtag_volume.loc[:,'Volume'])
plt.xticks(trend_hashtag_volume.loc[:,'Hashtag'],rotation=90)
plt.show()

### [5] Insight - Answering The Concern

While we've only examined one tweet in this analytics process thus far, it is now upto you to evaluate the various trending tweets, to better understand how people are feeling about various topics on Twitter. This should inform a descriptive recommendation for the business concern, and should address both of the forementioned questions regarding popularity of trends, and sentiment of readers.

As some final questions, it would be worth it to consider:

>**QUESTION:** Is Twitter the most accurate source for understanding sentiment of individuals? If not, why? 

>>**ANSWER:** ???

>**QUESTION:** Was the applied sentiment analysis effective in categorising feelings of readers? 

>>**ANSWER:** ???


>**QUESTION:** Is sentiment the best indicator of a topic's popularity? 

>>**ANSWER:** ???