# Twitter Sentiment Analysis
###### Calculating Subjectivity and Polarity Score of a Twitter Account

#### Sentiment analysis is one of the most common tasks in Data Science and AI. 
  
#### We use Python, Tweepy and TextBlob to perform sentiment analysis of a selected Twitter account using Twitter API and Natural Language Processing.

## Learnings

- How to perform basic sentiment analysis using **TextBlob** (powerful Natural Language Processing library for Python) 
- Use the **WordCloud** library to visualise the findings
- Working with the **Twitter API** (Familiarizing with APIs is a useful skill, it's a very common method of getting hold of the data from the internet)

## Problem definition

Task is to analyse the Tweets of an individual Twitter account in terms of **Subjectivity** and **Polarity**. 
We will identify individual tweets as **positive, negative and neutral** and calculate the percentage of positive tweets. 
We will use the WordCloud library to display a word cloud of the most positive words from the tweets.

## Get Twitter Application Keys

- Create a **Twitter account**
- Go to **developer.twitter.com**
- Apply for developer access by going to - developer.twitter.com/en/apply-for-access and click **Apply for developer account**. Answer a few questions to get started.
- Confirm your email address by clicking on the link that Twitter will send. After confirming your email address, access to Twitter Developer Account should be granted almost immediately.
- Once your access is granted go to the url - https://developer.twitter.com/en/apps and **Create an app**.
- Provide a unique name to your app
- Go to settings tab and get the 4 values -
- - **API key**
- - **API secret key** 
- - **Access token**
- - **Access token secret**
- We need all four values for Twitter Sentiment Analysis

## Import libraries

In [2]:
# Numpy - performing mathematical functions on multi-dimensional arrays
import numpy as np

# Pandas - Data manipulation and analysis library (working with tabular data)
import pandas as pd

# Matplotlib - Python library for plotting graphs and visualisations
import matplotlib.pyplot as plt


In [14]:
# Regular expression Python module
import re

# Tweepy - Python library for accessing the Twitter API
import tweepy

# TextBlob - NLP library for preprocessing of textual data
from textblob import TextBlob

# WordCloud - Python library for creating image clouds of frequently used words
from wordcloud import WordCloud

### Load the API keys and access tokens

In [4]:
config = pd.read_csv('./keys.csv')
print(type(config))

In [6]:
config.head()

Unnamed: 0,twitterApiKey,twitterApiSecret,twitterApiAccessToken,twitterApiAccessTokenSecret
0,taO8URia9jrMTD4sJvgctRuxR,TA1BJ1SylrtPp8MSlNrvSctLj0zbiCmTJ49XMWFMhvRk4I...,1358085162093793280-WIXTcJJ8biYTfMvX8ZGHY8l43k...,4OHpjVJzRDCXxx4REJrxmn8E532dsRSjCq2JgXcvy5Z6n


In [11]:
print(config['twitterApiKey'])
print(type(config['twitterApiKey']))
print(config['twitterApiKey'][0])
# print(config['twitterApiKey'][1])  # Error - ValueError: 1 is not in range

0    taO8URia9jrMTD4sJvgctRuxR
Name: twitterApiKey, dtype: object
<class 'pandas.core.series.Series'>
taO8URia9jrMTD4sJvgctRuxR


Set all Twitter API config variables required for authentication with Tweepy.

In [12]:
twitterApiKey = config['twitterApiKey'][0]
twitterApiSecret = config['twitterApiSecret'][0]
twitterApiAccessToken = config['twitterApiAccessToken'][0]
twitterApiAccessTokenSecret = config['twitterApiAccessTokenSecret'][0]

We are making an authentication call with Tweepy so we can call a function to retrieve the latest tweets from the specified account.

In [16]:
auth = tweepy.OAuthHandler(twitterApiKey, twitterApiSecret)

In [17]:
auth.set_access_token(twitterApiAccessToken, twitterApiAccessTokenSecret)

In [18]:
twiterApi = tweepy.API(auth, wait_on_rate_limit=True)

In [19]:
twitterAccount = 'elonmusk'

In [23]:
tweets = tweepy.Cursor(twiterApi.user_timeline, 
                        screen_name=twitterAccount, 
                        count=None,
                        since_id=None,
                        max_id=None,
                        trim_user=True,
                        exclude_replies=True,
                        contributor_details=False,
                        include_entities=False
                        ).items(50);

In [24]:
df = pd.DataFrame(data=[tweet.text for tweet in tweets], columns=['Tweet'])

In [25]:
df.head()

Unnamed: 0,Tweet
0,"Frodo was the underdoge,\nAll thought he would..."
1,Just agree to do Clubhouse with @kanyewest
2,"Bought some Dogecoin for lil X, so he can be a..."
3,This is true power haha https://t.co/Fc9uhQSd7O
4,RT @SpaceX: NASA has selected Falcon Heavy to ...


In [26]:
type(df['Tweet'])

pandas.core.series.Series

In [27]:
df['Tweet'][0]

'Frodo was the underdoge,\nAll thought he would fail,\nHimself most of all. https://t.co/zGxJFDzzrM'

In [29]:
tweets_li = []
for i in range(10):
    tweets_li.append(df['Tweet'][i])

In [30]:
tweets_li

['Frodo was the underdoge,\nAll thought he would fail,\nHimself most of all. https://t.co/zGxJFDzzrM',
 'Just agree to do Clubhouse with @kanyewest',
 'Bought some Dogecoin for lil X, so he can be a toddler hodler',
 'This is true power haha https://t.co/Fc9uhQSd7O',
 'RT @SpaceX: NASA has selected Falcon Heavy to launch the first two elements of the lunar Gateway together on one mission! https://t.co/3pWt…',
 'XPrize team will manage the $100M carbon capture prize https://t.co/fSw5IanL0r',
 'Back to work I go …',
 'Ð is for Ðogecoin! Instructional video.\nhttps://t.co/UEEocOfcTb',
 'The people have spoken … https://t.co/x41oVMzTGo',
 '🎶 Who let the Doge out 🎶']

Before doing sentiment analysis, we clean up each tweets from unnecessary data first.

We are going to create a cleanTweets function that will:
- remove mentions
- remove hashtags
- remove retweets
- remove urls

In [31]:
re.sub?