# Sentiment Analysis

In this lesson, we're going to learn how to use [VADER](https://github.com/cjhutto/vaderSentiment), an English-language sentiment analysis tool designed for use with social media texts. This tool helps to calculate whether texts express positive or negative sentiment.

## Install and Import Libraries/Packages

Then we will import the `SentimentIntensityAnalyser` and initialize it.

In [2]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

# Initialize VADER so we can use it later
sentimentAnalyser = SentimentIntensityAnalyzer()

We are also going to import pandas for working with data and set the column width for pandas DataFrames to 400.

In [3]:
import pandas as pd
pd.options.display.max_colwidth = 400

## Calculate Sentiment Scores

To calculate sentiment scores for a sentence or paragraph, we can use `sentimentAnalyser.polarity_scores()` and input a string of text.

This method returns a Python dictionary of sentiment scores: how negative the sentence is between 0-1, how neutral the sentence is between 0-1, how positive the sentence is between 0-1, as well as a compound score between -1-1.

Most scholars uses the compound score to represent sentiment in their analyses, and we will use the compound score in this lesson, as well. Let's test it out with some sample sentences! 

In [None]:
sentimentAnalyser.polarity_scores("The ref made the call. The goalie was fired up.")

In [None]:
sentimentAnalyser.polarity_scores("Prejudice is awful.")

## Make a Function

Make a function that will accept any text and then return just the compound sentiment score for that text

In [18]:
def calculate_sentiment(text):
    # Run VADER on the text
    # Your code here
    
    # Extract the compound score
    # Your code here
    
    # Return compound score
    # Your code here

Test it out!

In [26]:
calculate_sentiment('Prejudice is awful.')

-0.743

## Calculate Sentiment Scores for Donald Trump's Tweets

Let's try using VADER on many texts: Donald Trump's corpus of tweets from 2009 to 2021 (when he was suspended from Twitter).

We will load the Trump tweets CSV file with pandas

In [32]:
trump_df = pd.read_csv('Trump-Tweets.csv', parse_dates=['date'])

To calculate the sentiment for each tweet in the CSV file and add a new column that contains this information, we will apply a function that will take in any text and output the compound sentiment score.

Apply the function to every row in the "text" column and output the results into a new column "sentiment_score"


In [12]:
trump_df['sentiment_score'] = trump_df['text']#Your code here

Let's sort the DataFrame and examine the top 10 tweets with the highest compound sentiment.

In [None]:
# Sort and find top 10 highest scoring tweets

How well do you think VADER is working on Trump's tweets? What patterns do you notice in these most positive tweets?

Let's sort the DataFrame and examine the 10 tweets with the lowest compound sentiment.

In [None]:
# Sort and find top 10 lowest scoring tweets

### Plot Sentiment Over Time

We can plot how Trump's sentiment in his tweets fluctuates over time by making the date column the index of the DataFrame, which makes it easier to work with time series data.

In [34]:
trump_df = trump_df.set_index('date')

Then we will group the tweets by month using `.resample()`, a special method for datetime indices, and calculate the average (`.mean()`) compound score for each month. Finally, we will plot these averages.

In [None]:
trump_df.resample('M')['sentiment_score'].mean().plot(title="Trump's Tweet Sentiment by Month")

By using `.loc`, we can also zoom in on particular time periods, such as the year 2020.

In [None]:
trump_df.loc["2020"].resample('W')['sentiment_score'].mean().plot(title="Trump's Tweet Sentiment by Week in 2020");

## Zoom In and Explore

In [None]:
trump_df.loc["2020"]

- How well do you think VADER sentiment analysis works with tweets?
- What do you think VADER is really measuring?
- Could you imagine using sentiment analysis in a project? If so, how?