# Sentiment Analysis

![](../figs/intro_nlp/sa/entelecheia_smile.png)

## What is sentiment analysis

- Sentiment analysis is the process of determining whether a piece of writing is positive, negative or neutral. 
- It's also known as opinion mining, deriving the opinion or attitude of a speaker.
- It applies natural language processing, text analysis, computational linguistics, and machine learning to identify and extract subjective information in source materials such as reviews, comments, and news articles.
- The goal of sentiment analysis is to know the attitude of a speaker or writer with respect to some topic or the overall contextual polarity or emotional reaction to a document, interaction, or event.

## Why sentiment analysis

- Sentiment analysis is a useful tool for businesses to understand the sentiment of their customers towards their brand, product or service. 
- It can also be used to understand the sentiment of their competitors. 
- Sentiment analysis can also be used to understand the sentiment of the general public towards certain issues.

## Sentiment analysis types

Depending on the scale of the sentiment analysis, there are three types of sentiment analysis:

- **Document-level sentiment analysis**: This is the most common type of sentiment analysis. It is used to determine the overall sentiment of a document.
- **Sentence-level sentiment analysis**: This type of sentiment analysis is used to determine the sentiment of a sentence within a document.
- **Aspect-level sentiment analysis**: This type of sentiment analysis is used to determine the sentiment of a specific aspect of a document.

These three types of sentiment analysis can be grouped into two categories:

- **Coarse-grained sentiment analysis**: This type of sentiment analysis is used to determine the overall sentiment of a document. It is also known as document-level sentiment analysis.
- **Fine-grained sentiment analysis**: This type of sentiment analysis is used to determine the sentiment of a specific aspect of a document. It is also known as aspect-level sentiment analysis.



Or, depending on the number of classes, there are two types of sentiment analysis:

- **Binary sentiment analysis**: This type of sentiment analysis is used to determine whether a document is positive or negative.
- **Multi-class sentiment analysis**: This type of sentiment analysis is used to determine whether a document is positive, negative or neutral.


## How sentiment analysis works

Sentiment analysis is a supervised machine learning task. It involves training a model on a dataset of texts that are already labelled with their sentiment. The model is then used to predict the sentiment of new texts.

## Sentiment analysis datasets

There are many datasets available for sentiment analysis. Here are a few:

* [IMDB movie reviews](http://ai.stanford.edu/~amaas/data/sentiment/)
* [Amazon product reviews](http://jmcauley.ucsd.edu/data/amazon/)
* [Twitter US Airline Sentiment](https://www.kaggle.com/crowdflower/twitter-airline-sentiment)
* [Sentiment140](http://help.sentiment140.com/for-students/)
* [Stanford Sentiment Treebank](https://nlp.stanford.edu/sentiment/index.html)
* [Yelp reviews](https://www.yelp.com/dataset/challenge)
* [Semeval 2017](http://alt.qcri.org/semeval2017/task4/)



## Sentiment analysis with NLTK

NLTK has a built-in sentiment analysis module called `SentimentIntensityAnalyzer`. It uses a lexicon and rule-based approach to assign a sentiment polarity score to a sentence. The sentiment polarity score is a float within the range [-1.0, 1.0]. The score is the sum of the valence scores of each word in the sentence, adjusted according to the rules, and then normalized to be between -1.0 and 1.0. Positive values are positive valence, negative value are negative valence.

```python
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()
sid.polarity_scores("This is a good movie")
```

    {'neg': 0.0, 'neu': 0.417, 'pos': 0.583, 'compound': 0.4404}

```python
sid.polarity_scores("This is a bad movie")
```

    {'neg': 0.583, 'neu': 0.417, 'pos': 0.0, 'compound': -0.4404}

```python
sid.polarity_scores("This is a bad movie but it has good acting")
```

    {'neg': 0.0, 'neu': 0.667, 'pos': 0.333, 'compound': 0.4404}