# Sentiment analysis

In [1]:
import pandas as pd

In [2]:
# Loading sample data
df = pd.read_csv('twitter_dataset.csv')

## Detecting the language of a text

Source: langdetect (https://pypi.python.org/pypi/langdetect)

Step 1. In Terminal or Anaconda Prompt, install:

```pip install langdetect```


Step 2. In your Notebook, include the following code:

In [3]:
from langdetect import detect

In [4]:
df['langdetect'] = df['text'].apply(detect)

In [5]:
df['langdetect'].value_counts()

en    2418
it      30
es      30
de      24
ca      17
fr       9
tl       4
nl       2
id       1
Name: langdetect, dtype: int64

### Backup option in case langdetect does not work (as above)

In [8]:
def apply_langdetect(text):
    text = str(text)
    try:
        lang = detect(text)
    except:
        lang = 'error'
        
    return lang

In [9]:
df['langdetect2'] = df['text'].apply(apply_langdetect)

In [10]:
df['langdetect2'].value_counts()

en    2418
it      34
es      30
de      21
ca      13
fr      11
tl       3
nl       2
id       2
af       1
Name: langdetect2, dtype: int64

## Sentiment Analysis in English

There are several packages that perform sentiment analysis in English. Vader (https://github.com/cjhutto/vaderSentiment) is one of these packages, included in the NLTK library.

Step 1. In Terminal or Anaconda Prompt, install:

```conda install nltk```

Step 2. Run the following code in your notebook

In [7]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()

In [9]:
testtext = 'I love this'
sid.polarity_scores(testtext)

{'compound': 0.6369, 'neg': 0.0, 'neu': 0.192, 'pos': 0.808}

In [10]:
def vader_text(row):
    text = str(row['text']) # IMPORTANT: you need to change 'text' to the name of the column you need to analyse
    try:
        results = sid.polarity_scores(text)
        row['vader_compound'] = results['compound']
        row['vader_negative'] = results['neg']
        row['vader_neutral'] = results['neu']
        row['vader_positive'] = results['pos']
    except:
        pass
    
    return row
    
    

In [11]:
df = df.apply(vader_text, axis=1)

In [12]:
df[['vader_compound', 'vader_negative', 'vader_neutral', 'vader_positive']].describe()

Unnamed: 0,vader_compound,vader_negative,vader_neutral,vader_positive
count,2535.0,2535.0,2535.0,2535.0
mean,0.141392,0.009686,0.92257,0.067746
std,0.257358,0.038054,0.10742,0.100505
min,-0.836,0.0,0.448,0.0
25%,0.0,0.0,0.853,0.0
50%,0.0,0.0,1.0,0.0
75%,0.3612,0.0,1.0,0.146
max,0.9136,0.453,1.0,0.532


## Sentiment Analysis in EN, FR, IT, NL

Pattern (https://www.clips.uantwerpen.be/pages/pattern) is an NLP package that provides sentiment analysis in English, Spanish, German, French, Italian and Dutch. This module is still in beta for Python3, so the installation is a bit more complex. Here are the steps for OSX. If you have a Windows computer and this does not work, please come to the office hours.

Step 1. In Terminal or Anaconda Prompt, install:

```
git clone -b development https://github.com/clips/pattern
cd pattern
sudo python3.6 setup.py install
```

Step 2. Run the following code in your notebook

In [2]:
from pattern3.nl import sentiment #change .nl to .en, .fr or .it depending on the language needed

In [3]:
testtext = 'Ik vind dit erg'
sentiment(testtext)

(-0.35, 1.0)

In [15]:
def pattern_text(row):
    text = str(row['text']) # IMPORTANT: you need to change 'text' to the name of the column you need to analyse
    try:
        row['pattern_polarity'], row['pattern_subjectivity'] = sentiment(text)
    except:
        pass
    
    return row

In [16]:
df = df.apply(pattern_text, axis=1)

In [17]:
df[['pattern_polarity', 'pattern_subjectivity']].describe()

Unnamed: 0,pattern_polarity,pattern_subjectivity
count,2535.0,2535.0
mean,0.020429,0.107793
std,0.192833,0.295896
min,-0.75,0.0
25%,0.0,0.0
50%,0.0,0.0
75%,0.0,0.0
max,1.0,1.0
