# Requirements
* [VADER](https://github.com/cjhutto/vaderSentiment) - `pip install vaderSentiment`

In a previous notebook, we saved our dataset to a `.csv` field called `RE0_Data.csv`. We can use this `.csv` file to generate a new `pandas.DataFrame`.

In [16]:
import os # Allows us to retrieve file paths
import pandas # Needed to open and read the dataset

def get_path_to_file(file: str) -> str:
    ''' Returns the path to a file in a working directory.
        Note that function will still return a path even if the file
        doesn't exist in that directory.'''
    return os.path.realpath(file)

re0_data = pandas.read_csv(get_path_to_file('RE0_Data.csv'), encoding = 'utf-8')
re0_data

Unnamed: 0,Character,Gender,Line
0,Narrator,,A small mid-western town in America: Raccoon C...
1,Man,Male,Really?
2,Woman,Female,"Hmm, do you think so too?"
3,Bald Man,Male,Yeah.
4,White Man,Male,.do about it?
...,...,...,...
228,Billy,Male,"Rebecca, hurry!"
229,Rebecca,Female,Hey that must be the old mansion Enrico was ta...
230,Rebecca,Female,"I guess it's time to say goodbye. Officially, ..."
231,Billy,Male,"Yeah, I'm just a zombie now."


VADER generates four sentiment scores: `neg`, `neu`, `pos`, and `compound`. Out of these scores, the `compound` score represents the overall sentiment; it's a weighted score based on the `neg`, `neu`, and `pos` scores. So if we want to understand character emotions, `compound` score would be the most reliable metric.

In [19]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()
re0_data['Sentiment'] = re0_data['Line'].apply(lambda x: analyzer.polarity_scores(x)['compound'])
re0_data

Unnamed: 0,Character,Gender,Line,Sentiment
0,Narrator,,A small mid-western town in America: Raccoon C...,-0.3034
1,Man,Male,Really?,0.0000
2,Woman,Female,"Hmm, do you think so too?",0.0000
3,Bald Man,Male,Yeah.,0.2960
4,White Man,Male,.do about it?,0.0000
...,...,...,...,...
228,Billy,Male,"Rebecca, hurry!",0.0000
229,Rebecca,Female,Hey that must be the old mansion Enrico was ta...,0.0000
230,Rebecca,Female,"I guess it's time to say goodbye. Officially, ...",-0.6486
231,Billy,Male,"Yeah, I'm just a zombie now.",0.2960


## Question 1: What are the average sentiments for each character?

In [33]:
def label_sentiment_score(score: float) -> str:
    ''' Converts a VADER compound score into a literal label
        based on the typical threshold values. '''
    if (score > 0.05): 
        return 'Positive'
    elif (score < -0.05): 
        return 'Negative'
    return 'Neutral'

average_sentiment_scores = round(re0_data.groupby('Character')['Sentiment'].mean(), 2)
average_sentiment_scores = average_sentiment_scores.apply(label_sentiment_score)
average_sentiment_scores

Character
Bald Man            Positive
Billy                Neutral
Birkin              Positive
Black Man            Neutral
Commander           Negative
Computer            Negative
Edward              Negative
Enrico              Negative
Man                  Neutral
Man With Glasses    Negative
Marcus              Negative
Narrator            Negative
Queen Leech         Positive
Rebecca              Neutral
Soldier             Positive
Wesker              Negative
White Man            Neutral
Woman                Neutral
Name: Sentiment, dtype: object

## Question 2: How are women emotionally characterized compared to men?

In [37]:
average_sentiment_scores_by_gender = round(re0_data.groupby('Gender')['Sentiment'].mean(), 2)
average_sentiment_scores_by_gender = average_sentiment_scores_by_gender.apply(label_sentiment_score)
average_sentiment_scores_by_gender

Gender
Female    Neutral
Male      Neutral
Name: Sentiment, dtype: object