# Emotional AI: Feature Engineering

Anaysis by Frank Flavell

## How do people communicate about their emotions through text?
By identifying the way people communicate about their emotions, we can then determine specific features to engineer that could capture each use case.

* ***Specific Emotion Words:*** Ideally the person uses specific words that get right at the emotion they currently feel as well as the causes and consequences of that emotion.  

    * Example: "I feel alienated because my friends didn't invite me to a concert and I'm not sure if they like me."
<br/><br/>
* ***Causes:*** The person can articulate the cause of the emotion they feel without having identified the emotion that has been caused.  This is like knowing the definition of a word without knowing the actual word.  The emotion must be inferred using the context of the message.

    * Example: "My friends didn't invite me to a concert."
<br/><br/>
* ***Consequences:*** Similar to causes, the person can articulate the consequences of the emotion they are feeling or how it is affecting them and/or others.  Causes and consequences are not mutually exclusive and it is sometimes easier to see the consequences of an emotion without understanding the cause.

    * Example: "I'm not sure if my friends like me."
<br/><br/>
* ***Incohate:*** The person knows they are feeling something and it is having an impact on them, but they haven't been able to even articulate the consequences of the emotion.  They use vague emotion words that signal toward a positive or negative feeling, like "good" or "bad".

    * Example: "I'm feeling bad."
<br/><br/>
* ***Buried, but Willing:*** Almost worst case scenario, which is probably to be expected, is that the person isn't even thinking about their emotions.  They are willing to discuss it, but they aren't aware of how they feel or the causes and consequences of those emotions.  They most likely unwittingly mask their true feelings by using vague emotion words like "fine," "okay," "alright," etc.

    * Example: "I'm fine."
<br/><br/>
* ***Buried, but Unwilling:*** Worst case scenario the person is unaware of their feelings and unwilling to talk about it.

    * Example: "Can't talk right now."
<br/><br/>

## What new features can be engineered to match communication?
In order of development priority.

* ***Emotion Score:*** Calculate the emotional intensity of keywords to calculate to prioritize one emotion over the others.
<br/><br/>
* ***Key Punctuation:*** (? ! .) could help us identify emotion and possibly the intensity of the emotion.
<br/><br/>
* ***Parts of Speech:*** Extract the parts of speech tags to gain a more accurate contextual understanding of the person's message to help with emotion words and synonyms.
<br/><br/>
* ***Capitalization Ratio:*** number of capital letters divided by the number of words in the utterance.
<br/><br/>

Brute force methods that lead to data leakage.

* ***Emotion Words:*** Connect the person's words back to a specific emotion using a emotion lexicon with intensity scores.
<br/><br/>
* ***Synonyms:*** Use synonyms of the person's keywords to connect the message to a specific emotion.
<br/><br/>


## Table of Contents<span id="0"></span>

1. [**Emotion Score Calculator**](#1)
<br/><br/>
2. [**Punctuation Counter**](#2)
<br/><br/>
3. [**Parts of Speech**](#3)
<br/><br/>
4. [**Capitalization Ratio**](#4)
<br/><br/>


# Package Import

In [2]:
# import external libraries
import pandas as pd
import numpy as np
import matplotlib as cm
import matplotlib.pyplot as plt
import seaborn as sns

from collections import Counter
import re #regex
import random
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import LatentDirichletAllocation

import nltk
from nltk.probability import FreqDist
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.metrics.pairwise import linear_kernel
from sklearn.metrics import confusion_matrix
from sklearn.naive_bayes import MultinomialNB
from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score

nltk.download('stopwords')
nltk.download('punkt')
nltk.download('wordnet')

# Configure matplotlib for jupyter.
%matplotlib inline

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/matthewflavell/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     /Users/matthewflavell/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/matthewflavell/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


# Data Import

In [4]:
#Import cleaned data from pickle
df = pd.read_pickle('data/dialogue_cleaned.pickle')

In [5]:
df.head(2)

Unnamed: 0,dialogue,topic,emotion,type
0,The kitchen stinks.,1,2,3
1,I’ll throw out the garbage.,1,0,4


# <span id="1"></span>1. Emotion Score Calculator
#### [Return Contents](#0)

# <span id="2"></span>2. Punctuation Counter
#### [Return Contents](#0)

# <span id="3"></span>3. Parts of Speech
#### [Return Contents](#0)

# <span id="4"></span>4. Capitalization Ratio
#### [Return Contents](#0)

# <span id="5"></span>5. Synonym Generator
#### [Return Contents](#0)

# <span id="6"></span>6. Emotion Score Calculator
#### [Return Contents](#0)