# Inferencing a chat dialogue as one feature

Notebook includes inferences on streaming observational feature space.

The areas to target at retrieving features during streaming (or unique to each session interaction) includes:
1. Dialogue 
    - Candidate's speech behaviour throughout the conversation 
2. Candidate's input form details 

The first section includes general analysis on text dialogue 

## 

## Dialogue

To mock live events, chat dialogue from `li2017dailydialog/daily_dialog` (https://huggingface.co/datasets/li2017dailydialog/daily_dialog) on huggingface was used for inferening and model selection

**Dialogue Intention vs. Emotional Intensity** 

Instead of having to transform the list of sequences every instance, this section targets at finding the dialogue features that is able to describe the dialogue's context. This could be another way of having to use vector embeddings of the text corpus. 

Target is that you want to find whether neutral emotions should be expected vs. emotion intensity 

**Backgroun notes**
Features include the user's intent or actions labelled as `act` and emotion as `emotion` in the dataset. 
The possible categories of user's intent include `inform`, `question`, `directive` and `commissive`. To clarify the difference between `directive` and `commisive` speech:

| **Aspect**          | **Directive Speech Acts**                   | **Commissive Speech Acts**                |
|---------------------|---------------------------------------------|------------------------------------------|
| **Purpose**         | To get the listener to do something.        | To commit the speaker to do something.   |
| **Examples**        | Request, command, suggestion, advice.       | Promise, offer, vow, guarantee.          |
| **Speaker's Role**  | Directs the listener’s actions.             | Commits the speaker to future actions.   |
| **Listener's Role** | Listener is expected to perform the action. | No direct expectation from the listener. |

I guess and expect that a dialogue's happiness counts relationship must be linear to commissive speech counts

Ways in using this dataset in application (inferencing stage - assisting the agent's emotions)
1. full chat dialogue interpreting (dialogue vs chunks of messages) to evaluate an agent's expressiveness or behaviour 
2. expected intent / emotion to return (by preprocessing and collecting every second response )

### Data Analytics: interpreting as full dialogue

In [1]:
import numpy as np
from datasets import load_dataset

data = load_dataset("li2017dailydialog/daily_dialog")

In [3]:
from datasets import concatenate_datasets
concatenate_datasets([data["train"], data["validation"]])

Dataset({
    features: ['dialog', 'act', 'emotion'],
    num_rows: 12118
})

In [11]:
data[:5]['act']

[[3, 4, 2, 2, 2, 3, 4, 1, 3, 4],
 [2, 1, 2, 2, 1, 1],
 [2, 1, 2, 1, 1],
 [2, 1, 1, 1],
 [2, 1, 2, 1, 1, 2, 1, 3, 4]]

The modelling problem here can be defined to construct a predictor / indicator that returns the likelihood of an emotional intensity given the dialogue action and emotional states (Note that emotional states of both users are taken into account to represent the dilaogue's expressive / emotion state). 

#### Data Preprocessing: Transforming the action and emotes to multiclass 

In [None]:
# padding by the number of actions (treating the problem as a multi-label classification)

emote_input_config = ['noemote', 'anger', 'disgust', 'fear', 'happiness', 'sadness', 'surprise']
max_right_pad = 35

def emote_transform(x):
    # transforms the labels to -1 for negative and 1 for positive intensity
    # negative (1, 2, 4, 5) and positive (4, 6)
    for i in x:
        yield 0 if i == 0 else 1

def padding(input_lists, pad_value=-1, max_length=max_right_pad):
    # Find the maximum length of the sublists
    max_length = max(len(lst) for lst in input_lists)

    # Create a NumPy array with the desired shape, filled with the pad_value
    padded_array = np.full((len(input_lists), max_length), pad_value)

    # Fill in the original values
    for i, lst in enumerate(input_lists):
        padded_array[i, :len(lst)] = lst

    return padded_array

data = data.add_column('padded_act', list(padding(data['act'])))
data = data.add_column('emote_mu', [np.mean(list(emote_transform(x))) for x in data['emotion']])

#### Adding Dialogue features

In [81]:
# The act column has been padded to the right with -1, and the emote_mu column has been added with the mean intensity of the emotion
# Intention here is to convert the act column into a one-hot encoding and the emote_mu column into a binary classification

import numpy as np
import pandas as pd

intent_map = ['no_intent', 'inform', 'question', 'directive', 'commisive']
emote_map = ['noemote', 'anger', 'disgust', 'fear', 'happiness', 'sadness', 'surprise']

df = data.to_pandas()

dummies = pd.get_dummies(df.explode('act')['act'], prefix='act', dtype=int)
count_df = dummies.groupby(dummies.index).sum()
e_dummies = pd.get_dummies(df.explode('emotion')['emotion'], prefix='emote', dtype=int)
e_count_df = e_dummies.groupby(e_dummies.index).sum()

df = pd.concat([df, count_df], axis=1)
df = pd.concat([df, e_count_df], axis=1)
# Size of the dialogue
df['dialog_size'] = df['dialog'].str.len()

In [13]:
df['dialog_size'] = df['dialog'].str.len()
df['emotion_intensity'] = df['emote_0'] / df['dialog_size'] # highest = 1, the higher the more neutral / emotionless the speech is


In [None]:
# when dividing the act / emotion counts by the dialog size, we get the proportion of the act / emotion in the dialogue aka it should transform them to probabilities summing to 1

# defining the column labels
act_col = [f"act_{i}" for i in range(1, len(intent_map))]
emote_col = [f"emote_{i}" for i in range(len(emote_map))]
# normalizing the act and emotion columns by the dialog size
df[act_col] /= df['dialog_size'].values[:, None]
df[emote_col] /= df['dialog_size'].values[:, None]

In [90]:
df['emote_0']

0        0.600000
1        0.833333
2        1.000000
3        1.000000
4        0.777778
           ...   
11113    0.777778
11114    0.833333
11115    0.812500
11116    1.000000
11117    1.000000
Name: emote_0, Length: 11118, dtype: float64

In [104]:
set([j for i in df['act'].values for j in i])

{1, 2, 3, 4}

In [100]:
df[['emote_0', 'emote_1', 'emote_2', 'emote_3', 'emote_4', 'emote_5', 'emote_6', 'act_1', 'act_2', 'act_3', 'act_4', 'dialog_size']].corr()

Unnamed: 0,emote_0,emote_1,emote_2,emote_3,emote_4,emote_5,emote_6,act_1,act_2,act_3,act_4,dialog_size
emote_0,1.0,-0.185839,-0.13141,-0.072033,-0.882315,-0.191184,-0.211492,-0.21428,0.22969,0.108448,-0.031595,0.051299
emote_1,-0.185839,1.0,0.011847,-0.005069,-0.078864,0.037469,0.004559,0.057445,-0.069291,-0.013118,-0.004142,-0.023229
emote_2,-0.13141,0.011847,1.0,-0.002115,-0.044883,-0.017038,0.000391,0.092071,-0.050558,-0.055881,-0.038747,-0.054836
emote_3,-0.072033,-0.005069,-0.002115,1.0,-0.018843,-0.005839,-0.001208,0.021277,-0.034707,0.004371,-0.002571,-0.011796
emote_4,-0.882315,-0.078864,-0.044883,-0.018843,1.0,-0.069683,-0.027849,0.153229,-0.202316,-0.069376,0.062766,-0.026519
emote_5,-0.191184,0.037469,-0.017038,-0.005839,-0.069683,1.0,-0.00837,0.056232,-0.09167,-0.003776,0.01538,-0.048561
emote_6,-0.211492,0.004559,0.000391,-0.001208,-0.027849,-0.00837,1.0,0.102282,0.050265,-0.12561,-0.096317,0.009299
act_1,-0.21428,0.057445,0.092071,0.021277,0.153229,0.056232,0.102282,1.0,-0.1655,-0.794508,-0.672977,-0.061544
act_2,0.22969,-0.069291,-0.050558,-0.034707,-0.202316,-0.09167,0.050265,-0.1655,1.0,-0.384405,-0.469346,0.141934
act_3,0.108448,-0.013118,-0.055881,0.004371,-0.069376,-0.003776,-0.12561,-0.794508,-0.384405,1.0,0.704668,-0.021714


In [97]:
# I am guessing that the more commissive the conversation is, the more likely it is for the conversation to have higher counts in the happiness emotion (or positive emotions in general), and this works the other way too. Let's see if this is true

df[df['act_4'] >= 0.5]['emote_4'].describe() # commissive action >= 0.5 and happiness probability

count    243.000000
mean       0.196845
std        0.332304
min        0.000000
25%        0.000000
50%        0.000000
75%        0.250000
max        1.000000
Name: emote_4, dtype: float64

In [98]:
df[df['act_4'] < 0.5]['emote_4'].describe() # commisive action < 0.5 and happiness probability

count    10875.000000
mean         0.130029
std          0.234148
min          0.000000
25%          0.000000
50%          0.000000
75%          0.166667
max          1.000000
Name: emote_4, dtype: float64

In [107]:
df.explode('emotion')[act_col + emote_col + ['emotion']].corr()

Unnamed: 0,act_1,act_2,act_3,act_4,emote_0,emote_1,emote_2,emote_3,emote_4,emote_5,emote_6,emotion
act_1,1.0,-0.126438,-0.801684,-0.670036,-0.209971,0.05897,0.084056,0.004483,0.155443,0.041651,0.116353,0.122694
act_2,-0.126438,1.0,-0.409742,-0.49817,0.210122,-0.07057,-0.03764,-0.026819,-0.190819,-0.081924,0.058722,-0.112684
act_3,-0.801684,-0.409742,1.0,0.70245,0.114191,-0.013517,-0.05343,0.015053,-0.077785,0.00499,-0.139387,-0.072846
act_4,-0.670036,-0.49817,0.70245,1.0,-0.029743,-0.003588,-0.040209,0.005046,0.060219,0.019651,-0.109534,0.012707
emote_0,-0.209971,0.210122,0.114191,-0.029743,1.0,-0.182269,-0.109553,-0.068137,-0.894725,-0.171812,-0.217418,-0.58281
emote_1,0.05897,-0.07057,-0.013517,-0.003588,-0.182269,1.0,0.025265,-0.000749,-0.081931,0.045374,0.016368,0.00209
emote_2,0.084056,-0.03764,-0.05343,-0.040209,-0.109553,0.025265,1.0,-0.002721,-0.040431,-0.013694,0.001459,0.018849
emote_3,0.004483,-0.026819,0.015053,0.005046,-0.068137,-0.000749,-0.002721,1.0,-0.022772,-0.00051,0.006871,0.027631
emote_4,0.155443,-0.190819,-0.077785,0.060219,-0.894725,-0.081931,-0.040431,-0.022772,1.0,-0.071535,-0.021788,0.532266
emote_5,0.041651,-0.081924,0.00499,0.019651,-0.171812,0.045374,-0.013694,-0.00051,-0.071535,1.0,0.007818,0.130734


In [109]:
df = df.explode('emotion')

In [118]:
# since the correlation between happiness and unknown was high, wanted to see if this is due to the dataset's quality where it contains more unknowns and happiness than other emotions or if there is a real correlation between the two

df['emotion'].value_counts(normalize=True) # unknown emotions equips ~80% of the dataset jesus

emotion
0    0.827613
4    0.128278
6    0.018355
5    0.011116
1    0.009487
2    0.003476
3    0.001675
Name: proportion, dtype: float64

In [135]:
# transforming the dataset again for binary classification: no emotion vs emotion
df['emotion'] = np.where(df['emotion'] != 0, 1, 0)
df['emotion'].value_counts(normalize=True) # Reviewing the emotionless vs emotional states again: 80% of the dataset is emotionless

emotion
0    0.827613
1    0.172387
Name: proportion, dtype: float64

In [137]:
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Always scale the input. The most convenient way is to use a pipeline.
seed = 42
pred_cols = ['emote_0', 'emote_1', 'emote_2', 'emote_3', 'emote_4', 'emote_5', 'emote_6', 'act_1', 'act_2', 'act_3', 'act_4', 'dialog_size']

X = df[pred_cols].values
y = df['emotion'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=seed)

def linear_regression(X_train, y_train, X_test, y_test):
    """
    Fits a Linear Regression model and returns the model, predictions, and evaluation metrics.

    Parameters:
    X_train: array-like or pandas DataFrame, shape (n_samples, n_features)
        The training input samples.
    y_train: array-like, shape (n_samples,)
        The target values for training.

    Returns:
    model: LinearRegression
        The fitted Linear Regression model.
    predictions: array, shape (n_samples,)
        The predicted values for the training data.
    metrics: dict
        A dictionary containing the evaluation metrics (MAE, MSE, R^2).
    """
    # Initialize the Linear Regression model
    model = linear_model.LinearRegression()

    # Fit the model
    model.fit(X_train, y_train)

    # Make predictions on both training and testing data
    predictions_train = model.predict(X_train)
    predictions_test = model.predict(X_test)

    # Calculate evaluation metrics for testing data
    mae_test = mean_absolute_error(y_test, predictions_test)
    mse_test = mean_squared_error(y_test, predictions_test)

    # Store metrics in dictionaries
    metrics_train = pd.DataFrame([{
        'Mean Absolute Error (MAE)': mean_absolute_error(y_train, predictions_train),
        'Mean Squared Error (MSE)': mean_squared_error(y_train, predictions_train),
        'R^2 Score': r2_score(y_train, predictions_train)
    }])

    metrics_test = pd.DataFrame([{
        'Mean Absolute Error (MAE)': mae_test,
        'Mean Squared Error (MSE)': mse_test,
        'R^2 Score': r2_score(y_test, predictions_test)
    }])

    print('Train\n', metrics_train)
    print('Eval\n', metrics_test)

    return model

lr = linear_regression(X_train, y_train, X_test, y_test)

Train
    Mean Absolute Error (MAE)  Mean Squared Error (MSE)  R^2 Score
0                   0.176124                  0.087868   0.384122
Eval
    Mean Absolute Error (MAE)  Mean Squared Error (MSE)  R^2 Score
0                   0.177608                  0.088171   0.381981


- higher towards 1 predictions = likelihood of no emotions (maybe in usage 1-pred = neutral prob)
- lower towards 0 (or neg) predictions= likelihood of emotions existing

In [127]:
lr.predict([[2, 5, 0, 0, 0, 0, 0, 0, 0, 0, 3]])

array([-0.09931345])

In [218]:
lr.predict([[0, 10, 0, 0, 0, 0, 0,10, 0, 0, 0, 10]]) # 10 angry utters over 20

array([-0.14866205])

In [219]:
lr.predict([[20, 2, 0, 0, 0, 0, 0,10, 0, 0, 0, 10]])

array([1.92351798])

In [None]:
lr.predict([np.zeros(len(pred_cols))]) # Hypothetical Scenario: zero emotions inferenced from the previous speech and NAN speech intent returned

array([0.82100351])

In [None]:
lr.predict([np.ones(len(pred_cols))]) # Hypothetical Scenario: At least one emote was felt and intent was returned

array([0.63875645])

#### Clean up code with proper training

In [252]:
def load_dialog(split_type):
    data = load_dataset("li2017dailydialog/daily_dialog", split=split_type)
    df = data.to_pandas()

    dummies = pd.get_dummies(df.explode('act')['act'], prefix='act', dtype=int)
    count_df = dummies.groupby(dummies.index).sum()
    e_dummies = pd.get_dummies(df.explode('emotion')['emotion'], prefix='emote', dtype=int)
    e_count_df = e_dummies.groupby(e_dummies.index).sum()
    df = pd.concat([df, count_df], axis=1)
    df = pd.concat([df, e_count_df], axis=1)
    df['dialog_size'] = [len(i) for i in df['dialog']]
    df['emotion_intensity'] = df['emote_0'] / df['dialog_size'] # highest = 1, the higher the more neutral the dialogue should be

    return df[pred_cols].values, df['emotion_intensity'].values

In [253]:
x_train, y_train = load_dialog(split_type='train')
x_valid, y_valid = load_dialog(split_type='validation')
x_test, y_test = load_dialog(split_type='test')

In [255]:
lr = linear_regression(x_train, y_train, x_valid, y_valid)

Train
    Mean Absolute Error (MAE)  Mean Squared Error (MSE)  R^2 Score
0                   0.076375                  0.014731   0.770187
Eval
    Mean Absolute Error (MAE)  Mean Squared Error (MSE)  R^2 Score
0                   0.058813                  0.007901   0.755051


In [258]:
lr.score(x_test, y_test), lr.score(x_valid, y_valid), lr.score(x_train, y_train)

(0.7710130835412832, 0.7550514991831193, 0.7701873983454256)

#### Method 2: Logistic Regression

In [144]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.utils.class_weight import compute_class_weight
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from imblearn.over_sampling import SMOTE  # For oversampling
import numpy as np

SEED = 42
pred_cols = ['emote_0', 'emote_1', 'emote_2', 'emote_3', 'emote_4', 'emote_5', 'emote_6', 'act_1', 'act_2', 'act_3', 'act_4', 'dialog_size']

X = df[pred_cols].values
y = df['emotion'].values

# 1. Train-Test Split (Maintains class distribution)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=SEED)

# 2. Apply SMOTE (Oversampling minority class in training set)
smote = SMOTE(random_state=SEED)
X_train_resampled, y_train_resampled = smote.fit_resample(X_train, y_train)

# 3. Scale features AFTER splitting (to avoid data leakage)
scaler = StandardScaler()
X_train_resampled = scaler.fit_transform(X_train_resampled)
X_test = scaler.transform(X_test)

# 4. Train Model (Using class weights to handle imbalance)
class_weights = compute_class_weight("balanced", classes=np.unique(y_train_resampled), y=y_train_resampled)
model = LogisticRegression(class_weight={0: class_weights[0], 1: class_weights[1]})
model.fit(X_train_resampled, y_train_resampled)

# 5. Evaluate Model
y_pred = model.predict(X_test)
print("Classification Report:\n", classification_report(y_test, y_pred))



Classification Report:
               precision    recall  f1-score   support

           0       0.94      0.83      0.88     14429
           1       0.48      0.75      0.59      3005

    accuracy                           0.82     17434
   macro avg       0.71      0.79      0.73     17434
weighted avg       0.86      0.82      0.83     17434



In [187]:
label_map = {0: "no_emotion", 1: "emotional"}

dict(zip(label_map.values(), *model.predict_proba(X_test[:1])))

{'no_emotion': 0.06377385366365329, 'emotional': 0.9362261463363467}

In [192]:
X_test[:1].shape

(1, 12)

## Using the Dialogue for predicting the next intent

The returning intent conditioned to the full dialogue is difficult and may not be useful... 
But we can preprocess this dataset further to represent it as a paired speech dataset where every second intent will be the predicting variable and the inputs would be the utterance (encoded with `sentence_transformers`)

In [1]:
from datasets import load_dataset

ds = load_dataset("li2017dailydialog/daily_dialog", split='train[:100]')

In [2]:
df = ds.to_pandas()

### Data Transformation: dialogue to paired chunks

In [None]:
col_names_map = {
    'dialog': 'a',
    'act': 'act_a',
    'emotion': 'emote_a'
}
new_col = ['b', 'act_b', 'emote_b']

def transform_to_pairs(df):
    """ This method doesnt preserve the dialogue ids and randomnizes. But puts into paired columns efficiently. """

    explode = df.explode(['dialog', 'act', 'emotion']).reset_index(names='dialog_id')
    explode['pair_id'] = explode.groupby('dialog_id').cumcount() % 2
    # Pivot the table while avoiding duplicate index issues
    df_paired = explode.pivot_table(index='dialog_id', columns='pair_id', values=['dialog', 'emotion', 'act'], aggfunc='first')
    # Flatten MultiIndex columns and rename using col_names_map
    df_paired.columns = [col_names_map[col[0]] if col[1] == 0 else new_col[list(col_names_map.keys()).index(col[0])] for col in df_paired.columns]
    df_paired.reset_index(drop=True, inplace=True)
    return df_paired


In [85]:
new_df =transform_to_pairs(df)

In [144]:
new_df

Unnamed: 0,act_a,act_b,a,b,emote_a,emote_b
0,3,4,"Say , Jim , how about going for a few beers af...",You know that is tempting but is really not g...,0,0
1,2,1,Can you do push-ups ?,Of course I can . It's a piece of cake ! Beli...,0,0
2,2,1,Can you study with the radio on ?,"No , I listen to background music .",0,0
3,2,1,Are you all right ?,I will be all right soon . I was terrified wh...,0,0
4,2,1,"Hey John , nice skates . Are they new ?","Yeah , I just got them . I started playing ic...",0,0
...,...,...,...,...,...,...
95,2,1,How was your education going on in Australia ?,I'm going to graduate this summer .,0,0
96,2,1,"Do you have any particular hobbies , Tom ?","Oh , yes . I love playing badminton , table t...",0,0
97,2,1,What ’ s the plot of your new movie ?,It ’ s a story about a policemen who is inves...,0,0
98,2,1,Who's that old lady trimming the trees ?,She's my grandma .,0,0


In [None]:
def transform_dataset(row):
    # Extract the columns and also preserves the dialogue ids. Uses zip method

    dialog = row['dialog']
    act = row['act']
    emotion = row['emotion']
    # Pair them
    paired = list(zip(dialog[::2], dialog[1::2], emotion[::2], emotion[1::2], act[::2], act[1::2]))
    dialog_a, dialog_b, emote_a, emote_b, act_a, act_b = zip(*paired)

    return {
        'dialog_a': list(dialog_a),
        'dialog_b': list(dialog_b),
        'emote_a': list(emote_a),
        'emote_b': list(emote_b),
        'act_a': list(act_a),
        'act_b': list(act_b),
        'dialogue_size': len(dialog)
    }

def prepare_dataset(ds):

    new = ds.map(transform_dataset, remove_columns=['dialog', 'act', 'emotion'])
    new_df = new.to_pandas().explode(['dialog_a', 'dialog_b', 'emote_a', 'emote_b', 'act_a', 'act_b'])
    return new_df

new_cols = ['dialog_a', 'dialog_b', 'emote_a', 'emote_b', 'act_a', 'act_b']
new_ds = ds.map(transform_dataset, remove_columns=['dialog', 'act', 'emotion'])
new = new_ds.to_pandas().explode(new_cols)

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

In [157]:
new

Unnamed: 0,dialog_a,dialog_b,emote_a,emote_b,act_a,act_b,dialogue_size
0,"Say , Jim , how about going for a few beers af...",You know that is tempting but is really not g...,0,0,3,4,10
0,What do you mean ? It will help us to relax .,Do you really think so ? I don't . It will ju...,0,0,2,2,10
0,I guess you are right.But what shall we do ? ...,I suggest a walk over to the gym where we can...,0,0,2,3,10
0,That's a good idea . I hear Mary and Sally of...,"Sounds great to me ! If they are willing , we...",4,4,4,1,10
0,Good.Let ' s go now .,All right .,4,4,3,4,10
...,...,...,...,...,...,...,...
97,Did you do you own stunts in the movie ?,"I wanted to , but my insurance company wouldn...",0,0,2,1,11
97,Thank you very much for doing this interview .,My pleasure . Have you seen the movie yet ?,4,4,1,2,11
98,Who's that old lady trimming the trees ?,She's my grandma .,0,0,2,1,4
98,She's looks very healthy.How old is she ?,92 .,0,0,2,1,4


In [143]:
pd.get_dummies(new['act_a'], prefix='act_a', dtype=int).groupby(new.index).sum()

Unnamed: 0,act_a_1,act_a_2,act_a_3,act_a_4
0,0,2,2,1
1,1,2,0,0
2,0,2,0,0
3,1,1,0,0
4,2,2,0,0
...,...,...,...,...
95,0,2,1,0
96,1,3,0,0
97,1,4,0,0
98,0,2,0,0
