### Politeness prediction with ConvoKit

This notebook demonstrates how to train a simple classifier to predict the politeness level of a request by considering the politeness strategies used, as seen in the paper [A computational approach to politeness with application to social factors](https://www.cs.cornell.edu/~cristian/Politeness.html), using ConvoKit. Note that this notebook is *not* intended to reproduce the paper results: legacy code for reproducibility is available at this [repository](https://github.com/sudhof/politeness). 

In [1]:
import pandas as pd
import numpy as np
from tqdm import tqdm
from collections import defaultdict

In [3]:
# import sys
# sys.path.insert(0, "../github/Cornell-Conversational-Analysis-Toolkit/")

# import convokit
# print(convokit.__file__)

../github/Cornell-Conversational-Analysis-Toolkit/convokit/__init__.py




In [4]:
from convokit import Corpus, User, Utterance

In [5]:
from pandas import DataFrame
from typing import List, Dict, Set

#### 1: Loading (and converting) annotated dataset

We will be using the wikipedia annotations from the [Stanford Politeness Corpus](https://www.cs.cornell.edu/~cristian/Politeness.html). 

Code below demonstrates how to convert the original CSV file into the corpus format expected by ConvoKit, but this resultant corpus can also be directly downloaded using the helper function `download("wiki-politeness-annotated")`. 

In [6]:
# you may need to modify the filepath depending on where your downloaded version is stored 
df = pd.read_csv("Stanford_politeness_corpus/wikipedia.annotated.csv")

To see how the data looks:

In [7]:
df.head(2)

Unnamed: 0,Community,Id,Request,Score1,Score2,Score3,Score4,Score5,TurkId1,TurkId2,TurkId3,TurkId4,TurkId5,Normalized Score
0,Wikipedia,629705,Where did you learn English? How come you're t...,13,9,11,11,5,A2UFD1I8ZO1V4G,A2YFPO0N4GIS25,AYG3MF094634L,A38WUWONC7EXTO,A15DM9BMKZZJQ6,-1.120049
1,Wikipedia,244336,Thanks very much for your edit to the <url> ar...,23,16,24,21,25,A2QN0EGBRGJU1M,A2GSW5RBAT5LQ5,AO5E3LWBYM72K,A2ULMYRKQMNNFG,A3TFQK7QK8X6LM,1.313955


Firstly, we need to convert it to the format ConvoKit expects. Here is a simple helper function that does the job. 

In [8]:
def convert_df_to_corpus(df: DataFrame, id_col: str, text_col: str, meta_cols: List[str]) -> Corpus:
    
    """ Helper function to convert data to Corpus format
     
    Arguments:
        df {DataFrame} -- Actual data, in a pandas Dataframe
        id_col {str} -- name of the column that corresponds to utterances ids 
        text_col {str} -- name of the column that stores texts of the utterances  
        meta_cols {List[str]} -- set of columns that stores relevant metadata 
    
    Returns:
        Corpus -- the converted corpus
    """
    
    # in this particular case, user, reply_to, and timestamp information are all not applicable 
    # and we will simply either create a placeholder entry, or leave it as None 
        
    user = User("wiki_user")
    time = "NOT_RECORDED"

    utterance_list = []    
    for index, row in tqdm(df.iterrows()):
        
        # extracting meta data
        metadata = {}
        for meta_col in meta_cols:
            metadata[meta_col] = row[meta_col]
        
        utterance_list.append(Utterance(str(row[id_col]), user, row[id_col], None, time, \
                                        row[text_col], meta=metadata))
    
    return Corpus(utterances = utterance_list)

For meta data, we will include the normalized score, its corresponding binary label (based on a 75% vs. 25% percentile cutoff -- technically there are three classes, but we will only look at the two ends, thus "binary"), as well as all original annotations with turker information. 

- Adding detailed annotations information to dataframe 

In [9]:
# for simplicity, we will condense the turker information together
df["Annotations"] = [dict(zip([df.iloc[i]["TurkId{}".format(j)] for j in range(1,6)], \
                             [df.iloc[i]["Score{}".format(j)] for j in range(1,6)])) for i in tqdm(range(len(df)))]

100%|██████████| 4353/4353 [00:10<00:00, 421.31it/s]


- obtaining polite vs. impolite label (note that we are only interested in labels that are either +1 or -1)

In [11]:
# computing the binary label based on Normalized score
top = np.percentile(df['Normalized Score'], 75)
bottom = np.percentile(df["Normalized Score"], 25)
df['Binary'] = [int(score >= top) - int(score <= bottom) for score in df['Normalized Score']]

- converting dataframe to corpus

In [15]:
wiki_corpus = convert_df_to_corpus(df, "Id", "Request", ["Normalized Score", "Binary", "Annotations"])

4353it [00:00, 4521.61it/s]


In [39]:
# if you were to download the data directly, here is how: 
# from convokit import download
# wiki_corpus = Corpus(download("wiki-politeness-annotated"))

#### 2: Annotate the corpus with politeness strategies

To get politeness strategies for each utterance, we will first obtain dependency parses for the utterances, and then check for strategy use. 

- adding dependency parses

In [19]:
from convokit import TextParser
parser = TextParser(verbosity=50)

In [20]:
corpus = parser.transform(wiki_corpus)

050/4353 utterances processed
100/4353 utterances processed
150/4353 utterances processed
200/4353 utterances processed
250/4353 utterances processed
300/4353 utterances processed
350/4353 utterances processed
400/4353 utterances processed
450/4353 utterances processed
500/4353 utterances processed
550/4353 utterances processed
600/4353 utterances processed
650/4353 utterances processed
700/4353 utterances processed
750/4353 utterances processed
800/4353 utterances processed
850/4353 utterances processed
900/4353 utterances processed
950/4353 utterances processed
1000/4353 utterances processed
1050/4353 utterances processed
1100/4353 utterances processed
1150/4353 utterances processed
1200/4353 utterances processed
1250/4353 utterances processed
1300/4353 utterances processed
1350/4353 utterances processed
1400/4353 utterances processed
1450/4353 utterances processed
1500/4353 utterances processed
1550/4353 utterances processed
1600/4353 utterances processed
1650/4353 utterances proces

- adding strategy information

In [21]:
from convokit import PolitenessStrategies
ps = PolitenessStrategies()

In [22]:
wiki_corpus = ps.transform(wiki_corpus)

Below is an example of how a processed utterance now look. Dependency parses are stored in `parsed`, and politeness strategies are in `politeness_strategies`

In [28]:
wiki_corpus.get_utterance('629705')

Utterance({'id': '629705', 'user': User([('name', 'wiki_user')]), 'root': 629705, 'reply_to': None, 'timestamp': 'NOT_RECORDED', 'text': "Where did you learn English? How come you're taking on a third language?", 'meta': {'Normalized Score': -1.1200492637766977, 'Binary': -1, 'Annotations': {'A2UFD1I8ZO1V4G': 13, 'A2YFPO0N4GIS25': 9, 'AYG3MF094634L': 11, 'A38WUWONC7EXTO': 11, 'A15DM9BMKZZJQ6': 5}, 'parsed': [{'rt': 3, 'toks': [{'tok': 'Where', 'tag': 'WRB', 'dep': 'advmod', 'up': 3, 'dn': []}, {'tok': 'did', 'tag': 'VBD', 'dep': 'aux', 'up': 3, 'dn': []}, {'tok': 'you', 'tag': 'PRP', 'dep': 'nsubj', 'up': 3, 'dn': []}, {'tok': 'learn', 'tag': 'VB', 'dep': 'ROOT', 'dn': [0, 1, 2, 4, 5]}, {'tok': 'English', 'tag': 'NNP', 'dep': 'dobj', 'up': 3, 'dn': []}, {'tok': '?', 'tag': '.', 'dep': 'punct', 'up': 3, 'dn': []}]}, {'rt': 4, 'toks': [{'tok': 'How', 'tag': 'WRB', 'dep': 'advmod', 'up': 4, 'dn': []}, {'tok': 'come', 'tag': 'VB', 'dep': 'aux', 'up': 4, 'dn': []}, {'tok': 'you', 'tag': 'PR

You may want to save the corpus by doing `wiki_corpus.dump("wiki-politeness-annotated")` for further exploration. Note that if you do not specify a base path, data will be saved to `.convokit/saved-corpora` in your home directory by default. 

[TODO] To get a glimpse of the overall use of politeness strategies in this corpus: 

In [26]:
ps.summarize(wiki_corpus)

#### 3. Predict Politeness 

We will see how a simple classifier considering the use of politeness strategies perform, using `Classifier` (note that this is only for demonstration, and not geared towards achieving best performance).

In [29]:
import random
from sklearn import svm
from scipy.sparse import csr_matrix
from sklearn.metrics import classification_report

In [30]:
from convokit import Classifier

As a prestep, we subset the corpus as we will only consider the polite vs. impolite class for prediction (i.e., those with "Binary" field being either +1 or -1). 

In [38]:
binary_corpus = Corpus(utterances=[utt for utt in wiki_corpus.iter_utterances() if utt.meta["Binary"] != 0])

#### 3.1 Direct evaluation

If you are interested in how effectiveness are these politeness strategies, `Classifier` provides evaluation with both train/test splits, as well as with cross validaton. 

- cross validation accuracies

In [39]:
clf_cv = Classifier(obj_type="utterance", 
                        pred_feats=["politeness_strategies"], 
                        labeller=lambda utt: utt.meta['Binary'] == 1)

clf_cv.evaluate_with_cv(binary_corpus)

Using corpus objects...
Running a cross-validated evaluation...


of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  X_y_df = pd.concat([X_df, y_df], axis=1)


Done.


array([0.7706422 , 0.74770642, 0.73394495, 0.75172414, 0.73793103])

- train/test split

In [40]:
clf_split = Classifier(obj_type="utterance", 
                        pred_feats=["politeness_strategies"], 
                        labeller=lambda utt: utt.meta['Binary'] == 1)

clf_split.evaluate_with_train_test_split(binary_corpus)

Using corpus objects...
Running a train-test-split evaluation...


of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  X_y_df = pd.concat([X_df, y_df], axis=1)


Done.


(0.7270642201834863, array([[169,  39],
        [ 80, 148]]))

#### 3.2 Training a classifier to predict on other utterances

In [41]:
test_ids = binary_corpus.get_utterance_ids()[-100:]
train_corpus = Corpus(utterances=[utt for utt in binary_corpus.iter_utterances() if utt.id not in test_ids])
test_corpus = Corpus(utterances=[utt for utt in binary_corpus.iter_utterances() if utt.id in test_ids])
print("train size = {}, test size = {}".format(len(train_corpus.get_utterance_ids()),
                                               len(test_corpus.get_utterance_ids())))

train size = 2078, test size = 100


We can also train a classifier with a corpus to predict politeness labels for other Utterances. As an example, we will first train with a training corpus, and check predictions on some test utterances. 

In [42]:
clf = Classifier(obj_type="utterance", 
                        pred_feats=["politeness_strategies"], 
                        labeller=lambda utt: utt.meta['Binary'] == 1)
clf.fit(train_corpus)

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  X_y_df = pd.concat([X_df, y_df], axis=1)


<convokit.classifier.classifier.Classifier at 0x7fe476eb4b50>

- predicing on the test corpus (you can also predict on a list of utterances by using `clf.transform_objs()` instead) 

In [43]:
test_pred = clf.transform(test_corpus)

In [44]:
clf.summarize(test_pred)

Unnamed: 0_level_0,prediction,score
id,Unnamed: 1_level_1,Unnamed: 2_level_1
486441,1,0.729354
626897,0,0.112681
626894,0,0.244715
626728,0,0.068628
620909,1,0.826003
...,...,...
60798,0,0.113524
156734,0,0.138481
147665,0,0.469575
234095,1,0.654587


To look at a few example predictions:

In [47]:
pred2label = {1: "polite", 0: "impolite"}

for i, idx in enumerate(test_ids[0:5]):
    print(i)
    test_utt = test_corpus.get_utterance(idx)
    ypred, yprob = test_utt.meta['prediction'], test_utt.meta['score']
    print("test utterance:\n{}".format(test_utt.text))
    print("------------------------")
    print("Result: {}, probability estimates = {}\n".format(pred2label[ypred], yprob))

0
test utterance:
I understood just fine, but wasn't at my computer. Are you in a hurry?
------------------------
Result: polite, probability estimates = 0.7293542964575708

1
test utterance:
I've always been intrigued by 'dark-complected man.' What's with the radio, and fist in the air?
------------------------
Result: impolite, probability estimates = 0.112680884866657

2
test utterance:
Your early edit's clearly indicate that you were not a newbie. How do explain this?
------------------------
Result: impolite, probability estimates = 0.24471475719553584

3
test utterance:
Instead of another 3O, why don't you put in a <url>. And no, it's not a threat - it's an observation - why don't you <url>?
------------------------
Result: impolite, probability estimates = 0.0686276937064236

4
test utterance:
Great Article RaveenS, Do u want me to add this to the template (Sri Lankan Conflict)? I think it should be included in the ''see also'' section what do you suggest?
----------------------

We can also check out the confusion matrix and classification report 

In [48]:
clf.confusion_matrix(test_corpus)

array([[34, 12],
       [17, 37]])

In [50]:
print(clf.classification_report(test_corpus))

              precision    recall  f1-score   support

       False       0.67      0.74      0.70        46
        True       0.76      0.69      0.72        54

    accuracy                           0.71       100
   macro avg       0.71      0.71      0.71       100
weighted avg       0.71      0.71      0.71       100



We note that this is an implementation of a politeness classifier trained on a specific dataset (wikipedia) and on a specific binarization of politeness classes. Depending on your scenario, you might find it preferable to directly use the politeness strategies, as exemplified in the [conversations gone awry example](https://github.com/CornellNLP/Cornell-Conversational-Analysis-Toolkit/blob/master/examples/conversations-gone-awry/Conversations_Gone_Awry_Prediction.ipynb), rather than a politeness label/score.