# CRAFT demo (inference only) using ConvoKit

This example notebook shows how an already-trained CRAFT model can be applied to conversational data to predict future derailment. This example uses the fully trained Wikiconv-based model as reported in the "Trouble on the Horizon" paper, and applies it to ConvoKit's version of the labeled Wikiconv corpus.


In [2]:
import convokit

Since the GPL-licensed package `unidecode` is not installed, using Python's `unicodedata` package which yields worse results.


In [3]:
from convokit import Forecaster, Corpus, download

In [4]:
MAX_LENGTH = 80

In [5]:
from convokit.forecaster.CRAFTModel import CRAFTModel

In [6]:
craft_model = CRAFTModel(device_type="cpu", model_path="finetuned_model.tar")

Initializing CRAFT model with options:
{'hidden_size': 500, 'encoder_n_layers': 2, 'context_encoder_n_layers': 2, 'decoder_n_layers': 2, 'dropout': 0.1, 'batch_size': 64, 'clip': 50.0, 'learning_rate': 1e-05, 'print_every': 10, 'train_epochs': 30, 'validation_size': 0.2, 'max_length': 80, 'trained_model_output_filepath': 'finetuned_model.tar'}
Could not find CRAFT model tar file at: finetuned_model.tar
Loading saved parameters...
Building encoders, decoder, and classifier...
Models built and ready to go!


In [7]:
forecaster = Forecaster(forecaster_model = craft_model,
                        forecast_mode = "future",
                        convo_structure="linear",
                        text_func = lambda utt: utt.meta["tokens"][:(MAX_LENGTH-1)],
                        label_func = lambda utt: int(utt.meta['comment_has_personal_attack']),
                        forecast_attribute_name="prediction", forecast_prob_attribute_name="pred_score",
                        use_last_only = True,
                        skip_broken_convos=False
                       )

In [8]:
corpus = Corpus(filename=download("conversations-gone-awry-corpus"))

Dataset already exists at /kitchen/convokit-corpora-jpc/conversations-gone-awry-corpus


## Part 2: load the data

Now we load the labeled Wikiconv corpus from ConvoKit, and run some transformations to prepare it for use with PyTorch

In [9]:
from convokit.forecaster.CRAFT import craft_tokenize

In [10]:
for utt in corpus.iter_utterances():
    utt.add_meta("tokens", craft_tokenize(craft_model.voc, utt.text))

In [11]:
forecaster.transform(corpus, selector=lambda convo: convo.meta["split"] == "train",
                    ignore_utterances=lambda utt: utt.meta["is_section_header"])

Iteration: 1; Percent complete: 2.5%
Iteration: 2; Percent complete: 5.0%
Iteration: 3; Percent complete: 7.5%
Iteration: 4; Percent complete: 10.0%
Iteration: 5; Percent complete: 12.5%
Iteration: 6; Percent complete: 15.0%
Iteration: 7; Percent complete: 17.5%
Iteration: 8; Percent complete: 20.0%
Iteration: 9; Percent complete: 22.5%
Iteration: 10; Percent complete: 25.0%
Iteration: 11; Percent complete: 27.5%
Iteration: 12; Percent complete: 30.0%
Iteration: 13; Percent complete: 32.5%
Iteration: 14; Percent complete: 35.0%
Iteration: 15; Percent complete: 37.5%
Iteration: 16; Percent complete: 40.0%
Iteration: 17; Percent complete: 42.5%
Iteration: 18; Percent complete: 45.0%
Iteration: 19; Percent complete: 47.5%
Iteration: 20; Percent complete: 50.0%
Iteration: 21; Percent complete: 52.5%
Iteration: 22; Percent complete: 55.0%
Iteration: 23; Percent complete: 57.5%
Iteration: 24; Percent complete: 60.0%
Iteration: 25; Percent complete: 62.5%
Iteration: 26; Percent complete: 65.0

<convokit.model.corpus.Corpus at 0x7ff539d91a50>

In [12]:
forecasts_df = forecaster.summarize(corpus)

In [13]:
forecasts_df.head(20)

Unnamed: 0_level_0,prediction,pred_score
utt_id,Unnamed: 1_level_1,Unnamed: 2_level_1
800622928.18454.18454,1.0,0.98963
351871224.50472.50472,1.0,0.988888
409048245.4938.4938,1.0,0.988836
751475142.54124.54124,1.0,0.988621
308491753.38115.38115,1.0,0.988546
404257585.20200.20200,1.0,0.988429
159022461.6705.6705,1.0,0.988336
746296311.83642.83642,1.0,0.988175
117788418.23770.23770,1.0,0.987995
18657304.7525.7525,1.0,0.987427
