# CRAFT demo (inference only) using ConvoKit

This example notebook shows how an already-trained CRAFT model can be applied to conversational data to predict future derailment. This example uses the fully trained Wikiconv-based model as reported in the "Trouble on the Horizon" paper, and applies it to ConvoKit's version of the labeled Wikiconv corpus.

In [1]:
import os
os.chdir('../../../..')

In [2]:
import convokit



In [3]:
from convokit import Forecaster, Corpus, download

In [4]:
MAX_LENGTH = 80

In [5]:
craft_model = convokit.CRAFTModel(device_type="cpu", batch_size=100, max_length=MAX_LENGTH)

Loading saved parameters...
Building encoders, decoder, and classifier...
Models built and ready to go!


In [6]:
forecaster = Forecaster(forecaster_model = craft_model,
                        convo_structure="linear",
                        text_func = lambda utt: utt.meta["tokens"][:(MAX_LENGTH-1)],
                        utt_selector_func = lambda utt: not utt.meta["is_section_header"],
                        convo_selector_func = (lambda convo: convo.meta["split"] == "test"),
                        forecast_feat_name="prediction", forecast_prob_feat_name="score",
                        skip_broken_convos=False
                       )

In [7]:
corpus = Corpus(filename=download("conversations-gone-awry-corpus"))

Dataset already exists at /Users/calebchiam/.convokit/downloads/conversations-gone-awry-corpus


## Part 2: load the data

Now we load the labeled Wikiconv corpus from ConvoKit, and run some transformations to prepare it for use with PyTorch

In [8]:
from convokit import craft_tokenize

In [9]:
for utt in corpus.iter_utterances():
    utt.add_meta("tokens", craft_tokenize(craft_model.voc, utt.text))

In [None]:
forecaster.transform(corpus)

Iteration: 1; Percent complete: 2.3%
Iteration: 2; Percent complete: 4.5%
Iteration: 3; Percent complete: 6.8%
Iteration: 4; Percent complete: 9.1%
Iteration: 5; Percent complete: 11.4%
Iteration: 6; Percent complete: 13.6%
Iteration: 7; Percent complete: 15.9%
Iteration: 8; Percent complete: 18.2%
Iteration: 9; Percent complete: 20.5%
Iteration: 10; Percent complete: 22.7%
Iteration: 11; Percent complete: 25.0%
Iteration: 12; Percent complete: 27.3%
Iteration: 13; Percent complete: 29.5%
Iteration: 14; Percent complete: 31.8%
Iteration: 15; Percent complete: 34.1%
Iteration: 16; Percent complete: 36.4%
Iteration: 17; Percent complete: 38.6%
Iteration: 18; Percent complete: 40.9%
Iteration: 19; Percent complete: 43.2%
Iteration: 20; Percent complete: 45.5%
Iteration: 21; Percent complete: 47.7%
Iteration: 22; Percent complete: 50.0%
Iteration: 23; Percent complete: 52.3%
Iteration: 24; Percent complete: 54.5%
Iteration: 25; Percent complete: 56.8%
Iteration: 26; Percent complete: 59.1%

In [None]:
forecasts_df = forecaster.summarize(corpus)

In [None]:
forecasts_df.head(20)