# CRAFT demo (inference only) using ConvoKit

This example notebook shows how an already-trained CRAFT model can be applied to conversational data to predict future derailment. This example uses the fully trained Wikiconv-based model as reported in the "Trouble on the Horizon" paper, and applies it to ConvoKit's version of the labeled Wikiconv corpus.


In [1]:
import convokit

In [2]:
from convokit import Forecaster, Corpus, download

In [3]:
MAX_LENGTH = 80

In [7]:
from convokit.forecaster.CRAFTModel import CRAFTModel

In [None]:
craft_model = CRAFTModel(initial_weights="craft-wiki-pretrained", torch_device="cpu")

Downloading craft-wiki-pretrained to /Users/mishkin/.convokit/saved-models/craft-wiki-pretrained
Downloading craft-wiki-pretrained/craft_pretrained.tar from https://zissou.infosci.cornell.edu/convokit/models/craft_wikiconv/craft_pretrained.tar (974.6MB)... Done
Downloading craft-wiki-pretrained/index2word.json from https://zissou.infosci.cornell.edu/convokit/models/craft_wikiconv/index2word.json (998.5KB)... Done
Downloading craft-wiki-pretrained/word2index.json from https://zissou.infosci.cornell.edu/convokit/models/craft_wikiconv/word2index.json (898.4KB)... Done


In [9]:
forecaster = Forecaster(forecaster_model = craft_model,
                        forecast_mode = "future",
                        convo_structure="linear",
                        text_func = lambda utt: utt.meta["tokens"][:(MAX_LENGTH-1)],
                        label_func = lambda utt: int(utt.meta['comment_has_personal_attack']),
                        forecast_attribute_name="prediction", forecast_prob_attribute_name="pred_score",
                        use_last_only = True,
                        skip_broken_convos=False
                       )

TypeError: Forecaster.__init__() got an unexpected keyword argument 'forecast_mode'

In [None]:
corpus = Corpus(filename=download("conversations-gone-awry-corpus"))

Downloading conversations-gone-awry-corpus to /Users/mishkin/.convokit/saved-corpora/conversations-gone-awry-corpus
Downloading conversations-gone-awry-corpus from http://zissou.infosci.cornell.edu/convokit/datasets/conversations-gone-awry-corpus/conversations-gone-awry-corpus.zip (45.2MB)... Done


## Part 2: load the data

Now we load the labeled Wikiconv corpus from ConvoKit, and run some transformations to prepare it for use with PyTorch

In [None]:
from convokit.forecaster.CRAFT import craft_tokenize

In [None]:
for utt in corpus.iter_utterances():
    utt.add_meta("tokens", craft_tokenize(craft_model.voc, utt.text))

In [None]:
forecaster.transform(corpus, selector=lambda convo: convo.meta["split"] == "train",
                    ignore_utterances=lambda utt: utt.meta["is_section_header"])

Iteration: 1; Percent complete: 2.5%
Iteration: 2; Percent complete: 5.0%
Iteration: 3; Percent complete: 7.5%
Iteration: 4; Percent complete: 10.0%
Iteration: 5; Percent complete: 12.5%
Iteration: 6; Percent complete: 15.0%
Iteration: 7; Percent complete: 17.5%
Iteration: 8; Percent complete: 20.0%
Iteration: 9; Percent complete: 22.5%
Iteration: 10; Percent complete: 25.0%
Iteration: 11; Percent complete: 27.5%
Iteration: 12; Percent complete: 30.0%
Iteration: 13; Percent complete: 32.5%
Iteration: 14; Percent complete: 35.0%
Iteration: 15; Percent complete: 37.5%
Iteration: 16; Percent complete: 40.0%
Iteration: 17; Percent complete: 42.5%
Iteration: 18; Percent complete: 45.0%
Iteration: 19; Percent complete: 47.5%
Iteration: 20; Percent complete: 50.0%
Iteration: 21; Percent complete: 52.5%
Iteration: 22; Percent complete: 55.0%
Iteration: 23; Percent complete: 57.5%
Iteration: 24; Percent complete: 60.0%
Iteration: 25; Percent complete: 62.5%
Iteration: 26; Percent complete: 65.0

<convokit.model.corpus.Corpus at 0x7ff539d91a50>

In [None]:
forecasts_df = forecaster.summarize(corpus)

NameError: name 'forecaster' is not defined

In [None]:
forecasts_df.head(20)

Unnamed: 0_level_0,prediction,pred_score
utt_id,Unnamed: 1_level_1,Unnamed: 2_level_1
800622928.18454.18454,1.0,0.98963
351871224.50472.50472,1.0,0.988888
409048245.4938.4938,1.0,0.988836
751475142.54124.54124,1.0,0.988621
308491753.38115.38115,1.0,0.988546
404257585.20200.20200,1.0,0.988429
159022461.6705.6705,1.0,0.988336
746296311.83642.83642,1.0,0.988175
117788418.23770.23770,1.0,0.987995
18657304.7525.7525,1.0,0.987427
