## Preliminaries

In [1]:
import convokit

In [2]:
ROOT_DIR = '/kitchen/convokit_corpora/tennis-corpus/'

In [3]:
corpus = convokit.Corpus(ROOT_DIR)
corpus.load_info('utterance',['arcs_censored', 'question_motifs', 'question_motifs__sink'])

In [4]:
VERBOSITY = 10000

## Using PromptTypes to get types of questions

In [5]:
def question_filter(utt, aux_input={}):
    return utt.meta['is_question']
def response_filter(utt, aux_input={}):
    return (not utt.meta['is_question']) and (utt.reply_to is not None)

In [6]:
from convokit.prompt_types import PromptTypes

In [7]:
pt = PromptTypes(prompt_field='question_motifs', ref_field='arcs_censored', 
                 prompt_filter=question_filter, ref_filter=response_filter,
                 prompt_transform_field='question_motifs__sink',
                 output_field='prompt_types', prompt__tfidf_min_df=50,
                 ref__tfidf_min_df=50, 
    random_state=1000, verbosity=1)

In [8]:
pt.fit(corpus)

fitting 64571 input pairs
fitting ref tfidf model
fitting prompt tfidf model
fitting svd model
fitting 8 prompt types


In [9]:
test_utt_id = '5188_0.q'
utt = corpus.get_utterance(test_utt_id)

In [10]:
utt.text

'How do you feel? Watching the Australian Open it was very scary watching your ankle buckle. How does your ankle feel now?'

In [11]:
utt.get_info('question_motifs__sink')

['feel_*__feel_do__how>*',
 'feel_*__feel_does__feel_how feel_*__feel_now__how>*']

### Per-utterance behavior

In [12]:
utt = pt.transform_utterance(utt)

In [13]:
utt.get_info('prompt_types__prompt_repr')

[0.04614480560093584,
 0.12523176450740134,
 -0.054075221364537254,
 -0.16488221294276534,
 -0.2622154385109799,
 0.06781246550773161,
 0.3161285586302379,
 -0.11348710370703316,
 -0.30383347022247026,
 -0.36553848329504285,
 0.3362904355819349,
 0.048598494761011045,
 -0.2667895757175696,
 -0.14162106717317796,
 0.45694706042159894,
 -0.09758869692105782,
 0.06397073233732087,
 -0.04206323851119204,
 0.07013280562733451,
 0.007080600454132783,
 0.1910828848593811,
 -0.15056265464413737,
 0.21175490736928645,
 0.010463721267544844]

In [18]:
utt.get_info('prompt_types__prompt_dists.8')

[1.1542767147522681,
 1.1836322878230092,
 1.1403839094093262,
 1.080862067095506,
 1.3137305097357392,
 0.36497031797843443,
 1.3267111408021754,
 1.1126782406483375]

In [19]:
utt.get_info('prompt_types__prompt_type.8')

5.0

In [20]:
utt.get_info('prompt_types__prompt_type_dist.8')

0.36497031797843443

### model output

In [49]:
for i in range(8):
    print(i)
    pt.display_type(i, corpus=corpus, k=5)
    print('\n\n')

0
top prompt:


Unnamed: 0,0,1,2,3,4,5,6,7,type_id
try_*,0.664072,1.090674,1.31958,1.249188,1.087935,1.363645,1.155572,1.191221,0.0
how>*__keep_*__keep_how,0.669997,1.136478,1.267361,1.099178,1.043041,1.360519,1.171606,1.098759,0.0
do_*__do_do__do_what,0.686736,0.887752,1.325944,1.289076,1.022363,1.353739,1.343151,1.068488,0.0
do_*__do_do,0.690401,0.978398,1.269106,1.202296,0.977706,1.297059,1.308501,1.003302,0.0
prepare_*__prepare_for,0.692725,0.925045,1.307146,1.273835,1.118497,1.314042,1.307853,1.204177,0.0


top response:


Unnamed: 0,0,1,2,3,4,5,6,7,type_id
work_for,0.690725,1.043187,1.39162,1.146133,1.049347,1.270692,1.24596,1.159645,0.0
work_*,0.693785,1.098218,1.385079,1.114691,1.074291,1.277762,1.233976,1.176795,0.0
control_*,0.697097,1.072021,1.325468,1.25293,1.098485,1.391434,1.204093,1.116907,0.0
focus_on,0.709495,0.960281,1.25995,1.186321,1.000408,1.26875,1.24174,1.216111,0.0
focus_*,0.710258,0.970667,1.266089,1.192621,1.004985,1.265143,1.228822,1.21364,0.0


top prompts:
2020_8.q On a physical level, you're stronger maybe than ever. How do you work hard if the tournament is so close? What do you do in the gym? How do you keep the regimen going on?
['how>*__how>do work_*', 'do_*__do_in__do_what__what>* do_*__do_what__what>*__what>do', 'how>*__how>do how>*__keep_*__keep_how keep_*__keep_do']

887_3.q When you're in command of a match like that, do you actually try different things or work on something?
['try_*__try_do when>* work_*']

4182_6.q In the beginning of this season you lost in Melbourne, the final. Did you try to make a difference in preparation before today's game and why it didn't work?
['did>* try_* work_*']

3180_28.q How do you do that? What do you do?
['do_*__do_do do_*__do_how how>*__how>do', 'do_*__do_what__what>*__what>do']

5541_2.q What do you do between now and Wimbledon? What things do you work on?
['do_*__do_what__what>*__what>do', 'what>* work_*']

top responses:
3881_6.a No, it's me definitely. I'm really trying. Th

Unnamed: 0,0,1,2,3,4,5,6,7,type_id
will>*,1.029253,0.532545,1.153288,1.282362,0.991471,1.335638,1.512199,1.106152,1.0
going_*__going_are,1.021476,0.592999,1.26801,1.30461,1.115215,1.279629,1.498275,1.178506,1.0
are>*__going_*,1.018917,0.593653,1.240439,1.298572,1.066024,1.293147,1.486427,1.176297,1.0
are_*__are_what,1.030714,0.594019,1.008208,1.33854,1.127179,1.420803,1.55172,1.017175,1.0
be_*__be_what,1.09422,0.599262,1.142316,1.287656,1.027747,1.246965,1.520386,1.010938,1.0


top response:


Unnamed: 0,0,1,2,3,4,5,6,7,type_id
have_'ll,1.081366,0.672651,1.094378,1.260036,1.113196,1.343347,1.566586,0.956881,1.0
have_hopefully,1.033055,0.725882,1.191892,1.187055,1.164994,1.125782,1.537591,1.062537,1.0
'm_yet,1.074761,0.73307,1.270383,1.236572,1.094308,1.051111,1.488099,1.045396,1.0
'm_worried,1.078581,0.76951,1.244417,1.164254,1.016196,1.077727,1.463963,1.045015,1.0
take_will,0.820443,0.777609,1.284624,1.265605,1.041802,1.174257,1.420946,1.139167,1.0


top prompts:
6071_1.q You have been working with Thomas Enqvist for a couple of months now. What are your thoughts on your cooperation so far? Will you continue to work together throughout the year?
['are_*__what>*', 'will>*']

3880_19.q Will there be lots more Grand Slam champions out of China in tennis now, do you think?
['think_*__think_be__think_do will>*']

5213_7.q What are your plans for offseason? Any vacation? Are you going to the Davis Cup final? Any plans on that?
['are_*__what>*', '', 'are>*__going_*', '']

3720_11.q We know how tough David Ferrer can be. Are you going to try to do some things differently against him? What are your thoughts on that matchup?
['are>*__going_*', 'are_*__what>*']

6183_7.q Are you going to stay on in Paris for a little bit? What are your plans for the grass season?
['are>*__going_*', 'are_*__what>*']

top responses:
4347_5.a No, because I already achieved my goal. I would love to finish No.1 as well for the end of the year. For that it's going 

Unnamed: 0,0,1,2,3,4,5,6,7,type_id
think_*__think_what,1.197465,1.046705,0.670099,1.29163,1.045561,1.467219,1.330959,0.945044,2.0
have>*__played_*__played_have,1.156198,1.13816,0.675621,1.312374,1.293401,1.25834,1.252053,1.210052,2.0
played_*,1.178912,1.191127,0.67999,1.308993,1.325899,1.257607,1.211431,1.227966,2.0
think_*__what>*,1.213091,1.056169,0.682394,1.30661,1.083586,1.480133,1.323852,0.956038,2.0
think_*__think_of__think_what,1.215357,1.096311,0.689644,1.305629,1.028032,1.43072,1.270178,0.955797,2.0


top response:


Unnamed: 0,0,1,2,3,4,5,6,7,type_id
won_*,1.258377,1.196535,0.558545,1.285411,1.24632,1.287155,1.200692,1.188083,2.0
won_mean,1.205385,1.143031,0.562176,1.258819,1.255426,1.205559,1.334921,1.057083,2.0
won_know,1.223428,1.232449,0.585416,1.24203,1.263787,1.247248,1.207182,1.150428,2.0
won_in,1.259561,1.19966,0.612412,1.269198,1.275801,1.204014,1.205743,1.195606,2.0
know_well,1.165847,1.012158,0.623633,1.232463,1.062415,1.381328,1.374095,0.911233,2.0


top prompts:
1622_5.q How big is the difference between this sort of match, you've been playing a lot of futures match, but I don't know how many challengers you've been playing, and a junior Grand Slam, for instance?
['how>*__know_* how>*__playing_* know_*__know_do playing_*__playing_is']

5427_16.q What do you know of her as a player and how would you describe your respective styles?
['describe_*__describe_how know_*__know_do__what>*']

6432_5.q You were playing tennis way above your ranking in the first set. Have you ever played that level before, and do you think you can now start playing that level consistently?
['have>*__played_*__played_have played_*__played_before__played_have think_*__think_do']

3111_8.q From your matches earlier, what do you remember from those matches?
['from>* remember_*__remember_what']

1778_15.q What do you make of the notion that squash shots are occasionally being played or shots that come out of the squash game are being played today?
['played_* what

Unnamed: 0,0,1,2,3,4,5,6,7,type_id
been_*__been_has,1.154575,1.274408,1.130674,0.436758,1.096999,1.095909,1.290464,1.026623,3.0
been_*,1.153646,1.269146,1.159866,0.454188,1.052782,1.204385,1.238575,1.107916,3.0
been_*__what>*,1.116167,1.220201,1.208998,0.481859,1.095586,1.181512,1.2815,1.121013,3.0
been_*__have>*,1.093404,1.215954,1.179324,0.491063,0.962101,1.219532,1.249094,1.098328,3.0
been_*__been_has__been_there,1.108008,1.257066,1.147437,0.502284,1.116277,1.185385,1.279214,1.087618,3.0


top response:


Unnamed: 0,0,1,2,3,4,5,6,7,type_id
been_always,1.157884,1.22284,1.166764,0.588508,1.16769,1.243089,1.313821,1.140989,3.0
been_great,1.163742,1.210095,1.173097,0.590712,1.175477,1.215457,1.31981,1.16759,3.0
been_with,1.14255,1.218397,1.184616,0.597394,1.173116,1.223021,1.303265,1.165604,3.0
been_really,1.158042,1.221788,1.193701,0.603344,1.167613,1.218293,1.299536,1.173653,3.0
been_just,1.156198,1.23969,1.208971,0.604555,1.197551,1.201427,1.282709,1.199582,3.0


top prompts:
6243_3.q You have a Chinese sponsor, Li Ning. How has that been after winning the Open? Have there been a lot of activities, more so than you might have had to do a couple years ago?
['been_*__been_has__how>* how>*__how>has', 'been_*__been_there been_*__have>*']

2026_7.q What have the demands on your time been like since Sports Illustrated? Obviously that's a very high profile thing to be in. Has that changed your life a lot or at all?
['been_*__been_have been_*__what>*', 'changed_*__changed_has has>*']

5709_2.q A lot of players have found it difficult to refind the consistency after their first Grand Slam title. What has been difficult for you? What have been the challenges? Has Novak been able to give you any advice to see you through this period?
['been_*__been_for been_*__been_has__what>* been_*__been_what__what>* what>*__what>has', 'been_*__been_have been_*__been_what__what>* what>*__what>have', 'been_*__been_able been_*__been_has__has>*']

3897_6.q Li Na being the 

Unnamed: 0,0,1,2,3,4,5,6,7,type_id
say_*__say_did__what>*,1.055753,1.105856,1.172143,1.06495,0.54102,1.362242,1.036849,1.121043,4.0
say_*__say_to__say_what__what>*,1.06516,1.122508,1.180323,1.02556,0.543373,1.350329,1.045646,1.097821,4.0
say_*__say_did__say_to__say_what__what>*,1.06301,1.130922,1.184778,1.044063,0.558098,1.348753,1.012438,1.135273,4.0
say_*__what>*,1.039548,1.07504,1.091245,1.053678,0.559151,1.382721,1.125849,1.048972,4.0
talk_*__talk_did,1.041667,1.153686,1.140871,1.014077,0.566663,1.426454,1.008444,1.090276,4.0


top response:


Unnamed: 0,0,1,2,3,4,5,6,7,type_id
asked_*,1.087368,1.111035,1.144027,1.034541,0.547727,1.415786,1.069267,1.140354,4.0
sitting_*,1.087271,1.186492,1.214257,0.942601,0.560868,1.427531,1.008696,1.110207,4.0
oh>*,1.119154,1.096515,1.148108,1.029201,0.56144,1.354423,1.153463,0.966446,4.0
said_well,1.069284,1.134293,1.179574,1.083234,0.562052,1.345357,1.022609,1.131403,4.0
told_was,1.068833,1.189593,1.167921,1.069154,0.57305,1.397914,0.936245,1.139699,4.0


top prompts:
4063_10.q You didn't talk to her in Brisbane?
['talk_*__talk_did talk_*__talk_to']

1859_16.q Have you spoken to any of your Fed Cup teammates this week? And have they told you how they feel about you coming back into the tomorrow?
['spoken_*__spoken_have__spoken_to', 'and>* told_*']

3923_3.q Did you speak to him after it? Did you talk to David? Did he apologize?
['did>* speak_*', 'did>*__talk_* talk_*__talk_to', 'did>*']

5263_7.q Who did you talk to during the delay?
['talk_*__talk_did talk_*__talk_to who>*']

3321_7.q Did you talk to Victoria after the match yet? Have you had a chance to ask her what happened?
['did>*__talk_* talk_*__talk_to', 'had_*__have>*']

top responses:
3781_8.a No. He just asked me if it's the shoulder and I said yes. That's it.
['no>* no_*', "asked_'s asked_* asked_just", 'and>* said_* said_yes', "'s_*"]

4253_17.a No. I mean my coach he was always sitting alone, and my team, my father and my boyfriend and physio they were sitting in my chair a

Unnamed: 0,0,1,2,3,4,5,6,7,type_id
feel_*__feel_now,1.099974,1.129066,1.2396,1.152036,1.289675,0.408886,1.31432,1.155465,5.0
feeling_*__feeling_are__how>*,1.176024,1.208878,1.2067,1.086709,1.312381,0.409343,1.260881,1.214859,5.0
do>*__feel_*__feel_do__feel_playing,1.14669,1.193027,1.152799,1.152215,1.346524,0.421844,1.32292,1.127076,5.0
feel_*__feel_do,1.104209,1.117418,1.203544,1.142313,1.265165,0.42657,1.373412,1.044297,5.0
feel_*__feel_do__how>*,1.190418,1.176945,1.124022,1.17844,1.381882,0.441167,1.298054,1.212602,5.0


top response:


Unnamed: 0,0,1,2,3,4,5,6,7,type_id
'm_physically,1.155682,1.143277,1.217395,1.125303,1.235212,0.474613,1.266853,1.146912,5.0
'm_tired,1.137926,1.12598,1.173303,1.148966,1.239097,0.508785,1.301687,1.113153,5.0
feel_physically,1.144334,1.177881,1.209186,1.12304,1.27997,0.513591,1.289905,1.150014,5.0
feel_confident,1.12119,1.16485,1.167346,1.159332,1.300933,0.515009,1.304628,1.136917,5.0
feel_good,1.144566,1.172629,1.182966,1.159265,1.282155,0.527436,1.289555,1.142724,5.0


top prompts:
343_2.q You have an extra difficult game where you have to use an extra spin of your game. So given the fact that you have to play some extra games  Nadal now is out, which gives you maybe an extra motivation. How do you feel? Do you feel extra motivated for the rest of the tournament?
['feel_*__feel_do__how>*', 'do>*__feel_*']

5188_0.q How do you feel? Watching the Australian Open it was very scary watching your ankle buckle. How does your ankle feel now?
['feel_*__feel_do__how>*', 'feel_*__feel_does__feel_how feel_*__feel_now__how>*']

15_2.q Physically and mentally do you feel okay?
['feel_*__feel_do feel_*__feel_physically physically>*']

2308_4.q You played two and a half hours two days ago, two hours fifteen minutes today. Aggressive tennis. Physically do you feel fine?
['feel_*__feel_do feel_*__feel_physically physically>*']

4595_2.q Is that the longest match of your career? How are you feeling physically now?
['is>*', 'feeling_*__feeling_are__feeling_now feeling_

Unnamed: 0,0,1,2,3,4,5,6,7,type_id
was_*,1.11005,1.489184,1.158403,1.239469,1.086001,1.336831,0.347455,1.371045,6.0
was_*__what>*,1.089684,1.455052,1.173272,1.261514,1.070273,1.346836,0.355167,1.40352,6.0
was_*__was_what,1.081862,1.449563,1.176958,1.239616,1.033535,1.359986,0.359882,1.39284,6.0
was_*__was_what__what>*,1.087176,1.448471,1.174467,1.260128,1.056595,1.348275,0.363705,1.402448,6.0
were_*,1.101461,1.442414,1.157626,1.23866,0.984888,1.354865,0.393764,1.371292,6.0


top response:


Unnamed: 0,0,1,2,3,4,5,6,7,type_id
was_just,1.088369,1.454979,1.250559,1.193605,1.033478,1.309714,0.407539,1.378404,6.0
was_actually,1.103721,1.439569,1.192116,1.176255,0.951562,1.354406,0.414391,1.344319,6.0
was_yeah,1.143058,1.46507,1.164378,1.250613,1.109954,1.339046,0.423675,1.390772,6.0
gave_*,1.126723,1.460922,1.183128,1.199738,1.061919,1.282884,0.432826,1.424482,6.0
was_mean,1.144432,1.468158,1.178777,1.22542,1.079049,1.287545,0.433126,1.350299,6.0


top prompts:
1226_7.q Youzhny is a player with a lot of big wins.  He's won against Nadal, won against Federer.  Was it a sufficient match, and what was the approach for you before you went into the game?
['was>* was_*__was_what']

3419_4.q You somehow changed your game towards the end. Was it what your coach that told you to change it?
['was>* was_*__was_what']

3726_12.q Then your reaction at the end of the match, what was your feeling? Was it happy? Relief? A little bit of unhappiness because you almost let it go?
['was_*__was_what', 'was>*', '', '']

1984_8.q Was it mental or what was it?
['was>* was_*__was_what']

5869_5.q How much were you following the US Open at all? What was your reaction to seeing Marin and Kei in the final? Was the result surprising to you?
['how>*', 'was_*__was_what__what>*', 'was>*']

top responses:
2850_8.a Actually, September 1st was the first day back to school, and I missed it (smiling). But it started already. So my friends attended some classes alrea

Unnamed: 0,0,1,2,3,4,5,6,7,type_id
do>*__think_'s__think_*,1.155789,0.99674,1.013655,1.067685,1.062727,1.260903,1.513347,0.531644,7.0
is>*,1.06249,1.067104,1.080466,1.062832,1.071693,1.18933,1.466067,0.538167,7.0
think_'s__think_*,1.169565,0.959485,1.010322,1.08559,1.038018,1.252852,1.516804,0.552625,7.0
like_*__like_do,1.071264,0.986405,1.009678,1.153968,1.063828,1.215064,1.495786,0.581972,7.0
is_*__is_like,1.128784,1.161261,1.067291,0.934017,1.014746,1.132462,1.387028,0.589775,7.0


top response:


Unnamed: 0,0,1,2,3,4,5,6,7,type_id
think_is,1.12977,1.083397,0.934999,1.095067,1.139335,1.220646,1.479282,0.544207,7.0
is_there,1.125157,1.102398,1.057348,1.093204,1.092229,1.258255,1.426755,0.552333,7.0
think_are,1.147441,1.094366,0.939655,1.069253,1.088532,1.184831,1.462403,0.561257,7.0
think_'s,1.135482,1.098581,0.953449,1.067127,1.089396,1.259378,1.460707,0.574519,7.0
's_about,1.088023,1.117607,1.035078,1.076636,1.064573,1.207458,1.427095,0.577848,7.0


top prompts:
6218_1.q Do you think it's just confidence? Is that something that confidence can do in those big moments?
["do>*__think_'s__think_*", 'is>*']

1802_3.q It's obviously very entertaining to watch slices, lobs and dropshots. Do you think it's possible to do that against some of the players you might face later? Is that just a one-off today?
["do>*__think_'s__think_*", 'is>*']

1389_5.q The crowd likes you very much because not only do you play great tennis, but also you have the quality of being able to have fun and joke on the court. But now you seem more calm. Is it because you really want to focus and be the best in tennis? Do you think it's sort of a better discipline?
['is>*', "do>*__think_'s__think_*"]

5447_2.q I know that injuries are a personal thing. There's been a big trend of people pulling out today. Do you think it's something about the surface? Is this all just a coincidence?
["do>*__think_'s__think_*", 'is>*']

4898_10.q Do you think it's about patience durin

### transforming the entire corpus

In [21]:
corpus = pt.transform(corpus)

In [22]:
utt1 = corpus.get_utterance('2020_8.q')

In [23]:
utt1.text

"On a physical level, you're stronger maybe than ever. How do you work hard if the tournament is so close? What do you do in the gym? How do you keep the regimen going on?"

In [24]:
utt1.get_info('prompt_types__prompt_type.8')

0.0

## Storing models

In [25]:
import os

In [26]:
pt.dump_model(os.path.join(ROOT_DIR, 'pt_model'))

dumping embedding model
dumping training embeddings
dumping type model 8


In [27]:
new_pt = PromptTypes(prompt_field='question_motifs', ref_field='arcs_censored', 
                 prompt_filter=question_filter, ref_filter=response_filter,
                 prompt_transform_field='question_motifs__sink',
                 output_field='prompt_types_new', prompt__tfidf_min_df=50,
                 ref__tfidf_min_df=50, 
    random_state=1000, verbosity=1)

In [28]:
pt_model_dir = os.path.join(ROOT_DIR, 'pt_model')
!ls $pt_model_dir

km_model.8.joblib	   svd_model.joblib	   train_ref_ids.npy
prompt_df.8.tsv		   train_prompt_df.8.tsv   train_ref_vects.npy
prompt_tfidf_model.joblib  train_prompt_ids.npy    U_prompt.npy
ref_df.8.tsv		   train_prompt_vects.npy  U_ref.npy
ref_tfidf_model.joblib	   train_ref_df.8.tsv


In [29]:
new_pt.load_model(pt_model_dir)

loading embedding model
loading training embeddings
loading type model 8


In [30]:
utt = new_pt.transform_utterance(utt)

In [32]:
utt.get_info('prompt_types_new__prompt_type.8')

5.0

## Changing the number of types

In [35]:
pt.refit_types(4)

fitting 4 prompt types


In [47]:
for i in range(4):
    print(i)
    pt.display_type(i, type_key=4, k=5)
    print('\n\n')

0
top prompt:


Unnamed: 0,0,1,2,3,type_id
will>*,0.63095,1.298031,1.065377,1.418889,0.0
are>*__going_*,0.679646,1.262736,1.145024,1.406238,0.0
are_*__are_what,0.686385,1.343151,0.991287,1.467312,0.0
going_*__going_are,0.690053,1.25175,1.158506,1.423621,0.0
going_*,0.706212,1.269886,1.170854,1.368935,0.0


top response:


Unnamed: 0,0,1,2,3,type_id
take_will,0.725783,1.161619,1.122077,1.34526,0.0
know_yet,0.739357,1.295608,1.023017,1.251791,0.0
have_'ll,0.749098,1.28924,0.960487,1.480158,0.0
will_*,0.785161,1.262358,1.010793,1.288739,0.0
have_hopefully,0.78672,1.110688,1.054252,1.460314,0.0





1
top prompt:


Unnamed: 0,0,1,2,3,type_id
do>*__feel_*__feel_do__feel_playing,1.170831,0.508297,1.12055,1.303392,1.0
feeling_*__feeling_are__how>*,1.177613,0.512859,1.191507,1.243356,1.0
feel_*__feel_do__how>*,1.167678,0.512973,1.191546,1.284891,1.0
feel_*__feel_now,1.107351,0.532761,1.152466,1.29033,1.0
feeling_*__how>*,1.212046,0.532931,1.227951,1.174939,1.0


top response:


Unnamed: 0,0,1,2,3,type_id
feel_confident,1.135918,0.578863,1.127169,1.281757,1.0
'm_physically,1.128963,0.581577,1.130372,1.243505,1.0
feel_physically,1.152812,0.583199,1.137885,1.266194,1.0
feel_good,1.147258,0.594057,1.132743,1.265594,1.0
feels_yeah,1.203118,0.597427,1.026486,1.326435,1.0





2
top prompt:


Unnamed: 0,0,1,2,3,type_id
do>*__think_'s__think_*,1.004293,1.195127,0.598695,1.435172,2.0
think_'s__think_*,0.981542,1.197007,0.609744,1.43512,2.0
think_*__think_do,0.93545,1.225525,0.627383,1.424596,2.0
is_*__is_like,1.111017,1.074835,0.628596,1.316906,2.0
is>*,1.026136,1.131318,0.630521,1.394571,2.0


top response:


Unnamed: 0,0,1,2,3,type_id
think_is,1.060018,1.14232,0.611579,1.409632,2.0
think_are,1.066465,1.114231,0.612049,1.389874,2.0
think_'s,1.060338,1.174667,0.629274,1.386505,2.0
is_there,1.073984,1.184221,0.643225,1.359634,2.0
think_now,0.976245,1.106304,0.644016,1.384676,2.0





3
top prompt:


Unnamed: 0,0,1,2,3,type_id
was_*__was_what,1.295417,1.241937,1.300852,0.428956,3.0
was_*__what>*,1.304098,1.22849,1.314929,0.437765,3.0
was_*__was_what__what>*,1.297477,1.230933,1.312691,0.439647,3.0
were_*,1.288189,1.243271,1.277479,0.44014,3.0
was_*,1.335885,1.213531,1.286668,0.445144,3.0


top response:


Unnamed: 0,0,1,2,3,type_id
was_actually,1.290001,1.245703,1.251458,0.445544,3.0
was_just,1.308849,1.208106,1.294344,0.471725,3.0
took_so,1.248846,1.235757,1.286594,0.484393,3.0
gave_*,1.313197,1.169273,1.33038,0.488466,3.0
was_surprised,1.299895,1.245011,1.271755,0.492075,3.0







## Example variation: using arcs instead of motifs

In [44]:
pt_arcs = PromptTypes(prompt_field='arcs_censored', ref_field='arcs_censored', 
                 prompt_filter=question_filter, ref_filter=response_filter,
                 prompt_transform_field='arcs_censored',
                 output_field='prompt_types_arcs', prompt__tfidf_min_df=50,
                 ref__tfidf_min_df=50, n_types=4,
    random_state=1000, verbosity=1)

In [45]:
pt_arcs.fit(corpus)

fitting 78215 input pairs
fitting ref tfidf model
fitting prompt tfidf model
fitting svd model
fitting 4 prompt types


In [46]:
for i in range(4):
    print(i)
    pt_arcs.display_type(i,  k=5)
    print('\n\n')

0
top prompt:


Unnamed: 0,0,1,2,3,type_id
feel_does,0.665389,0.96588,1.331548,1.141363,0.0
feel_do,0.674004,1.103788,1.349711,1.08503,0.0
feel_now,0.702311,1.164617,1.280405,1.125567,0.0
feel_*,0.709954,1.191471,1.180174,1.217915,0.0
feeling_now,0.720253,1.154975,1.247254,1.152868,0.0


top response:


Unnamed: 0,0,1,2,3,type_id
'm_fit,0.656866,1.124501,1.303152,1.183858,0.0
feels_good,0.675811,1.071909,1.271421,1.159534,0.0
feels_*,0.685173,1.041319,1.307112,1.151629,0.0
feels_yeah,0.693127,1.113668,1.293664,1.172959,0.0
feel_been,0.695478,1.13734,1.295989,1.19268,0.0





1
top prompt:


Unnamed: 0,0,1,2,3,type_id
like_*,1.066842,0.592536,1.369603,0.930745,1.0
there>*,1.080142,0.608612,1.203342,1.065093,1.0
's_*,1.056124,0.616678,1.432085,0.859614,1.0
is>*,0.982752,0.618435,1.411804,0.96311,1.0
wonder_*,1.109403,0.626687,1.291283,0.964637,1.0


top response:


Unnamed: 0,0,1,2,3,type_id
understand_*,1.058724,0.589922,1.250244,1.024241,1.0
like_do,0.998437,0.618985,1.370673,0.925776,1.0
think_do,1.075442,0.622273,1.307021,0.957153,1.0
respect_*,1.064281,0.626583,1.296694,0.978071,1.0
's_like,1.082253,0.633566,1.27681,1.043093,1.0





2
top prompt:


Unnamed: 0,0,1,2,3,type_id
was_*,1.173871,1.256789,0.422653,1.369839,2.0
was>*,1.172642,1.250716,0.44274,1.372602,2.0
were_*,1.170927,1.227038,0.443781,1.349003,2.0
was_what,1.191153,1.272507,0.444555,1.357529,2.0
were_in,1.154412,1.246567,0.453565,1.358357,2.0


top response:


Unnamed: 0,0,1,2,3,type_id
was_actually,1.180324,1.244486,0.464286,1.354411,2.0
was_yeah,1.192064,1.305419,0.47827,1.36117,2.0
was_just,1.157896,1.277145,0.481335,1.366745,2.0
was_really,1.156121,1.291358,0.483771,1.37089,2.0
was_there,1.161162,1.242514,0.486821,1.382533,2.0





3
top prompt:


Unnamed: 0,0,1,2,3,type_id
are_what,1.232947,1.039911,1.451706,0.653125,3.0
expect_do,1.152844,1.033127,1.46143,0.659696,3.0
will>*,1.197333,1.060435,1.453046,0.663972,3.0
be_will,1.17233,1.108597,1.459242,0.682239,3.0
going_'re,1.149314,1.054815,1.458452,0.697464,3.0


top response:


Unnamed: 0,0,1,2,3,type_id
see_'ll,1.100716,1.058801,1.395763,0.718711,3.0
have_'ll,1.187852,0.978814,1.476159,0.729195,3.0
see_happens,1.135575,1.113555,1.375629,0.743777,3.0
see_will,1.091617,1.055427,1.348109,0.745719,3.0
see_goes,1.092629,1.105348,1.381137,0.746129,3.0







In [54]:
df = pt.train_types[8]['prompt_df']

In [55]:
df[]

 7.0    17273
 6.0    13076
 1.0     8328
-1.0     7765
 4.0     5791
 5.0     4379
 2.0     4213
 3.0     1775
 0.0     1567
Name: type_id, dtype: int64

In [56]:
pt.display_type(7, corpus=corpus, k=15)

top prompt:


Unnamed: 0,0,1,2,3,4,5,6,7,type_id
do>*__think_'s__think_*,1.155789,0.99674,1.013655,1.067685,1.062727,1.260903,1.513347,0.531644,7.0
is>*,1.06249,1.067104,1.080466,1.062832,1.071693,1.18933,1.466067,0.538167,7.0
think_'s__think_*,1.169565,0.959485,1.010322,1.08559,1.038018,1.252852,1.516804,0.552625,7.0
like_*__like_do,1.071264,0.986405,1.009678,1.153968,1.063828,1.215064,1.495786,0.581972,7.0
is_*__is_like,1.128784,1.161261,1.067291,0.934017,1.014746,1.132462,1.387028,0.589775,7.0
like_*__like_do__like_what,1.15826,1.085909,1.07393,1.106,1.081123,1.172475,1.445129,0.596329,7.0
think_*__think_is__think_why,1.106006,1.153582,1.005937,1.060637,1.031982,1.275838,1.354624,0.596931,7.0
like_*__what>*__what>do,1.067488,1.098119,1.102917,1.078895,1.039799,1.149338,1.420797,0.602374,7.0
's_*,1.058483,0.873674,1.098032,1.07907,0.990816,1.222524,1.540646,0.610405,7.0
like_*__like_do__what>*,1.079468,1.042654,1.095969,1.105133,1.049147,1.168217,1.453664,0.610522,7.0


top response:


Unnamed: 0,0,1,2,3,4,5,6,7,type_id
think_is,1.12977,1.083397,0.934999,1.095067,1.139335,1.220646,1.479282,0.544207,7.0
is_there,1.125157,1.102398,1.057348,1.093204,1.092229,1.258255,1.426755,0.552333,7.0
think_are,1.147441,1.094366,0.939655,1.069253,1.088532,1.184831,1.462403,0.561257,7.0
think_'s,1.135482,1.098581,0.953449,1.067127,1.089396,1.259378,1.460707,0.574519,7.0
's_about,1.088023,1.117607,1.035078,1.076636,1.064573,1.207458,1.427095,0.577848,7.0
think_always,1.16045,1.146366,1.056997,1.030997,1.148802,1.090964,1.443062,0.579397,7.0
's_really,1.102912,1.138633,1.041846,1.07341,1.062061,1.272286,1.408632,0.579524,7.0
are_*,1.150882,1.140159,0.990419,1.088843,1.123055,1.227889,1.412462,0.584224,7.0
's_for,1.174044,1.151047,0.933325,1.103528,1.145505,1.252021,1.403483,0.584919,7.0
's_have,1.134782,1.183457,1.011616,1.062724,1.103204,1.238154,1.378383,0.584923,7.0


top prompts:
6218_1.q Do you think it's just confidence? Is that something that confidence can do in those big moments?
["do>*__think_'s__think_*", 'is>*']

1802_3.q It's obviously very entertaining to watch slices, lobs and dropshots. Do you think it's possible to do that against some of the players you might face later? Is that just a one-off today?
["do>*__think_'s__think_*", 'is>*']

1389_5.q The crowd likes you very much because not only do you play great tennis, but also you have the quality of being able to have fun and joke on the court. But now you seem more calm. Is it because you really want to focus and be the best in tennis? Do you think it's sort of a better discipline?
['is>*', "do>*__think_'s__think_*"]

5447_2.q I know that injuries are a personal thing. There's been a big trend of people pulling out today. Do you think it's something about the surface? Is this all just a coincidence?
["do>*__think_'s__think_*", 'is>*']

4898_10.q Do you think it's about patience durin