The notebook generates utterance pairs from tagged utterances, where tags are {B-START, I-START, B-OTHER, I-OTHER, O}.

From the start of every conversation, i.e. utterances with B-START tag, an utterance-reponse pair is construct by assuming the next utterance in the conversation is a reponse. We stop the pairing if we reach the end of the conversation.

e.g.


# Import Libraries and Data

In [10]:
import pandas as pd
import os
import datetime
pd.set_option('display.max_colwidth', -1)
pd.set_option("display.max_rows", 1000)

In [11]:
df_to_generate = pd.read_csv('../data/predicted/seq_label_ner-v4-pred.csv', index_col=[0])

In [12]:
df_to_generate.head()

Unnamed: 0,para,pred_label
0,Chapter 19,O
1,"The next day opened a new scene at Longbourn. Mr Collins made his declaration in form. Having resolved to do it without loss of time, as his leave of absence extended only to the following Saturday, and having no feelings of diffidence to make it distressing to himslf even at the moment, he set about it in a very orderly manner, with all the observances, which he supposed a regular part of the business. On finding Mrs Bennet, Elizabeth, and one of the younger girls together, soon after breakfast, he addressed the mother in these words:",O
2,"""May I hope, madam, for your interest with your fair daughter Elizabeth, when I solicit for the honour of a private audience with her in the course of this morning?""",B-START
3,"Before Elizabeth had time for anything but a blush of surprise, Mrs Bennet answered instantly,",O
4,"""Oh dear!—yes—certainly. I am sure Lizzy will be very happy—I am sure she can have no objection. Come, Kitty, I want you up stairs.""",B-OTHER


In [13]:
def add_chapters(df):
    chapter_dict = dict()
    chapter_tag = ''
    for i in df.index:
        curr_chapter_tag = df.loc[i]['chapter_tag']
        if curr_chapter_tag == '':
            if chapter_tag != '':
                chapter_dict[i] = chapter_tag
            else:
                chapter_tag = curr_chapter_tag
                chapter_dict[i] = chapter_tag
        else:
            if chapter_tag == curr_chapter_tag:
                chapter_dict[i] = chapter_tag
            else:
                chapter_tag = curr_chapter_tag
                chapter_dict[i] = chapter_tag
    return chapter_dict

In [14]:
df_to_generate['chapter_tag'] = df_to_generate['para'].apply(lambda x: x if 'chapter ' in x.lower() else '')
df_to_generate['chapter_tag'] = list(x[1] for x in add_chapters(df_to_generate).items())

In [15]:
df_to_generate.shape

(281, 3)

# Generate Utterance Pairs

In [16]:
df_utter = df_to_generate[df_to_generate['pred_label']!='O']
utter_list = list(df_utter.apply(lambda row: row['para'], axis=1).values)

print('Number of utterances: {}'.format(len(utter_list)))

utter_pair_list = []
for idx, i in enumerate(utter_list):
    if idx < (len(utter_list)-1):
        utter_pair_list.append((i,)+(utter_list[idx+1],))
utter_pair_list.append(('',)+('',))

df_pairs = pd.DataFrame()
df_pairs['utter_1'] = [x[0] for x in utter_pair_list]
df_pairs['utter_2'] = [x[1] for x in utter_pair_list]
df_pairs['label'] = df_utter['pred_label'].values
df_pairs['chapter_tag'] = df_utter['chapter_tag'].values

indices_to_rm = []
for i in df_pairs.index:
    if (df_pairs.loc[i]['label'] == 'B-START') & (i!=0):
        indices_to_rm.append(i-1)

df_pairs = df_pairs.drop(indices_to_rm, axis=0)
df_pairs = df_pairs.reset_index()
del df_pairs['index']

test_set_chapters = ['Chapter {}'.format(x) for x in range(19,27)]
validation_set_chapters = ['Chapter {}'.format(x) for x in range(27,34)]
def custom_train_test_split(field):
    if field in test_set_chapters:
        return 'test'
    elif field in validation_set_chapters:
        return 'validation'
    else:
        return 'train'
df_pairs['split_tag'] = df_pairs['chapter_tag'].apply(lambda x: custom_train_test_split(x))

print('Generated {} utterance pairs'.format(df_pairs.shape[0]))

print('Saving to csv..')

# Save as csv
NAME = 'seqlab-v4'
dirname = os.path.dirname('__file__')
output_path = os.path.join(dirname, '../data/utterance_pairs/')
current_year = str(datetime.datetime.now())[0:10]
csv_name = '{}-utter-pairs-{}.csv'.format(NAME, current_year)
df_pairs.to_csv(output_path + csv_name)

print('Done')

Number of utterances: 173
Generated 143 utterance pairs
Saving to csv..
Done


# Preview

In [17]:
# preview
df_pairs_seqlabel_v4 = pd.read_csv('../data/utterance_pairs/seqlab-v4-utter-pairs-2019-05-06.csv', index_col=[0])

In [18]:
df_pairs_seqlabel_v4.head(30)

Unnamed: 0,utter_1,utter_2,label,chapter_tag,split_tag
0,"""May I hope, madam, for your interest with your fair daughter Elizabeth, when I solicit for the honour of a private audience with her in the course of this morning?""","""Oh dear!—yes—certainly. I am sure Lizzy will be very happy—I am sure she can have no objection. Come, Kitty, I want you up stairs.""",B-START,Chapter 19,test
1,"""Oh dear!—yes—certainly. I am sure Lizzy will be very happy—I am sure she can have no objection. Come, Kitty, I want you up stairs.""","""Dear madam, do not go. I beg you will not go. Mr Collins must excuse me. He can have nothing to say to me that anybody need not hear. I am going away myself.""",B-OTHER,Chapter 19,test
2,"""Dear madam, do not go. I beg you will not go. Mr Collins must excuse me. He can have nothing to say to me that anybody need not hear. I am going away myself.""","""No, no, nonsense, Lizzy. I desire you to stay where you are.""",B-OTHER,Chapter 19,test
3,"""No, no, nonsense, Lizzy. I desire you to stay where you are.""","""Lizzy, I insist upon your staying and hearing Mr Collins.""",B-OTHER,Chapter 19,test
4,"""Lizzy, I insist upon your staying and hearing Mr Collins.""","""Believe me, my dear MissElizabeth, that your modesty, so far from doing you any disservice, rather adds to your other perfections. You would have been less amiable in my eyes had there not been this little unwillingness; but allow me to assure you, that I have your respected mother's permisson for this address. You can hardly doubt the purport of my discourse, however your natural delicacy may lead you to dissemble; my attentions have been too marked to be mistaken. Almost as soon as I entered the house, I singled you out as the companion of my future life. But before I am run away with by my feelings on this subject, perhaps it would be advisable for me to state my reasons for marrying—and, moreover, for coming into Hertfordshire with the design of selecting a wife, as I certainly did.""",I-OTHER,Chapter 19,test
5,"""Believe me, my dear MissElizabeth, that your modesty, so far from doing you any disservice, rather adds to your other perfections. You would have been less amiable in my eyes had there not been this little unwillingness; but allow me to assure you, that I have your respected mother's permisson for this address. You can hardly doubt the purport of my discourse, however your natural delicacy may lead you to dissemble; my attentions have been too marked to be mistaken. Almost as soon as I entered the house, I singled you out as the companion of my future life. But before I am run away with by my feelings on this subject, perhaps it would be advisable for me to state my reasons for marrying—and, moreover, for coming into Hertfordshire with the design of selecting a wife, as I certainly did.""","""My reasons for marrying are, first, that I think it a right thing for every clergyman in easy circumsances (like myself) to set the example of matrimony in his parish; secondly, that I am convinced that it will add very greatly to my happiness; and thirdly—which perhaps I ought to have mentioned earlier, that it is the particular advice and recommendation of the very noble lady whom I have the honour of calling patroness. Twice has she condescended to give me her opinion (unasked too!) on this subject; and it was but the very Saturday night before I left Hunsford—between our pools at quadrille, while Mrs Jenkinson was arranging Missde Bourgh's footstool, that she said, 'Mr Collins, you must marry. A clergyman like you must marry. Choose properly, choose a gentlewoman for my sake; and for your own, let her be an active, useful sort of person, not brought up high, but able to make a small income go a good way. This is my advice. Find such a woman as soon as you can, bring her to Hunsford, and I will visit her.' Allow me, by the way, to observe, my fair cousin, that I do not reckon the notice and kindness of Lady Catherine de Bourgh as among the least of the advantages in my power to offer. You will find her manners beyond anything I can describe; and your wit and vivacity, I think, must be acceptable to her, especially when tempered with the silence and respect which her rank will inevitably excite. Thus much for my general intention in favour of matrimony; it remains to be told why my views were directed towards Longbourn instead of my own neighbourhood, where I can assure you there are many amiable young women. But the fact is, that being, as I am, to inherit this estate after the death of your honoured father (who, however, may live many years longer), I could not satisfy myself without resolving to choose a wife from among his daughters, that the loss to them might be as little as possible, when the melancholy event takes place—which, however, as I have already said, may not be for several years. This has been my motive, my fair cousin, and I flatter myself it will not sink me in your esteem. And now nothing remains for me but to assure you in the most animated language of the violence of my affection. To fortune I am perfectly indifferent, and shall make no demand of that nature on your father, since I am well aware that it could not be complied with; and that one thousand pounds in the four per cents, which will not be yours till after your mother's decease, is all that you may ever be entitled to. On that head, therefore, I shall be uniformly silent; and you may assure yourself that no ungenerous reproach shall ever pass my lips when we are married.""",B-OTHER,Chapter 19,test
6,"""My reasons for marrying are, first, that I think it a right thing for every clergyman in easy circumsances (like myself) to set the example of matrimony in his parish; secondly, that I am convinced that it will add very greatly to my happiness; and thirdly—which perhaps I ought to have mentioned earlier, that it is the particular advice and recommendation of the very noble lady whom I have the honour of calling patroness. Twice has she condescended to give me her opinion (unasked too!) on this subject; and it was but the very Saturday night before I left Hunsford—between our pools at quadrille, while Mrs Jenkinson was arranging Missde Bourgh's footstool, that she said, 'Mr Collins, you must marry. A clergyman like you must marry. Choose properly, choose a gentlewoman for my sake; and for your own, let her be an active, useful sort of person, not brought up high, but able to make a small income go a good way. This is my advice. Find such a woman as soon as you can, bring her to Hunsford, and I will visit her.' Allow me, by the way, to observe, my fair cousin, that I do not reckon the notice and kindness of Lady Catherine de Bourgh as among the least of the advantages in my power to offer. You will find her manners beyond anything I can describe; and your wit and vivacity, I think, must be acceptable to her, especially when tempered with the silence and respect which her rank will inevitably excite. Thus much for my general intention in favour of matrimony; it remains to be told why my views were directed towards Longbourn instead of my own neighbourhood, where I can assure you there are many amiable young women. But the fact is, that being, as I am, to inherit this estate after the death of your honoured father (who, however, may live many years longer), I could not satisfy myself without resolving to choose a wife from among his daughters, that the loss to them might be as little as possible, when the melancholy event takes place—which, however, as I have already said, may not be for several years. This has been my motive, my fair cousin, and I flatter myself it will not sink me in your esteem. And now nothing remains for me but to assure you in the most animated language of the violence of my affection. To fortune I am perfectly indifferent, and shall make no demand of that nature on your father, since I am well aware that it could not be complied with; and that one thousand pounds in the four per cents, which will not be yours till after your mother's decease, is all that you may ever be entitled to. On that head, therefore, I shall be uniformly silent; and you may assure yourself that no ungenerous reproach shall ever pass my lips when we are married.""","""You are too hasty, sir,""",B-OTHER,Chapter 19,test
7,"""You are too hasty, sir,""","""You forget that I have made no answer. Let me do it without further loss of time. Accept my thanks for the compliment you are paying me. I am very sensible of the honour of your proposals, but it is impossible for me to do otherwise than to decline them.""",B-OTHER,Chapter 19,test
8,"""You forget that I have made no answer. Let me do it without further loss of time. Accept my thanks for the compliment you are paying me. I am very sensible of the honour of your proposals, but it is impossible for me to do otherwise than to decline them.""","""I am not now to learn,""",I-OTHER,Chapter 19,test
9,"""I am not now to learn,""","""that it is usual with young ladies to reject the addresses of the man whom they secretly mean to accept, when he first applies for their favour; and that sometimes the refusal is repeated a second, or even a third time. I am therefore by no means discouraged by what you have just said, and shall hope to lead you to the altar ere long.""",B-OTHER,Chapter 19,test
