Data processing for the MAN model #7

songdezhao · 2021-12-23T21:51:21Z

Hi Di, thanks a lot for sharing the code of this QA system. I have been trying to apply it to my own data. I skipped the pre-training and the multi-task learning; instead, I was trying to apply the MAN architecture (i.e., BertForMultipleChoice_SAN) to a single dataset of my own.

I didn't find the exact code to produce the two additional masks: premise_mask and hyp_mask, so I tried to implement them myself. Sorry for the ask but I am wondering if my following implementation makes sense:

In run_classifier_bert.py:
a) Right after Line 143, I added the following. My understanding is that the hypothesis contains the answer only, so we should do this after tokens_b gets is "fair share" from the total max_len but before we concat it with question in the next line.
hypothesis = ["[CLS]"] + tokens_b + ["[SEP]"]

b) Then, before Line 151, I added the following:

        # The premise is the concatenation of the passage/dialogue and the question plus additional special tokens
        premise = ["[CLS]"] + tokens_a + ["[SEP]"] + tokens_c + ["[SEP]"]
        
        # Convert to IDs
        premise_ids = tokenizer.convert_tokens_to_ids(premise)
        hypothesis_ids = tokenizer.convert_tokens_to_ids(hypothesis)
        
        # Compute how much to pad and then build the mask with the actual content length and the pad length
        premise_pad_length = max_seq_length - len(premise_ids)
        # create a mask with the actual content only
        premise_mask = [1] * len(premise_ids)
        # do padding
        premise_ids += [0] * premise_pad_length
        # append the padded length to mask
        premise_mask += [0] * premise_pad_length

        hypothesis_pad_length = max_seq_length - len(hypothesis_ids)
        hypothesis_mask = [1] * len(hypothesis_ids)
        hypothesis_ids += [0] * hypothesis_pad_length
        hypothesis_mask += [0] * hypothesis_pad_length

With the above, I also modified InputFeatures to include these two additional masks and have them passed along to the forward function

Sorry for the long message but I am wondering if the above additional data processing looks correct in order to use MAN? Many thanks!

The text was updated successfully, but these errors were encountered:

jind11 · 2021-12-23T23:20:51Z

Hi for MAN, to find out the positions of premise and hypothesis, we can make use of the segment ID. For example, in my code, segment ID of 0 means dialogue/context and segment ID of 1 means the concatenation of question and answer. But of course, you can also use special mask to indicate which tokens belong to premise and which belong to hypothesis. They are serving the same purpose. Lines 152 and 156 are for setting up segment IDs.

songdezhao · 2021-12-23T23:46:01Z

Thanks a lot for the quick response!

I guess I had one misunderstanding before but just to clarify: Is hypothesis the concatenation of question and answer or is it answer only?

From the other conversion here, it seems hypothesis is answer only? If so, then I guess the segment ID cannot be directly used to derive the two masks, since 1 in segment ID means the concatenation of question and answer.

Thanks again.

jind11 · 2021-12-23T23:52:14Z

Good question, I have tried both versions: hypothesis consists of question and answer, hypothesis consists only the answer. At lease for the DREAM dataset, I did not find out much difference. The first choice also makes sense in intuition: we carry with the information from both question and answer and seek to find out the most relevant information from the context and see whether we can find out the evidence to support this certain pair of question and answer, which is also similar to the factual correctness task.

songdezhao · 2021-12-24T00:22:51Z

Got it and thanks again. I will use the first choice. I agree it makes more sense and also requires less changes to the data processing code, i.e., I can directly use the segment ID to derive the two additional masks.

Two other quick questions (sorry):

I assume this is the model: BertForMultipleChoice_SAN? I am asking because I also see "SAN2".
For initialization, I simply replaced the general model with the following. For "opt", I simply put "use_SAN" there. Are there any other things I should put into this "opt" parameter?
model = BertForMultipleChoice_SAN.from_pretrained(args.bert_model, opt={"use_SAN": 1}, num_choices=[5])

jind11 · 2021-12-24T00:35:20Z

Yes
That's it .

songdezhao · 2021-12-24T00:35:37Z

Thanks a lot

songdezhao · 2021-12-24T05:28:34Z

Thanks again for your help and I am now able to train the model with MAN (i.e., BertForMultipleChoice_SAN).

Just one quick question. When training, in the log, I see this warning:
pytorch_pretrained_bert.modeling - Failed for Randomly initialize the top level classifiers!

I checked the code and I think this is because in the modeling.py, it is trying to randomly initialize the variable "classifiers" while there is no such variable in BertForMultipleChoice_SAN. Can I simply ignore this warning or should I do the following initialization at Line 776:

logger.info("Randomly initialize the top level classifiers!")
for i in range(len(model.out_proj)):
    model.out_proj[i].classifier.proj.weight.data.normal_(mean=0.0, std=config.initializer_range)
    model.out_proj[i].classifier.proj.bias.data.zero_()

jind11 · 2021-12-24T07:46:11Z

The code you mentioned is all right, while you can also ignore since pytorch will help initialize any tensor with default initialization method.

songdezhao · 2021-12-24T07:48:23Z

I see. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data processing for the MAN model #7

Data processing for the MAN model #7

songdezhao commented Dec 23, 2021

jind11 commented Dec 23, 2021

songdezhao commented Dec 23, 2021

jind11 commented Dec 23, 2021

songdezhao commented Dec 24, 2021

jind11 commented Dec 24, 2021

songdezhao commented Dec 24, 2021

songdezhao commented Dec 24, 2021

jind11 commented Dec 24, 2021

songdezhao commented Dec 24, 2021

Data processing for the MAN model #7

Data processing for the MAN model #7

Comments

songdezhao commented Dec 23, 2021

jind11 commented Dec 23, 2021

songdezhao commented Dec 23, 2021

jind11 commented Dec 23, 2021

songdezhao commented Dec 24, 2021

jind11 commented Dec 24, 2021

songdezhao commented Dec 24, 2021

songdezhao commented Dec 24, 2021

jind11 commented Dec 24, 2021

songdezhao commented Dec 24, 2021