-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data processing for the MAN model #7
Comments
Hi for MAN, to find out the positions of premise and hypothesis, we can make use of the segment ID. For example, in my code, segment ID of 0 means dialogue/context and segment ID of 1 means the concatenation of question and answer. But of course, you can also use special mask to indicate which tokens belong to premise and which belong to hypothesis. They are serving the same purpose. Lines 152 and 156 are for setting up segment IDs. |
Thanks a lot for the quick response! I guess I had one misunderstanding before but just to clarify: Is hypothesis the concatenation of question and answer or is it answer only? From the other conversion here, it seems hypothesis is answer only? If so, then I guess the segment ID cannot be directly used to derive the two masks, since 1 in segment ID means the concatenation of question and answer. Thanks again. |
Good question, I have tried both versions: hypothesis consists of question and answer, hypothesis consists only the answer. At lease for the DREAM dataset, I did not find out much difference. The first choice also makes sense in intuition: we carry with the information from both question and answer and seek to find out the most relevant information from the context and see whether we can find out the evidence to support this certain pair of question and answer, which is also similar to the factual correctness task. |
Got it and thanks again. I will use the first choice. I agree it makes more sense and also requires less changes to the data processing code, i.e., I can directly use the segment ID to derive the two additional masks. Two other quick questions (sorry):
|
|
Thanks a lot |
Thanks again for your help and I am now able to train the model with MAN (i.e., BertForMultipleChoice_SAN). Just one quick question. When training, in the log, I see this warning: I checked the code and I think this is because in the modeling.py, it is trying to randomly initialize the variable "classifiers" while there is no such variable in BertForMultipleChoice_SAN. Can I simply ignore this warning or should I do the following initialization at Line 776: logger.info("Randomly initialize the top level classifiers!")
for i in range(len(model.out_proj)):
model.out_proj[i].classifier.proj.weight.data.normal_(mean=0.0, std=config.initializer_range)
model.out_proj[i].classifier.proj.bias.data.zero_() |
The code you mentioned is all right, while you can also ignore since pytorch will help initialize any tensor with default initialization method. |
I see. Thanks. |
Hi Di, thanks a lot for sharing the code of this QA system. I have been trying to apply it to my own data. I skipped the pre-training and the multi-task learning; instead, I was trying to apply the MAN architecture (i.e., BertForMultipleChoice_SAN) to a single dataset of my own.
I didn't find the exact code to produce the two additional masks: premise_mask and hyp_mask, so I tried to implement them myself. Sorry for the ask but I am wondering if my following implementation makes sense:
In run_classifier_bert.py:
a) Right after Line 143, I added the following. My understanding is that the hypothesis contains the answer only, so we should do this after tokens_b gets is "fair share" from the total max_len but before we concat it with question in the next line.
hypothesis = ["[CLS]"] + tokens_b + ["[SEP]"]
b) Then, before Line 151, I added the following:
Sorry for the long message but I am wondering if the above additional data processing looks correct in order to use MAN? Many thanks!
The text was updated successfully, but these errors were encountered: