New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to represent sentence in Template Denoising step? #13
Comments
Given a template, The sentence : ‘[X]’ means [MASK] . I understand that you use [MASK]'s vector to represent X. I am curious which one did you use to represent template biases, given The sentence : ‘’ means [MASK] . without [X]. |
For me, I tried both [MASK] and [CLS] to represent template biases. The first performed badly, with 75.89 average score; while the second one got 78.54 performance. |
Thanks for your interest in out work. For template denoising, we use [MASK] with same position ids of template tokens. For example, if we use template like You can also refer to this issue: #11 (comment). |
Thank you for the quick response! However, I do use the method you described but found it performed worse than a version using [CLS] to represent template biases. For now, I got a version using [MASK] can get 75.89, and the other version using [CLS] can get 78.93. I am curious did you do any tests about using [CLS] to represent sentence bias? If so, what kind of performance can you get? Many thanks! |
Hello, can you share your implementation? Prompt-BERT/prompt_bert/models.py Lines 78 to 108 in 8c0cb4c
Prompt-BERT/prompt_bert/models.py Lines 110 to 113 in 8c0cb4c
Prompt-BERT/prompt_bert/models.py Lines 163 to 177 in 8c0cb4c
Our results are from ./run.sh unsup-roberta $SEED .
For the results of [CLS], I have not tried [CLS] as template denoising. |
Sure, thanks for the details of your implementation. I actually have already read them. But anyway, appreciate it! I attach the main part for computing template denoising below: In my code, the original input is I first concatenate the bs and es together without adding [s1]: Then I get the position of [MASK] in the template: Next, I call the sentence encoder to get template representation. Note that here I only use new_src_tokens for computing position ids, which I will cover later. I use bs_length and x_length to get the index when I compute the
At last, I get the mask representation: Inside the self.sentence_encoder, I copy the main part --which is the part that computes the positional ids below: For your reference, the original transformer in fairseq computes the positional ids in this way (given input sequence tokens): That's the main part of the code. For all other parts, I use the default code in fairseq to compute attentions, segment embeddings, etc. Thanks for your time and looking forward to your reply!! |
Indeed, I found one slight difference between my code with yours is that my template has a space between the word 'means' and [mask]; but I found your version did not have a space according to https://github.com/kongds/Prompt-BERT/blob/main/train.py#L145 Do you think this slight difference would cause a performance difference? |
Hello, I think space will not cause the performance difference. For |
I see, that's a great point. I will remove the pad in src_tokens and train the model again. Many thanks and enjoy the weekend! |
Thank you Lines 674 to 699 in 8c0cb4c
|
I see. So basically you encode each sentence again and prepend bs in front and append es afterward. Because you use a for loop to do this one by one, so [pad] will not appear before es in your case. Can I ask a stupid question, it seems you add the [pad] after getting all sentences. However, do you add the pad (line 698) after the bos token? Is it the correct way to do it? An alternative way is adding [pad] between '.' and [SEP], which way shall I use? |
The [SEP] is already in Line 675 in 8c0cb4c
Although using |
That makes sense. Thank you so much for your insightful answers! I will let you whether it works or not. And I could open-source your model in fairseq architecture once I finished my project. |
Hi, Sorry to bother you again. I changed the way to calculate new_src_tokens and x_length accordingly. The performance of using [MASK] and [CLS] both improved slightly, from 75.89 -> 76.52 ([MASK]) and 78.54 -> 79.08 ([CLS]). However, the highest performance for using [MASK] as template denoising can still not reach the performance in your paper. I am curious if you could help me check the code once you have time, many thanks!
Next, I iterate each src_tokens[i] to get its new form (adding the template), i.e., new_src_tokens[i]. Now, we can get the pad length we shall add after to the eos in new_src_tokens[i]: Next, we discuss three situations:
In this way, all new_src_tokens and each x_length[i] are re-computed.I also re-wrote the sentence encoder part to get template representation. I add one more parameter, which is es_length: Inside the self.sentence_encoder, I first compute positional embedding from the shape of new_src_tokens: Thanks for your time and looking forward to your reply!! |
Hello, I don't find problem with the calculation of |
Thanks for the help! Yeah, sometimes it happens for the performance difference between different architectures. For example, my re-implemented SimCSE can get 77.45 in fairseq while the original SimCSE at HuggingFace is 76.57. Anyway, your answer helps me a lot. And I believe the version with [CLS], even with a slight difference from the [MASK]'s one, is still a good reproducing version of PromptBERT. |
Hi there,
I am recently rebuilding your work in fairseq. Your model is really impressive.
I am able to rebuild your results in Table 8, with different templates, I can get 78.41 scores on average (RoBERTa_base as backbone model).
However, when I try to reproduce your default method, which is different templates with denoising, the highest score I can get is 78.54 (RoBERTa_base as backbone model).
I tried using either 1) MASK token's representation to represent the template, or 2) cls token's representation to represent the template at the Template Denoising step.
Can you clarify which method you use as the template biases?
Many thanks!
The text was updated successfully, but these errors were encountered: