Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a few questions about the 'MBR' decoding strategy. #67

Closed
xiaotingxuan opened this issue Dec 11, 2023 · 2 comments
Closed

a few questions about the 'MBR' decoding strategy. #67

xiaotingxuan opened this issue Dec 11, 2023 · 2 comments

Comments

@xiaotingxuan
Copy link

Hi, your code is incredibly useful. However, I have a few questions regarding the 'MBR' decoding strategy.

In your paper, you mentioned, "We first generate a set of candidate samples S from different random seeds of DIFFUSEQ and select the best output sequence that achieves the minimum expected risk under a meaningful loss function."

As I'm new to the 'MBR' decoding strategy, I might be misunderstanding something. My understanding is that you calculate the BLEU score for each candidate and then select the one with the highest score as the final output. Subsequently, you perform an evaluation on the selected output.

I've noticed that if we evaluate these candidates individually, we get 10 similar results, which aren't impressive. However, after using evaluation metrics to select the output, the result is significantly better. I'm unsure if this is an accurate method, or could it be considered "cheating"? (I apologize if this comes off as offensive.) This is because by using evaluation metrics to select the output, we will naturally obtain better metric results.

Additionally, I have read your code (https://github.com/Shark-NLP/DiffuSeq/blob/8bfafcbb26df218073b8117234afb9de9dfcbec9/scripts/eval_seq2seq.py#L16C1-L26C1), and it appears that you didn't calculate the BLEU score between the recovered sentences and the reference sentences. Instead, you only calculated the BLEU score between the recovered sentences and the other candidate recovered sentences.

@zzbuzzard
Copy link

Hi! Not an author, but I've looked at this code too (eval_seq2seq). I believe MBR here computes the BLEU score between each pair of the S candidates (not looking at the reference!), and selects the candidate with the highest BLEU score to other candidates. The BLEU score is then computed between the chosen candidate and the actual reference. It is not cheating as the candidate is chosen without using the reference at all.

Lines 168-177 in eval_seq2seq.py show this fairly clearly - we select the 'best' candidates (by comparing them to one another - not the reference) and then compare these selected candidates to the reference.

@xiaotingxuan
Copy link
Author

Oh,I got it. Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants