a few questions about the 'MBR' decoding strategy. #67

xiaotingxuan · 2023-12-11T07:51:48Z

Hi, your code is incredibly useful. However, I have a few questions regarding the 'MBR' decoding strategy.

In your paper, you mentioned, "We first generate a set of candidate samples S from different random seeds of DIFFUSEQ and select the best output sequence that achieves the minimum expected risk under a meaningful loss function."

As I'm new to the 'MBR' decoding strategy, I might be misunderstanding something. My understanding is that you calculate the BLEU score for each candidate and then select the one with the highest score as the final output. Subsequently, you perform an evaluation on the selected output.

I've noticed that if we evaluate these candidates individually, we get 10 similar results, which aren't impressive. However, after using evaluation metrics to select the output, the result is significantly better. I'm unsure if this is an accurate method, or could it be considered "cheating"? (I apologize if this comes off as offensive.) This is because by using evaluation metrics to select the output, we will naturally obtain better metric results.

Additionally, I have read your code (https://github.com/Shark-NLP/DiffuSeq/blob/8bfafcbb26df218073b8117234afb9de9dfcbec9/scripts/eval_seq2seq.py#L16C1-L26C1), and it appears that you didn't calculate the BLEU score between the recovered sentences and the reference sentences. Instead, you only calculated the BLEU score between the recovered sentences and the other candidate recovered sentences.

zzbuzzard · 2023-12-12T01:38:56Z

Hi! Not an author, but I've looked at this code too (eval_seq2seq). I believe MBR here computes the BLEU score between each pair of the S candidates (not looking at the reference!), and selects the candidate with the highest BLEU score to other candidates. The BLEU score is then computed between the chosen candidate and the actual reference. It is not cheating as the candidate is chosen without using the reference at all.

Lines 168-177 in eval_seq2seq.py show this fairly clearly - we select the 'best' candidates (by comparing them to one another - not the reference) and then compare these selected candidates to the reference.

xiaotingxuan · 2023-12-20T02:30:55Z

Oh,I got it. Thank you

xiaotingxuan closed this as completed Dec 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a few questions about the 'MBR' decoding strategy. #67

a few questions about the 'MBR' decoding strategy. #67

xiaotingxuan commented Dec 11, 2023

zzbuzzard commented Dec 12, 2023

xiaotingxuan commented Dec 20, 2023

a few questions about the 'MBR' decoding strategy. #67

a few questions about the 'MBR' decoding strategy. #67

Comments

xiaotingxuan commented Dec 11, 2023

zzbuzzard commented Dec 12, 2023

xiaotingxuan commented Dec 20, 2023