Improved Sampling (Nucleus Sampling) #51
In a recent paper, they find that beam search actually does even worse than greedy search for GPT-2 117M, and they propose a new sampling method to improve text output.
From what I understand, it's top-k sampling, except instead of
Here's an example from the paper:
🇰Top-k Sampling (k=8): "I'm sorry, sir." "No, it's okay,
The text was updated successfully, but these errors were encountered:
I am trying to understand this example, now that v0.5 includes nucleus sampling.
I need to read about the meaning of
However, you might be able to tell me more about the temperature for top-k sampling and nucleus sampling. Is it set to 1.0? 0.7? 0.9? Or doesn't it matter too much?
For top-k sampling, temperature should influence the sampling (because probabilities change with temperature), but the top-k tokens (and their order) remain the same.
For nucleus sampling, the parameter
I could not find the info regarding the value of the temperature in the paper. I guess the authors chose temperature = 1, so that it has no effect on the probabilities, according to formula (4). However, I am not sure because Figure 8 relies on a temperature of 0.8.
In the code of this Python module, I see that the temperature is used at this line, even for nucleus sampling. So, it is good to keep in mind that the number of top tokens depends on two parameters (
logits = next_outputs['logits'][:, -1, :] / tf.to_float(temperature) if top_p > 0.0: logits = top_p_logits(logits, p=top_p) else: logits = top_k_logits(logits, k=top_k) samples = tf.multinomial( logits, num_samples=1, output_dtype=tf.int32)
tl;dr: if you change