-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logprobs suggestions #948
Logprobs suggestions #948
Conversation
@@ -8,11 +8,12 @@ | |||
"\n", | |||
"This notebook demonstrates the use of the `logprobs` parameter in the Chat Completions API. When `logprobs` is enabled, the API returns the log probabilities of each output token, along with a limited number of the most likely tokens at each token position and their log probabilities. The relevant request parameters are:\n", | |||
"* `logprobs`: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message. This option is currently not available on the `gpt-4-vision-preview` model.\n", | |||
"* `top_logprobs`: An integer between 0 and 5 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.\n", | |||
"* `top_logprobs`: An integer between 0 and 5 specifying the number of most likely tokens to return at each token position, each with an associated log probability. `logprobs` must be set to true if this parameter is used.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just adding backticks
"Log probabilities of output tokens indicate the likelihood of each token occurring in the sequence given the context. To simplify, a logprob is `log(p)`, where `p` = probability of a token occurring at a specific position based on the other tokens in the sentence. Some key points about `logprobs`:\n", | ||
"* Higher log probabilities suggest a higher likelihood of the token in that context. This allows users to gauge the model's confidence in its output or explore alternative responses the model considered. \n", | ||
"* It also allow us to compute the overall probability of a sequence as the sum of the log probs of the individual tokens. This is useful for scoring and ranking model outputs. It's pretty common to take the average logprob of a sentence to choose the best generation.\n", | ||
"Log probabilities of output tokens indicate the likelihood of each token occurring in the sequence given the context. To simplify, a logprob is `log(p)`, where `p` = probability of a token occurring at a specific position based on the previous tokens in the context. Some key points about `logprobs`:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly clarified
- causal masking, distribution only depend on previous tokens
- replace "sentence" with "context"
"* Higher log probabilities suggest a higher likelihood of the token in that context. This allows users to gauge the model's confidence in its output or explore alternative responses the model considered. \n", | ||
"* It also allow us to compute the overall probability of a sequence as the sum of the log probs of the individual tokens. This is useful for scoring and ranking model outputs. It's pretty common to take the average logprob of a sentence to choose the best generation.\n", | ||
"Log probabilities of output tokens indicate the likelihood of each token occurring in the sequence given the context. To simplify, a logprob is `log(p)`, where `p` = probability of a token occurring at a specific position based on the previous tokens in the context. Some key points about `logprobs`:\n", | ||
"* Higher log probabilities suggest a higher likelihood of the token in that context. This allows users to gauge the model's confidence in its output or explore alternative responses the model considered.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed trailing space
"* It also allow us to compute the overall probability of a sequence as the sum of the log probs of the individual tokens. This is useful for scoring and ranking model outputs. It's pretty common to take the average logprob of a sentence to choose the best generation.\n", | ||
"Log probabilities of output tokens indicate the likelihood of each token occurring in the sequence given the context. To simplify, a logprob is `log(p)`, where `p` = probability of a token occurring at a specific position based on the previous tokens in the context. Some key points about `logprobs`:\n", | ||
"* Higher log probabilities suggest a higher likelihood of the token in that context. This allows users to gauge the model's confidence in its output or explore alternative responses the model considered.\n", | ||
"* Logprob can be any negative number or `0.0`. `0.0` corresponds to 100% probability.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is sometimes surprising that logprobs are negative. I think it's important to point out, to contextualize the next point to explain why summing results in the joint probablity which should be lower than the individual logprobs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point
"Log probabilities of output tokens indicate the likelihood of each token occurring in the sequence given the context. To simplify, a logprob is `log(p)`, where `p` = probability of a token occurring at a specific position based on the previous tokens in the context. Some key points about `logprobs`:\n", | ||
"* Higher log probabilities suggest a higher likelihood of the token in that context. This allows users to gauge the model's confidence in its output or explore alternative responses the model considered.\n", | ||
"* Logprob can be any negative number or `0.0`. `0.0` corresponds to 100% probability.\n", | ||
"* Logprobs allow us to compute the joint probability of a sequence as the sum of the logprobs of the individual tokens. This is useful for scoring and ranking model outputs. Another common approach is to take the average per-token logprob of a sentence to choose the best generation.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall -> joint. "overall" can sometimes mean the total probability which is kind of the opposite.
slight re-wording to point out that summing and averaging are two different approaches, where you choose one or the other not both
Summary
Some minor re-wording, to clarify some minor things
Motivation
Why are these changes necessary? How do they improve the cookbook?
I'll comment individually
For new content
When contributing new content, read through our contribution guidelines, and mark the following action items as completed:
We will rate each of these areas on a scale from 1 to 4, and will only accept contributions that score 3 or higher on all areas. Refer to our contribution guidelines for more details.