Logprobs suggestions #948

enochcheung · 2023-12-21T05:11:09Z

Summary

Some minor re-wording, to clarify some minor things

Motivation

Why are these changes necessary? How do they improve the cookbook?

I'll comment individually

For new content

When contributing new content, read through our contribution guidelines, and mark the following action items as completed:

I have added a new entry in registry.yaml (and, optionally, in authors.yaml) so that my content renders on the cookbook website.
I have conducted a self-review of my content based on the contribution guidelines:
- Relevance: This content is related to building with OpenAI technologies and is useful to others.
- Uniqueness: I have searched for related examples in the OpenAI Cookbook, and verified that my content offers new insights or unique information compared to existing documentation.
- Spelling and Grammar: I have checked for spelling or grammatical mistakes.
- Clarity: I have done a final read-through and verified that my submission is well-organized and easy to understand.
- Correctness: The information I include is correct and all of my code executes successfully.
- Completeness: I have explained everything fully, including all necessary references and citations.

We will rate each of these areas on a scale from 1 to 4, and will only accept contributions that score 3 or higher on all areas. Refer to our contribution guidelines for more details.

enochcheung · 2023-12-21T05:11:39Z

examples/Using_logprobs.ipynb

@@ -8,11 +8,12 @@
    "\n",
    "This notebook demonstrates the use of the `logprobs` parameter in the Chat Completions API. When `logprobs` is enabled, the API returns the log probabilities of each output token, along with a limited number of the most likely tokens at each token position and their log probabilities. The relevant request parameters are:\n",
    "* `logprobs`: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message. This option is currently not available on the `gpt-4-vision-preview` model.\n",
-    "* `top_logprobs`: An integer between 0 and 5 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.\n",
+    "* `top_logprobs`: An integer between 0 and 5 specifying the number of most likely tokens to return at each token position, each with an associated log probability. `logprobs` must be set to true if this parameter is used.\n",


just adding backticks

enochcheung · 2023-12-21T05:12:36Z

examples/Using_logprobs.ipynb

-    "Log probabilities of output tokens indicate the likelihood of each token occurring in the sequence given the context. To simplify, a logprob is `log(p)`, where `p` = probability of a token occurring at a specific position based on the other tokens in the sentence. Some key points about `logprobs`:\n",
-    "* Higher log probabilities suggest a higher likelihood of the token in that context. This allows users to gauge the model's confidence in its output or explore alternative responses the model considered. \n",
-    "* It also allow us to compute the overall probability of a sequence as the sum of the log probs of the individual tokens. This is useful for scoring and ranking model outputs. It's pretty common to take the average logprob of a sentence to choose the best generation.\n",
+    "Log probabilities of output tokens indicate the likelihood of each token occurring in the sequence given the context. To simplify, a logprob is `log(p)`, where `p` = probability of a token occurring at a specific position based on the previous tokens in the context. Some key points about `logprobs`:\n",


Slightly clarified

causal masking, distribution only depend on previous tokens

replace "sentence" with "context"

enochcheung · 2023-12-21T05:12:57Z

examples/Using_logprobs.ipynb

-    "* Higher log probabilities suggest a higher likelihood of the token in that context. This allows users to gauge the model's confidence in its output or explore alternative responses the model considered. \n",
-    "* It also allow us to compute the overall probability of a sequence as the sum of the log probs of the individual tokens. This is useful for scoring and ranking model outputs. It's pretty common to take the average logprob of a sentence to choose the best generation.\n",
+    "Log probabilities of output tokens indicate the likelihood of each token occurring in the sequence given the context. To simplify, a logprob is `log(p)`, where `p` = probability of a token occurring at a specific position based on the previous tokens in the context. Some key points about `logprobs`:\n",
+    "* Higher log probabilities suggest a higher likelihood of the token in that context. This allows users to gauge the model's confidence in its output or explore alternative responses the model considered.\n",


removed trailing space

enochcheung · 2023-12-21T05:13:53Z

examples/Using_logprobs.ipynb

-    "* It also allow us to compute the overall probability of a sequence as the sum of the log probs of the individual tokens. This is useful for scoring and ranking model outputs. It's pretty common to take the average logprob of a sentence to choose the best generation.\n",
+    "Log probabilities of output tokens indicate the likelihood of each token occurring in the sequence given the context. To simplify, a logprob is `log(p)`, where `p` = probability of a token occurring at a specific position based on the previous tokens in the context. Some key points about `logprobs`:\n",
+    "* Higher log probabilities suggest a higher likelihood of the token in that context. This allows users to gauge the model's confidence in its output or explore alternative responses the model considered.\n",
+    "* Logprob can be any negative number or `0.0`. `0.0` corresponds to 100% probability.\n",


I think it is sometimes surprising that logprobs are negative. I think it's important to point out, to contextualize the next point to explain why summing results in the joint probablity which should be lower than the individual logprobs

enochcheung · 2023-12-21T05:14:57Z

examples/Using_logprobs.ipynb

+    "Log probabilities of output tokens indicate the likelihood of each token occurring in the sequence given the context. To simplify, a logprob is `log(p)`, where `p` = probability of a token occurring at a specific position based on the previous tokens in the context. Some key points about `logprobs`:\n",
+    "* Higher log probabilities suggest a higher likelihood of the token in that context. This allows users to gauge the model's confidence in its output or explore alternative responses the model considered.\n",
+    "* Logprob can be any negative number or `0.0`. `0.0` corresponds to 100% probability.\n",
+    "* Logprobs allow us to compute the joint probability of a sequence as the sum of the logprobs of the individual tokens. This is useful for scoring and ranking model outputs. Another common approach is to take the average per-token logprob of a sentence to choose the best generation.\n",


overall -> joint. "overall" can sometimes mean the total probability which is kind of the opposite.

slight re-wording to point out that summing and averaging are two different approaches, where you choose one or the other not both

Logprobs suggestions

1c201e5

enochcheung requested a review from jhills20 December 21, 2023 05:11

enochcheung commented Dec 21, 2023

View reviewed changes

shyamal-anadkat approved these changes Dec 21, 2023

View reviewed changes

shyamal-anadkat merged commit f6b0cb1 into main Dec 21, 2023

shyamal-anadkat deleted the dev/enoch/logprobs branch December 21, 2023 05:57

katia-openai pushed a commit that referenced this pull request Feb 29, 2024

Logprobs suggestions (#948)

a00ccf8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logprobs suggestions #948

Logprobs suggestions #948

enochcheung commented Dec 21, 2023 •

edited

Loading

enochcheung Dec 21, 2023

enochcheung Dec 21, 2023

enochcheung Dec 21, 2023

enochcheung Dec 21, 2023

shyamal-anadkat Dec 21, 2023

enochcheung Dec 21, 2023

Logprobs suggestions #948

Logprobs suggestions #948

Conversation

enochcheung commented Dec 21, 2023 • edited Loading

Summary

Motivation

For new content

enochcheung Dec 21, 2023

Choose a reason for hiding this comment

enochcheung Dec 21, 2023

Choose a reason for hiding this comment

enochcheung Dec 21, 2023

Choose a reason for hiding this comment

enochcheung Dec 21, 2023

Choose a reason for hiding this comment

shyamal-anadkat Dec 21, 2023

Choose a reason for hiding this comment

enochcheung Dec 21, 2023

Choose a reason for hiding this comment

enochcheung commented Dec 21, 2023 •

edited

Loading