-
Notifications
You must be signed in to change notification settings - Fork 9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add perplexity example to the logprobs
user guide
#1071
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks Ankur. Added some small nits. Generally, I think also showing each token's probability would be helpful too, maybe just for one sentence or for both. Because then we could see which tokens the model is more or less confident in, which might help people grasp this intuitively. Could also make prompt to make the sentences briefer if it's too much to do each logprobs for the longer sentences.
Generally good stuff tho! Thanks
examples/Using_logprobs.ipynb
Outdated
"* Users can easily create a token highlighter using the built in tokenization that comes with enabling `logprobs`. Additionally, the bytes parameter includes the ASCII encoding of each output character, which is particularly useful for reproducing emojis and special characters.\n", | ||
"\n", | ||
"4. Calculating perplexity\n", | ||
"* `logprobs1 can be used to help us assess the model's overall confidence in a result and help us compare the confidence of results from different prompts." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace 1 with closing `
examples/Using_logprobs.ipynb
Outdated
"4. Token highlighting and outputting bytes\n", | ||
"* Users can easily create a token highlighter using the built in tokenization that comes with enabling `logprobs`. Additionally, the bytes parameter includes the ASCII encoding of each output character, which is particularly useful for reproducing emojis and special characters.\n", | ||
"\n", | ||
"4. Calculating perplexity\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to 5.
examples/Using_logprobs.ipynb
Outdated
"## 5. Conclusion" | ||
"## 5. Calculating perplexity\n", | ||
"\n", | ||
"When looking to assess the model's confidence in a result, it can be useful to calculate perplexity, which is a measure of the uncertainty. Perplexity can be calculated by exponentiating the negative of the average of the logprobs. Generally, a higher perplexity indicates a more uncertain result, and a lower perplexity indicates a more confident result. As such, perplexity can be used to both assess the result of an individual model run and also to helpfully compare the relative confidence of results between model runs. While a high confidence doesn't guarantee result accuracy, it can be a helpful signal that can be paired with other evaluation metrics to build a better understanding of your prompt's behavior.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"is a measure of the model's uncertainty."
.."average of the output logprobs"
would remove "helpfully" before compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
examples/Using_logprobs.ipynb
Outdated
"* Users can easily create a token highlighter using the built in tokenization that comes with enabling `logprobs`. Additionally, the bytes parameter includes the ASCII encoding of each output character, which is particularly useful for reproducing emojis and special characters.\n", | ||
"\n", | ||
"4. Calculating perplexity\n", | ||
"* `logprobs1 can be used to help us assess the model's overall confidence in a result and help us compare the confidence of results from different prompts." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - fix logprobs formatting
Summary
Adds a section to demonstrate how logprobs can be used to assess model confidence in overall results. The changes introduce the concept of perplexity, add code to calculate it for two examples, and display the output.
Motivation
These changes build on a suggestion at the end of the original guide to use logprobs for different evaluation metrics, filling in a recommended extension.
For new content
When contributing new content, read through our contribution guidelines, and mark the following action items as completed:
We will rate each of these areas on a scale from 1 to 4, and will only accept contributions that score 3 or higher on all areas. Refer to our contribution guidelines for more details.