-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Joe at openai/summarize with controllable detail #1128
Joe at openai/summarize with controllable detail #1128
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Criteria | Description | Score |
---|---|---|
Relevance | Is the content related to building with OpenAI technologies? Is it useful to others? | 4 |
Uniqueness | Does the content offer new insights or unique information compared to existing documentation? | 4 |
Clarity | Is the language easy to understand? Are things well-explained? Is the title clear? | 2 |
Correctness | Are the facts, code snippets, and examples correct and reliable? Does everything execute correctly? | 4 |
Conciseness | Is the content concise? Are all details necessary? Can it be made shorter? | 4 |
Completeness | Is the content thorough and detailed? Are there things that weren’t explained fully? | 4 |
Grammar | Are there grammatical or spelling errors present? | 4 |
This is a super cool cookbook! Really like the methodology and going over the examples – mostly could benefit from some high-level explanations of the concepts you're demonstrating (and commenting) with code. People are going to love this talk!
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Summarizing with Controllable Detail" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe "summarization" or "how to summarize..."?
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The objective of this notebook is to demonstrate how to summarize large documents with a controllable level of detail. If you give a GPT model the task of summarizing a long document (e.g. 10k or more tokens), you'll tend to get back a relatively short summary that isn't proportional to the length of the document. For instance, a summary of a 20k token document will not be twice as long as a summary of a 10k token document. One way we can fix this is to split our document up into pieces, and produce a summary piecewise. After many queries to a GPT model, the full summary can be reconstructed. By controlling the number of text chunks and their sizes, we can ultimately control the level of detail in the output." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: first line is awesome and punchy - put rest of paragraph in separate text block (or add two newlines)
" return combined_chunks\n", | ||
"\n", | ||
"\n", | ||
"def combine_chunks_with_no_minimum(\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be useful to introduce these functions at a high level, and maybe also how they fit together. (So a skimming reader can still follow along).
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Let's inspect the summaries to get a feel for what that means." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always love "let's get a feel for..." sections!
(edit: kept reading, would be useful if you added a brief comment telling the (skimming) reader what the "feel" is for each of the summaries)
} | ||
}, | ||
{ | ||
"cell_type": "code", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete empty cell
"collapsed": false | ||
} | ||
} | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a brief conclusion section – always nice to "close the book" officially/decisively
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"def summarize(text: str,\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above functions – especially since this is the "meat" of the cookbook. You explain the steps in the comments but indented comments are always tougher.
e.g. there's some interesting nuance around candidates
, dropped chunks, and the recursive nature that's worth talking through (even briefly) at the top!
Summary
The objective of this notebook is to demonstrate how to summarize large documents with a controllable level of detail. If you give a GPT model the task of summarizing a long document (e.g. 10k or more tokens), you'll tend to get back a relatively short summary that isn't proportional to the length of the document. For instance, a summary of a 20k token document will not be twice as long as a summary of a 10k token document. One way we can fix this is to split our document up into pieces, and produce a summary piecewise. After many queries to a GPT model, the full summary can be reconstructed. By controlling the number of text chunks and their sizes, we can ultimately control the level of detail in the output.
Motivation
I hope it's self evident!
For new content
When contributing new content, read through our contribution guidelines, and mark the following action items as completed:
We will rate each of these areas on a scale from 1 to 4, and will only accept contributions that score 3 or higher on all areas. Refer to our contribution guidelines for more details.