Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joe at openai/summarize with controllable detail #1128

Merged

Conversation

joe-at-openai
Copy link
Contributor

Summary

The objective of this notebook is to demonstrate how to summarize large documents with a controllable level of detail. If you give a GPT model the task of summarizing a long document (e.g. 10k or more tokens), you'll tend to get back a relatively short summary that isn't proportional to the length of the document. For instance, a summary of a 20k token document will not be twice as long as a summary of a 10k token document. One way we can fix this is to split our document up into pieces, and produce a summary piecewise. After many queries to a GPT model, the full summary can be reconstructed. By controlling the number of text chunks and their sizes, we can ultimately control the level of detail in the output.

Motivation

I hope it's self evident!

For new content

When contributing new content, read through our contribution guidelines, and mark the following action items as completed:

  • I have added a new entry in registry.yaml (and, optionally, in authors.yaml) so that my content renders on the cookbook website.
  • I have conducted a self-review of my content based on the contribution guidelines:
    • Relevance: This content is related to building with OpenAI technologies and is useful to others.
    • Uniqueness: I have searched for related examples in the OpenAI Cookbook, and verified that my content offers new insights or unique information compared to existing documentation.
    • Spelling and Grammar: I have checked for spelling or grammatical mistakes.
    • Clarity: I have done a final read-through and verified that my submission is well-organized and easy to understand.
    • Correctness: The information I include is correct and all of my code executes successfully.
    • Completeness: I have explained everything fully, including all necessary references and citations.

We will rate each of these areas on a scale from 1 to 4, and will only accept contributions that score 3 or higher on all areas. Refer to our contribution guidelines for more details.

Copy link
Collaborator

@shyamal-anadkat shyamal-anadkat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@joe-at-openai joe-at-openai merged commit 3c4e4bd into main Apr 8, 2024
@joe-at-openai joe-at-openai deleted the joe-at-openai/summarize-with-controllable-detail branch April 8, 2024 23:55
Copy link
Collaborator

@ibigio ibigio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Criteria Description Score
Relevance Is the content related to building with OpenAI technologies? Is it useful to others? 4
Uniqueness Does the content offer new insights or unique information compared to existing documentation? 4
Clarity Is the language easy to understand? Are things well-explained? Is the title clear? 2
Correctness Are the facts, code snippets, and examples correct and reliable? Does everything execute correctly? 4
Conciseness Is the content concise? Are all details necessary? Can it be made shorter? 4
Completeness Is the content thorough and detailed? Are there things that weren’t explained fully? 4
Grammar Are there grammatical or spelling errors present? 4

This is a super cool cookbook! Really like the methodology and going over the examples – mostly could benefit from some high-level explanations of the concepts you're demonstrating (and commenting) with code. People are going to love this talk!

"cell_type": "markdown",
"metadata": {},
"source": [
"# Summarizing with Controllable Detail"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe "summarization" or "how to summarize..."?

"cell_type": "markdown",
"metadata": {},
"source": [
"The objective of this notebook is to demonstrate how to summarize large documents with a controllable level of detail. If you give a GPT model the task of summarizing a long document (e.g. 10k or more tokens), you'll tend to get back a relatively short summary that isn't proportional to the length of the document. For instance, a summary of a 20k token document will not be twice as long as a summary of a 10k token document. One way we can fix this is to split our document up into pieces, and produce a summary piecewise. After many queries to a GPT model, the full summary can be reconstructed. By controlling the number of text chunks and their sizes, we can ultimately control the level of detail in the output."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: first line is awesome and punchy - put rest of paragraph in separate text block (or add two newlines)

" return combined_chunks\n",
"\n",
"\n",
"def combine_chunks_with_no_minimum(\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be useful to introduce these functions at a high level, and maybe also how they fit together. (So a skimming reader can still follow along).

"cell_type": "markdown",
"metadata": {},
"source": [
"Let's inspect the summaries to get a feel for what that means."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always love "let's get a feel for..." sections!

(edit: kept reading, would be useful if you added a brief comment telling the (skimming) reader what the "feel" is for each of the summaries)

}
},
{
"cell_type": "code",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete empty cell

"collapsed": false
}
}
],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a brief conclusion section – always nice to "close the book" officially/decisively

"metadata": {},
"outputs": [],
"source": [
"def summarize(text: str,\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above functions – especially since this is the "meat" of the cookbook. You explain the steps in the comments but indented comments are always tougher.

e.g. there's some interesting nuance around candidates, dropped chunks, and the recursive nature that's worth talking through (even briefly) at the top!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants