New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Joe at openai/summarize with controllable detail #1128

Merged

joe-at-openai merged 4 commits into main from joe-at-openai/summarize-with-controllable-detail

Apr 8, 2024

Contributor

joe-at-openai commented Apr 1, 2024

Summary

The objective of this notebook is to demonstrate how to summarize large documents with a controllable level of detail. If you give a GPT model the task of summarizing a long document (e.g. 10k or more tokens), you'll tend to get back a relatively short summary that isn't proportional to the length of the document. For instance, a summary of a 20k token document will not be twice as long as a summary of a 10k token document. One way we can fix this is to split our document up into pieces, and produce a summary piecewise. After many queries to a GPT model, the full summary can be reconstructed. By controlling the number of text chunks and their sizes, we can ultimately control the level of detail in the output.

Motivation

I hope it's self evident!

For new content

When contributing new content, read through our contribution guidelines, and mark the following action items as completed:

I have added a new entry in registry.yaml (and, optionally, in authors.yaml) so that my content renders on the cookbook website.
I have conducted a self-review of my content based on the contribution guidelines:
- Relevance: This content is related to building with OpenAI technologies and is useful to others.
- Uniqueness: I have searched for related examples in the OpenAI Cookbook, and verified that my content offers new insights or unique information compared to existing documentation.
- Spelling and Grammar: I have checked for spelling or grammatical mistakes.
- Clarity: I have done a final read-through and verified that my submission is well-organized and easy to understand.
- Correctness: The information I include is correct and all of my code executes successfully.
- Completeness: I have explained everything fully, including all necessary references and citations.

We will rate each of these areas on a scale from 1 to 4, and will only accept contributions that score 3 or higher on all areas. Refer to our contribution guidelines for more details.

joe-at-openai added 4 commits

April 1, 2024 11:43


          add example demonstrating how to summarize with controllable detail

fe5d782


          various fixes to the notebook + shorten the source document

3b3fc55


          add some more markdown comments and add tqdm

32043da


          add entry to registry

9eda3e3

shyamal-anadkat approved these changes

View reviewed changes

Collaborator

shyamal-anadkat left a comment

lgtm

joe-at-openai merged commit 3c4e4bd into main

joe-at-openai deleted the joe-at-openai/summarize-with-controllable-detail branch

April 8, 2024 23:55

ibigio reviewed

View reviewed changes

Collaborator

ibigio left a comment

Criteria	Description	Score
Relevance	Is the content related to building with OpenAI technologies? Is it useful to others?	4
Uniqueness	Does the content offer new insights or unique information compared to existing documentation?	4
Clarity	Is the language easy to understand? Are things well-explained? Is the title clear?	2
Correctness	Are the facts, code snippets, and examples correct and reliable? Does everything execute correctly?	4
Conciseness	Is the content concise? Are all details necessary? Can it be made shorter?	4
Completeness	Is the content thorough and detailed? Are there things that weren’t explained fully?	4
Grammar	Are there grammatical or spelling errors present?	4

This is a super cool cookbook! Really like the methodology and going over the examples – mostly could benefit from some high-level explanations of the concepts you're demonstrating (and commenting) with code. People are going to love this talk!

examples/Summarizing_with_controllable_detail.ipynb

+                 "cell_type": "markdown",
+                 "metadata": {},
+                 "source": [
+                  "# Summarizing with Controllable Detail"

Collaborator

ibigio Apr 9, 2024

nit: maybe "summarization" or "how to summarize..."?

examples/Summarizing_with_controllable_detail.ipynb

+                 "cell_type": "markdown",
+                 "metadata": {},
+                 "source": [
+                  "The objective of this notebook is to demonstrate how to summarize large documents with a controllable level of detail. If you give a GPT model the task of summarizing a long document (e.g. 10k or more tokens), you'll tend to get back a relatively short summary that isn't proportional to the length of the document. For instance, a summary of a 20k token document will not be twice as long as a summary of a 10k token document. One way we can fix this is to split our document up into pieces, and produce a summary piecewise. After many queries to a GPT model, the full summary can be reconstructed. By controlling the number of text chunks and their sizes, we can ultimately control the level of detail in the output."

Collaborator

ibigio Apr 9, 2024

nit: first line is awesome and punchy - put rest of paragraph in separate text block (or add two newlines)

examples/Summarizing_with_controllable_detail.ipynb

+                  "    return combined_chunks\n",
+                  "\n",
+                  "\n",
+                  "def combine_chunks_with_no_minimum(\n",

Collaborator

ibigio Apr 9, 2024

Might be useful to introduce these functions at a high level, and maybe also how they fit together. (So a skimming reader can still follow along).

examples/Summarizing_with_controllable_detail.ipynb

+                 "cell_type": "markdown",
+                 "metadata": {},
+                 "source": [
+                  "Let's inspect the summaries to get a feel for what that means."

Collaborator

ibigio Apr 9, 2024

Always love "let's get a feel for..." sections!

(edit: kept reading, would be useful if you added a brief comment telling the (skimming) reader what the "feel" is for each of the summaries)

examples/Summarizing_with_controllable_detail.ipynb

+                 }
+                },
+                {
+                 "cell_type": "code",

Collaborator

ibigio Apr 9, 2024

delete empty cell

examples/Summarizing_with_controllable_detail.ipynb

+                  "collapsed": false
+                 }
+                }
+               ],

Collaborator

ibigio Apr 9, 2024

nit: add a brief conclusion section – always nice to "close the book" officially/decisively

examples/Summarizing_with_controllable_detail.ipynb

+                 "metadata": {},
+                 "outputs": [],
+                 "source": [
+                  "def summarize(text: str,\n",

Collaborator

ibigio Apr 9, 2024

Same comment as above functions – especially since this is the "meat" of the cookbook. You explain the steps in the comments but indented comments are always tougher.

e.g. there's some interesting nuance around candidates, dropped chunks, and the recursive nature that's worth talking through (even briefly) at the top!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment