In this tutorial, we will demonstrate how to use OpenAI's language model to recursively summarize text from an arbitrary URL. We will use Python and the following libraries: torch, openai, requests, and beautifulsoup4. The code will be explained with comments and markdown explanations.

First, let's import the required libraries:

In [None]:
import torch
import openai
from IPython.core.display import display, Markdown
import requests
from bs4 import BeautifulSoup

Next, set your OpenAI API key:

In [None]:
openai.api_key = "INSERT YOUR API KEY HERE"

Define a function chat_query to interact with the language model:

In [9]:
def chat_query(model, query, print_response=True):
    resp = openai.ChatCompletion.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": query},
        ]
    )

    if print_response:
        display(Markdown(resp['choices'][0]['message']['content']))
    return resp

Create a function model_context_window to get the context window size for the given model:

In [None]:
def model_context_window(model_name):
    model_windows = {
        "gpt-3.5-turbo": 4096,
        "text-davinci-002": 2048,
        # Add other models here
    }
    return model_windows.get(model_name, 2048)  # Default to 2048 if model not found

Define a helper function tokens_to_char to convert tokens to characters:

In [None]:
def tokens_to_char(tokens): return tokens * 4

Create the main function recursive_summarization that recursively summarizes the text:

In [None]:
def recursive_summarization(model, text, summary_length=1000, last_run=False):
    if len(text) <= summary_length and not(last_run):
        return recursive_summarization(model, text, summary_length, last_run=True)
    print(f"current length: {len(text)}")
    
    context_window = model_context_window(model)
    truncation_length = tokens_to_char(context_window) - 2 * summary_length
    
    summary_query = f"Please summarize the following text in {summary_length} characters or less: {text[:truncation_length]}"
    summary_response = chat_query(model, summary_query, print_response=False)
    summary = summary_response['choices'][0]['message']['content'].strip()
    
    print("num chars in summary: ", len(summary))
    remaining_text = text[truncation_length:]
    if last_run:
        return summary + " " + remaining_text
    return recursive_summarization(model, summary + " " + remaining_text, summary_length)

Define a function extract_text_from_url to extract text from a given URL:

In [None]:
def extract_text_from_url(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    text = ' '.join([p.text for p in soup.find_all('p')])
    return text

Now, use the extract_text_from_url function to get the original text, and then call recursive_summarization to generate the summary:

In [11]:
original_text = extract_text_from_url("https://www.trenzle.com/a-midjourney-ai-complete-guide-how-to-use-cheatsheet-prompts-with-examples-for-beginners/")
model_name = "gpt-3.5-turbo"
summary_length = 2000
summary = recursive_summarization(model_name, original_text, summary_length=summary_length)
print("Summary:", summary)

current length: 84270
num chars in summary:  673
current length: 72560
num chars in summary:  724
current length: 60901
num chars in summary:  655
current length: 49173
num chars in summary:  636
current length: 37426
num chars in summary:  694
current length: 25737
num chars in summary:  638
current length: 13992
num chars in summary:  813
current length: 2422
num chars in summary:  875
current length: 876
num chars in summary:  687
Summary: Midjourney is a beta AI-powered app that generates images based on text prompts with various settings, such as resolution, aspect ratio, and styling options. It offers control and possibilities for users to experiment with their image creations, including lens type, lighting, realism level, and filters. Users can also discard unwanted subjects via the "--no" keyword and increase or decrease abstraction with the "--chaos" setting. Other AI-based writing tools, such as Smart Copy, ChatGPT-4, ParagraphAI, and Neuroflash, can also help users with copy