This notebook generates **Key Takeaways** summary for each chapter of a book using OpenAI's API.

In [7]:
#@markdown # Enter the required information for generating Key Takeaways.
#@markdown ---

#@markdown **Step 1:** Enter API Key and ISBN

open_api_key = '' #@param {type:"string"}
isbn = '' #@param {type:"string"}

In [8]:
!pip install openai
!pip install transformers
import json
import time
import re
from transformers import GPT2TokenizerFast
from openai import OpenAI
from concurrent.futures import ProcessPoolExecutor
import concurrent.futures
import io
from google.colab import files



In [9]:
#@markdown **Step 2:** Upload Chapter Extracts File

#@markdown Click on the **"Choose Files"** button to upload your chapter extracts file.

#@markdown The file should be in JSON format with the following structure for each chapter:

#@markdown ```
#@markdown {
#@markdown   "isbn": "ISBN-10",
#@markdown   "name": "Chapter Name",
#@markdown   "sequence_index": index,
#@markdown   "contents": "Chapter text",
#@markdown   "part": "Chapter part number",
#@markdown }
#@markdown ```
#@markdown Once the file is selected, it will be automatically uploaded.

uploaded_files = files.upload()

if not uploaded_files:
    raise Exception("File upload failed, please try again.")
else:
    # Taking only the first uploaded file
    uploaded_filename = next(iter(uploaded_files))
    uploaded_file = uploaded_files[uploaded_filename]

    # Reading the file
try:
    data = json.load(io.BytesIO(uploaded_file))
    # Check that all necessary keys are present
    if not all(key in chapter for chapter in data for key in ["isbn", "name", "sequence_index", "contents", "part"]):
        raise ValueError("Some chapters do not contain all required keys.")
    print(f"File '{uploaded_filename}' successfully uploaded and read.")
except (json.JSONDecodeError, ValueError) as e:
    print(f"An error occurred while reading the file: {e}")
    raise

Saving 1626813582_autosplits.json to 1626813582_autosplits (1).json
File '1626813582_autosplits (1).json' successfully uploaded and read.


In [10]:
#@markdown **Step 3:** Generate Key Takeaways and Download Processed Data

#@markdown Click the **"Play"** button on the left of this cell to generate the key takeaways and download the processed data.

# Gets the token count using GPT-2 tokenizer
def get_token_count(text):
    tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
    tokens = tokenizer(text)["input_ids"]
    return len(tokens)

def prompt_model(text, model="gpt-3.5-turbo", open_api_key=open_api_key):
    try:
        start = time.time()
        client = OpenAI(api_key=open_api_key)
        print("Prompting model...")
        chat_completion = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": text}]
        )
        response = chat_completion.choices[0].message.content
        print("Response received -- Time taken: {:.2f} seconds".format(time.time() - start))
        return response
    except Exception as e:
        print(f"Failed to prompt model: {e}")
        raise

def prompt_response_to_pov(response):
    response.lower()
    if "first" in response:
        return "first person"
    elif "second" in response:
        return "second person"
    else:
        return "third person"

def determine_pov(text):
    model = 'gpt-4-1106-preview'
    prompt = f'''
    You are a editor. Determine the point of view of the text enclosed in backticks (```)
    Respond with one of the following: first person, second person, or third person.

    ```{text}```
    '''
    result = prompt_model(prompt, model=model)
    pov = prompt_response_to_pov(result)
    return pov

def remove_leading_and_trailing_fluff(text):
    model = 'gpt-4-turbo-preview'
    prompt = f'''

    You are a professional editor. You will be given a summary of a chapter from some book.

    * Remove the leading and trailing backticks (```) from the text if they exist.
    * Remove any leading or trailing whitespace from the text.
    * Remove any leading title or leading chapter name from the text.

    ```{text}```

    YOUR EDITED TEXT:
    '''

    result = prompt_model(prompt, model=model)
    return result

def rewrite_in_pov(text, pov):
    model = 'gpt-4-1106-preview'
    prompt = f'''

    You are a professional editor. You will be given a summary of a chapter from some book.

    * Rewrite the summary in the point of view of {pov}.

    ```{text}```

    YOUR REWRITTEN TEXT:
    '''

    result = prompt_model(prompt, model=model)
    return result

def key_takeaway(text):
    model = 'gpt-4-1106-preview'
    prompt = f'''
    You are a professional writer. You will be given a chapter from a book.
    Your job is to identify the main idea of the text and summarize it in a few sentences.

    * The main idea should have a title.
    * The main idea should be a few sentences long.
    * The main idea should follow the format [Title]: [Main Idea]

    ```{text}```

    Here is an example of the format:

    ```The Allure of "FREE!": The chapter examines the psychological impact of the concept of "free" on consumer behavior, demonstrating through experiments and anecdotes how people are irrationally drawn to free products or services, even when they are not the best option. Despite only a trivial price difference, the presence of a free option can significantly shift consumers' preferences. This effect is so strong that it can influence not only purchasing decisions but also the success of public policies and social programs, highlighting the unique and powerful role of "zero price" in human psychology and decision-making.```

    MAIN IDEA:
    '''

    result = prompt_model(prompt, model=model)
    return result

def key_takeaway_and_clean(chapter):
    summary = key_takeaway(chapter["contents"])
    cleaned_summary = remove_leading_and_trailing_fluff(summary)
    return cleaned_summary

def process_chapter(chapter):
    # Try to process a single chapter and handle possible exceptions
    try:
        summary = key_takeaway_and_clean(chapter)
        return summary
    except Exception as e:
        print(f"Error processing chapter: {e}")
        return None

def generate_key_takeaways_summary(data):
    with ProcessPoolExecutor(max_workers=8) as executor:
        # Submit all chapter processing jobs to the executor
        futures = [executor.submit(process_chapter, chapter) for chapter in data]
        # Collect results as they become available
        for i, future in enumerate(concurrent.futures.as_completed(futures)):
            result = future.result()
            if result is not None:
                data[i]["key_takeaway"] = result
            else:
                data[i]["key_takeaway"] = "Error processing this chapter"
    return data

# Call the function to process the data
processed_data = generate_key_takeaways_summary(data)

def count_chapter_lengths(data):
    for chapter in data:
        len_words = get_token_count(chapter['contents'])
        print(f"Chapter {chapter['name']} has {len_words} tokens.")

count_chapter_lengths(processed_data)

# Save the processed data to a file and offer it for the user to download
output_filename = f"{isbn}_key_takeaways.json"
with open(output_filename, "w") as output_file:
    json.dump(processed_data, output_file, ensure_ascii=False, indent=4)

files.download(output_filename)


Prompting model...
Prompting model...
Prompting model...Prompting model...
Prompting model...
Prompting model...

Prompting model...Prompting model...

Response received -- Time taken: 6.68 seconds
Prompting model...
Response received -- Time taken: 7.68 seconds
Response received -- Time taken: 7.79 seconds
Prompting model...Response received -- Time taken: 7.82 seconds

Prompting model...
Prompting model...
Response received -- Time taken: 9.33 seconds
Response received -- Time taken: 9.55 seconds
Prompting model...Response received -- Time taken: 9.65 seconds

Prompting model...
Prompting model...
Response received -- Time taken: 11.97 seconds
Prompting model...
Response received -- Time taken: 4.68 seconds
Prompting model...
Response received -- Time taken: 5.77 seconds
Prompting model...
Response received -- Time taken: 6.35 seconds
Prompting model...
Response received -- Time taken: 6.12 seconds
Prompting model...
Response received -- Time taken: 9.25 seconds
Prompting model...
Re

Token indices sequence length is longer than the specified maximum sequence length for this model (2208 > 1024). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1849 > 1024). Running this sequence through the model will result in indexing errors


Chapter Foreword has 2208 tokens.
Chapter Exponential Organizations has 1849 tokens.


Token indices sequence length is longer than the specified maximum sequence length for this model (4582 > 1024). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (6352 > 1024). Running this sequence through the model will result in indexing errors


Chapter Chapter One: Illuminated by Information has 4582 tokens.
Chapter Chapter Two: A Tale of Two Companies has 6352 tokens.


Token indices sequence length is longer than the specified maximum sequence length for this model (2138 > 1024). Running this sequence through the model will result in indexing errors


Chapter Chapter Three: The Exponential Organization has 513 tokens.
Chapter Massive Transformative Purpose (MTP) has 2138 tokens.


Token indices sequence length is longer than the specified maximum sequence length for this model (1808 > 1024). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1687 > 1024). Running this sequence through the model will result in indexing errors


Chapter Staff on Demand has 1808 tokens.
Chapter Community & Crowd has 1687 tokens.


Token indices sequence length is longer than the specified maximum sequence length for this model (2469 > 1024). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (3653 > 1024). Running this sequence through the model will result in indexing errors


Chapter Algorithms has 2469 tokens.
Chapter Engagement has 3653 tokens.


Token indices sequence length is longer than the specified maximum sequence length for this model (1995 > 1024). Running this sequence through the model will result in indexing errors


Chapter Dashboards has 1995 tokens.


Token indices sequence length is longer than the specified maximum sequence length for this model (3068 > 1024). Running this sequence through the model will result in indexing errors


Chapter Experimentation has 3068 tokens.


Token indices sequence length is longer than the specified maximum sequence length for this model (2408 > 1024). Running this sequence through the model will result in indexing errors


Chapter Autonomy has 2408 tokens.


Token indices sequence length is longer than the specified maximum sequence length for this model (3059 > 1024). Running this sequence through the model will result in indexing errors


Chapter Social Technologies has 3059 tokens.
Chapter Chapter Five: Implications of Exponential Organizations has 400 tokens.


Token indices sequence length is longer than the specified maximum sequence length for this model (1700 > 1024). Running this sequence through the model will result in indexing errors


Chapter 1. Information Accelerates Everything has 1700 tokens.


Token indices sequence length is longer than the specified maximum sequence length for this model (1905 > 1024). Running this sequence through the model will result in indexing errors


Chapter 6. Smaller Beats Bigger has 1905 tokens.


Token indices sequence length is longer than the specified maximum sequence length for this model (1712 > 1024). Running this sequence through the model will result in indexing errors


Chapter 9. Everything is Measurable and Anything is Knowable has 1712 tokens.
Chapter Chapter Six: Starting an ExO has 861 tokens.


Token indices sequence length is longer than the specified maximum sequence length for this model (1544 > 1024). Running this sequence through the model will result in indexing errors


Chapter Ignition has 1544 tokens.


Token indices sequence length is longer than the specified maximum sequence length for this model (1691 > 1024). Running this sequence through the model will result in indexing errors


Chapter Step 4: Breakthrough Idea has 1691 tokens.


Token indices sequence length is longer than the specified maximum sequence length for this model (2085 > 1024). Running this sequence through the model will result in indexing errors


Chapter Example 2: GitHub has 2085 tokens.


Token indices sequence length is longer than the specified maximum sequence length for this model (2356 > 1024). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1936 > 1024). Running this sequence through the model will result in indexing errors


Chapter Example 5: GoPro has 2356 tokens.
Chapter Chapter Eight: ExOs for Large Organizations has 1936 tokens.


Token indices sequence length is longer than the specified maximum sequence length for this model (1621 > 1024). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1654 > 1024). Running this sequence through the model will result in indexing errors


Chapter 2. Partner with, Invest in or Acquire ExOs has 1621 tokens.
Chapter Inspire ExOs at the Edge has 1654 tokens.


Token indices sequence length is longer than the specified maximum sequence length for this model (1850 > 1024). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (2558 > 1024). Running this sequence through the model will result in indexing errors


Chapter Partner with Accelerators, Incubators and Hackerspaces has 1850 tokens.
Chapter Chapter Ten: The Exponential Executive has 2558 tokens.


Token indices sequence length is longer than the specified maximum sequence length for this model (4066 > 1024). Running this sequence through the model will result in indexing errors


Chapter Epilogue: A New Cambrian Explosion has 4066 tokens.


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>