# Generative AI for Document Analysis and Summarization

From the blog post: [Mastering Long Document Insights: Advanced Summarization with Amazon Bedrock and Anthropic Claude 2 Foundation Model](https://garystafford.medium.com/mastering-long-document-insights-advanced-summarization-with-amazon-bedrock-and-anthropic-claude-2-2fe13d5ae8d8). In this post, we move beyond simple summarization and explore advanced techniques to analyze long texts using Amazon Bedrock and the Anthropic Claude 2 Foundation Model.

In [3]:
%%sh
python -m pip install pip -Uq --root-user-action=ignore
python -m pip install -r requirements.txt -Uq --root-user-action=ignore

In [4]:
%%sh
python -m pip list | grep 'anthropic\|boto3\|botocore'

anthropic                            0.5.0
boto3                                1.28.78
botocore                             1.31.78


In [5]:
import datetime
import re
from statistics import mean
import string

from anthropic import Anthropic

## Load and Prepare Long Text

In [6]:
# Use regular expressions to split the book into chapters
def split_book(book_text):
    # Specific to this Gutenberg eBooks format
    chapters = re.split(r"^CHAPTER [IVXLCDM]+$", book_text, flags=re.MULTILINE)

    # remove everything prior to chapter 1
    chapters.pop(0)

    # Split the last chapter into two parts and remove everything after "THE END"
    chapter26 = re.split(r"^.*THE END.*$", chapters[26], flags=re.MULTILINE)[0]
    chapters.pop(26)
    chapters.append(chapter26)

    return chapters

In [7]:
# specify the path to the text file you want to read
# https://www.gutenberg.org/cache/epub/345/pg345-images.html
file_path = "./input/dracula.txt"

# Open the file in read mode and read its contents into a string
with open(file_path, "r") as file:
    book_text = file.read()

chapters = split_book(book_text)

In [8]:
print(chapters[0][0:2048])



JONATHAN HARKER’S JOURNAL
(Kept in shorthand.)

3 May. Bistritz.—Left Munich at 8:35 P. M., on 1st May, arriving at Vienna early next morning; should have arrived at 6:46, but train was an hour late. Buda-Pesth seems a wonderful place, from the glimpse which I got of it from the train and the little I could walk through the streets. I feared to go very far from the station, as we had arrived late and would start as near the correct time as possible. The impression I had was that we were leaving the West and entering the East; the most western of splendid bridges over the Danube, which is here of noble width and depth, took us among the traditions of Turkish rule.

We left in pretty good time, and came after nightfall to Klausenburgh. Here I stopped for the night at the Hotel Royale. I had for dinner, or rather supper, a chicken done up some way with red pepper, which was very good but thirsty. (Mem., get recipe for Mina.) I asked the waiter, and he said it was called “paprika hendl,”

## Long Text Statistics

In [9]:
chapter_nums = []
character_count = []
word_count = []
token_count = []
char_token_ratio = []
para_count = []

client = Anthropic()

print("chpt\tparas\twords\tchars\ttokens\tratio")
print(f"----------------------------------------------")


for i, chapter in enumerate(chapters):
    try:
        chapter_nums.append(f"chpt {i+1}")

        token_count_tmp = client.count_tokens(chapter.strip())
        token_count.append(token_count_tmp)

        character_count_tmp = len(chapter.strip())
        character_count.append(character_count_tmp)

        word_count_tmp = sum(
            [i.strip(string.punctuation).isalpha() for i in chapter.split()]
        )
        word_count.append(word_count_tmp)

        para_count_tmp = len(chapter.split("\n\n"))
        para_count.append(para_count_tmp)

        char_token_ratio_tmp = character_count_tmp / token_count_tmp
        char_token_ratio.append(char_token_ratio_tmp)

        print(
            f"{i+1}\t{para_count_tmp:,.0f}\t{word_count_tmp:,.0f}\t{character_count_tmp:,.0f}\t{token_count_tmp:,.0f}\t{char_token_ratio_tmp:,.2f}"
        )

    except Exception as ex:
        print(ex)

print("\n---")
print(f"Complete Novel (raw long document)")
print(f"---")
print(f"sum chars:\t{len(book_text):,.0f}")
print(
    f"sum words:\t{sum([i.strip(string.punctuation).isalpha() for i in book_text]):,.0f}"
)
print(f"sum tokens:\t{client.count_tokens(book_text):,.0f}")

print("\n---")
print("Chapters")
print(f"---")
print(f"chpt count:\t{len(chapters)}")
print(f"---")
print(f"min paras:\t{min(para_count):,.0f}")
print(f"max paras:\t{max(para_count):,.0f}")
print(f"mean words:\t{int(mean(para_count)):,.0f}")
print(f"sum paras:\t{sum(para_count):,.0f}")
print("---")
print(f"min words:\t{min(word_count):,.0f}")
print(f"max words:\t{max(word_count):,.0f}")
print(f"mean words:\t{int(mean(word_count)):,.0f}")
print(f"sum words:\t{sum(word_count):,.0f}")
print("---")
print(f"min chars:\t{min(character_count):,.0f}")
print(f"max chars:\t{max(character_count):,.0f}")
print(f"mean chars:\t{int(mean(character_count)):,.0f}")
print(f"sum chars:\t{sum(character_count):,.0f}")
print("---")
print(f"min tokens:\t{min(token_count):,.0f}")
print(f"max tokens:\t{max(token_count):,.0f}")
print(f"mean tokens:\t{int(mean(token_count)):,.0f}")
print(f"sum tokens:\t{int(sum(token_count)):,.0f}")
print("---")
print(f"min chars/token:\t{min(char_token_ratio):,.2f}")
print(f"max chars/token:\t{max(char_token_ratio):,.2f}")
print(f"mean chars/token:\t{mean(char_token_ratio):,.2f}")

chpt	paras	words	chars	tokens	ratio
----------------------------------------------
1	39	5,547	30,624	7,218	4.24
2	62	5,305	28,510	6,833	4.17
3	46	5,571	29,805	7,075	4.21
4	86	5,703	30,267	7,338	4.12
5	28	3,390	18,019	4,650	3.88
6	64	5,299	29,195	7,524	3.88
7	62	5,424	29,964	7,120	4.21
8	59	6,044	32,637	7,970	4.09
9	64	5,709	30,180	7,477	4.04
10	100	5,623	30,817	7,706	4.00
11	78	4,754	26,991	7,014	3.85
12	95	6,993	37,944	9,372	4.05
13	107	6,242	34,198	8,490	4.03
14	97	6,053	32,612	8,277	3.94
15	101	5,485	29,787	7,509	3.97
16	62	4,381	23,928	5,895	4.06
17	80	5,264	29,074	7,140	4.07
18	84	6,615	35,948	8,924	4.03
19	46	5,505	29,462	7,041	4.18
20	104	5,467	31,241	7,883	3.96
21	69	5,905	32,220	7,940	4.06
22	65	5,249	28,130	6,822	4.12
23	84	5,403	29,551	7,351	4.02
24	75	6,057	32,123	7,924	4.05
25	87	5,907	32,612	8,153	4.00
26	106	6,818	37,084	9,266	4.00
27	78	7,733	40,677	10,055	4.05

---
Complete Novel (raw long text)
---
sum chars:	856,545
sum words:	658,827
sum tokens:	211,209

---
Chapter

In [10]:
# import pandas as pd

# data = {
#     "chars": character_count,
#     "words": word_count,
#     "tokens": token_count,
#     "char/token": char_token_ratio,
# }

# df = pd.DataFrame(data, index=chapter_nums)

# df

## Long Text Summarization Examples

In [11]:
import json

import boto3
from botocore.exceptions import ClientError

In [12]:
client_bedrock = boto3.client("bedrock-runtime", "us-east-1")

In [13]:
def create_summary(prompt):
    try:
        body = json.dumps(
            {
                "prompt": f"\n\nHuman: {prompt}\n\nAssistant:",
                "max_tokens_to_sample": 4096,
                "temperature": 0.2,
                "top_k": 250,
                "top_p": 1,
                "stop_sequences": ["\n\nHuman"],
            }
        )
        # print(f"Request body: {body}")

        accept = "application/json"
        content_type = "application/json"

        response = client_bedrock.invoke_model(
            body=body,
            modelId="anthropic.claude-v2",
            accept=accept,
            contentType=content_type,
        )
        response_body = json.loads(response.get("body").read())
        # print(f"Response body: {response_body}")

        # remove the first line of text that explains the task completed
        # e.g. " Here are three hypothetical questions that the passage could help answer:\n\n"
        formatted_response = ""
        try:
            formatted_response = (
                response_body.get("completion").split("\n", 2)[2].strip()
            )
        except:
            formatted_response = response_body.get("completion")
        return formatted_response
    except ClientError as ex:
        print(ex)
        exit(1)

### Basic Prompting

Progressively more precise prompts.

In [14]:
# basic prompt example #1

prompt = f"""Write a short summary of the following chapter:
    {chapters[0].strip()}"""

print(create_summary(prompt))

- The chapter is from Jonathan Harker's journal, written in shorthand. He is traveling to Transylvania to meet Count Dracula.

- Harker takes a train from Munich to Buda-Pesth, where he gets an impression of entering the East. He continues on to Klausenburgh and has dinner at his hotel. 

- At Bistritz, he stays at the Golden Krone Hotel. The landlady seems reluctant for him to continue his journey the next day. 

- Harker departs Bistritz in a coach. His fellow passengers make signs to ward off evil and warn him of danger ahead. 

- After nightfall, the passengers grow excited as they near the Borgo Pass. A mysterious driver arrives to take Harker in his calèche the rest of the way.

- They travel through the Borgo Pass at night. Wolves can be heard howling. The driver seems unconcerned. 

- Nearing Castle Dracula, Harker sees blue flames in the darkness which seem to frighten the horses. The driver leaves to investigate.

- They arrive at the castle, an ancient, immense, ruined struc

In [15]:
# basic prompt example #2

prompt = f"""Write a concise, grammatically correct, single-paragraph summary of the following chapter:
    {chapters[0].strip()}"""

print(create_summary(prompt))

Jonathan Harker describes his journey from Munich to Transylvania to meet Count Dracula, detailing the beautiful but ominous landscape and strange encounters with locals who warn him against continuing. Upon arriving at the Borgo Pass, he is met by a driver in Dracula's carriage who takes him through the mountains to the Count's ruined castle. Harker feels unease about his mysterious host and the howling wolves that seem to surround them.


In [16]:
# basic prompt example #3

prompt = f"""### INSTRUCTIONS ###
    Write a concise, grammatically correct, single-paragraph summary of the following chapter.
    
    ### CHAPTER ###
    {chapters[0].strip()}"""

print(create_summary(prompt))

Jonathan Harker's journal describes his journey from Munich to Transylvania to meet Count Dracula, depicting the beautiful but ominous Carpathian landscape and recounting strange events and superstitions that foreshadow Dracula's supernatural nature. After delays and mysteries, Harker arrives at the Borgo Pass as wolves howl, where a carriage takes him through the mountains to Dracula's ruined castle.


In [17]:
# basic prompt example #4

prompt = f"""Write a concise, grammatically correct, single-paragraph summary of the chapter's main points, events, and ideas contained inside the <chapter><\chapter> XML tags below.  
    The Assistant will refrain from using bullet-point lists.
    The Assistant will refrain from including XML tags in the response.
    
    <chapter>
    {chapters[0].strip()}
    </chapter>"""

print(create_summary(prompt))



### Generate Chapter-level Summaries

In [18]:
# generate chapter-level summaries

summary = ""

print(f"Start time: {datetime.datetime.now()}")

for i, chapter in enumerate(chapters[0::]):
    try:
        prompt = f"""Write a concise, grammatically correct, single-paragraph summary of the chapter's main points, events, and ideas contained inside the <chapter><\chapter> XML tags below. 
            The Assistant will refrain from using bullet-point lists.
            The Assistant will refrain from including XML tags in the response.

            <chapter>
            {chapter.strip()}
            </chapter>"""

        chapter_summary = create_summary(prompt)
        chapter_summary = f"\nChapter {i + 1}:\n{chapter_summary}\n\n"
        summary += chapter_summary
        print(f"Chapter {i + 1}/{len(chapters)} completed...", end="\r")
    except Exception as ex:
        chapter_summary = f"Chapter {i + 1}/{len(chapters)} failed: {ex}"
        print(chapter_summary)
        summary += chapter_summary

print(f"Finish time: {datetime.datetime.now()}")

with open(f"./output/dracula_chapter_summaries.txt", "w") as f:
    f.write(summary)

Start time: 2023-11-06 03:01:35.369900
Finish time: 2023-11-06 03:09:13.529252


### Generate Summary of Summaries

In [38]:
# generate summary of summaries (using summary from above cell)

print(f"Start time: {datetime.datetime.now()}")

prompt = f"""Write a concise, grammatically correct, single-paragraph summary of the novel's main points, events, and ideas, contained inside the <summaries><\summaries> XML tags below. 
    The Assistant will refrain from using bullet-point lists.
    The Assistant will refrain from including XML tags in the response.
    
    <summaries>
    {summary}
    </summaries>"""

try:
    chapter_summary = create_summary(prompt)
except Exception as ex:
    print(ex)
    exit(0)

print(f"Finish time: {datetime.datetime.now()}")

with open(f"./output/dracula_summary_of_summaries.txt", "w") as f:
    f.write(chapter_summary)

Start time: 2023-11-06 16:33:22.706006
Finish time: 2023-11-06 16:33:52.935164


### Generate Chapter-level Main Character Descriptions

In [20]:
# generate chapter-level main character descriptions

summary = ""

print(f"Start time: {datetime.datetime.now()}")

for i, chapter in enumerate(chapters[0::]):
    try:
        prompt = f"""Provide a list of the chapter's 3-4 main characters and a brief description of each based on chapter inside the <chapter><\chapter> XML tags below.          
            The Assistant will order the main characters by how many times they are mentioned.
            The Assistant will number each character in the list starting at 1.
            The Assistant will refrain from including square brackets and XML tags in the response.

            Follow the template inside the <template><\template> XML tags below and replace the placeholders with the relevant information:
            <template>
            [Number]. [Character]: [Description]
            <template>

            Here is an example inside the <example><\example> XML tags below:
            <example>
            1. Pink Panther: A suave and smooth-talking anthropomorphic animated panther.
            </example>

            <chapter>
            {chapter.strip()}
            </chapter>"""

        chapter_summary = create_summary(prompt)
        chapter_summary = f"\nChapter {i + 1}:\n{chapter_summary}\n\n"
        summary += chapter_summary
        print(f"Chapter {i + 1}/{len(chapters)} completed...", end="\r")
    except Exception as ex:
        chapter_summary = f"Chapter {i + 1}/{len(chapters)} failed: {ex}"
        print(chapter_summary)
        summary += chapter_summary

print(f"Finish time: {datetime.datetime.now()}")

with open(f"./output/dracula_character_descs.txt", "w") as f:
    f.write(summary)

Start time: 2023-11-06 03:09:33.078763
Finish time: 2023-11-06 03:16:54.476901


### Generate Character Description of Count Dracula

In [39]:
# generate a description of count dracula (using summary from above cell)

print(f"Start time: {datetime.datetime.now()}")

try:
    prompt = f"""Write a concise, grammatically correct, single-paragraph description of the main character, Dracula (aka Count Dracula), based on the following individual character descriptions inside the <descriptions><\descriptions> XML tags below. 
    The Assistant will refrain from using bullet-point lists.
    The Assistant will refrain from including XML tags in the response.
    
    <descriptions>
    {summary}
    </descriptions>"""

    chapter_summary = create_summary(prompt)
except Exception as ex:
    print(ex)
    exit(0)

print(f"Finish time: {datetime.datetime.now()}")

with open(f"./output/dracula_character_desc_dracula.txt", "w") as f:
    f.write(chapter_summary)

Start time: 2023-11-06 16:35:04.814167
Finish time: 2023-11-06 16:35:24.786173


### Generate Chapter-level Character Type Descriptions

In [22]:
# generate chapter-level character type descriptions

summary = ""

print(f"Start time: {datetime.datetime.now()}")

for i, chapter in enumerate(chapters[0::]):
    try:
        prompt = f"""The following list of character types inside the <character_types></character_types> XML tags below, are often found in fictional literature: 
            <character_types>
            - Protagonist
            - Antihero
            - Antagonist
            - Guide
            - Contagonist
            - Sidekicks (Deuteragonist)
            - Henchmen
            - Love Interest
            - Temptress
            - Confidant
            - Foil
            </character_types>

            Based on this list of character types, give 3-4 examples of character types based on chapter inside the <chapter><\chapter> XML tags below.
            The Assistant will including the character name and an explanation of why.
            The Assistant will refrain from including square brackets and XML tags in the response.
            
            Follow the template inside the <template><\template> XML tags below and replace the placeholders, in square brackets, with the character name, character type, and explanation:
            <template>
            [Character_Name] - [Character_Type]: [Explanation]
            <template>

            Here is an example inside the <example><\example> XML tags below:
            <example>
            Love Interest - Minnie Mouse: Mickey Mouse's lifelong romantic interest.
            </example>

            <chapter>
            {chapter.strip()}
            </chapter>"""

        chapter_summary = create_summary(prompt)
        chapter_summary = f"\nChapter {i + 1}:\n{chapter_summary}\n\n"
        summary += chapter_summary
        print(f"Chapter {i + 1}/{len(chapters)} completed...", end="\r")
    except Exception as ex:
        chapter_summary = f"Chapter {i + 1}/{len(chapters)} failed: {ex}"
        print(chapter_summary)
        summary += chapter_summary

print(f"Finish time: {datetime.datetime.now()}")

with open(f"./output/dracula_character_types.txt", "w") as f:
    f.write(summary)

Start time: 2023-11-06 03:17:04.158120
Finish time: 2023-11-06 03:24:01.066948


### Generate Chapter-level Literary Device Descriptions

In [23]:
# generate chapter-level literary device descriptions

summary = ""

print(f"Start time: {datetime.datetime.now()}")

for i, chapter in enumerate(chapters[0::]):
    try:
        prompt = f"""The following list of literary devices inside the <literary_devices></literary_devices> XML tags below, are often found in fictional literature: 
            <literary_devices>
            Allegory, Alliteration, Allusion, Amplification, Anagram, Analogy, Anthropomorphism, Antithesis, 
            Chiasmus, Colloquialism, Circumlocution, Epigraph, Euphemism, Foreshadowing, Hyperbole, Imagery, 
            Metaphor, Mood, Motif, Onomatopoeia, Oxymoron, Paradox, Personification, Portmanteau, Puns, Satire, 
            Simile, Symbolism, Tone
            </literary_devices>

            Based on the list of literary devices, give 2-3 examples of literary devices found in the chapter inside the <chapter><\chapter> XML tags below, and explain why.
            The Assistant will use a bullet-point list.
            The Assistant will refrain from including square brackets and XML tags in the response.
            
            Follow the template inside the <template><\template> XML tags below for your response. Replace the placeholders, in square brackets, with the literary device and the explanation:
            <template>
            - [Literary_Device]: [Explanation]
            <template>

            <chapter>
            {chapter.strip()}
            </chapter>"""

        chapter_summary = create_summary(prompt)
        chapter_summary = f"\nChapter {i + 1}:\n{chapter_summary}\n\n"
        summary += chapter_summary
        print(f"Chapter {i + 1}/{len(chapters)} completed...", end="\r")
    except Exception as ex:
        chapter_summary = f"Chapter {i + 1}/{len(chapters)} failed: {ex}"
        print(chapter_summary)
        summary += chapter_summary

print(f"Finish time: {datetime.datetime.now()}")

with open(f"./output/dracula_literary_devices.txt", "w") as f:
    f.write(summary)

Start time: 2023-11-06 03:24:01.112827
Finish time: 2023-11-06 03:33:41.821564


### Generate Chapter-level Setting Descriptions

In [24]:
# generate chapter-level setting descriptions

summary = ""

print(f"Start time: {datetime.datetime.now()}")

for i, chapter in enumerate(chapters[0::]):
    try:
        prompt = f"""Provide a list of the no more than three settings and a brief description of each setting in the chapter inside the <chapter><\chapter> XML tags below.
            The Assistant will order the settings by how many times they are mentioned in the chapter.
            The Assistant will number the list of settings.
            The Assistant will refrain from including square brackets and XML tags in the response.
            
            Follow the template inside the <template><\template> XML tags below and replace the placeholders, in square brackets, with the relevant information:
            <template>
            [Number]. [Setting]: [Description]
            <template>

            Here is an example inside the <example><\example> XML tags below:
            <example>
            1. Hoboken, New Jersey: Part of the New York metropolitan area on the banks of the Hudson River across from lower Manhattan, where the story takes place.
            </example>

            <chapter>
            {chapter.strip()}
            </chapter>"""

        chapter_summary = create_summary(prompt)
        chapter_summary = f"\nChapter {i + 1}:\n{chapter_summary}\n\n"
        summary += chapter_summary
        print(f"Chapter {i + 1}/{len(chapters)} completed...", end="\r")
    except Exception as ex:
        chapter_summary = f"Chapter {i + 1}/{len(chapters)} failed: {ex}"
        print(chapter_summary)
        summary += chapter_summary

print(f"Finish time: {datetime.datetime.now()}")

with open(f"./output/dracula_chapter_settings.txt", "w") as f:
    f.write(summary)

Start time: 2023-11-06 03:33:41.867103
Finish time: 2023-11-06 03:40:22.419477


### Descriptive Adjectives

In [25]:
# generate chapter-level descriptive adjectives

summary = ""

print(f"Start time: {datetime.datetime.now()}")

for i, chapter in enumerate(chapters[0::]):
    try:
        prompt = f"""Provide a list of 5-6 adjectives that best describe the chapter inside the <chapter><\chapter> XML tags below. Also, provide a brief reason for each adjective chosen.
            The Assistant will refrain from using a bullet-point or numbered list.
            The Assistant will refrain from including square brackets and XML tags in the response.
            
            Follow the template inside the <template><\template> XML tags below and replace the placeholders, in square brackets, with the relevant information:
            <template>
            [Adjective]: [Reason]
            <template>

            Here is an example inside the <example><\example> XML tags below:
            <example>
            Relentless: The riders and their hounds were desperately chasing after the poor fox.
            </example>

            <chapter>
            {chapter.strip()}
            </chapter>"""

        chapter_summary = create_summary(prompt)
        chapter_summary = f"\nChapter {i + 1}:\n{chapter_summary}\n\n"
        summary += chapter_summary
        print(f"Chapter {i + 1}/{len(chapters)} completed...", end="\r")
    except Exception as ex:
        chapter_summary = f"Chapter {i + 1}/{len(chapters)} failed: {ex}"
        print(chapter_summary)
        summary += chapter_summary

print(f"Finish time: {datetime.datetime.now()}")

with open(f"./output/dracula_adjectives.txt", "w") as f:
    f.write(summary)

Start time: 2023-11-06 03:40:22.457079
Finish time: 2023-11-06 03:48:33.752949


### Generate Chapter-level Questions and Answers

In [26]:
# generate chapter-level questions and answers

summary = ""

print(f"Start time: {datetime.datetime.now()}")

for i, chapter in enumerate(chapters[0::]):
    try:
        prompt = f"""Generate a list of 6 hypothetical questions that the following chapter, inside the <chapter><\chapter> XML tags below, could be used to answer. 
            The Assistant will provide both the question and the answer.
            The Assistant will refrain from asking overly broad questions.
            The Assistant will refrain from using bullet-point lists.
            The Assistant will refrain from including square brackets and XML tags in the response.
            
            Follow the template inside the <template><\template> XML tags below and replace the placeholders, in square brackets, with the relevant information:
            <template>
            Q: [Question]
            A: [Answer]
            <template>

            Here is an example inside the <example><\example> XML tags below:
            <example>
            Q: What is the weather like in Spain?
            A: The rain in Spain stays mainly in the plain.
            </example>

            <chapter>
            {chapter.strip()}
            </chapter>"""

        chapter_summary = create_summary(prompt)
        chapter_summary = f"\nChapter {i + 1}:\n{chapter_summary}\n\n"
        summary += chapter_summary
        print(f"Chapter {i + 1}/{len(chapters)} completed...", end="\r")
    except Exception as ex:
        chapter_summary = f"Chapter {i + 1}/{len(chapters)} failed: {ex}"
        print(chapter_summary)
        summary += chapter_summary

print(f"Finish time: {datetime.datetime.now()}")

with open(f"./output/dracula_question_answer.txt", "w") as f:
    f.write(summary)

Start time: 2023-11-06 03:48:33.800358
Finish time: 2023-11-06 04:02:19.589223


### Generate Chapter-level Multiple Choice Questions

In [27]:
# generate chapter-level multiple choice questions

summary = ""

print(f"Start time: {datetime.datetime.now()}")

for i, chapter in enumerate(chapters[0::]):
    try:
        prompt = f"""Generate a list of 6 hypothetical multiple-choice questions that the following chapter, inside the <chapter><\chapter> XML tags below, could be used to answer. 
            The Assistant will provide the question, four possible answers lettered a,b,c, and d, and the correct answer.
            The Assistant will ask brief, specific questions.
            The Assistant will refrain from using bullet-point lists.
            The Assistant will refrain from including square brackets and XML tags in the response.
            
            Follow the template inside the <template><\template> XML tags below and replace the placeholders, in square brackets, with the relevant information:
            Q: [Question]
            (a) [Choice_1]
            (b) [Choice_2]
            (c) [Choice_3]
            (d) [Choice_4]
            A: (Letter) [Correct_Answer]
            <template>

            Here is an example inside the <example><\example> XML tags below:
            <example>
            Q: What color is fresh grass?
            (a) Red
            (b) Blue
            (c) Green
            (d) Yellow
            A: (c) Green
            </example>

            <chapter>
            {chapter.strip()}
            </chapter>"""

        chapter_summary = create_summary(prompt)
        chapter_summary = f"\nChapter {i + 1}:\n{chapter_summary}\n\n"
        summary += chapter_summary
        print(f"Chapter {i + 1}/{len(chapters)} completed...", end="\r")
    except Exception as ex:
        chapter_summary = f"Chapter {i + 1}/{len(chapters)} failed: {ex}"
        print(chapter_summary)
        summary += chapter_summary

print(f"Finish time: {datetime.datetime.now()}")

with open(f"./output/dracula_multiple_choice.txt", "w") as f:
    f.write(summary)

Start time: 2023-11-06 04:02:19.635007
Finish time: 2023-11-06 04:14:48.910214


## Using Foundation Models to Optimize Anthropic Prompts

Using foundation models to optimize the prompts demonstrated above.

### Using ChatGPT 3.5 to Optimize Prompts

[https://chat.openai.com/](https://chat.openai.com/)

My prompt: "Optimize the following prompt, written in Python, for the Anthropic Claude 2 foundation model:..."

Per ChatGPT 3.5, "This version of the prompt maintains the same instructions and expectations while being more concise and clear."


In [33]:
# generate chapter-level multiple-choice questions (chatgpt 3.5 optimized)

summary = ""

print(f"Start time: {datetime.datetime.now()}")

for i, chapter in enumerate(chapters[0::]):
    try:
        prompt = f"""Generate 6 multiple-choice questions that pertain to the content of the chapter within the <chapter> XML tags below. Each question should have four answer choices, labeled a, b, c, and d, with one of them being the correct answer. Keep the questions concise and avoid bullet-point lists or the inclusion of square brackets and XML tags in your response.

            Use the following template within the <template> XML tags:

            Q: [Question]
            (a) [Choice_1]
            (b) [Choice_2]
            (c) [Choice_3]
            (d) [Choice_4]
            A: (Letter) [Correct_Answer]
            <template>

            For example, within the <example> XML tags:

            <example>
            Q: What color is fresh grass?
            (a) Red
            (b) Blue
            (c) Green
            (d) Yellow
            A: (c) Green
            </example>

            <chapter>
            {chapter.strip()}
            </chapter>"""

        chapter_summary = create_summary(prompt)
        chapter_summary = f"\nChapter {i + 1}:\n{chapter_summary}\n\n"
        summary += chapter_summary
        print(f"Chapter {i + 1}/{len(chapters)} completed...", end="\r")
    except Exception as ex:
        chapter_summary = f"Chapter {i + 1}/{len(chapters)} failed: {ex}"
        print(chapter_summary)
        summary += chapter_summary

print(f"Finish time: {datetime.datetime.now()}")

with open(f"./output/dracula_multiple_choice_chatgpt35_optimized.txt", "w") as f:
    f.write(summary)

Start time: 2023-11-06 14:25:48.316079
Finish time: 2023-11-06 14:37:51.989010


### Using Anthropic Claude to Optimize Prompts

[https://claude.ai/chat](https://claude.ai/chat)

My prompt: "Optimize the following prompt, written in Python, for the Anthropic Claude 2 foundation model:..."

Per Claude, the key changes:

* Simplified instructions to generate 6 Q&A pairs about the chapter.
* Asked for brief, specific questions to avoid broad ones.
* Removed restrictions about bullet points and tags - the template makes it clear.
* Removed the example - the template shows the format.
* Streamlined the template to just Q and A.
* Used more natural language overall.

This focuses the prompt on generating concise Q&A pairs about the key ideas in the chapter. The simplified instructions make it clearer for the model. Removing unnecessary restrictions gives it more freedom. Overall this should produce more targeted questions and answers.

In [None]:
# generate chapter-level questions and answers (claude-optimized prompt)

summary = ""

print(f"Start time: {datetime.datetime.now()}")

for i, chapter in enumerate(chapters[0::]):
    try:
        prompt = f"""Generate a list of 6 brief, specific questions and answers about the chapter below.

            Follow this template:

            Q: [Question?]
            A: [Answer] 

            Make the questions clear and focused. Avoid broad, generic questions.  

            Keep the questions and answers short. Do not use bullet points, brackets or XML tags.

            <chapter>
            {chapter.strip()}
            </chapter>"""

        chapter_summary = create_summary(prompt)
        chapter_summary = f"\nChapter {i + 1}:\n{chapter_summary}\n\n"
        summary += chapter_summary
        print(f"Chapter {i + 1}/{len(chapters)} completed...", end="\r")
    except Exception as ex:
        chapter_summary = f"Chapter {i + 1}/{len(chapters)} failed: {ex}"
        print(chapter_summary)
        summary += chapter_summary

print(f"Finish time: {datetime.datetime.now()}")

with open(f"./output/dracula_question_answer_claude_optimized.txt", "w") as f:
    f.write(summary)

Start time: 2023-11-06 14:37:52.036256
Finish time: 2023-11-06 14:47:46.207766


My prompt: "Optimize the following prompt, written in Python, for the Anthropic Claude 2 foundation model:..."

Per Claude, the key changes:

* Simplified the instructions to focus on generating 6 MCQs for the chapter.
* Removed the example - the template makes it clear what's needed.
* Streamlined the template to just have the core elements (Q, choices A-D, correct answer).
* Asked the model to make questions and answers brief and clear.
* Removed unnecessary restrictions like bullet points and XML tags.
* Used more natural phrasing overall.

This focuses the prompt on the key task, simplifies the instructions, and lets the model know what makes a good MCQ. The result should be more targeted and higher quality questions.

In [36]:
# generate chapter-level multiple-choice questions (claude-optimized prompt)

summary = ""

print(f"Start time: {datetime.datetime.now()}")

for i, chapter in enumerate(chapters[0::]):
    try:
        prompt = f"""Generate a list of 6 multiple choice questions about the chapter below. 

            For each question, provide 4 answer choices labeled A, B, C and D, with the correct answer labeled in parentheses at the end. 

            Make the questions and answers brief and clear. Do not use bullet points, brackets or XML tags.

            Follow this template:

            Q: [Question?]
            A) [Choice 1]  
            B) [Choice 2]
            C) [Choice 3]  
            D) [Choice 4]
            A: (Letter) [Correct Answer]

            <chapter>
            {chapter.strip()} 
            </chapter>"""

        chapter_summary = create_summary(prompt)
        chapter_summary = f"\nChapter {i + 1}:\n{chapter_summary}\n\n"
        summary += chapter_summary
        print(f"Chapter {i + 1}/{len(chapters)} completed...", end="\r")
    except Exception as ex:
        chapter_summary = f"Chapter {i + 1}/{len(chapters)} failed: {ex}"
        print(chapter_summary)
        summary += chapter_summary

print(f"Finish time: {datetime.datetime.now()}")

with open(f"./output/dracula_multiple_choice_claude_optimized.txt", "w") as f:
    f.write(summary)

Start time: 2023-11-06 14:59:21.004121
Finish time: 2023-11-06 15:12:14.287979
