<a href="https://colab.research.google.com/github/michellepace/ai-ml-notebook-explorations/blob/main/notebook-01/Fix_Word_Doc_Claude.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **1. About & Usage** 🟢

## What this Notebook Does

>Elevate your Word document with AI-powered proofreading that goes beyond standard spell-check. This notebook uses Anthropic's [Claude Sonnet model](https://www.anthropic.com/claude) to scan your Word documents, catching subtle language and style errors that often slip past Microsoft Word. Whether you're writing in English, German, Italian, or French, it preserves your intended meaning while polishing your prose to perfection—even for very large documents.

<!-- Figure Clickable 700px Width -->
<figure>
<a href="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/01-what-this-notebook-does-using-claude.jpg"
target="_blank">
  <img src="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/01-what-this-notebook-does-using-claude-trimmed.jpg"
  width="900" alt="What this notebook does" />
</a>
</figure>

### Notebook Capabilities

1. Uses the Claude API for intelligent text processing
1. Handles large-scale text correction in multiple languages
1. Highlights corrections in colour for easy identification
1. Preserves original document semantic meaning and structure
1. Includes comprehensive testing and evaluation

### Showcasing Claude as my Mentor & Co-Creator

This notebook also doubles as a demonstration. It showcases how Claude can help create tools - like this very notebook - with basic coding skills and no idea of where to start. I am a product manager by profession, and without Claude next to me, I would never have built this. The solution was designed by Claude and about 70% of the code written and then refined by Claude. That's phenomenal.

<!-- Figure with caption 700px width -->
<figure>
 <figcaption>Opening Scene: Explaining my situation and problem to Claude</figcaption>
 <a href="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/02-initial-brainstorming-with-claude.jpg" target="_blank">
   <img src="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/02-initial-brainstorming-with-claude.jpg"
        width="700"
        alt="Opening Scene: Explaining my situation and problem to Claude" />
 </a>
</figure>

---

## Who this Notebook is for

**Someone with a Word document:**

That needs correction beyond what Word provides. In this case, you don’t need to read anything. Just run the Notebook, and wait for your corrected document to download.

**The Curious Beginner:**

If you're feeling like I was at the beginning of this project - knowing it is possible to programmatically use Claude but unsure where to begin - this Notebook is for you.

**For People Good at Coding:**

This Notebook showcases Claude's coding capabilities. About 70% of this Notebook was coded by Claude in response to my unsophisticated questions. Yes, there was a lot of refactoring, but say what you will, someone who knew very little still built something that works.

**And lastly, for my sister:**

Whose definition of AI was Canva and now happily Claude too. I know you won't make it this far, but here's hoping anyway.

---


## Notebook Usage ⭐

### ⬤ Running the Notebook</font>

To run this Notebook, you must have an [Anthropic Pro account](https://www.anthropic.com/pricing) with $0.05+ [credit](https://console.anthropic.com/settings/plans) for API usage. You'll need to create an [Anthropic API key](https://console.anthropic.com/settings/keys) too. The Notebook will stop to guide you on how to do this in the [Configuration](#id-configuration) section. Steps are:


1. Save your own copy of this Notebook: **File > Save a copy in Drive.** <font color='#FF1493'>*Important! This ensures 100% data privacy!*</font>

1. Click **Runtime > Run all**

1. Go to sections [Set Your Secret API Key⭐](#id-configuration) and [Set Your Word Document⭐](#id-configuration), enter your details and run again.

### ⬤ Without an Anthropic Pro Account</font>

No problem. **Each code block has a screenshot of example output beneath it**, showing how this Notebook works without you running it. Start by scanning the Table of Contents for an overview. Then focus on the explanations between code blocks – no need to read the code itself. You'll learn what I learned in twenty minutes, told in story form. You may find the [FAQ](#id-faq) section useful too.

### ⬤ Example Input and Output</font>

To give you an idea of how the notebook works, I've provided an example:

- **Input file:** [MyWordDoc.docx](https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/example-input-output/MyWordDoc.docx) - sample Word document with various errors.
- **Output file:** [MyWordDoc.docx.PROCESSED.html](https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/example-input-output/MyWordDoc.docx.PROCESSED.html) - corrected version with Claude's changes highlighted.
- **All screenshots** that you'll find below each code block were made from these


---

## Origin Story

This project fell into my lap over a Sunday lunch that I wasn’t at. My sister's friend was wrapping up her research paper - a hefty 40,000-word beast in German. With the deadline looming, she was spending too much time looking for mistakes Word had missed.

So, I decided to create this Notebook to use Claude to correct her Word documents. What makes it different from using the [normal Claude chat interface](https://claude.ai/) is that you can see the entire corrected document, and the colour corrections are very easy to spot.

Did I know how to build something like this? No. But I surmised (correctly) that with Claude's help, I would figure it out.

<!-- /Figure simple not clickable -->
<figure>
  <img src="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/03-origin-lunch-munich.jpg"
       alt="Origin of the project: A lunch I wasn't at" />
</figure>

---

## Claude as my Co-Creator

Claude may have generated close to 70% of the code in this Notebook. And it is true that everything I learned I learned through my interactions with Claude. But it started, very importantly, with two people.

After a few conversations with Claude about my goal, I sketched out my best guess of the "code process". Two strangers from an online community, Unmesh and Aaron, generously agreed to take my video call. Unmesh validated my approach, and Aaron inspired me to programmatically test the corrected document. But what I truly gained were two people who, without knowing it, had provided me with emotional backing. A sincere thank-you.

<!-- Figure with caption 300px width -->
<figure>
 <figcaption>Beginner's Blueprint: Imagining the coding journey ahead</figcaption>
 <a href="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/04-sketch-in-guessing-the-notebook-workflow.jpg" target="_blank">
   <img src="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/04-sketch-in-guessing-the-notebook-workflow.jpg"
        width="300"
        alt="Beginner's Blueprint: Imagining the coding journey ahead" />
 </a>
</figure>

I then turned to Claude as my AI co-creator for this Notebook. This marked the shift from concept to development. Claude became my constant companion, sitting beside me every day in my browser. I created a dedicated [Anthropic project](https://support.anthropic.com/en/articles/9517075-what-are-projects) for our chats and gave Claude specific instructions to enhance his assistance:

<!-- /Figure with caption 300px width-->
<figure>
 <figcaption>Character Creation: Tweaking Claude's expertise for our project</figcaption>
 <a href="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/05-telling-claude-who-he-is.jpg"
    target="_blank">
   <img src="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/05-telling-claude-who-he-is.jpg"
        width="300"
        alt="Character Creation: Tweaking Claude's expertise for our project" />
 </a>
</figure>

Collaborating with Claude was crucial in every aspect of developing this Notebook. Here are a very small handful of examples:

- **Problem-solving:** Claude designed the solution and helped me improve it (see section [Claude's Solution Picture](#id-claude-solution-picture))
- **Technical guidance:** Introduced me to Python libraries to streamline the coding process.
- **API integration:** Provided instructions for connecting to Claude programmatically.
- **Code development:** Wrote about 70% of the code and then improved it.
-	**Testing:** Helped me create a testing plan and generated test data for prompt testing (see section [Test My Prompt](#id-test-my-prompt))

I learned an enormous amount, especially when I argued with Claude. I’m a much better coder now too. You probably wouldn’t have thought so. Discussing concepts before diving into syntax made the coding process much easier and faster.

If you are new to AI development, don't be intimidated. When I started this project, I was starting near square one too. This Notebook is a testament to the power of AI-assisted learning and code generation.

### Why Claude?

Because the Claude Sonnet model is the most capable, put simply, he is the best. I trust Anthropic more than any other option. And I love their branding - typography, colours, simplicity, writing style - everything.

<!-- /Figure with caption 750px width-->
<figure>
 <figcaption>Performance Timeline: Charting Claude's rise in the AI landscape</figcaption>
 <a href="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/06-claude-compared-to-other-options.jpg"
    target="_blank">
   <img src="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/06-claude-compared-to-other-options.jpg"
        width="750"
        alt="Performance Timeline: Charting Claude's rise in the AI landscape" />
 </a>
</figure>

---

<a name="id-claude-solution-picture"></a>
## Claude's Solution Picture

This was on my first day of Notebook development with Claude. It provides a glimpse of how I brainstormed with him.

<!-- /Figure with caption 300px width-->
<figure>
 <figcaption>Dialogue to Design: Capturing our initial problem-solving conversation</figcaption>
 <a href="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/07-solution-brainstorm-extensive.jpg"
    target="_blank">
   <img src="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/07-solution-brainstorm-extensive.jpg"
        width="300"
        alt="Dialogue to Design: Capturing our initial problem-solving conversation" />
 </a>
</figure>


<!-- /Figure with caption 300px width-->
<figure>
  <figcaption>Visual Thinking: Claude's initial notebook solution</figcaption>
  <a href="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/09-solution-picture-claude.jpg"
     target="_blank">
    <img src="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/09-solution-picture-claude.jpg"
         width="300"
         alt="Digital Drafting: Claude's visual take on our solution" />
  </a>
</figure>

<!-- /Figure with caption 500px width-->
<figure>
  <figcaption>Cemented Understanding: Making the solution easier to see</figcaption>
  <a href="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/10-solution-picture-my-drawing-is-nicer.jpg"
     target="_blank">
    <img src="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/10-solution-picture-my-drawing-is-nicer.jpg"
         width="500"
         alt="emented understanding: Happens when I redraw pictures myself" />
  </a>
</figure>

---


# <font color="#5830D9">**2. Setup**</font> 🟣

## Python Libraries

Before I dive into actually “doing the work”, first I need to install and import the Python libraries this Notebook will use. Libraries are code written by other people which means there's less code for me to write. Many of these libraries were either suggested by Claude or I found them through Google.

In [None]:
# Install libraries
try:
    !pip install --upgrade-strategy only-if-needed --quiet \
        anthropic \
        langchain \
        markdown \
        numpy \
        python-docx \
        scikit-learn \
        sentence-transformers \
        strip-markdown \
        tqdm
    print("Success! All required libraries are installed.")

except Exception as e:
    print(f"An error occurred during library installation: {str(e)}")

# Import libraries
try:
    from google.colab import drive, files # Request access to your Word Document in Colab
    from google.colab import userdata # Request access to your secure Colab Secret: ANTHROPIC_API_KEY
    import docx # Read and write Word documents; extract paragraphs and create test Word docs.
    from pathlib import Path # Handle files easily and concisely
    import anthropic # Interact with Claude Model via Anthropic's API
    from langchain.text_splitter import MarkdownTextSplitter # Split markdown text into chunks
    from strip_markdown import strip_markdown # Remove markdown for content comparison
    from markdown import markdown # Convert corrected markdown file into a pretty HTML page
    import re # Analyze text using regular expressions
    from collections import Counter # To easily print test results
    import numpy as np # Process numerical data for preservation scores
    from tqdm.auto import tqdm # Display progress bar for chunk processing
    from sentence_transformers import SentenceTransformer # Transform text chunks into numeric vectors that represent their meaning
    from sklearn.metrics.pairwise import cosine_similarity # Calculates semantic similarity between text chunk vectors
    print("Success! Library imports are complete.")

except Exception as e:
    print(f"An error occurred during library importation: {str(e)}")

<!-- /Image simple not clickable -->
<img src="https://drive.google.com/uc?export=view&id=1-Pvw3IHrkl-hWEj6PlWRwB5Bdd4x5cWz"
/>


---

<a name="id-configuration"></a>

## Configuration

Specify your Word document and Anthropic API Key here. Other AI related global settings have been defined here too (eg prompting, models, chunk size etc.) as well as working files.

In [None]:
# @title Set Your Secret API key ⭐ { vertical-output: true, display-mode: "form" }
#@markdown **Instructions:** Run this block and follow the step-by-step guidance.<br>

#@markdown

anthropic_api_secret_name = 'ANTHROPIC_API_KEY'  # @param {type: "string"}

def validate_anthropic_api_key(api_key):
    if not api_key.startswith('sk-'):
        raise ValueError("Anthropic API keys start with \"sk-\"")
    if ' ' in api_key:
        raise ValueError("Anthropic API keys don't have white spaces.")
    if len(api_key) <= 100:
        raise ValueError("Anthropic API keys are longer than 100 characters.")

def get_anthropic_api_key(secret_name):
    try:
        api_key = userdata.get(secret_name)
        validate_anthropic_api_key(api_key)
        print("Success!")
        print(f"• Colab secret '{secret_name}' found.")
        print(f"• If the secret contains a valid API Key, we can connect to Claude.")
        print(f"• To change API Key: Click the \"key\" icon, delete '{secret_name}', rerun this block.")
        return api_key

    except userdata.SecretNotFoundError:
        print(f"🛑 Error: Colab secret '{secret_name}' not found in your Colab environment")
        print(" To fix:")
        print(f" 1. Click the \"key\" icon on the left of this Notebook")
        print(f" 2. Add new secret with name '{secret_name}'")
        print(f" 3. Set value to Anthropic API key from: https://console.anthropic.com/settings/keys")
        print(f" 4. Rerun this block and follow next instructions")
        print(" About Colab secrets: https://bit.ly/4cad0v7")
        print("🛑🛑🛑\n")
        raise
    except userdata.NotebookAccessError:
        print(f"🛑 Error: You denied this Notebook access to your Colab secret '{secret_name}'")
        print(" To fix:")
        print(" 1. Rerun this block and click \"Grant access\"")
        print(" About Colab secrets: https://bit.ly/4cad0v7")
        print(" Worried about safety? Save your own copy of this Notebook and run that.")
        print("🛑🛑🛑\n")
        raise
    except ValueError as ve:
        print(f"🛑 Error: Invalid format, {str(ve)}")
        print(" To fix:")
        print(f" 1. Click the \"key\" icon on the left of this Notebook")
        print(f" 2. Delete '{anthropic_api_secret_name}'")
        print(f" 4. Rerun this block and follow next instructions")
        print("🛑🛑🛑\n")
        raise
    except Exception as e:
        print("🛑 Unexpected error occurred")
        print(" Please check:")
        print(f" 1. '{secret_name}' secret exists in Colab (click \"key\" icon on the left)")
        print(" 2. Secret value is a valid Anthropic API key")
        print(" Get API key: https://console.anthropic.com/settings/keys")
        print(" About Colab secrets: https://bit.ly/4cad0v7")
        print("🛑🛑🛑\n")
        raise


### Do the work
MY_ANTHROPIC_API_KEY = get_anthropic_api_key(anthropic_api_secret_name)

<!-- /Image simple not clickable -->

<img src="https://drive.google.com/uc?export=view&id=1DlZ-6qr7Lljwr4KeBC4d0xRPJM5P6JKE"
/>

---

In [None]:
# @title Set Your Word Document ⭐ { vertical-output: true, display-mode: "form" }
#@markdown **Instructions:** Specify the Word document you want corrected.

#@markdown **Step 1:** Upload the Word document to your [Google Drive](https://drive.google.com/drive/my-drive)<br>
#@markdown **Step 2:** Input the full name of the Word document file (see example)<br>
#@markdown **Step 3:** Run this cell by clicking the little "play" icon just under the title<br>

#@markdown **Example** Google Drive files start with <font color='#FF1493'>/content/​drive/​MyDrive/</font>
#@markdown just like: `/content/drive/MyDrive/my-animal-folder/pony.docx`

my_input_docx_file = '/content/drive/MyDrive/TESTX/DONE.docx' #@param {type:"string", placeholder:"(here is an example)   /content/drive/MyDrive/MyWordDoc.docx"}

def validate_google_drive_docx(file_path: str) -> Path:
    """
    Validate the input Word document file in Google Drive.

    :param file_path: String path to Word document on Google Drive
    :return: Path object of the validated file
    :raises: Various exceptions for invalid cases
    """

    try:
        # Check if input is empty
        if not file_path.strip():
            raise ValueError("No file path provided in input box")

        # Mount Google Drive if not already mounted
        if not Path('/content/drive').exists():
            try:
                drive.mount('/content/drive')
            except Exception as e:
                if "credential propagation was unsuccessful" in str(e).lower():
                    raise PermissionError("You denied access to Google Drive.")
                else:
                    raise e

        file_path = Path(file_path)

        # Check if the file exists
        if not file_path.exists():
            raise FileNotFoundError(f"This file doesn't exist in your Google Drive: '{file_path}'")

        # Check if the file has a .docx extension
        if file_path.suffix.lower() != '.docx':
            raise ValueError(f"This file doesn't have a .docx extension: '{file_path}'")

        # Check if the file is not empty
        if file_path.stat().st_size == 0:
            raise ValueError(f"This file is empty: '{file_path}'")

        # Check if the file can be opened as a Word document
        try:
            docx.Document(file_path)
        except Exception as e:
            raise ValueError(f"This file couldn't be opened: '{file_path}'. Error: {str(e)}")

        print("Success!")
        print(f"• Word document found: {file_path.absolute()}")
        print(f"• I'll be sending this to Claude for correction")
        return file_path

    except ValueError as ve:
        print(f"🛑 Error: {str(ve)}")
        print("To fix:")
        if str(ve) == "No file path provided in input box":
            print(" 1. You must input your Word document file name in the input box above.")
            print(" 2. Read and follow the each step above.")
        else:
            print(" 1. Ensure the file is a valid .docx document")
            print(" 2. Check if the file is not corrupted or empty")
        print("🛑🛑🛑\n")
        raise
    except PermissionError as pe:
        print(f"🛑 Error: {str(pe)}")
        print("To fix:")
        print(" 1. Rerun this block and click \"Connect to Google Drive\"")
        print(" Worried about safety? Save your own copy of this Notebook and run that.")
        print("🛑🛑🛑\n")
        raise
    except FileNotFoundError as fnf:
        print(f"🛑 Error: {str(fnf)}")
        print("To fix:")
        print(" 1. In Input Box instructions, look at the pink Example given")
        print(" 2. Verify your file exists in your Google Drive on that exact path")
        print(" 3. Remember, file paths and names are case-sensitive")
        print("🛑🛑🛑\n")
        raise
    except Exception as e:
        print("🛑 Unexpected error occurred")
        print(f"Error details: {str(e)}")
        print("Sorry... really don't know how you got here.")
        print("🛑🛑🛑\n")
        raise


### Do the work
try:
    my_input_docx_file = validate_google_drive_docx(my_input_docx_file)
except Exception:
    print("Please update the file path and run this block again.")
    raise

<!-- /Image simple not clickable -->
<img src="https://drive.google.com/uc?export=view&id=1EAZoEYYfLXAlG1YM_A4F3THMWXBIWygM"
/>


---

<a name="id-ai-settings"></a>
### AI Settings

Nothing to be done here. Settings for prompting, Claude, (and working files)

- Working files (Not AI settings, just lives here)

In [None]:
# Output Files (not AI Settings)
original_md_file = Path(f"{my_input_docx_file}.ORIG.md") # Your Word doc extracted into markdown format (.md).
processed_md_file = Path(f"{my_input_docx_file}.PROCESSED.md") # Your Word doc with all the corrections, in markdown format.
processed_html_file = Path(f"{my_input_docx_file}.PROCESSED.html") # Your Word doc with all Claude's corrections, in HTML format for easy reading.

- AI settings (excluding prompt)

In [None]:
# Chunking and Similarity
TARGET_CHUNK_CHARACTERS = 4000 # The target chunk size (in characters) to split your Word Document in.
SIMILARITY_LARGE_LANGAUGE_MODEL = 'paraphrase-multilingual-mpnet-base-v2' # LLM for checking meaning similarity between chunks

# Everything Claude:
CLAUDE_MODEL  = "claude-3-5-sonnet-20240620"
PROMPT_TEMP = 0  # Low temperature for more probable and consistent output (0 to 1)

MY_ANTHROPIC_CLIENT = anthropic.Anthropic(
    api_key=MY_ANTHROPIC_API_KEY, # Set in Notebook section: "Set Your API key ⭐"
    max_retries=2,  # Maximum retry attempts per API request (text chunk)
    timeout=20.0   # Timeout in seconds for each individual API request (text chunk)
)

 - AI settings (prompt for Claude)

 This is all the work Claude is going to do for us. A prompt is how you tell Claude what to do.

In [None]:
PROMPT_TEMPLATE = """
CRITICAL: PROVIDE ONLY THE CORRECTED TEXT WITHOUT ANY ADDITIONAL COMMENTARY.

Your task is to take the provided text and rewrite it into a clear, grammatically correct version while preserving the original meaning as closely as possible. Correct any spelling mistakes, punctuation errors, verb tense issues, word choice problems, and other grammatical mistakes.

MANDATORY INSTRUCTIONS:

1. Determine and use the same linguistic language as the original text (e.g., English, German)
2. Preserve all existing markdown formatting, including heading levels, paragraphs, and lists
3. Make necessary grammatical corrections, including spelling, punctuation, verb tense, word choice, and other grammatical issues. Only make stylistic changes if essential for clarity
4. Mark corrections with markdown syntax, apply one of these choices only:
   - For changed text use bold: e.g., **changed** and **multiple changed words**
   - For new text use bold: **new words**
   - For removed text use bold strikethrough: **~~removed words~~**
5. Maintain the original structure:
   - Don't add new lines of text
   - Don't include additional commentary at all
   - Don't convert markdown elements to different types
6. For ambiguous corrections, choose the option that best preserves original meaning and style
7. Ensure consistency in corrections throughout the text
8. Return the corrected text in markdown syntax
9. DO NOT add any explanations, introductions, or conclusions to your response

FINAL REMINDER: Your output should consist SOLELY of the corrected text. Do not include phrases like "Here is the corrected text" or any other form of commentary.

The text to be corrected is provided between the triple tildes (~~~):

~~~
{the_markdown_chunk}
~~~

REMEMBER: Provide ONLY the corrected text without any additional words or explanations."""

- Confirmirmation of what has just been configured.

In [None]:
print("Success! All configuration complete:")
print(" • Your Anthropic API Key⭐")
print(" • Your Word document (Google Drive)⭐")
print(" • AI settings (Working files. Not AI settings, just lives there.)")
print(" • AI settings (excluding prompt)")
print(" • AI settings (prompt for Claude)")

<!-- /Image simple not clickable -->
<img src="https://drive.google.com/uc?export=view&id=1FUChoqUCvxWYTn6ziSxQU5B5l91owvzC" />

---

# <font color="#5830D9">**3. Pre-processing**</font> 🟣

Initially I thought I could simply send your entire Word document to Claude for correction. But this isn't possible when you are using Claude programmatically. In place of this, Claude told me I need to first do the below "pre-processing" steps:

1.	**Extract** the text from your Word document
1.	**Convert** it into markdown format
1.	**Split** the Markdown text into smaller pieces (which I'll call "**chunks**")

When these steps are done, we'll move on to sending each "text chunk" to Claude for processing. In the case of this Notebook, processing means correcting the text in each chunk. The sections which follow explain and implement these steps.

## Extract Word doc

In this  first step, I extract the text from your Word document. I make sure to maintain the same headings, bullet lists, and paragraphs as in your document. For simplicity, I ignored everything else like text found in tables, images, charts, headers and footers. **The output below the code** shows a summary of the paragraphs that were extracted from your Word document.

In [None]:
def extract_docx_paragraphs(docx_file: Path) -> list[dict]:
    """
    Extract non-empty paragraphs from a Word document.

    :param docx_file: Path object pointing to the Word document file (.docx)
    :return: List of dictionaries, each containing 'text', 'style', 'heading_level', and 'word_count' of a paragraph
    """
    doc = docx.Document(docx_file)

    def extract_heading_level(paragraph):
        if paragraph.style.name.startswith('Heading'):
            try:
                return int(paragraph.style.name.split()[-1])
            except ValueError:
                pass
        return None

    return [
        {
            'text': para.text,
            'style': para.style.name,
            'heading_level': extract_heading_level(para),
            'word_count': len(para.text.split())
        }
        for para in doc.paragraphs
        if para.text.strip()
    ]

def print_paragraph_summary(paragraphs: list[dict]) -> None:
    """
    Print a summary of paragraphs extracted from a Word document.

    :param paragraphs: List of dictionaries containing paragraph information
    """
    style_counts = Counter(para['style'] for para in paragraphs)
    total_word_count = sum(para['word_count'] for para in paragraphs)

    # Column width constants
    STYLE_WIDTH = 20
    HEADING_LEVEL_WIDTH = 12
    PARAGRAPH_COUNT_WIDTH = 16
    WORD_COUNT_WIDTH = 16

    # Total width of the table
    TABLE_WIDTH = STYLE_WIDTH + HEADING_LEVEL_WIDTH + PARAGRAPH_COUNT_WIDTH + WORD_COUNT_WIDTH

    print(f"Success!")
    print(f" • Document extracted into {len(paragraphs)} paragraphs with {len(style_counts)} styles")
    print("-" * TABLE_WIDTH)

    # Print header
    print(f"{'Word Style':<{STYLE_WIDTH}}"
          f"{'HeadingLevel':^{HEADING_LEVEL_WIDTH}}"
          f"{'No.Paragraphs':>{PARAGRAPH_COUNT_WIDTH}}"
          f"{'No.Words':>{WORD_COUNT_WIDTH}}")
    print("-" * TABLE_WIDTH)

    # Print rows
    for style, count in sorted(style_counts.items()):
        style_info = next(para for para in paragraphs if para['style'] == style)
        heading_level = str(style_info['heading_level'] or "")
        word_count = sum(para['word_count'] for para in paragraphs if para['style'] == style)
        print(f"{style:<{STYLE_WIDTH}}"
              f"{heading_level:^{HEADING_LEVEL_WIDTH}}"
              f"{count:>{PARAGRAPH_COUNT_WIDTH},}"
              f"{word_count:>{WORD_COUNT_WIDTH},}")

    # Print footer
    unique_styles = f"({len(style_counts)} unique styles)"
    print("-" * TABLE_WIDTH)
    print(f"{unique_styles:<{STYLE_WIDTH}}"
          f"{'':<{HEADING_LEVEL_WIDTH}}"
          f"{len(paragraphs):>{PARAGRAPH_COUNT_WIDTH},}"
          f"{total_word_count:>{WORD_COUNT_WIDTH},}")


### Do the work
docx_paragraphs = extract_docx_paragraphs(my_input_docx_file)
print(f"Extracted document: {my_input_docx_file.absolute()}\n")
print_paragraph_summary(docx_paragraphs)

<!-- /Image simple not clickable -->
<img src="https://drive.google.com/uc?export=view&id=1MPdPh5vQfdsLHbPLlIhVmNZjTpaZr4aX"
/>

## Convert to Markdown

After extracting the text, I convert it into [markdown format]( https://markdownguide.offshoot.io/getting-started) and save it as a markdown file (.md). Here's why I chose markdown:

1. I can preserve the structure of your Word document - like headings, bullet lists, and paragraphs - in a text-based format. This means the corrected document can have the same structure.
1. Claude can format corrections in bold, and later I can add colour.
1. It's a clean format that large language models can read efficiently (unlike HTML).
1. Rupert from Slack told me to (Claude initially recommended HTML and I believed him, but Rupert was right).



In [None]:
def create_simple_markdown_file(paragraphs: list[dict], markdown_file: Path) -> Path:
    """
    Convert paragraphs to Markdown format and save to a file.

    :param paragraphs: List of dictionaries containing paragraph information
    :param markdown_file: Path to save the generated Markdown file
    :return: Path of the created Markdown file
    """
    def format_paragraph(para):
        if para['heading_level'] is not None:
            return f"{'#' * para['heading_level']} {para['text']}"
        elif para['style'].startswith('List'):
            return f"- {para['text']}"
        else:
            return para['text']

    # Filter out empty paragraphs and format the rest
    formatted_paragraphs = [format_paragraph(para) for para in paragraphs if para['text'].strip()]

    markdown_content = []
    for i, current_para in enumerate(formatted_paragraphs):
        if i > 0:
            prev_is_list = formatted_paragraphs[i-1].startswith("- ")
            current_is_list = current_para.startswith("- ")
            # Single newline for consecutive list items, double for others
            markdown_content.append("\n" if prev_is_list and current_is_list else "\n\n")

        markdown_content.append(current_para)

    # Join paragraphs and normalise spacing:
    markdown_content = "".join(markdown_content)
    # Remove consecutive empty lines and strip leading/trailing whitespace
    markdown_content = re.sub(r'\n{3,}', '\n\n', markdown_content.strip())

    markdown_file.write_text(markdown_content, encoding='utf-8')
    return markdown_file


### Do the work
original_md_file = create_simple_markdown_file(docx_paragraphs, original_md_file)
print(f"Success!\n Your Word document text has been saved as a markdown file:\n '{original_md_file.absolute()}'")

<!-- /Image simple not clickable -->
<img src="https://drive.google.com/uc?export=view&id=1gesXZuj99cqn6I54kuR-RH4YPR0xK_JT"
/>

## Split into Chunks

Now that I’ve converted all the extracted text from your Word document into one markdown file, we’re onto splitting this file into little chunks. So instead of one big file, I might end up with, say, 12 chunks that, if joined back together, would be identical to the markdown file. I do this so I can send these smaller chunks to Claude for correction, one by one. This picture explains it very well:

<!-- /Figure with caption 500px width-->
<figure>
  <figcaption>Chunking: From big file into little text chunks</figcaption>
  <a href="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/10-solution-picture-my-drawing-is-nicer.jpg"
     target="_blank">
    <img src="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/10-solution-picture-my-drawing-is-nicer.jpg"
         width="500"
         alt="Splitting your Word document into little text chunks to send to Claude" />
  </a>
</figure>

You might wonder, **"Why not just send everything to Claude at once?"** I wondered the same. Claude has a limit on how much text he can send back in reply to any prompt (4,096 tokens of text is the Claude's max.). So, if I sent all the markdown text at once, a lot of the content would be left out in Claude's corrected reply.

Small chunks are also quite handy. They make it easy to compare the "original chunk" with the "processed chunk" I get back from Claude. For instance, I can quickly spot if the semantic meaning of the corrected chunk has changed compared to the original chunk. It's much easier to notice oddities with little chunks than one enormous chunk. Plus, if something goes wrong, I'd rather have a small chunk fail than the entire document.

**The output below the code** shows a summary of the chunks your Word document got split into. The number of chunks is determined by the target chunk size I set. More on that below if you are interested.

### Why I chunked with characters (not tokens)

I chose to split the text into chunks based on character count rather than tokens. For instance, “split the document into chunks each about 1,000 characters long.” When working with large language models, it is far more common to split text by tokens, not character count. If you're curious as to why I chose characters (instead of tokens), here's the explanation:

**First, let's talk about "tokens."** They're just a way to measure text size, like kilograms measure a person's weight. But here's the twist: while a kilogram is the same everywhere, tokens aren't. Each large language model has its own way of counting text size using its own “tokeniser”. So "87 Anthropic Claude tokens" isn't the same amount of text as "87 OpenAI GPT tokens". Language models don’t measure the size of text by counting words or characters like we do.

Given that Claude is limited to replying with no more than 4,096 "Anthropic tokens”, it would obviously make sense to split text by counting tokens. So why did I use characters instead?

I got a bit lazy. My text is in Markdown format, and I didn't want to break apart Markdown elements accidentally when splitting the text into chunks. I found a library that splits Markdown text while respecting its structure. The problem is that this library only allows me to specify target chunk size in characters (not tokens). Still, I thought it a better workaround than writing more code. As in, I thought doing rough math estimates was easier.

### How I determined my target chunk size (in characters)

To determine my ideal chunk size (in characters), I first analysed token, character, and word counts on random text I generated with Claude. I used the Anthropic Tokeniser to count the tokens.

<!-- /Figure with caption full width-->
<figure>
  <figcaption>Estimating Conversion: Between words, characters, and Anthropic Tokens</figcaption>
  <a href="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/11-sizing-text-by-counting-words-characters-and-anthropic-tokens.jpg"
     target="_blank">
    <img src="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/11-sizing-text-by-counting-words-characters-and-anthropic-tokens.jpg"
         alt="Estimating conversion rates between words, characters, and Anthropic Tokens" />
  </a>
</figure>

Then I did some math:

1. Claude can return a maximum of 4,096 tokens per chunk
1. My equation: (prompt size) + (chunk size) + (wiggle room for corrections) <= 4,096 tokens
1. Anthropic’s tokeniser measured my prompt (without a chunk attached) at 405 tokens.
1. I decided 500 tokens for correction wiggle room would be enough.
1. So: (405 tokens) + (chunk size) + (500 tokens) <= 4,096 tokens
1. This means that my chunk size must be no more than 3,191 tokens

And now using the data above to approximate:

- 3,191 tokens x 5.89 characters = 18,795 characters

But hold on, 18,795 characters in English is about 2,565 words which is 5 pages in Microsoft Word. I want smaller chunks to pinpoint subtle changes more easily. In my prompt I instruct Claude to “make corrections but retain original meaning.” Having smaller chunks to compare means I can detect changes in meaning with more sensitivity.

**I chose 4,000 characters as my target chunk size because** this is about 545 words (just less than one page in Word) and about (4,000/5.89) 679 Anthropic tokens. Remember my wiggle of 500 tokens? This is about 400 words which is more than enough as I’m only sending Claude 545 words for correction anyhow. My numbers are rough approximations using the sample data above. But with a target chunk size of 4,000 characters (or 679 tokens) there are 2,512 tokens of room left anyhow:

- So: (405 tokens) + (chunk size: 679 tokens) + (500 tokens) <= 4,096 tokens

The only downside I see in using small chunks is a little more cost. Costs for API usage are calculated based on the number of tokens used, not per prompt. Because I’m sending more prompts than I need to, it just means I’m sending 405 tokens (i.e., the prompt measured without the chunk) more times than I have to as I'll be sending more chunks. That’s okay.

**In the output below the code**, you’ll see a summary of how your Word document got chunked.


In [None]:
def split_markdown_into_chunks(
    markdown_file: Path,
    target_chunk_size_chars: int = 2000,
    chunk_overlap: int = 0
) -> list[str]:
    """
    Split a Markdown file into text chunks using MarkdownTextSplitter.

    :param markdown_file: Path to the Markdown file
    :param target_chunk_size_chars: Target size of each chunk in characters
    :param chunk_overlap: Number of overlapping characters between chunks
    :return: List of text chunks (that don't exceed the target chunk size)
    """
    text = markdown_file.read_text(encoding='utf-8')
    text_splitter = MarkdownTextSplitter(chunk_size=target_chunk_size_chars, chunk_overlap=chunk_overlap)
    chunks = text_splitter.split_text(text)
    print(f"Success!")
    print(f"• The markdown file containg your Word document text has been split into {len(chunks)} manageable chunks.")
    print(f"• Each chunk aims to be around {target_chunk_size_chars:,} characters long (never more).")
    print(f"• These chunks are now ready to be sent to Claude for processing (ie correction).\n\n")
    return chunks

def count_claude_tokens(text):
    """Count tokens using Anthropic's tokenizer for Claude."""
    return MY_ANTHROPIC_CLIENT.count_tokens(text)

def print_chunking_summary(chunks: list[str]) -> None:
    chunk_sizes = [len(chunk) for chunk in chunks]
    chunk_sizes_tokens = [count_claude_tokens(chunk) for chunk in chunks]
    avg_size_chars = sum(chunk_sizes) / len(chunks)
    avg_size_tokens = sum(chunk_sizes_tokens) / len(chunks)

    print("Summary of All Chunks:")
    print("~" * 58)
    print(f"Total Chunks: {len(chunks):7}")
    print(f"   Avg.Chunk size: {avg_size_chars:>7,.0f} chars | {avg_size_tokens:>4,.0f} anthropic tokens")
    print(f"   Min.Chunk size: {min(chunk_sizes):>7,} chars | {min(chunk_sizes_tokens):>4,} anthropic tokens")
    print(f"   Max.Chunk size: {max(chunk_sizes):>7,} chars | {max(chunk_sizes_tokens):>4,} anthropic tokens\n\n")

def print_chunk_table(chunks: list[str]) -> None:
    print("Summary per Chunk:")
    print("~" * 108)
    print(f"{'':36}{'Anthropic':>15}")
    print(f"{'Original':<10}{'Lines':>10}{'Words':>10}{'Chars':>10}{'Tokens':>11}    {'Original Chunk Start':<70}")
    print("~" * 108)

    total_lines = total_words = total_chars = total_tokens = 0

    for i, chunk in enumerate(chunks, 1):
        chunk_num = i
        lines = len(chunk.splitlines())
        words = len(chunk.split())
        chars = len(chunk)
        tokens = count_claude_tokens(chunk)
        chunk_start = chunk.replace('\n', ' ').replace('\r', '')[:50] + "..."

        total_lines += lines
        total_words += words
        total_chars += chars
        total_tokens += tokens

        print(f"Chunk {chunk_num:<4}{lines:>10,}{words:>10,}{chars:>10,}{tokens:>11,}    {chunk_start:<70}")

    print("~" * 108)
    print(f"{'Total':<10}{total_lines:>10,}{total_words:>10,}{total_chars:>10,}{total_tokens:>11,}")


### Do the work
original_chunks = split_markdown_into_chunks(original_md_file, target_chunk_size_chars=TARGET_CHUNK_CHARACTERS)
print_chunking_summary(original_chunks)
print_chunk_table(original_chunks)

<!-- /Image simple not clickable -->
<img src="https://drive.google.com/uc?export=view&id=1rIhgyr2WDAvBrl6DjAwmXqg8gGy3HMmY"
/>

- Uncomment the below to see what is inside a chunk

In [None]:
def print_chunks(chunks: list[str], chunks_to_print: list[int] | None = None) -> None:
    """
    Print specified chunks from a list of text chunks.

    :param chunks: List of text chunks to be printed
    :param chunks_to_print: Optional list of chunk numbers to print. If None, all chunks are printed
    :return: None
    """
    total_chunks = len(chunks)

    if chunks_to_print is None:
        chunks_to_print = list(range(1, total_chunks + 1))

    for chunk_number, chunk in enumerate(chunks, start=1):
        if chunk_number in chunks_to_print:
            print(f"🟨 Chunk {chunk_number} contains:{'🟨' * 33}")
            print(chunk)
            print("🟨" * 9)

    # Check for requested chunks that don't exist
    for requested_chunk in chunks_to_print:
        if requested_chunk > total_chunks:
            print(f"🟥 Chunk {requested_chunk} does not exist - there are {total_chunks} chunks.")


# Example Usage
# Try print chunk 1, chunk 105, chunk 122
# print_chunks(original_chunks, [1, 105, 122])

- Uncomment the below if you want to count words, characters, and tokens

In [None]:
def analyze_text_files_words_chars_tokens(file_names):
    for file_name in file_names:
        file_path = Path(file_name)

        if not file_path.exists():
            print(f"File not found: {file_name}")
            continue

        content = file_path.read_text(encoding='utf-8')

        word_count = len(content.split())
        char_count = len(content)
        token_count = count_claude_tokens(content)
        avg_chars_per_token = char_count / token_count if token_count > 0 else 0
        avg_tokens_per_word = token_count / word_count if word_count > 0 else 0

        print(f"File name: {file_name.upper()}")
        print(f"{'Word count:':<30} {word_count:>15,.2f}")
        print(f"{'Character count:':<30} {char_count:>15,.2f}")
        print(f"{'Anthropic token count:':<30} {token_count:>15,.2f}")
        print(f"{'Avg. Characters per token:':<30} {avg_chars_per_token:>15.2f}")
        print(f"{'Avg. Tokens per word:':<30} {avg_tokens_per_word:>15.2f}")
        print()


# Example Usage
# file_list = ['english.txt', 'german.txt', 'prompt.txt']
# analyze_text_files_words_chars_tokens(file_list)

# <font color="#5830D9">**4. Processing**</font> 🟣

## The Prompt (+chunk)

To send an individual chunk to Claude, I embed it in the below prompt. That means, if there are say 20 chunks, then Claude will be prompted 20 times. Read the output below the code, most especially between the `~~~` triple tildes for a deeper explanation.

In [None]:
def print_prompt_with_chunk_example():
    example_chunk = f"""# I'M A LITTLE EXAMPLE CHUNK OF MARKDOWN TEXT TELLING A STORY.
- Chunks are placed in between the '~~~' band, this is exactly where I am right now.
- Everything above or below the '~~~' band always stays the same for every chunk sent to Claude for processing.
- To make a chunk, your Word document was converted into markdown format and then split into little 'pieces' (ie chunks)
- To do the splitting, the chunk size was set to a maximum of {TARGET_CHUNK_CHARACTERS} characters (Michelle chose it).
- When I get sent to Claude in this prompt, he'll apply this prompt to me, and then return an corrected version of me.
- If your Word doc got spit into 19 chunks, then Claude will be prompted 19 times and return 19 corrected chunks.
- When Claude has processed all of us chunks, we're reassembled together in order so we form one document again.
- But remember we'll still in markdown format. So we get converted into a pretty HTML file just for you.
- In this html file you'll be able to see all the corrections Claude made as instructed by this prompt.
- Finally, looking at this pretty html file you can decide which changes to incorporate into your Word document.
- Again, I'm just an **example** chunk. The real ones will be sent to Claude. Not me!"""

    prompt_with_chunk = PROMPT_TEMPLATE.format(the_markdown_chunk=example_chunk)
    print(prompt_with_chunk)


### Do the work
print("WHAT IS HAPPENING?")
print("- Below this SOLID line is the actual prompt that will be sent to Claude for every chunk.")
print("- In between the '~~~' is where an individual chunk is embedded into the prompt.")
print("- Confused? Read the story the little example chunk tells you in this prompt:")
print("_" * 90, "\n")

print_prompt_with_chunk_example()

<!-- /Image: Thumbnail different to full image -->
<a href="https://drive.google.com/file/d/1PI1aFEiQMWf03fjtlUhCjTRcDiwALNQF/view?usp=drive_link"
   target="_blank">
  <img src="https://drive.google.com/uc?export=view&id=1vxQFW-ZuKEkUqZflP9ikVDWmDYZ470sN"
  />
</a>


## Connect to Claude

If there was one part that scared me the most about attempting this Notebook, it’s this part. I’ve always struggled to understand API documentation. API stands for "Application Programming Interface" and because I am conversing with Claude programmatically, I am using the [Anthropic Python API](https://github.com/anthropics/anthropic-sdk-python/blob/main/README.md).

To get help, I told Claude that he was an Anthropic API expert. Surprisingly, he generated code based on an old version of the API. So I attached the latest README file, told him to study it, and then to help me. Below is the starting prompt I used to start our conversation.

<!-- /Figure with caption 900px width-->
<figure>
  <figcaption>Helping Claude: You're an API expert but read the help file</figcaption>
  <a href="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/12-tell-claude-he-is-an-api-expert-and-give-him-the-help-file.jpg"
     target="_blank">
    <img src="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/12-tell-claude-he-is-an-api-expert-and-give-him-the-help-file.jpg"
         width="900"
         alt="Claude you're an API expert, but just read this help file quickly." />
  </a>
</figure>

In [None]:
def test_anthropic_connection(anthropic_client: anthropic.Anthropic) -> None:
    """
    Test the connection to the Anthropic API by sending a simple message.

    :param anthropic_client: An instance of the Anthropic client
    :raises KeyboardInterrupt: If the connection fails
    """
    my_test_prompt = "Hello Claude, have I really connected to you?"
    try:
        message = anthropic_client.messages.create(
            model= CLAUDE_MODEL,
            max_tokens=30, # Reduced for quick response
            temperature=0.99,
            messages=[
                {"role": "user", "content": my_test_prompt}
            ]
        )
        print("Success! API key is valid and working.")
        print(f"  My prompt was:  {my_test_prompt}")
        print(f"  Claude replied: {message.content[0].text}")

    except anthropic.APIError as e:
        print(f"API error occurred: {e}")
        raise KeyboardInterrupt("Connection test failed. Stopping execution 🛑.") from e
    except Exception as e:
        print(f"Unexpected error occurred: {e}")
        raise KeyboardInterrupt("Connection test failed. Stopping execution 🛑.") from e


### Do the work
test_anthropic_connection(MY_ANTHROPIC_CLIENT)

<!-- /Image simple not clickable -->
<img src="https://drive.google.com/uc?export=view&id=1w545XP1bMkhsYEM1hjafLjaUokhihncE"
/>

## Process Chunks

Let’s summarise where we are so far. We’ve extracted the text from your Word document, we’ve converted it into a markdown file, we’ve split that file into chunks of markdown text, and we’ve established we can connect to Claude using the Anthropic API.

In this step I send each chunk one by one, embedded into the prompt, to Claude. The prompt instructs Claude to make corrections to the chunk. For each chunk, Claude then processes it and sends the corrected chunk back to me. I collect these chunks back in order because I’m going to join them all together again into one big document.

In the code below, you'll see I got a bit carried away with exception handling and "friendly error messages." Then I gold-plated it further by showing a fancy progress bar to keep you entertained while the chunks are processing.

<!-- /Figure with caption 500px width-->
<figure>
  <figcaption>Processing Chunks: Send it, Fix it (Claude), Give it Back</figcaption>
  <a href="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/10-solution-picture-my-drawing-is-nicer.jpg"
     target="_blank">
    <img src="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/10-solution-picture-my-drawing-is-nicer.jpg"
         width="500"
         alt="Processing text chunks with Claude: Send, Fix, Get Back" />
  </a>
</figure>

In [None]:
def process_all_chunks(chunks: list[str]) -> list[str]:
    """
    Process all chunks of markdown text and show progress bar.

    :param chunks: List of markdown text chunks
    :return: List of processed text chunks (error message for failed chunks)
    """
    processed_chunks = []
    processing_desc = "Processing: sending text chunks to Claude for correction! "
    for i, chunk in tqdm(enumerate(chunks, 1), total=len(chunks), desc=processing_desc, bar_format='{l_bar}{bar} {n_fmt}/{total_fmt} Chunks'):
        result = process_one_chunk(i, chunk)
        processed_chunks.append(result)

    successful_chunks = sum(1 for chunk in processed_chunks if not chunk.startswith("ERROR - MISSING TEXT!"))
    total_chunks = len(chunks)

    if successful_chunks == total_chunks:
        print(f"Processing complete: {successful_chunks}/{total_chunks} chunks processed successfully.")
    else:
        print(f"Processing complete: Warning!! Only {successful_chunks}/{total_chunks} chunks were processed successfully.")

    return processed_chunks

def process_one_chunk(
    chunk_count: int,
    chunk: str,
    client: anthropic.Anthropic = MY_ANTHROPIC_CLIENT,
    model: str = CLAUDE_MODEL,
    temperature: float = PROMPT_TEMP,
    max_tokens: int = 4096 # This is Claude Sonnet's maxiumum size of text measured in tokens he is able to reply with.
) -> str:
    """
    Process a single chunk of markdown text using the Anthropic API.

    :param chunk_count: The index of the current chunk
    :param chunk: The markdown text to process
    :param client: The Anthropic client (default: MY_ANTHROPIC_CLIENT)
    :param model: The Claude model to use (default: CLAUDE_MODEL)
    :param temperature: The temperature setting for the model (default: PROMPT_TEMP)
    :param max_tokens: The maximum number of tokens in the response (default: PROMPT_MAX_RESPONSE_TOKENS)
    :return: Processed text or error message if processing failed
    """
    try:
        response = client.messages.create(
            model=model,
            max_tokens=max_tokens,
            temperature=temperature,
            messages=[
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "text",
                            "text": PROMPT_TEMPLATE.format(the_markdown_chunk=chunk)
                        }
                    ]
                }
            ]
        )
        if not response.content or not response.content[0].text:
            raise ValueError(f"Empty response from API for chunk {chunk_count}")

        return response.content[0].text

    except Exception as e:
        error_message = get_friendly_error_msg(e, chunk_count)
        print(error_message)  # Print error message for logging
        return error_message  # Return error message for failed chunks

def get_friendly_error_msg(exception: Exception, chunk_count: int) -> str:
    CHUNK_ERROR_TEMPLATE = "ERROR (chunk {}) was not processed because {}"

    error_msgs = {
        anthropic.AuthenticationError: "there's an issue with your Anthropic API key",
        anthropic.PermissionDeniedError: "your API key does not have permission to use the specified resource",
        anthropic.RateLimitError: "your account has hit a Rate limit. It's measured in requests/minute, tokens/minute, and tokens/day",
        anthropic.APITimeoutError: "Anthropic took too long to respond. Likely an issue on Anthropic's end.",
        anthropic.APIConnectionError: "failed to connect to Anthropic's API. Perhaps a network issues on your end",
        anthropic.APIStatusError: "Anthropic returned an unsuccessful status code",
        anthropic.APIError: "a general Anthropic API error has occurred internal to Anthropic’s systems.",
    }

    for error_type, msg in error_msgs.items():
        if isinstance(exception, error_type):
            error_msg = f"{msg}: {exception}"
            break
    else:
        if isinstance(exception, ValueError) and "Empty response" in str(exception):
            error_msg = f"Anthropic returned an empty response for chunk {chunk_count} (maybe we sent an empty chunk?): {exception}"
        else:
            error_msg = f"Unexpected error on chunk {chunk_count}: {exception}"

    return CHUNK_ERROR_TEMPLATE.format(chunk_count, error_msg)


### Do the work
processed_chunks = process_all_chunks(chunks=original_chunks)

<!-- /Image simple not clickable -->
<img src="https://drive.google.com/uc?export=view&id=1eT3y7P5zg-pbLGkXVXib7q9qFUAVMxMk"
/>

# <font color="#5830D9">**5. Post-Processing**</font> 🟣

At this stage, I've sent all the text chunks extracted from your Word document to Claude. He's processed each chunk as per the instructions in the prompt and sent each corrected chunk back to me. Now I need to reassemble these chunks and put them in a file format where the corrections are easy to spot (HTML).

## Reassemble Processed Chunks

This step is straightforward, just join the processed chunks back together. Since I collected the processed chunks from Claude in the same order I sent them, they fit right back into their original spots, following the flow of your Word document. The result? One big markdown file containing all the corrected text. It's essentially your original document, but now with Claude's corrections neatly incorporated.


In [None]:
def reassemble_chunks(chunks: list[str], output_file: Path) -> Path:
    """
    Reassemble processed chunks of text and save to an output file.

    This function joins the processed chunks with double newlines, removes
    excessive empty lines, trims leading and trailing whitespace, and saves
    the result to the specified output file.

    :param chunks: List of processed markdown text chunks
    :param output_file: Path to the output file where the reassembled text will be saved
    :return: Path to the output file containing the reassembled text
    """
    # Join chunks with double newlines
    reassembled_chunks = "\n\n".join(chunks)

    # Remove consecutive empty lines
    reassembled_chunks = re.sub(r'\n{3,}', '\n\n', reassembled_chunks)

    # Remove leading and trailing whitespace
    reassembled_chunks = reassembled_chunks.strip()

    # Save the corrected document
    output_file.write_text(reassembled_chunks, encoding='utf-8')

    return output_file


### Do the work
processed_md_file = reassemble_chunks(processed_chunks, processed_md_file)
print(f"Success!")
print(f"• All processed chunks from Claude have been saved into one file:")
print(f"  {processed_md_file.absolute()} ")

<!-- /Image simple not clickable -->
<img src="https://drive.google.com/uc?export=view&id=1GWaDYj-Sp_iuc0JfxgrwcjOODBtiFJfD"
/>

## Create Pretty HTML File

Now that I have a markdown file that contains all the processed chunks, we’re ready for this step. I convert this file into an HTML file so it's both easy to spot the corrections and pretty. All corrections are both bold and, now thanks to HTML, can also be in colour.  The original structure of your Word document has also been maintained.

**The code below** saves your corrected Word document as an HTML file into your Google Drive. It also downloads this file automatically to your computer; open it in any browser to see the corrections.

<!-- /Figure with caption 650px width-->
<figure>
  <figcaption>Getting Claude to style your corrected document</figcaption>
  <a href="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/13-asking-claude-to-style-the-corrected-document-html-file.jpg"
     target="_blank">
    <img src="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/13-asking-claude-to-style-the-corrected-document-html-file.jpg"
         width="650"
         alt="Getting Claude to style your corrected document." />
  </a>
</figure>

But my work isn’t done. I’m onto testing: the output file, the prompt itself, and my functional code.

In [None]:
def convert_markdown_to_html(markdown_file: Path, html_file: Path) -> Path:
    """
    Convert a Markdown file to HTML with custom styling and preprocessing.

    :param markdown_file: Path object representing the input Markdown file
    :param html_file: Path object representing the output HTML file
    :return: Path object of the output HTML file
    """
    # Read the Markdown file
    markdown_text = markdown_file.read_text(encoding='utf-8')

    # Convert custom Markdown strikethrough syntax to HTML, handling both regular and bold strikethroughs
    def process_strikethrough(match):
        content = match.group(2)
        is_bold = bool(match.group(1))
        if is_bold:
            return f"<strong><del>{content}</del></strong>"
        else:
            return f"<del>{content}</del>"

    strikethrough_pattern = r'(\*\*)?~~(.*?)~~(\*\*)?'
    markdown_text = re.sub(strikethrough_pattern, process_strikethrough, markdown_text)

    # Convert to HTML
    html_body = markdown(markdown_text)

    # Define CSS styles, including the improved responsive design
    css_styles = """
    <style>
        body {
            margin: 0 auto;
            padding: 0 5%;
            max-width: 50em;
            line-height: 1.5em;
            font-family: 'Inter', Arial, sans-serif;
            font-size: 16px;
            background-color: #F0EFEA;
            color: #141413;
        }

        @media (max-width: 768px) {
            body {
                padding: 0 3%;
            }
        }

        strong {
            color: #E46264;
        }
    </style>
    """

    # Create the full HTML document
    full_html = f"""
    <!DOCTYPE html>
    <html lang="en">
    <head>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, initial-scale=1.0">
        <title>Corrected Document</title>
        <link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;700&display=swap" rel="stylesheet">
        {css_styles}
    </head>
    <body>
        {html_body}
    </body>
    </html>
    """

    # Write the HTML to a file
    html_file.write_text(full_html, encoding='utf-8')

    return html_file

def download_drive_file(file_path: Path):
    file_path = Path(file_path)

    if not file_path.is_file():
        print(f"File not found: {file_path}")
        return

    # Download file directly from Google Drive
    files.download(str(file_path))

    print(f"• Browser download initiated...")


### Do the work
processed_html_file = convert_markdown_to_html(processed_md_file, processed_html_file)
print(f"Success!")
print(f"• Saved corrected Word document as: {processed_html_file.absolute()}")

download_drive_file(processed_html_file)
print(f"• Downloaded as: {processed_md_file.name}")
print(f"• Enjoy the corrections Word missed, we're done!")
print("🙂")

<!-- /Image simple not clickable -->
<img src="https://drive.google.com/uc?export=view&id=1ogNcRikmq1N4-QHiBIPym_bMMCbDHzv_"
/>

# <font color="#5830D9">**6. Testing**</font> 🟣

## Test plan with Claude

After conversations with Claude, I decided to divide testing into three key areas:

1. **Functional testing:** Verifies if specific code components, like markdown stripping, work correctly. This is traditional testing.
1. **Prompt testing:** A new concept for me, crucial for refining the prompt and ensuring desired outcomes.
1. **Output file testing:** Another new approach, focusing on preserving the original document's meaning in the corrected version.
Below are some example questions I asked Claude during this process.

<!-- /Figure with caption 750px width-->
<figure>
  <figcaption>Talking Testing: Deriving the test plan</figcaption>
  <a href="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/14-talk-testing-with-claude.jpg"
     target="_blank">
    <img src="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/14-talk-testing-with-claude.jpg"
         width="750"
         alt="Talking testing with Claude and designing a test plan." />
  </a>
</figure>

From there, I drew up my best guess of a test plan. The sections that follow will go into the details and implementation of each, but visually here it is:

<!-- /Figure with caption 750px width-->
<figure>
  <figcaption>Test Plan: Applying what I learnt from Claude</figcaption>
  <a href="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/15-sketched-out-test-plan-using-what-i-learnt.jpg"
     target="_blank">
    <img src="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/15-sketched-out-test-plan-using-what-i-learnt.jpg"
         width="750"
         alt="Sketched out test plan using what I learnt from Claude." />
  </a>
</figure>

---

## 6.1 Test Processed Doc

I tested my prompt, and Claude was quite dependable in retaining original semantic meaning when doing corrections. However, I still wanted an automated way to check that this holds true for every Word document corrected. So, I set up three different ways to evaluate the corrected text against your original text. These provide confidence that Claude has enhanced your writing without changing its core message, structure, or content.

Here's what I'm checking in each section that follows:

1. **Document structure:** Are the headings and paragraphs still in the right place?
1. **Document content - Word count:** Has the overall length changed dramatically?
1. **Document content - Semantic meaning:** Does the corrected text still mean the same as the original?

**"Michelle, so much testing is overkill,"** I hear you saying. I quite disagree, and here's why: Imagine I decide to switch to the free Meta Llama 3.1 model instead of Claude Sonnet. How would I know if it's performing as well as Claude did? What if I have 20 large Word documents to correct? It wouldn't be practical to manually check each one. What if my document is in Italian? Should I assume Claude will perform as well, or should I have an automated way of testing it? Lastly, just because I've thoroughly tested one document, that doesn't imply the same level of dependability for another as content varies vastly.

Manually testing for all these scenarios would be incredibly time-consuming. That's why these automated tests are so important to me. With very little effort, I can now have confidence in the quality of corrections across a wide range of documents and potential future changes.

### Structure Preservation

This checks whether the overall structure of your Word document, like headings and bullet lists, remains intact in the corrected document. It's like making sure the skeleton hasn't been rearranged. Keep in mind that Claude might combine some paragraphs during correction. For example, if you accidentally started a new paragraph mid-sentence in your Word document, Claude would probably fix that. So don't judge an imperfect match too harshly. It's just an indication that something has changed. Overall, I don’t expect that too much will change. This is what I am checking.

In [None]:
def evaluate_document_structure_preservation(original_file: Path, processed_file: Path) -> str:
    """
    Test if the document structure is preserved after processing.

    :param original_file: Path to the original markdown file
    :param processed_file: Path to the processed markdown file
    :return: A string summarizing the evaluation results
    """
    def count_markdown_elements(markdown_text: str) -> dict[str, int]:
        """
        Extract the structure of a markdown document.
        """
        lines = markdown_text.splitlines()
        element_counts = {
            'total_lines': len(lines),
            'headings': 0,
            'paragraphs': 0,
            'list_items': 0,
            'empty_lines': 0
        }

        heading_pattern = re.compile(r'^#+\s')

        for line in lines:
            if not line:
                element_counts['empty_lines'] += 1
            if heading_pattern.match(line):
                element_counts['headings'] += 1
            elif line.startswith(('- ', '* ')):
                element_counts['list_items'] += 1
            elif line.strip():  # Only count non-empty lines as paragraphs
                element_counts['paragraphs'] += 1

        return element_counts

    original_content = original_file.read_text(encoding='utf-8')
    processed_content = processed_file.read_text(encoding='utf-8')

    original_structure = count_markdown_elements(original_content)
    processed_structure = count_markdown_elements(processed_content)

    pass_fail = original_structure == processed_structure

    result_summary = (
        "Evaluate Document Structure Preservation: Counting Markdown elements (entire documents)\n"
        f"{'~' * 125}\n"
        f"{'Document structure match!' if pass_fail else 'Document structure mismatch(!)'}\n"
        f" Extracted from Word doc:     {original_structure}\n"
        f" Assembled processed chunks:  {processed_structure}"
    )

    return result_summary


### Do the work
result_eval_structure = evaluate_document_structure_preservation(original_md_file, processed_md_file)
print(result_eval_structure)

<!-- /Image Clickable -->
<a href="https://drive.google.com/file/d/1s_PLQf_wCArHzxG4LS6Ioc7SxVhak662/view?usp=drive_link"
   target="_blank">
  <img src="https://drive.google.com/uc?export=view&id=1s_PLQf_wCArHzxG4LS6Ioc7SxVhak662"
  />
</a>

### Content Preservation: Simple Word Count

Here, I do a quick comparison of word counts. If there's a big difference between the original and processed word count from Claude, it might mean the content has changed too much. It's a simple but effective first check that helps us spot any major unexpected changes in content. If you do see something unexpected, then it’s a flag to go and look. Overall, here too, I don’t expect word count to change by a lot, and this is what I am checking.

In [None]:
def word_count_comparison(
    original_texts:list[str],
    processed_texts: list[str],
) -> list[dict[str, str | int | float]]:
    """
    Compare word counts between lists of original and corrected texts.

    :param original_texts: List of original text contents
    :param processed_texts: List of corrected text contents
    :return: List of dictionaries containing word count comparison information
    """
    results = []
    for chunk_number, (original_text, processed_text) in enumerate(zip(original_texts, processed_texts), 1):
        original_count = len(original_text.split())
        processed_count = len(processed_text.split())

        # print("---ORIG")
        # print(original_text)
        # print("---PROCE")
        # print(processed_text)

        difference = processed_count - original_count
        percentage_diff = (difference / original_count * 100) if original_count else 0

        sign = '+' if difference >= 0 else ''

        results.append({
            "chunk_number": chunk_number,
            "similarity_test": "Word Count",
            "original_count": original_count,
            "corrected_count": processed_count,
            "difference": difference,
            "percentage_difference": percentage_diff,
            "message_tabular": f"Word Count    Original: {original_count:4d} | Processed: {processed_count:4d} | Difference: {difference:+4d}   {percentage_diff:+.1f}%",
            "message_arrow": f"Word Count: {original_count:4d} → {processed_count:4d}  |  {sign}{difference} words  {sign}{percentage_diff:.0f}%"
        })

    return results

def strip_off_markdown(chunks: list[str]) -> list[str]:
    """
    Remove markdown formatting, specifically bold and strikethrough, from a list of text chunks.

    This function processes each chunk to:
    1. Delete words with bold and strikethrough markdown formatting (**~~text~~**)
    2. Strip all other markdown formatting

    :param chunks: List of text chunks potentially containing markdown formatting
    :return: List of plain text chunks with markdown formatting removed
    """
    def remove_bold_strikethrough_words(text):
        # This pattern matches:
        # 1. Start of string or a single space/tab (captured)
        # 2. Bold and struck-through text (**~~any content~~**)
        # 3. Optional single space or tab at the end
        # 4. The first capturing group (1) is used in replacement, preserving leading space/tab if present
        pattern = r'(^|[ \t])(\*\*~~.*?~~\*\*)([ \t])?'
        return re.sub(pattern, r'\1', text)

    chunks_without_markdown = []
    for chunk in chunks:
        # Remove bold and strikethrough words because I prompted Claude not
        # to delete words, but rather bold strikethrough them. Avoid false postive, remove!
        text_without_bold_strikethrough = remove_bold_strikethrough_words(chunk)

        plain_text = strip_markdown(text_without_bold_strikethrough)

        chunks_without_markdown.append(plain_text)

    return chunks_without_markdown

def evaluate_content_preservation_by_word_count(
    original_chunks: list[str],
    processed_chunks: list[str],
) -> str:
    """
    Test content preservation by comparing word counts of original and processed chunks.

    :param original_chunks: List of original text chunks
    :param processed_chunks: List of processed text chunks
    :return: A string summarizing the evaluation results
    """
    results = word_count_comparison(original_chunks, processed_chunks)

    result_summary = "Evaluate Content Preservation: Word Count chunk to chunk (markdown syntax stripped out)\n"
    result_summary += "~" * 96 + "\n"

    for result in results:
        result_summary += f"Chunk {result['chunk_number']:3d}    {result['message_tabular']}\n"

    return result_summary


### Do the work
original_chunks_clean = strip_off_markdown(original_chunks)
processed_chunks_clean = strip_off_markdown(processed_chunks)
result_eval_content_word_count = evaluate_content_preservation_by_word_count(original_chunks_clean, processed_chunks_clean)
print(result_eval_content_word_count)

<!-- /Image Clickable -->
<a href="https://drive.google.com/file/d/1pCeIR4bIkPAw5OC0fnOlFfbx7_RFWGKe/view?usp=drive_link"
   target="_blank">
  <img src="https://drive.google.com/uc?export=view&id=1pCeIR4bIkPAw5OC0fnOlFfbx7_RFWGKe"
  />
</a>

### Content Preservation: Semantic Similarity

This is the most sophisticated check of the three, diving into the actual semantic similarity of the text. I wanted to ensure that Claude hadn't changed the original "meaning" of the text when making corrections.

Measuring semantic similarity can be complex, with many approaches available. I posted on a few discussion forums, and found many conflicting opinions. So I asked Claude to brainstorm options and generate example test data. Based on these results, I chose the sentence-transformers library, as it provided the most sensible scores against the dummy test data.

Why this method? It's like having a well-read assistant who understands context and nuance, not just individual words. You can express the same idea in multiple ways, and this check ensures that the corrected text conveys the same meaning as the original. While not as sophisticated as models like Claude Sonnet or GPT-4, it effectively verifies that Claude retained the "original meaning" during corrections, as instructed in the prompt.

When using sentence-transformers, you need to specify a language model for text comparison. I consulted Claude for options and selected the second-largest LLM on the list, "paraphrase-multilingual-mpnet-base-v2." This model offers high accuracy across multiple languages, including English and German. Claude suggested the second option in the list below, which is half the size. I declined.

<!-- /Figure with caption 800px width-->
<figure>
  <figcaption>Analysis Paralysis: Narrowing down  model options for testing semantic similarity</figcaption>
  <a href="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/16-choosing-a-semantic-similarity-large-language-r2.jpg"
     target="_blank">
    <img src="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/16-choosing-a-semantic-similarity-large-language-r2.jpg"
         width="800"
         alt="Asking Claude for language model options to use with Sentence Transformer"/>
  </a>
</figure>

Here's how Sentence-Transformer works:

1. It loads the language model I selected
1. Converts text chunks into number vectors (a way computers understand text)
1. Compares the cosine angle between the two vectors in the given chunk pair (original and Claude-corrected chunk)

The smaller the angle between these vectors are in dimensional space, the higher the similarity score (that is, the closer their "meaning"). A score of 100% within a "chunk pair" means identical meaning, while 0% indicates completely different content. I generally look for scores above 80% to be confident Claude has preserved the original meaning while making corrections. A 70% score might indicate that the general topic is the same, but some details or nuances have changed. It's not necessarily bad, but it signals that a closer look at that chunk may be needed.

<!-- /Figure with caption 600px width-->
<figure>
  <figcaption>Semantic Similarity Visualised: Smaller angles = More similar meanings</figcaption>
  <a href="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/17-testing-for-semantic-similarity-(meaning)-explained.jpg"
     target="_blank">
    <img src="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/17-testing-for-semantic-similarity-(meaning)-explained.jpg"
         width="600"
         alt="Semantic Similarity Visualised: Smaller angles = More similar meanings"/>
  </a>
</figure>

I also look for differences between all the "chunk pairs." For example, if all score above 80% but one pair scores, say, 60%, that's a flag to take a quick look. It means the corrected chunk in that pair has changed much more compared to the original than in the other pairs. This isn't necessarily bad—perhaps Claude fixed a poorly written section quite substantially—but it's worth examining.

Remember, these scores are guides, not absolute judgements. They help you focus on areas that might need attention, ensuring Claude has enhanced your text while staying true to the original message.

**The code below** displays the similarity scores for each chunk pair, along with the beginning of each chunk to help you quickly locate it in your original Word document. **Don't worry about the warnings**; they just occur the first time the semantic model loads and don't affect similarity scoring.


In [None]:
def sentence_transformer_similarity(texts1: list[str], texts2: list[str]) -> list[dict]:
    """
    Compare similarity between lists of original and corrected texts using sentence transformers.

    :param texts1: List of original text contents
    :param texts2: List of corrected text contents
    :return: List of dictionaries containing similarity comparison information for each chunk
    """
    steps = [
        "Step 1/3 - Firing up a large language model... to be able to understand text for comparison",
        "Step 2/3 - Turning text chunks into number vectors... makes it possible to compare text",
        "Step 3/3 - Comparing original text chunks vs. Claude's corrections... super quick with these vectors"
    ]
    with tqdm(total=3, desc=steps[0].split("...", 1)[0], bar_format='{l_bar}{bar} {n_fmt}/{total_fmt}', mininterval=2.0) as pbar:
        print()
        for i, step in enumerate(steps, 1):
            if i == 1:
                model = SentenceTransformer(SIMILARITY_LARGE_LANGAUGE_MODEL)
            elif i == 2:
                embeddings1, embeddings2 = model.encode(texts1), model.encode(texts2)
            elif i == 3:
                similarities = cosine_similarity(embeddings1, embeddings2)

            print(f"DONE!\t🌸{step}")  # Print the full step description after performing the operation

            pbar.update(1)

            if i < 3:  # Update description for next step, except for the last iteration
                pbar.set_description(f"{steps[i].split('...', 1)[0]}")
        print()

    similarity_scores = np.minimum(np.diag(similarities) * 100, 100)

    return [
        {
            "chunk_number": i,
            "similarity_test": "Sentence Transformer",
            "similarity_score": score,
            "message": f"Similarity Score \t({score:.0f}%) Original <> Processed"
        }
        for i, score in enumerate(similarity_scores, 1)
    ]

def evaluate_content_preservation_by_similarity(original_chunks: list[str], processed_chunks: list[str]) -> str:
    """
    Evaluate content preservation by comparing meaning similarity of original and processed chunks.

    :param original_chunks: List of original text chunks
    :param processed_chunks: List of processed text chunks
    :return: A string summarizing the evaluation results, including similarity scores for each chunk
    """
    results = sentence_transformer_similarity(original_chunks, processed_chunks)

    result_summary = "Evaluate Content Preservation: Semantic Meaning chunk to chunk (markdown syntax stripped out)\n"
    result_summary += "~" * 96 + "\n"

    for result in results:
        chunk_num = f"{result['chunk_number']:>3}"
        score = f"{result['similarity_score']:.0f}%"
        result_summary += f"Chunk {chunk_num}   Similarity Score (original<>processed):    {score:>4}\n"

    return result_summary


### Do the work
result_eval_content_similarity = evaluate_content_preservation_by_similarity(original_chunks_clean, processed_chunks_clean)
print(result_eval_content_similarity)

<!-- /Image Clickable -->
<a href="https://drive.google.com/file/d/1oIIo6cR-Ve1NhmR_0jZ2ey71sz1vq9lL/view?usp=drive_link"
   target="_blank">
  <img src="https://drive.google.com/uc?export=view&id=1oIIo6cR-Ve1NhmR_0jZ2ey71sz1vq9lL"
  />
</a>

### All Results

See all results for evaluating the output file in one place so that it is easier to make connections.

In [None]:
print(f"{result_eval_structure}\n\n")
print(f"{result_eval_content_word_count}\n")
print(f"{result_eval_content_similarity}")

print("\n\nREFERENCE - Use 'Chunk Start' for finding chunks in original Word document**")
print_chunk_table(original_chunks)

<!-- /Image: Thumbnail different to full image -->
<a href="https://drive.google.com/file/d/1elh0pCwdRIhOz7fTQdVjCHXHb1n3SsDz/view?usp=drive_link"
target="_blank">
<img src="https://drive.google.com/uc?export=view&id=1CnfqsDxoDy6z3SPXLVh7hEDJe5AgYXV0"
/>
</a>

- Uncomment the code below to investigate a particular chunk

In [None]:
# For example, say chunk 4 had an explotion in Word count and a low similarity score
# Take a look at both the original and processed chunks to see what is going on:

# print_chunks(original_chunks, [4])  # Uncomment me to run me
# print_chunks(processed_chunks, [4]) # Uncomment me to run me

<a name="id-test-my-prompt"></a>

## 6.2 Test My Prompt

Creating the perfect instructions (or "prompt") for Claude is a mix of creativity and precision. It took me a bit of trial and error to refine the prompt to get Claude to perform as I wanted him to. As I was tweaking my prompt, I kept manually retesting the same things. That is why I wrote these tests – to automate my manual tests so I could tweak faster with more confidence. They send example chunks to Claude to ensure that no matter the tweak, he is still doing the  very specific things I care about, like correcting inappropriate word choice or bolding a correction using markdown format.

For more encompassing testing, of both Claude’s prompt and code together, see the section 'Evaluate the output file.' In these prompt tests, I isolate the testing to only the prompt. None of my other code is involved in any sort of way. That was important.

**You'll see in the output below that there are failing tests**. Try as I might, I just couldn't get Claude to consistently detect British English from American English and make the appropriate corrections. As someone who prefers the Queen's English, I spent many hours trying to tweak my prompt. Finally, I gave up and removed all associated prompt instructions. I've given up for now. Claude must be American.


In [None]:
# Just a little helper function to help my testing
def run_prompt_or_code_test(test_func):
    test_name = test_func.__name__
    try:
        test_func()
        print(f"✅PASSED TEST: {test_name}")
    except AssertionError as e:
        print(f"🛑FAILED TEST: {test_name}: {str(e)}")
    except Exception as e:
        print(f"ERROR: {test_name} raised an unexpected error: {str(e)}")

In [None]:
def show(test_case: dict[str, str], processed: str) -> str:
    return f"""
    Sent to Claude: {test_case['test']}
    Expected Back:  {test_case['expect']}
    Actual Back:    {processed}
    """

def test_no_additional_commentary():
    test_case = {
        "test": "This is a test sentence.",
        "expect": "This is a test sentence."
}
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert len(processed.split('\n')) == 1, "Additional lines were added to the text" + show(test_case, processed)
    assert processed == test_case['expect'], "Text was added when it shouldn't have been" + show(test_case, processed)

def test_structure_maintenance():
    test_case = {
        "test": "Paragraph 1.\n\nParagraph 2.\n\nParagraph 3.",
        "expect": "Paragraph 1.\n\nParagraph 2.\n\nParagraph 3."
    }
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert processed.count('\n\n') == 2, "Paragraph structure was not maintained" + show(test_case, processed)
    assert processed == test_case['expect'], "Text was changed when it shouldn't have been" + show(test_case, processed)

def test_markdown_preservation():
    test_case = {
        "test": "# Heading 1\n## Heading 2\n* List item 1\n* List item 2",
        "expect": "# Heading 1\n## Heading 2\n* List item 1\n* List item 2"
    }
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert processed == test_case['expect'], "Markdown formatting was not preserved" + show(test_case, processed)

def test_spelling_correction():
    test_case = {
        "test": "This senteence has a spelling error.",
        "expect": "This **sentence** has a spelling error."
    }
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert processed == test_case['expect'], "Exact expected correction marking not made" + show(test_case, processed)

def test_dupe_words_are_bold_strike():
    test_case = {
        "test": "This dog dog has blue eyes.",
        "expect": "This **~~dog~~** dog has blue eyes."
    }
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert processed == test_case['expect'], "Duplicate word was not strikethrough bold" + show(test_case, processed)

def test_language_detection_german():
    test_case = {
        "test": "Das ist ein Test. Es enthällllt einige Fehler.",
        "expect": "Das ist ein Test. Es **enthält** einige Fehler."
    }
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert processed == test_case['expect'], "German text was not correctly identified or corrected" + show(test_case, processed)

def test_grammatical_corrections():
    test_case = {
        "test": "She buyed some milk and go home.",
        "expect": "She **bought** some milk and **went** home."
    }
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert processed == test_case['expect'], "Grammatical errors were not corrected" + show(test_case, processed)

def test_inappropriate_word_choice_easy():
    test_case = {
        "test":   "He garnished support for perspective employees. His inciteful comments added value.",
        "expect": "He **garnered** support for **prospective** employees. His **insightful** comments added value."
    }
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert processed == test_case['expect'], "Inappropriate word choice was not corrected as expected:" + show(test_case, processed)

def test_inappropriate_word_choice_creative():
    test_case = {
        "test":   "She was literally dying of embarrassment as her speech was bad.",
        "expect": "She was dying of embarrassment as her speech was bad."
    }
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert "literally" not in processed, "Inappropriate word choice 'literally' was not removed" + show(test_case, processed)
    assert "bad" not in processed, "Inappropriate word choice 'bad' was not removed" + show(test_case, processed)

def test_remove_bad_phrase_1():
    test_case = {
        "test":   "The cat quickly ran fastly across the room.",
        "expect": "The cat **~~quickly~~** ran **~~fastly~~** across the room."
    }
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert " quickly ran fastly " not in processed, "Bad phrase 'quickly ran fastly' not corrected" + show(test_case, processed)

def test_remove_bad_phrase_2():
    test_case = {
        "test":   "John forgot to remember to bring his lunch to work today.",
        "expect": "John **~~forgot to remember to~~** **forgot to** bring his lunch to work today."
    }
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert " forgot to remember " not in processed, "Bad phrase 'forgot to remember' not corrected" + show(test_case, processed)

def test_boldstrike_redundant_words():
    test_case = {
        "test":   "She always never fails to disappoint her team with her exceptional work.",
        "expect": "She **~~always~~** never fails to **~~disappoint~~** **impress** her team with her exceptional work."
    }
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert "**~~always~~**" in processed, "Redundant word 'always' not boldstrike" + show(test_case, processed)

def test_detect_and_correct_british():
    test_case = {
        "test": "The centre's staff analysed the colhour behaviour.",
        "expect": "The centre's staff analysed the **colour** behaviour."
    }
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert processed == test_case['expect'], "British spelling was not preserved" + show(test_case, processed)

def test_detect_and_correct_american():
    test_case = {
        "test": "The colourful catalog analyzed labor practices.",
        "expect": "The **colorful** catalog analyzed labor practices."
    }
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert processed == test_case['expect'], "American spelling was not preserved" + show(test_case, processed)

def run_all_prompt_tests():
    tests = [
        test_no_additional_commentary,
        test_structure_maintenance,
        test_markdown_preservation,
        test_spelling_correction,
        test_dupe_words_are_bold_strike,
        test_language_detection_german,
        test_grammatical_corrections,
        test_inappropriate_word_choice_easy,
        test_inappropriate_word_choice_creative,
        test_remove_bad_phrase_1,
        test_remove_bad_phrase_2,
        test_boldstrike_redundant_words,
        test_detect_and_correct_british,
        test_detect_and_correct_american
    ]
    print("Test my Prompt to Claude: does he correct like my prompt asked him?")
    print("~" * 70)
    for test in tests:
        run_prompt_or_code_test(test)


### Do the work
run_all_prompt_tests()

<!-- /Image: Clickable -->
<a href="https://drive.google.com/file/d/1g2nKmuMsNlAC5Za8iDi_0gIX1r9f5MZQ/view?usp=drive_link"
   target="_blank">
  <img src="https://drive.google.com/uc?export=view&id=1g2nKmuMsNlAC5Za8iDi_0gIX1r9f5MZQ"
  />
</a>


## 6.3 Test My Code

These tests ensure my own code is working, regardless of how Claude performs. They use test data to check the parts of my code I was specifically worried about. By confirming my code works as intended, I can be confident that any issues in the final output are due to the prompt or Claude's response, not my code. They also become wonderfully useful when I asked Claude to refine my code.

### Word Doc Extraction

This test checks that the `extract_docx_paragraphs` function correctly pulls out all non-blank paragraphs from a Word document. I created a test.docx file just for this test.


In [None]:
def test_extract_docx_paragraphs():

    test_name = test_extract_docx_paragraphs.__name__

    def create_word_docx_for_testing(file_name):
        document = docx.Document()

        # Are extracted from word doc
        document.add_heading('Headings without levels are Paragraphs', 0)
        document.add_heading('Headings at level 3 are Paragraphs', level=3)
        document.add_paragraph('Plain paragraphs are Paragraphs')
        document.add_paragraph('Intense quotes are Paragraphs', style='Intense Quote')
        document.add_paragraph('Lists bullet point items are Paragraphs', style='List Bullet')
        document.add_paragraph('', style='List Bullet') # Even if empty, the bullet is made for the list
        document.add_paragraph('', style='List Number') # Even if empty, the number is made for the list

        # Aren't extracted from word doc
        document.add_paragraph('') # Empty paragraphs
        table = document.add_table(rows=1, cols=2) # Tables
        table.rows[0].cells[0].text = "1st cell, left" # Tables (empty or non-empty)
        document.add_page_break() # Page breaks

        document.save(file_name)
        return file_name

    docx_file = create_word_docx_for_testing('test.docx')

    actual = extract_docx_paragraphs(docx_file)

    expected = [
        {'text': 'Headings without levels are Paragraphs', 'style': 'Title', 'heading_level': None, 'word_count':5},
        {'text': 'Headings at level 3 are Paragraphs', 'style': 'Heading 3', 'heading_level': 3, 'word_count':6},
        {'text': 'Plain paragraphs are Paragraphs', 'style': 'Normal', 'heading_level': None, 'word_count':4},
        {'text': 'Intense quotes are Paragraphs', 'style': 'Intense Quote', 'heading_level': None, 'word_count':4},
        {'text': 'Lists bullet point items are Paragraphs', 'style': 'List Bullet', 'heading_level': None, 'word_count':6}
    ]

    Path(docx_file).unlink() # delete file

    assert len(actual) == len(expected), f"Expected {len(expected)} paragraphs, actual paragraphs: {len(actual)}"

    for i in range(len(expected)):
        assert actual[i] == expected[i], f"\n   Expected:\n    {expected[i]}\n   Actual:\n    {actual[i]}"



### Do the work
run_prompt_or_code_test(test_extract_docx_paragraphs)

<!-- /Image simple not clickable -->
<img src="https://drive.google.com/uc?export=view&id=1oMvMZ89I9syAxCp9Fc6QELzddHQ58A2K"
/>


### Create Markdown File

This test makes sure that the `create_simple_markdown_file` function correctly turns the extracted Word document paragraphs into a markdown file.

In [None]:
def test_docx_paragraphs_to_markdown():
    # Sample document content
    docx_paras = [
        {'text': 'Test Heading any', 'style': 'Heading 5', 'heading_level': 5},
        {'text': 'Test paragraph 1', 'style': 'Normal', 'heading_level': None},
        {'text': 'Test bullet point A', 'style': 'List (Paragraph)', 'heading_level': None},
        {'text': 'Test bullet point B', 'style': 'List (any Paragraph whose Style begins with List)', 'heading_level': None},
        {'text': 'Anything else', 'style': 'Any other Paragraph style that is not in the above', 'heading_level': None}
    ]

    expected = "##### Test Heading any\n\nTest paragraph 1\n\n- Test bullet point A\n- Test bullet point B\n\nAnything else"

    created = create_simple_markdown_file(docx_paras, Path('temporary_test_markdown_file.md'))
    created_content = created.read_text(encoding='utf-8')
    created.unlink() # Delete test file

    assert created_content == expected, f"""
---------- Expected ------------------------------:
[{expected}]
---------- Actual --------------------------------:
[{created_content}]
--------------------------------------------------"""


### Do the work
run_prompt_or_code_test(test_docx_paragraphs_to_markdown)

<!-- /Image simple not clickable -->
<img src="https://drive.google.com/uc?export=view&id=1-VvQd5urHNo3yq4yYr9HlOquPZcVU2F9"
/>

### Strip Off Markdown

This test checks that the `strip_off_markdown` function removes Markdown formatting from a list of text chunks, with special handling for bold and strikethrough text. I designed this function to prepare text for semantic meaning comparison, because I wanted to only compare the actual writing (not the markdown elements).


In [None]:
def test_strip_off_markdown():

    test_name = test_strip_off_markdown.__name__

    processed_chunks = [
        # Removing bold strikethrough and using simple single spacing
        "This **~~word here~~** is deleted leaving a single white space.",
        "This    **~~word here~~**is deleted but spaces other spaces remain the same.",
        "Multiple **~~bold strikethrough~~** words **~~are removed~~** in this sentence.",
        "Preserve newlines\nwhile **~~removing~~**\nbold strikethrough.",
        # "Should not change trailing if it's not a white space, but doesn't work **~~end~~**.",

        # Converting markdown
        "# Title\nThis is some text with **bold** formatting.",
        "Para1\n\nPara2\n\n\n\nPara3",
        "* List item A\n* List item B **bold** text",
        "1. This is the first item\n2. The second item with *italics*",
        "## This is a heading\n\nThis is a paragraph with a link [link text](https://www.example.com).\n\n\n**Bold text** too.",
        "Peter**'**s horse"
    ]

    expected = [
        # Removing bold strikethrough and using simple single spacing
        "This is deleted leaving a single white space.",
        "This    is deleted but spaces other spaces remain the same.",
        "Multiple words in this sentence.",
        "Preserve newlines\nwhile \nbold strikethrough.",
        # "Should not change trailing if it's not a white space, but doesn't work.",

        # Converting markdown
        "Title\nThis is some text with bold formatting.",
        "Para1\nPara2\nPara3",
        "\nList item A\nList item B bold text\n",
        "\nThis is the first item\nThe second item with italics\n",
        "This is a heading\nThis is a paragraph with a link link text.\nBold text too.",
        "Peter's horse"
    ]

    processed_chunks_clean = strip_off_markdown(processed_chunks)
    for index, (expect, got) in enumerate(zip(expected, processed_chunks_clean), start=1):
        assert got == expect, f"\n-Expected: [{expect}]\n-Actual:   [{got}]"


### Do the work
run_prompt_or_code_test(test_strip_off_markdown)

<!-- /Image simple not clickable -->
<img src="https://drive.google.com/uc?export=view&id=1Z-5ctTpDrUXgx2mfny9DdF5HHAXiQeAr"
/>

# **FAQ** 🟢

**Why not just use Microsoft Word to do corrections?**

- Word misses many mistakes as this Notebook taught me. In fact, I ran a few documents through this Notebook from previous lives that I had been very proud of. I was quite shocked, and not so proud now.

**Why not just use the regular Claude chat to do corrections?**

- That’s a great option for shorter texts. But I had a 40,000-word long document to correct. Using the regular chat would mean section by section correction into your own Word document. Having everything on one page, an HTML page with colour, was a lot easier for my sister’s friend to use.

**Was it really just you and Claude?**

- Yes. I was truly amazed at what I could do with him by my side.

**How did you know how to ask Claude questions?**

- I asked Claude questions like I would ask a colleague. I got better with time. I read Anthropic’s [Prompt Engineering Overview](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview) and [Prompt Engineering for Business](https://www.anthropic.com/news/prompt-engineering-for-business-performance) articles.
- I discovered the [Anthropic Prompt Generator](https://console.anthropic.com/dashboard) and it taught me what a good prompt looks like. It's marvellous.
- This excellent excerpt summed up asking questions (“prompt engineering”) beautifully for me: <font color='grey'>_So ultimately, the art of prompt engineering is about understanding how to navigate the vast probabilistic landscape of the language model’s knowledge to narrow down the path to the specific information or behaviour we seek.</font> Reference: [Patterns of Application Development using AI](https://obie.medium.com/patterns-of-application-development-using-ai-fbb660fa9ae7)_

**Did you use Claude to write up this Notebook?**

- No, I would feel fake. I wrote it first in Microsoft Word, then I used this Notebook to correct it

**How did you write such an elaborate prompt for this Notebook?**

- I cheated and started with the Grammar Genie recipe which I found in the [Anthropic Prompt Library]( https://docs.anthropic.com/en/prompt-library/library).
- I strengthened that using the [Anthropic Prompt Generator]( https://console.anthropic.com/dashboard)
- And tweaked some more with Claude by asking him to evaluate my prompt and generate test cases.

**What is a prompt temperature and why did you set it to zero?**

- Prompt temperature is a setting in language models that controls the randomness of generated text. A lower temperature (close to 0) makes outputs more focused and deterministic, while a higher temperature (close to 1) makes outputs more diverse and creative.
- I used zero because I didn’t want Claude to diverge from the original meaning of the text. I did not use any other setting, but now I wonder if perhaps Claude would correct “inappropriate word choice” with nicer words given a higher temperature setting.

**What was the most difficult thing to get working in the prompt?”**

- First, it was getting Claude to mark only corrections in ** BOLD **. But nothing compared to trying to get Claude to detect either American or British spelling and make corrections accordingly. I gave up (for now). I've got the [failing prompt tests](#id-test-my-prompt) to prove it.

**What surprised you most?**

- How little I knew and yet I still built this Notebook, followed by how much I ~~learned~~ learnt.
- How great Claude is for expanding your thinking and how well he codes if you ask him the right way and then get him to refine it.
- The number of writing mistakes Microsoft Word misses (I really was surprised).
- That tiny little shifts in word placement and white space in the prompt can change the results.
- That the [Anthropic Workbench]( https://console.anthropic.com/workbench) with the exact same prompt and prompt settings, would work for American / British spelling distinction but I could never get it to work in my Notebook prompt (yes, the prompt was exactly a copy paste).
- That a prompt can work for a short piece of text, but not a longer one. This gave me overconfidence in my “Test My Prompt” tests.

**How can I correct the markdown in my own python notebook?**
1. `jupyter nbconvert --to markdown --TemplateExporter.exclude_code_cell=True AAA.ipynb`
1. `pandoc -f markdown -t docx -o AAA.docx AAA.md`
1. Open AAA.docx and rename the style "compact" to "List hello"
1. Save AAA.docx to your google drive and run this Notebook against it

**What else did you learn?**

- The Anthropic Workbench is a tremendous place to craft and test your prompt.
- That just like regular machine learning, the work is in the pre-processing.
- It's easier to create notebooks in VSCode and then copy to Google Colab.
- It’s the back-and-forth conversation with Claude where the value is.
- Claude doesn't think if you tell him to think silently. After months of doing this in effort to limit my chat size, in the [Anthropic tutorial](https://github.com/anthropics/prompt-eng-interactive-tutorial/blob/master/Anthropic%201P/06_Precognition_Thinking_Step_by_Step.ipynb) I found out: <font color='grey'>_"Thinking only counts when it's out loud. You cannot ask Claude to think but output only the answer - in this case, no thinking has actually occurred."_</font> What an enormous blunder.
- I also learnt this the hardway, don't even think about it, use [XML tags](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags) always. If in doubt, or you're too lazy to type those pointy brackets like I was, take a look [here](https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/21-Proof-to-convince-you-to-always-use-xml-tags.jpg) I'll prove it to you with a picture.

**What will you always do from here on?**

- Write tests for my prompts.
- Draw a line at asking for opinions and instead look for myself (which is what I had to do when choosing how to test semantic similarity).
- Be more aware when I switch my brain off, and ask Claude questions I could have worked out myself. It's faster to use your own brain.

**Where to from here?**

- Since the Llama 3.1 model is free and apparently performs exceptionally well… I think I’m going to take a look down that way.
- A little project to find out what agents and [Agentic Workflows]( https://www.youtube.com/watch?v=sal78ACtGTc) are because Andrew Ng loves them.
- Continue AWS upskilling and figure out a way to spend my $600 free credit in 2 months before it expires.

**Ending questions**

- **Were you surpised at the power of Claude Sonnet?**<br> Blown away. You really have to use it to truly appreciate it, again, blown away.
- **Will you ever get paid to do technical things?**<br> No, I’d feel too guilty. I’m more valuable as a Product Manager.
- **What is your background?**<br> Applied Statistics. Lean. Product Management. Business. Six sigma. Drawing stickmen and connecting dots.
- **What do you love most about Generative AI and Machine Learning in general?**<br> The sheer expansiveness of creativity and the knife's edge of pragmatism.

# What did Claude think? 🟢

He thought I was fantastic too. Here's his analysis of my notebook:

<a href="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/18-asking-claude-to-analyse-my-entire-notebook.jpg"
target="_blank">
  <img src="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/18-asking-claude-to-analyse-my-entire-notebook.jpg" width=700px
  />
</a>

<font color="#D97757">Thank you for providing the detailed markdown export of your Google Colab notebook. I've carefully analysed the content following your provided steps. Here's a summary of my understanding and analysis:</font>

**Summary**

<font color="#D97757">This notebook, titled "Fix Word Doc Claude," is an AI-powered proofreading tool that uses Anthropic's Claude Sonnet model to correct and enhance Word documents. It's designed to catch subtle language and style errors that often slip past Microsoft Word's spell-checker. The notebook can handle large documents (up to 42,000 words) in multiple languages, including English, German, Italian, and French.
The notebook is structured into several main sections:</font>

**About and Usage**

1. <font color="#D97757">**Setup** (Python libraries and configuration)
1. **Pre-processing** (extracting text from Word, converting to markdown, splitting into chunks)
1. **Processing** (sending chunks to Claude for correction)
1. **Post-Processing** (reassembling chunks, creating HTML output)
1. **Testing** (evaluating output, testing prompt, and testing code)</font>

<font color="#D97757">The notebook not only serves as a functional tool but also as a demonstration of how Claude can assist in creating complex tools with basic coding skills. It showcases the author's journey in AI development, highlighting the collaboration between a human with limited coding experience and an AI assistant.</font>

**Quality Analysis**

1. <font color="#D97757">**Comprehensiveness:** The notebook is exceptionally detailed, covering all aspects from setup to testing. It provides a complete solution for document correction.
1. **Educational Value:** The author explains each step thoroughly, making it accessible to non-technical users. The notebook serves as an excellent learning resource for AI-assisted development.
1. **Code Quality:** The code is well-structured and commented. It includes error handling and user-friendly messages, showing attention to user experience.
1. **Testing:** The notebook includes extensive testing, covering output evaluation, prompt testing, and code testing. This demonstrates a commitment to reliability and quality.
1. **Innovation:** The project creatively uses Claude for both development assistance and as the core of the correction tool, showcasing innovative use of AI.
1. **Documentation:** The notebook is well-documented, with clear explanations of each step, making it easy for others to understand and potentially adapt the code.
1. **Reflective Learning:** The FAQ section provides valuable insights into the development process and lessons learned, adding depth to the project.</font>

**Artefacts Created**

- <font color="#D97757">Created Github [README](https://github.com/michellepace/ai-ml-notebook-explorations/blob/main/notebook-01/README-notebook-01.md) file
- Created [Notebook Workflow Picture](https://github.com/michellepace/ai-ml-notebook-explorations/blob/main/notebook-01/notebook-workflow-picture.md)

Overall, this is a high-quality notebook that not only achieves its primary goal of document correction but also serves as an excellent example of AI-assisted development and a learning resource for others interested in similar projects.</font>

<br>

**What about the author?**

<!-- /Figure with caption full width-->
<figure>
  <figcaption></figcaption>
  <a href="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/19-what-claude-thought-of-the-notebook-author.jpg"
     target="_blank">
    <img src="https://michellepace.github.io/ai-ml-notebook-explorations/notebook-01/images-embedded/19-what-claude-thought-of-the-notebook-author.jpg"
         alt="What Claude thought of this Notebook author" />
  </a>
</figure>

<br>

<font size="3">thanks claude, me and you then.</font>

<font size="8"><b>The End.</b></font>