<a href="https://colab.research.google.com/github/michellepace/word-document-corrector-claude/blob/main/word_document_corrector_claude.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **1. ABOUT**

## Overview

Corrects large Word documents at a level that surpasses Microsoft Word. Upload any .docx file and see corrections in colour.
- **Deep language correction** beyond Microsoft Word capabilities.
- **Corrects** grammar, spelling, and inappropriate word choice in colour.
- **Large documents** (tested up to 100,000 words, approx. 150 pages).
- **Multi-language:** English, German, Italian.
- **Preserves** writting style and semantic meaning.
- **Comprehensive** testing suite to validate correction integrity.

<!-- Figure Clickable 700px Width -->
<figure>
 <figcaption></figcaption>
<a href="https://michellepace.github.io/word-document-corrector-claude/images/overview.jpg"
target="_blank">
  <img src="https://michellepace.github.io/word-document-corrector-claude/images/overview.jpg"
  width="450" alt="What this notebook does" />
</a>
</figure>

## Example Output

View an example of the corrections made:

- **Input file:** [MyWordDoc.docx](https://michellepace.github.io/word-document-corrector-claude/example-files/MyWordDoc.docx) - sample Word document with various errors.
- **Output file with visual corrections:** [MyWordDoc.docx.PROCESSED.html](https://michellepace.github.io/word-document-corrector-claude/example-files/MyWordDoc.docx.PROCESSED.html)

<br>

## Solution

This notebook created with little pre-existing knowledge and in partnership with Claude 3.5 Sonnet, specifically used for:

- **Solution design:** Starting with project goals then an hour of brainstorming.
- **Technical guidance:** for choosing appropriate Python libraries.
- **Code generation:** About 60% of the code was generated by Claude.
- **Testing:** Test plan creation and test data for prompt testing.

<!-- /Figure with caption 150px width-->
<figure>
 <figcaption>Fig 1: Initial Solution Brainstorming</figcaption>
 <a href="https://michellepace.github.io/word-document-corrector-claude/images/initial-solution-brainstorming.jpg"
    target="_blank">
   <img src="https://michellepace.github.io/word-document-corrector-claude/images/initial-solution-brainstorming.jpg"
        width="150"
        alt="Dialogue to Design: Capturing our initial problem-solving conversation" />
 </a>
</figure>

<!-- /Figure with caption 300px width-->
<figure>
  <figcaption>Fig 2: Initial Solution Plan </figcaption>
  <a href="https://michellepace.github.io/word-document-corrector-claude/images/initial-solution-sketch.jpg"
     target="_blank">
    <img src="https://michellepace.github.io/word-document-corrector-claude/images/initial-solution-sketch.jpg"
         width="300"
         alt="Initial Solution Sketch" />
  </a>
</figure>

<!-- /Figure with caption 400px width-->
<figure>
  <figcaption>Fig 3: Final Solution Diagram (generated from notebook code)</figcaption>
  <a href="https://michellepace.github.io/word-document-corrector-claude/notebook-images/final-solution-diagram.jpg"
     target="_blank">
    <img src="https://michellepace.github.io/word-document-corrector-claude/images/final-solution-diagram.jpg"
         width="400"
         alt="Digital Drafting: Claude's visual take on our solution" />
  </a>
</figure>

<br>

# **2. SETUP**

## Python Libraries

In [1]:
# Install libraries that Colab does not have pre-installed.
try:
    !pip install --upgrade-strategy only-if-needed --quiet anthropic langchain markdown numpy python-docx scikit-learn sentence-transformers strip-markdown tqdm
    print("Success! All required libraries are installed.")
except Exception as e:
    print(f"An error occurred during library installation: {str(e)}")

# Import all required libraries.
try:
    from google.colab import drive, files # Request access to your Word Document in Colab
    from google.colab import userdata # Request access to your secure Colab Secret: ANTHROPIC_API_KEY
    import docx # Read and write Word documents; extract paragraphs and create test Word docs.
    from pathlib import Path # Handle files easily and concisely
    import anthropic # Interact with Claude Model via Anthropic's API
    from langchain.text_splitter import MarkdownTextSplitter # Split markdown text into chunks
    from strip_markdown import strip_markdown # Remove markdown for content comparison
    from markdown import markdown # Convert corrected markdown file into a pretty HTML page
    import re # Analyze text using regular expressions
    from collections import Counter # To easily print test results ------------------------------------
    from tabulate import tabulate # To easily print test results
    import numpy as np # Process numerical data for preservation scores
    from tqdm.auto import tqdm # Display progress bar for chunk processing
    from sentence_transformers import SentenceTransformer # Transform text chunks into comparable vectors of meaning
    from sklearn.metrics.pairwise import cosine_similarity # Calculates semantic similarity between text chunk vectors
    print("Success! Library imports are complete.")

except Exception as e:
    print(f"An error occurred during library importation: {str(e)}")

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m222.8/222.8 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.3/244.3 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m18.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m17.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m16.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

<a name="id-configuration"></a>

## Your Settings

**Complete Notebook Usage Steps:**
1. Top left corner of Notebook: **File > Save a copy in Drive.**
1. Top left corner of Notebook: **Runtime > Run all.**
1. Follow the steps in cell below: "**Set Word Document ⭐**"
1. Follow the steps in next cell below: "**Set Secret API key ⭐**"
1. Top left corner of Notebook: **Runtime > Run cell and below**
1. Wait for notebook to finish running all code.
1. Find the corrections in the same folder as your Word document.

<br>

In [2]:
# @title Set Word Document ⭐ { vertical-output: true, display-mode: "form" }
#@markdown **Instructions:**

#@markdown **Step 1:** Upload the Word document to your [Google Drive](https://drive.google.com/drive/my-drive)<br>
#@markdown **Step 2:** Input the full name of the Word document file (see Example below)<br>
#@markdown **Step 3:** Run this cell by clicking the little "play" icon just under the title<br>

#@markdown **Example:**<br>
#@markdown - Google Drive files ALWAYS start with: <font color='#FF1493'>/content/​drive/​MyDrive/</font><br>
#@markdown - For example: `/content/drive/MyDrive/my-animal-folder/pony.docx`<br><br>

my_input_docx_file = '/content/drive/MyDrive/TEST/Word.docx' #@param {type:"string", placeholder:"(here is an example)   /content/drive/MyDrive/MyWordDoc.docx"}

def validate_google_drive_docx(file_path: str) -> Path:
    """
    Validate the input Word document file in Google Drive.

    :param file_path: String path to Word document on Google Drive
    :return: Path object of the validated file
    :raises: Various exceptions for invalid cases
    """

    try:
        # Check if input is empty
        if not file_path.strip():
            raise ValueError("No file path provided in input box")

        # Mount Google Drive if not already mounted
        if not Path('/content/drive').exists():
            try:
                drive.mount('/content/drive')
            except Exception as e:
                if "credential propagation was unsuccessful" in str(e).lower():
                    raise PermissionError("You denied access to Google Drive.")
                else:
                    raise e

        file_path = Path(file_path)

        # Check if the file exists
        if not file_path.exists():
            raise FileNotFoundError(f"This file doesn't exist in your Google Drive: '{file_path}'")

        # Check if the file has a .docx extension
        if file_path.suffix.lower() != '.docx':
            raise ValueError(f"This file doesn't have a .docx extension: '{file_path}'")

        # Check if the file is not empty
        if file_path.stat().st_size == 0:
            raise ValueError(f"This file is empty: '{file_path}'")

        # Check if the file can be opened as a Word document
        try:
            docx.Document(file_path)
        except Exception as e:
            raise ValueError(f"This file couldn't be opened: '{file_path}'. Error: {str(e)}")

        print("Success!")
        print(f"• Word document found: {file_path.absolute()}")
        print(f"• I'll be sending this to Claude for correction")
        return file_path

    except ValueError as ve:
        print(f"🛑 Error: {str(ve)}")
        print("To fix:")
        if str(ve) == "No file path provided in input box":
            print(" 1. You must input your Word document file name in the input box above.")
            print(" 2. Read and follow the each step above.")
        else:
            print(" 1. Ensure the file is a valid .docx document")
            print(" 2. Check if the file is not corrupted or empty")
        print("🛑🛑🛑\n")
        raise
    except PermissionError as pe:
        print(f"🛑 Error: {str(pe)}")
        print("To fix:")
        print(" 1. Rerun this block and click \"Connect to Google Drive\"")
        print(" Worried about safety? Save your own copy of this Notebook and run that.")
        print("🛑🛑🛑\n")
        raise
    except FileNotFoundError as fnf:
        print(f"🛑 Error: {str(fnf)}")
        print("To fix:")
        print(" 1. In Input Box instructions, look at the pink Example given")
        print(" 2. Verify your file exists in your Google Drive on that exact path")
        print(" 3. Remember, file paths and names are case-sensitive")
        print("🛑🛑🛑\n")
        raise
    except Exception as e:
        print("🛑 Unexpected error occurred")
        print(f"Error details: {str(e)}")
        print("Sorry... really don't know how you got here.")
        print("🛑🛑🛑\n")
        raise


### Do the work
try:
    my_input_docx_file = validate_google_drive_docx(my_input_docx_file)
except Exception:
    print("Please update the file path and run this block again.")
    raise

Mounted at /content/drive
Success!
• Word document found: /content/drive/MyDrive/TEST/Word.docx
• I'll be sending this to Claude for correction


In [3]:
# @title Set Secret API key ⭐ { vertical-output: true, display-mode: "form" }
#@markdown **Instructions:**<br>

#@markdown **Step 1:** Create an [Anthropic API key](https://console.anthropic.com/settings/keys).<br>
#@markdown **Step 2:** Then run this cell by clicking the 'play' icon just under the title.<br>
#@markdown **Step 3:** you will be guided to setup a Colab secret if you don't have one.<br>

#@markdown

anthropic_api_secret_name = 'ANTHROPIC_API_KEY'  # @param {type: "string"}

def test_anthropic_connection(anthropic_client: anthropic.Anthropic) -> None:
    my_test_prompt = "Hello Claude, have I connected to you? (answer briefly!)"
    print()
    try:
        message = anthropic_client.messages.create(
            model="claude-3-5-sonnet-latest",
            max_tokens=20,
            messages=[{"role": "user", "content": my_test_prompt}]
        )
        print("Success!\nAPI key is valid and working.")
        print(f"• My prompt was:  {my_test_prompt}")
        print(f"• Claude responded: {message.content[0].text}")

    except anthropic.APIError as e:
        print(f"API error occurred: {e}")
        raise KeyboardInterrupt("Connection test failed. Stopping execution 🛑.") from e
    except Exception as e:
        print(f"Unexpected error occurred: {e}")
        raise KeyboardInterrupt("Connection test failed. Stopping execution 🛑.") from e


def validate_anthropic_api_key_format(api_key):
    if not api_key.startswith('sk-'):
        raise ValueError("Anthropic API keys start with \"sk-\"")
    if ' ' in api_key:
        raise ValueError("Anthropic API keys don't have white spaces.")
    if len(api_key) <= 100:
        raise ValueError("Anthropic API keys are longer than 100 characters.")


def get_anthropic_api_key(secret_name):
    try:
        api_key = userdata.get(secret_name)
        validate_anthropic_api_key_format(api_key)
        print("Success!")
        print(f'Your Colab secret "{secret_name}" was found.')
        print(f"• If the secret contains a valid API Key, we can connect to Claude.")
        print(f'• To change API Key: Click the "key" icon in left handside panel, delete "{secret_name}", rerun this block.')
        return api_key

    except userdata.SecretNotFoundError:
        print(f"🛑 Error: Colab secret '{secret_name}' not found in your Colab environment")
        print(" To fix:")
        print(f" 1. Click the \"key\" icon on the left of this Notebook")
        print(f" 2. Add new secret with name '{secret_name}'")
        print(f" 3. Set value to Anthropic API key from: https://console.anthropic.com/settings/keys")
        print(f" 4. Rerun this block and follow next instructions")
        print(" About Colab secrets: https://bit.ly/4cad0v7")
        print("🛑🛑🛑\n")
        raise
    except userdata.NotebookAccessError:
        print(f"🛑 Error: You denied this Notebook access to your Colab secret '{secret_name}'")
        print(" To fix:")
        print(" 1. Rerun this block and click \"Grant access\"")
        print(" About Colab secrets: https://bit.ly/4cad0v7")
        print(" Worried about safety? Save your own copy of this Notebook and run that.")
        print("🛑🛑🛑\n")
        raise
    except ValueError as ve:
        print(f"🛑 Error: Invalid format, {str(ve)}")
        print(" To fix:")
        print(f" 1. Click the \"key\" icon on the left of this Notebook")
        print(f" 2. Delete '{anthropic_api_secret_name}'")
        print(f" 4. Rerun this block and follow next instructions")
        print("🛑🛑🛑\n")
        raise
    except Exception as e:
        print("🛑 Unexpected error occurred")
        print(" Please check:")
        print(f" 1. '{secret_name}' secret exists in Colab (click \"key\" icon on the left)")
        print(" 2. Secret value is a valid Anthropic API key")
        print(" Get API key: https://console.anthropic.com/settings/keys")
        print(" About Colab secrets: https://bit.ly/4cad0v7")
        print("🛑🛑🛑\n")
        raise


### Do the work
MY_ANTHROPIC_API_KEY = get_anthropic_api_key(anthropic_api_secret_name)
### Do the work

anthropic_client = anthropic.Anthropic(
    api_key=MY_ANTHROPIC_API_KEY, # From: "Set Secret API key ⭐"
    max_retries=2, # Maximum retry attempts
    timeout=10, # Timeout of the retry
)

test_anthropic_connection(anthropic_client)

Success!
Your Colab secret "ANTHROPIC_API_KEY" was found.
• If the secret contains a valid API Key, we can connect to Claude.
• To change API Key: Click the "key" icon in left handside panel, delete "ANTHROPIC_API_KEY", rerun this block.

Success!
API key is valid and working.
• My prompt was:  Hello Claude, have I connected to you? (answer briefly!)
• Claude responded: Yes, I'm here and ready to chat!


## All Other Settings

- Output files

In [4]:
# Output Files (not AI Settings)
original_md_file = Path(f"{my_input_docx_file}.ORIG.md") # Word doc extracted into markdown format.
processed_md_file = Path(f"{my_input_docx_file}.PROCESSED.md") # Word doc with all the corrections, in markdown format.
processed_html_file = Path(f"{my_input_docx_file}.PROCESSED.html") # Word doc corrections, in HTML format for easy reading.

- AI settings (excluding prompt)

In [5]:
# Chunking Word Doc. Splitter preserves markdown elements.
TARGET_CHUNK_CHARACTERS = 4400 # (approx. 600 words)

# Testing Simularity between original/processed chunk
SIMILARITY_LARGE_LANGAUGE_MODEL = 'paraphrase-multilingual-mpnet-base-v2'

# Everything Claude:
CLAUDE_MODEL  = "claude-3-5-sonnet-latest"
PROMPT_TEMP = 0  # Low temperature for more probable and consistent output (0 to 1)

MY_ANTHROPIC_CLIENT = anthropic.Anthropic(
    api_key=MY_ANTHROPIC_API_KEY, # From: "Set Secret API key ⭐"
    max_retries=2,  # Maximum retry attempts per API request (text chunk)
    timeout=20.0   # Timeout in seconds for each individual API request (text chunk)
)

 - AI settings (prompt for Claude)

In [6]:
PROMPT_TEMPLATE = """CRITICAL: PROVIDE ONLY THE CORRECTED TEXT WITHOUT ANY ADDITIONAL COMMENTARY.

Your task is to take the provided text and rewrite it into a clear, grammatically correct version while preserving the original meaning as closely as possible. Correct any spelling mistakes, punctuation errors, verb tense issues, word choice problems, and other grammatical mistakes.

MANDATORY INSTRUCTIONS:

1. Determine and use the same linguistic language as the original text (e.g., English, German)
2. Preserve all existing markdown formatting, including heading levels, paragraphs, and lists
3. Make necessary grammatical corrections, including spelling, punctuation, verb tense, word choice, and other grammatical issues. Only make stylistic changes if essential for clarity
4. Mark corrections with markdown syntax, apply one of these choices only:
   - For changed text use bold: e.g., **changed** and **multiple changed words**
   - For new text use bold: **new words**
   - For removed text use bold strikethrough: **~~removed words~~**
5. Maintain the original structure:
   - Don't add new lines of text
   - Don't include additional commentary at all
   - Don't convert markdown elements to different types
6. For ambiguous corrections, choose the option that best preserves original meaning and style
7. Ensure consistency in corrections throughout the text
8. Return the corrected text in markdown syntax
9. DO NOT add any explanations, introductions, or conclusions to your response

FINAL REMINDER: Your output should consist SOLELY of the corrected text. Do not include phrases like "Here is the corrected text" or any other form of commentary.

The text to be corrected is provided between the triple tildes (~~~):

~~~
{the_markdown_chunk}
~~~

REMEMBER: Provide ONLY the corrected text without any additional words or explanations."""

In [7]:
print("Success! All settings configured.")

Success! All settings configured.


---

<br>

# **3. PRE-PROCESSING**

It is not possible to send a Word document to Claude. These steps below convert the document into "little chunks" of markdown, ready to send to Claude for correction one chunk at a time.

<!-- /Figure with caption 300px width-->
<figure>
  <figcaption>Fig 4: Pre-processing up to "little chunks"</figcaption>
  <a href="https://michellepace.github.io/word-document-corrector-claude/images/initial-solution-sketch.jpg"
     target="_blank">
    <img src="https://michellepace.github.io/word-document-corrector-claude/images/initial-solution-sketch.jpg"
         width="500"
         alt="Initial Solution Sketch" />
  </a>
</figure>

## Extract Word doc text

- Preserve the same headings, bullet lists, and paragraphs as in the Word document.
- For simplicity, ignore everything else like text found in tables, images, charts, headers and footers.

In [8]:
def extract_docx_paragraphs(docx_file: Path) -> list[dict]:
    """Extract non-empty paragraphs from Word document"""
    doc = docx.Document(docx_file)

    def get_heading_level(para):
        if para.style.name.startswith('Heading'):
            try:
                return int(para.style.name.split()[-1])
            except ValueError:
                return None
        return None

    return [{
        'text': para.text,
        'style': para.style.name,
        'heading_level': get_heading_level(para),
        'word_count': len(para.text.split())
    } for para in doc.paragraphs if para.text.strip()]

def print_summary(paragraphs: list[dict], docx_file: Path) -> None:
    """Print document summary with style statistics"""
    # Calculate stats and build table
    style_stats = {}
    for style in set(p['style'] for p in paragraphs):
        style_paras = [p for p in paragraphs if p['style'] == style]
        style_stats[style] = {
            'level': style_paras[0]['heading_level'],
            'para_count': len(style_paras),
            'words': sum(p['word_count'] for p in style_paras)
        }

    rows = [[
        style,
        stats['level'] or '',
        f"{stats['para_count']:,}",
        f"{stats['words']:,}"
    ] for style, stats in sorted(style_stats.items())]

    print(f"Success! • Extracted text from: {docx_file.name}\n")
    print(f"EXRACTION SUMMARY")
    print(f" Total styles: {len(style_stats)}")
    print(f" Total paragraphs: {len(paragraphs):,}")
    print(f" Total word count: {sum(stats['words'] for stats in style_stats.values()):,}\n")


    print(tabulate(
        rows,
        headers=['Style', 'Heading Level', 'Paragraphs', 'Words'],
        tablefmt='simple',
        colalign=('left', 'center', 'right', 'right')
    ))

# Use the functions
docx_paragraphs = extract_docx_paragraphs(my_input_docx_file)
print_summary(docx_paragraphs, my_input_docx_file)

Success! • Extracted text from: Word.docx

EXRACTION SUMMARY
 Total styles: 4
 Total paragraphs: 117
 Total word count: 3,506

Style            Heading Level     Paragraphs    Words
--------------  ---------------  ------------  -------
Heading 1              1                    7       35
Heading 2              2                   16       66
List Paragraph                             21      242
Normal                                     73    3,163


## Convert to Markdown


Convert the extracted test into a [markdown]( https://markdownguide.offshoot.io/getting-started) file.

Markdown format was chosen because:
1. Can preserve Word document structure (headings, lists, paragraphs)
1. It's a clean format that large language models can read efficiently (unlike HTML).
1. You can mostly avoid splitting a heading section into multiple chunks, Claude will have more awareness of context for corrections.

In [9]:
def create_simple_markdown_file(paragraphs: list[dict], markdown_file: Path) -> Path:
    """ Convert paragraphs to Markdown format and save to a file.
    :param paragraphs: List of dictionaries containing paragraph information
    :param markdown_file: Path to save the generated Markdown file
    :return: Path of the created Markdown file
    """
    def format_paragraph(para):
        if para['heading_level'] is not None:
            return f"{'#' * para['heading_level']} {para['text']}"
        elif para['style'].startswith('List'):
            return f"- {para['text']}"
        else:
            return para['text']

    # Filter out empty paragraphs and format the rest
    formatted_paragraphs = [format_paragraph(para) for para in paragraphs if para['text'].strip()]

    markdown_content = []
    for i, current_para in enumerate(formatted_paragraphs):
        if i > 0:
            prev_is_list = formatted_paragraphs[i-1].startswith("- ")
            current_is_list = current_para.startswith("- ")
            # Single newline for consecutive list items, double for others
            markdown_content.append("\n" if prev_is_list and current_is_list else "\n\n")

        markdown_content.append(current_para)

    # Join paragraphs and normalise spacing:
    markdown_content = "".join(markdown_content)
    # Remove consecutive empty lines and strip leading/trailing whitespace
    markdown_content = re.sub(r'\n{3,}', '\n\n', markdown_content.strip())

    markdown_file.write_text(markdown_content, encoding='utf-8')
    return markdown_file


### Do the work
original_md_file = create_simple_markdown_file(docx_paragraphs, original_md_file)
print(f'Success!\n• Word document text saved as a markdown file\n• It is here: "{original_md_file.absolute()}"')

Success!
• Word document text saved as a markdown file
• It is here: "/content/drive/MyDrive/TEST/Word.docx.ORIG.md"


## Split into Chunks

While Claude 3.5 Sonnet allows up to 8,192 tokens (~6,000 words), using smaller chunks makes it easier to compare original and processed text.

**Character-Based Chunking**

I use character-based chunking instead of tokens to avoid breaking markdown elements (headings, lists, etc.). The markdown-splitting library used only supports character-based splits.

**Target Chunk Size**

A starting guess of 600 words (just over one page) was used. Then based on token analysis using the Anthropic Tokenizer (Figure 5) I determined:
- Target: 600 words (~4,400 characters) per chunk
- Total prompt size: Prompt template (269 words) + chunk (600 words) ≈ 1,100 tokens
- Well below Claude's 8,192 token limit
- Prompt instructs to "preserve meaning": response size will be < 8,192 tokens too.
- Set target chunk size: 4,400 characters.

<!-- /Figure with caption full width-->
<figure>
  <figcaption>Fig 5: Table For Estimating Conversion: Between words, characters, and Anthropic Tokens</figcaption>
    <img src="https://michellepace.github.io/word-document-corrector-claude/images/text-analysis-with-tokeniser.jpg"
         alt="Estimating conversion rates between words, characters, and Anthropic Tokens" />
  </a>
</figure>

**Note:** While smaller chunks increase API costs due to more "prompt template" submissions, they improve quality control by making semantic changes easier to spot.

In [10]:
def split_markdown_into_chunks(
    markdown_file: Path,
    target_chunk_size_chars: int = 2000,
    chunk_overlap: int = 0
) -> list[str]:
    """ Split a Markdown file into text chunks using MarkdownTextSplitter.
    :param markdown_file: Path to the Markdown file
    :param target_chunk_size_chars: Target size of each chunk in characters
    :param chunk_overlap: Number of overlapping characters between chunks
    :return: List of text chunks (that don't exceed the target chunk size)
    """
    text = markdown_file.read_text(encoding='utf-8')
    text_splitter = MarkdownTextSplitter(chunk_size=target_chunk_size_chars, chunk_overlap=chunk_overlap)
    chunks = text_splitter.split_text(text)
    print(f"Success!")
    print(f"• The markdown file containg your Word document text has been split into {len(chunks)} manageable chunks.")
    print(f"• Each chunk aims to be around {target_chunk_size_chars:,} characters long (never more).")
    print(f"• These chunks are now ready to be sent to Claude for processing (ie correction).")
    return chunks


def count_claude_tokens(text: str) -> int:
    count = MY_ANTHROPIC_CLIENT.beta.messages.count_tokens(
        model=CLAUDE_MODEL,
        messages=[{"role": "user", "content": text}]
    )
    return count.input_tokens


def print_chunking_summary(chunks: list[str]) -> None:
    sizes = [len(chunk) for chunk in chunks]
    tokens = [count_claude_tokens(chunk) for chunk in chunks]

    print(f"\n\nSummary of All Chunks\n{'-' * 48}")
    print(tabulate([
        ["Total Chunks", f"{sum(sizes):,}", f"{sum(tokens):,}", f"{len(chunks)}"],
        ["Avg. Chunk size", f"{sum(sizes)/len(chunks):,.0f}", f"{sum(tokens)/len(chunks):,.0f}", "*"],
        ["Min. Chunk size", f"{min(sizes):,}", f"{min(tokens):,}", "*"],
        ["Max. Chunk size", f"{max(sizes):,}", f"{max(tokens):,}", "*"]
    ],
        headers=['What', 'Characters', 'Tokens', 'Count'],
        tablefmt='simple',
        colalign=('left', 'right', 'right', 'right')
    ))


def print_chunk_table(chunks: list[str]) -> None:
    # Generate chunk statistics rows
    rows = [[
        f"Chunk {i}",
        f"{len(chunk.splitlines()):,}",
        f"{len(chunk.split()):,}",
        f"{len(chunk):,}",
        f"{count_claude_tokens(chunk):,}",
        chunk.replace('\n', ' ').replace('\r', '')[:40] + "..."
    ] for i, chunk in enumerate(chunks, 1)]

    # Add separator and totals row
    rows.append(["-" * 10] * 6)  # Add separator row
    rows.append([
        "Total"
    ] + [f"{sum(int(row[i].replace(',','')) for row in rows[:-1]):,}" for i in range(1, 5)] + [""])

    print(f"\n\nSummary per Chunk\n{'-' * 104}")
    print(tabulate(
        rows,
        headers=['Chunk', 'Lines', 'Words', 'Chars', 'Tokens', 'Chunk Start'],
        tablefmt='simple',
        colalign=('left', 'right', 'right', 'right', 'right', 'left')
    ))

### Do the work
original_chunks = split_markdown_into_chunks(original_md_file, target_chunk_size_chars=TARGET_CHUNK_CHARACTERS)
print_chunking_summary(original_chunks)
print_chunk_table(original_chunks)

Success!
• The markdown file containg your Word document text has been split into 5 manageable chunks.
• Each chunk aims to be around 4,400 characters long (never more).
• These chunks are now ready to be sent to Claude for processing (ie correction).


Summary of All Chunks
------------------------------------------------
What               Characters    Tokens    Count
---------------  ------------  --------  -------
Total Chunks           20,747     4,755        5
Avg. Chunk size         4,149       951        *
Min. Chunk size         3,584       758        *
Max. Chunk size         4,395     1,069        *


Summary per Chunk
--------------------------------------------------------------------------------------------------------
Chunk            Lines       Words       Chars      Tokens  Chunk Start
----------  ----------  ----------  ----------  ----------  --------------------------------------------
Chunk 1             48         689       4,177         956  # 🟡 A Story with Cl

<br>

- Uncomment the below to see inside a specified chunk.

In [11]:
def print_chunks(chunks: list[str], chunks_to_print: list[int] | None = None) -> None:
    total_chunks = len(chunks)

    if chunks_to_print is None:
        chunks_to_print = list(range(1, total_chunks + 1))

    print("\nCHUNK CONTENTS")
    print("=" * 80)

    for chunk_number, chunk in enumerate(chunks, start=1):
        if chunk_number in chunks_to_print:
            print(f"\n🌸 CHUNK {chunk_number} contains:")
            print("-" * 80)
            print(f"\n{chunk}\n")

    # Error messages for non-existent chunks
    invalid_chunks = [n for n in chunks_to_print if n > total_chunks]
    if invalid_chunks:
        print("\nERROR:")
        print(f"Requested chunks {invalid_chunks} do not exist (max: {total_chunks})")


# Example usage: show chunks 1, 3
# print_chunks(original_chunks, [1, 3])

<br>

# **4. PROCESSING**

## Prompt Template (+chunk)

Individual chunks are embedded into the prompt template. That means, if there are say 20 chunks, then Claude will be prompted 20 times. The output below shows an example of what the total prompt sent to Claude looks like.

In [12]:
def print_prompt_with_chunk_example():
    example_chunk = f"""# I'M A LITTLE EXAMPLE CHUNK OF MARKDOWN TEXT TELLING A STORY.
- Chunks are placed in between the '~~~' band, this is exactly where I am right now.
- Text above or below the '~~~' band is the same for all prompts sent to Claude.
- To make a chunk, the Word document was converted into markdown and split into pieces (ie chunks)
- Target chunk size was set to a maximum of {TARGET_CHUNK_CHARACTERS} characters, respecting markdown elements.
- If there are 19 chunks, Claude will be prompted 19 seperate times and return 19 corrected chunks.
- Corrected chunks are reassembled together and then converted into html with corrections in colour.
- Again, I'm just an **example** chunk. The real ones will be sent to Claude. Not me!"""

    prompt_with_chunk = PROMPT_TEMPLATE.format(the_markdown_chunk=example_chunk)
    print(prompt_with_chunk)


### Do the work
print("Between solid lines: Prompt sent to Claude (Prompt template + example chunk)")
print("_" * 90, "\n")
print_prompt_with_chunk_example()
print("_" * 90, "\n")

Between solid lines: Prompt sent to Claude (Prompt template + example chunk)
__________________________________________________________________________________________ 

CRITICAL: PROVIDE ONLY THE CORRECTED TEXT WITHOUT ANY ADDITIONAL COMMENTARY.

Your task is to take the provided text and rewrite it into a clear, grammatically correct version while preserving the original meaning as closely as possible. Correct any spelling mistakes, punctuation errors, verb tense issues, word choice problems, and other grammatical mistakes.

MANDATORY INSTRUCTIONS:

1. Determine and use the same linguistic language as the original text (e.g., English, German)
2. Preserve all existing markdown formatting, including heading levels, paragraphs, and lists
3. Make necessary grammatical corrections, including spelling, punctuation, verb tense, word choice, and other grammatical issues. Only make stylistic changes if essential for clarity
4. Mark corrections with markdown syntax, apply one of these choices 

## Process Chunks

Each chunk is embedded in the base prompt and sent to Claude with instructions to make corrections. The processed chunks are collected in order and later reassembled into a complete document.

In [26]:
def process_all_chunks(chunks: list[str]) -> list[str]:
    """ Process all chunks of markdown text and show progress bar.
    :param chunks: List of markdown text chunks
    :return: List of processed text chunks (error message for failed chunks)
    """
    processed_chunks = []
    processing_desc = "Processing: sending text chunks to Claude for correction! "
    for i, chunk in tqdm(enumerate(chunks, 1), total=len(chunks), desc=processing_desc, bar_format='{l_bar}{bar} {n_fmt}/{total_fmt} Chunks'):
        result = process_one_chunk(i, chunk)
        processed_chunks.append(result)

    successful_chunks = sum(1 for chunk in processed_chunks if not chunk.startswith("ERROR"))
    total_chunks = len(chunks)

    if successful_chunks == total_chunks:
        print(f"Processing complete: {successful_chunks}/{total_chunks} chunks processed successfully.")
    else:
        print(f"Processing complete: Warning!! Only {successful_chunks}/{total_chunks} chunks were processed successfully.")

    return processed_chunks

def process_one_chunk(
    chunk_count: int,
    chunk: str,
    client: anthropic.Anthropic = MY_ANTHROPIC_CLIENT,
    model: str = CLAUDE_MODEL,
    temperature: float = PROMPT_TEMP,
    max_tokens: int = 4096 # This is Claude Sonnet's maxiumum size of text measured in tokens he is able to reply with.
) -> str:
    """ Process a single chunk of markdown text using the Anthropic API.
    :param chunk_count: The index of the current chunk
    :param chunk: The markdown text to process
    :param client: The Anthropic client (default: MY_ANTHROPIC_CLIENT)
    :param model: The Claude model to use (default: CLAUDE_MODEL)
    :param temperature: The temperature setting for the model (default: PROMPT_TEMP)
    :param max_tokens: The maximum number of tokens in the response (default: PROMPT_MAX_RESPONSE_TOKENS)
    :return: Processed text or error message if processing failed
    """
    try:
        response = client.messages.create(
            model=model,
            max_tokens=max_tokens,
            temperature=temperature,
            messages=[{
                "role": "user",
                "content": [{"type": "text", "text": PROMPT_TEMPLATE.format(the_markdown_chunk=chunk)}]
            }]
        )
        if not (response.content and response.content[0].text):
            raise ValueError(f"Empty response from API for chunk {chunk_count}")

        return response.content[0].text

    except Exception as e:
        error_message = get_friendly_error_msg(e, chunk_count)
        print(error_message)  # Print error message for logging
        return error_message  # Return error message for failed chunks

def get_friendly_error_msg(exception: Exception, chunk_count: int) -> str:
    CHUNK_ERROR_TEMPLATE = "ERROR (chunk {}) was not processed because {}"

    error_msgs = {
        # Found in docs Jan 2025
        anthropic.AuthenticationError: "there's an issue with your Anthropic API key",
        anthropic.PermissionDeniedError: "your API key does not have permission to use the specified resource",
        anthropic.RateLimitError: "your account has hit a Rate limit. It's measured in requests/minute, tokens/minute, and tokens/day",
        anthropic.InvalidRequestError: "there was an issue with the format or content of your request",
        anthropic.NotFoundError: "the requested resource was not found",
        anthropic.RequestTooLargeError: "request exceeds the maximum allowed number of bytes",
        anthropic.APIError: "an unexpected error has occurred internal to Anthropic's systems",
        anthropic.OverloadedError: "Anthropic's API is temporarily overloaded. Please retry your request after a short delay",
        # Can no longer find these in the docs, but keeping for in case.
        anthropic.APITimeoutError: "Anthropic took too long to respond. Likely an issue on Anthropic's end.",
        anthropic.APIConnectionError: "failed to connect to Anthropic's API. Perhaps network issues on your end",
        anthropic.APIStatusError: "Anthropic returned an unsuccessful status code",
    }

    for error_type, msg in error_msgs.items():
        if isinstance(exception, error_type):
            error_msg = f"{msg}: {exception}"
            break
    else:
        if isinstance(exception, ValueError) and "Empty response" in str(exception):
            error_msg = f"Anthropic returned an empty response for chunk {chunk_count} (maybe we sent an empty chunk?): {exception}"
        else:
            error_msg = f"Unexpected error on chunk {chunk_count}: {exception}"

    return CHUNK_ERROR_TEMPLATE.format(chunk_count, error_msg)


### Do the work
processed_chunks = process_all_chunks(chunks=original_chunks)

Processing: sending text chunks to Claude for correction! :   0%|           0/5 Chunks

Processing complete: 5/5 chunks processed successfully.


<br>

# **5. POST-PROCESSING**

## Reassemble Chunks

Reassemble processed (corrected) chunks into markdown file.

In [27]:
def reassemble_chunks(chunks: list[str], output_file: Path) -> Path:
    """ Reassemble processed chunks of text and save to an output file.
    :param chunks: List of processed markdown text chunks
    :param output_file: Path to the output file where the reassembled text will be saved
    :return: Path to the output file containing the reassembled text
    """
    # Join chunks with double newlines
    reassembled_chunks = "\n\n".join(chunks)

    # Remove consecutive empty lines
    reassembled_chunks = re.sub(r'\n{3,}', '\n\n', reassembled_chunks)

    # Remove leading and trailing whitespace
    reassembled_chunks = reassembled_chunks.strip()

    # Save the corrected document
    output_file.write_text(reassembled_chunks, encoding='utf-8')

    return output_file


### Do the work
processed_md_file = reassemble_chunks(processed_chunks, processed_md_file)
print(f"Success!")
print(f"• Processed chunks assembled into one file:")
print(f'• here it is: "{processed_md_file.absolute()}"')

Success!
• Processed chunks assembled into one file:
• here it is: "/content/drive/MyDrive/TEST/Word.docx.PROCESSED.md"


## Corrections In Colour

Convert markdown file into html so corrections are in colour and easy to spot.

In [28]:
def convert_markdown_to_html(markdown_file: Path, html_file: Path) -> Path:
    """ Convert a Markdown file to HTML with custom styling and preprocessing.
    :param markdown_file: Path object representing the input Markdown file
    :param html_file: Path object representing the output HTML file
    :return: Path object of the output HTML file
    """
    # Read the Markdown file
    markdown_text = markdown_file.read_text(encoding='utf-8')

    # Convert custom Markdown strikethrough syntax to HTML, handling regular and bold strikethroughs
    def process_strikethrough(match):
        content = match.group(2)
        is_bold = bool(match.group(1))
        if is_bold:
            return f"<strong><del>{content}</del></strong>"
        else:
            return f"<del>{content}</del>"

    strikethrough_pattern = r'(\*\*)?~~(.*?)~~(\*\*)?'
    markdown_text = re.sub(strikethrough_pattern, process_strikethrough, markdown_text)

    # Convert to HTML
    html_body = markdown(markdown_text)

    # Define CSS styles, including the improved responsive design
    css_styles = """
    <style>
        body {
            margin: 0 auto;
            padding: 0 5%;
            max-width: 50em;
            line-height: 1.5em;
            font-family: 'Inter', Arial, sans-serif;
            font-size: 16px;
            background-color: #F0EFEA;
            color: #141413;
        }

        @media (max-width: 768px) { body { padding: 0 3%;}}

        strong { color: #E46264;}
    </style>
    """

    # Create the full HTML document
    full_html = f"""
    <!DOCTYPE html>
    <html lang="en">
    <head>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, initial-scale=1.0">
        <title>Corrected Document</title>
        <link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;700&display=swap" rel="stylesheet">
        {css_styles}
    </head>
    <body>
        {html_body}
    </body>
    </html>
    """

    # Write the HTML to a file
    html_file.write_text(full_html, encoding='utf-8')

    return html_file

def download_drive_file(file_path: Path):
    file_path = Path(file_path)

    if not file_path.is_file():
        print(f"File not found: {file_path}")
        return

    # Download file directly from Google Drive
    files.download(str(file_path))

    print(f"• Browser download initiated...")


### Do the work
processed_html_file = convert_markdown_to_html(processed_md_file, processed_html_file)
print(f"Success!")
print(f"• Saved corrected Word document as: {processed_html_file.absolute()}")

download_drive_file(processed_html_file)
print(f"• Downloaded as: {processed_md_file.name}")
print(f"• Enjoy the corrections Word missed, we're done! 🙂")

Success!
• Saved corrected Word document as: /content/drive/MyDrive/TEST/Word.docx.PROCESSED.html


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

• Browser download initiated...
• Downloaded as: Word.docx.PROCESSED.md
• Enjoy the corrections Word missed, we're done! 🙂


<br>

# **6. TESTING**

<a name="id-test-plan"></a>
## Test Plan (critical♥️)

What I have learnt through creating this notebook is that — automated testing — especially when working with LLMs — is the difference between success and failure. You must go slower to go faster.
<br>

The test plan is divided into 3 major areas:

1. **Test Processed Doc:** End-to-end testing, verifying code and prompt.
1. **Prompt testing:** Prompt evaluation to ensure responses are as expected (generated test data).
1. **Test My Code:** Traditional functional testing for core code components.

<br>

The first two major areas incorporate testing the response from Claude. It may seem like overkill but I quite disagree, and here is why:

- What if I have 20 large Word documents to correct? It is not practical to manually check each one. What if my document is in Italian? Should I assume Claude will perform as well, or should I have an automated way of testing it? Just because I've thoroughly tested one document, that doesn't imply the same level of dependability for another as content varies vastly.

- As a Product Manager, what if I decide to switch to the free Meta Llama 3.1 model instead of Claude Sonnet, or a one tenth of the cost model like DeepSeek v3? How would I have any chance of knowing that my "product" can perform as well as it did with Claude 3.5 Sonnet? The same can be said for model updates.

- Manually testing for all these scenarios is time-consuming. Automated tests increased the quality of corrections across a wide range of documents and made me more confident to adopt model updates (and a model switch soon).

- Testing is marked with a heart "♥️" because: It must be at the heart of a product when leveraging an LLM — the have non-deterministic responses.

<br>

Finally, the test plan was developed in collaboration with Claude:
<!-- /Figure with caption 200px width-->
<figure>
  <figcaption>Fig 6: Test Plan Brainstorming</figcaption>
  <a href="https://michellepace.github.io/word-document-corrector-claude/images/test-plan-brainstorming.jpg"
     target="_blank">
    <img src="https://michellepace.github.io/word-document-corrector-claude/images/test-plan-brainstorming.jpg"
         width="200"
         alt="Talking testing with Claude and designing a test plan." />
  </a>
</figure>

<!-- /Figure with caption 200px width-->
<figure>
  <figcaption>Fig 7: Final Test Plan</figcaption>
  <a href="https://michellepace.github.io/word-document-corrector-claude/images/final-test-plan.jpg"
     target="_blank">
    <img src="https://michellepace.github.io/word-document-corrector-claude/images/final-test-plan.jpg"
         width="200"
         alt="Sketched out test plan using what I learnt from Claude." />
  </a>
</figure>

<a name="id-test-processed-doc"></a>
## 6.1 Test Processed Doc

Test that Claude has enhanced and corrected writing without changing the core structure or meaning, specifically:

1. **a) Structure Preservation:** Are the headings and paragraphs still in the right place?
1. **b) Content Preservation: Word count:** Has the overall length changed dramatically?
1. **c) Content Preservation: Semantic meaning:** Does the corrected text still mean the same?

### a) Content Preservation: Document Structure

Checks the structure of the Word document has remained intact, like headings and bullet lists. Some structural changes may be valid. For example, if you accidentally started a new paragraph mid-sentence in your Word document, Claude would likely fix that.

In [29]:
def display_markdown_structure_comparison(original_file: Path, processed_file: Path) -> None:
    """Test if document structure is preserved after processing."""

    def count_markdown_elements(text: str) -> dict[str, int]:
        lines = text.splitlines()
        heading_pattern = re.compile(r'^#+\s')

        return {
            'headings': sum(1 for line in lines if heading_pattern.match(line)),
            'paragraphs': sum(1 for line in lines if line.strip() and not heading_pattern.match(line)
                            and not line.startswith(('- ', '* '))),
            'list_items': sum(1 for line in lines if line.startswith(('- ', '* '))),
            'total_lines': len(lines),
            'empty_lines': sum(1 for line in lines if not line.strip())
        }

    # Count elements in both files
    original = count_markdown_elements(original_file.read_text(encoding='utf-8'))
    processed = count_markdown_elements(processed_file.read_text(encoding='utf-8'))

    # Prepare comparison table
    rows = [[
        element.replace('_', ' ').title(),
        f"{original[element]:,}",
        f"{processed[element]:,}",
        "✓" if original[element] == processed[element] else "×"
    ] for element in original]

    # Print results
    print(f"\nEvaluate Content Preservation: By Document Structure\n{'-' * 66}")
    print(f"Status: {'MATCH' if original == processed else 'MISMATCH(!)'}\n")

    print(tabulate(
        rows,
        headers=['Markdown Element', 'Original', 'Processed', 'Match'],
        tablefmt='simple',
        colalign=('left', 'right', 'right', 'center')
    ))


# Do the work
display_markdown_structure_comparison(original_md_file, processed_md_file)


Evaluate Content Preservation: By Document Structure
------------------------------------------------------------------
Status: MATCH

Markdown Element      Original    Processed   Match
------------------  ----------  -----------  -------
Headings                    23           23     ✓
Paragraphs                  73           73     ✓
List Items                  21           21     ✓
Total Lines                217          217     ✓
Empty Lines                100          100     ✓


### b) Content Preservation: Simple Word Count

Comparison of word count in original document compared to the corrected document. Large differences could mean Claude has not preserved meaning. A simple but effective check.

In [30]:
def word_count_comparison(
    original_texts:list[str],
    processed_texts: list[str],
) -> list[dict[str, str | int | float]]:
    """ Compare word counts between lists of original and corrected texts.
    :param original_texts: List of original text contents
    :param processed_texts: List of corrected text contents
    :return: List of dictionaries containing word count comparison information
    """
    results = []
    for chunk_number, (original_text, processed_text) in enumerate(zip(original_texts, processed_texts), 1):
        original_count = len(original_text.split())
        processed_count = len(processed_text.split())
        difference = processed_count - original_count
        percentage_diff = (difference / original_count * 100) if original_count else 0
        sign = '+' if difference >= 0 else ''

        results.append({
            "chunk_number": chunk_number,
            "similarity_test": "Word Count",
            "original_count": original_count,
            "corrected_count": processed_count,
            "difference": difference,
            "percentage_difference": percentage_diff,
            "message_tabular": f"Word Count    Original: {original_count:4d} | Processed: {processed_count:4d} | Difference: {difference:+4d}   {percentage_diff:+.1f}%",
            "message_arrow": f"Word Count: {original_count:4d} → {processed_count:4d}  |  {sign}{difference} words  {sign}{percentage_diff:.0f}%"
        })

    return results


def strip_off_markdown(chunks: list[str]) -> list[str]:
    """ Remove markdown formatting, specifically bold and strikethrough, from a list of text chunks
    :param chunks: List of text chunks potentially containing markdown formatting
    :return: List of plain text chunks with markdown formatting removed
    """
    def remove_bold_strikethrough_words(text):
        # This pattern matches:
        # 1. Start of string or a single space/tab (captured)
        # 2. Bold and struck-through text (**~~any content~~**)
        # 3. Optional single space or tab at the end
        # 4. The first capturing group (1) is used in replacement, preserving leading space/tab if present
        pattern = r'(^|[ \t])(\*\*~~.*?~~\*\*)([ \t])?'
        return re.sub(pattern, r'\1', text)

    chunks_without_markdown = []
    for chunk in chunks:
        # Remove bold and strikethrough words because I prompted Claude not
        # to delete words, but rather bold strikethrough them. Avoid false postive, remove!
        text_without_bold_strikethrough = remove_bold_strikethrough_words(chunk)

        plain_text = strip_markdown(text_without_bold_strikethrough)

        chunks_without_markdown.append(plain_text)

    return chunks_without_markdown


def display_word_count_comparison(
    original_chunks: list[str],
    processed_chunks: list[str]
) -> None:
    # Generate comparison data
    rows = []
    for i, (orig, proc) in enumerate(zip(original_chunks, processed_chunks), 1):
        orig_count = len(orig.split())
        proc_count = len(proc.split())
        diff = proc_count - orig_count
        pct = (diff / orig_count * 100) if orig_count else 0

        rows.append([f"Chunk {i}", f"{orig_count:,}", f"{proc_count:,}", f"{diff:+,}",f"{pct:+.1f}%"])

    print(f"\nEvaluate Content Preservation: By Simple Word Count\n{'-' * 66}")
    print(tabulate(
        rows,
        headers=['Chunk', 'Original', 'Processed', 'Difference', 'Change'],
        tablefmt='simple',
        colalign=('left', 'right', 'right', 'right', 'right')
    ))


### Do the work
original_chunks_clean = strip_off_markdown(original_chunks)
processed_chunks_clean = strip_off_markdown(processed_chunks)
display_word_count_comparison(original_chunks_clean, processed_chunks_clean)


Evaluate Content Preservation: By Simple Word Count
------------------------------------------------------------------
Chunk      Original    Processed    Difference    Change
-------  ----------  -----------  ------------  --------
Chunk 1         678          665           -13     -1.9%
Chunk 2         742          722           -20     -2.7%
Chunk 3         740          740            +0     +0.0%
Chunk 4         744          746            +2     +0.3%
Chunk 5         602          603            +1     +0.2%


### c) Content Preservation: Semantic Similarity

Semantic similarity measures if texts convey the same meaning, regardless of wording. After extensive testing of various approaches, I chose the sentence-transformers library with "paraphrase-multilingual-mpnet-base-v2" model (Figure 8) as it proved sufficiently accurate for this use case.

<!-- /Figure with caption no width-->
<figure>
  <figcaption>Fig 8: Narrowing model options for testing semantic similarity</figcaption>
    <img src="https://michellepace.github.io/word-document-corrector-claude/images/semantic-simularity-text-transformer-model-options.jpg"
        width=800
         alt="Asking Claude for language model options to use with Sentence Transformer"/>
  </a>
</figure>

**How Sentence-Transformer works:**
1. It loads the language model I chose (paraphrase-multilingual-mpnet-base-v2)
1. Converts text chunks into number vectors (a way computers understand text)
1. Compares the cosine angle between the two vectors in the given chunk pair (original and Claude-corrected chunk)
1. The smaller the angle the closer the meaning, and the higher the similarity score.

<br>

<!-- /Figure with caption 450px width-->
<figure>
  <figcaption>Fig 9: Semantic Similarity: Smaller angle infers more similar meaning</figcaption>
  <a href="https://michellepace.github.io/word-document-corrector-claude/images/testing-for-semantic-similarity-explained-cosine.jpg"
     target="_blank">
    <img src="https://michellepace.github.io/word-document-corrector-claude/images/testing-for-semantic-similarity-explained-cosine.jpg"
         width="440"
         alt="Semantic Similarity Visualised: Smaller angles = More similar meanings"/>
  </a>
</figure>

<br>

**Semantic Similarity Scoring Guide (derived from testing):**

Similarity Score | Interpretation | Action Needed
--- |:---|:---
100% | Identical meaning | None - perfect match
> 80% | Meaning preserved | None - good match
~ 70% | Same topic but possible detail changes | Review chunk for changes
< 70% | Significant meaning changes | Detailed review required
= 0%  | No relation at all | Something is wildly wrong

<br>

**Note:** Significant score variations between chunk pairs (e.g., one 60% among several 80%+) warrant review, though may simply indicate substantial improvements to poorly written sections.

In [31]:
def sentence_transformer_similarity(texts1: list[str], texts2: list[str]) -> list[dict]:
    """Compare similarity between lists of original and corrected texts using sentence transformers.
    :param texts1: List of original text contents
    :param texts2: List of corrected text contents
    :return: List of dictionaries containing similarity comparison information for each chunk
    """
    def initialize_model():
        return SentenceTransformer(
            SIMILARITY_LARGE_LANGAUGE_MODEL,
            tokenizer_kwargs={'clean_up_tokenization_spaces': True})

    def generate_embeddings(model, texts1, texts2):
        return model.encode(texts1), model.encode(texts2)

    def calculate_similarities(embeddings1, embeddings2):
        similarities = cosine_similarity(embeddings1, embeddings2)
        return np.minimum(np.diag(similarities) * 100, 100)

    def create_result_dict(score: float, index: int) -> dict:
        return {
            "chunk_number": index,
            "similarity_test": "Sentence Transformer",
            "similarity_score": score,
            "message": f"Similarity Score \t({score:.0f}%) Original <> Processed"}

    steps = [
        "🌸 Step 1/3 - Firing up a LLM to be able to understand text for comparison",
        "🌸 Step 2/3 - Turning text chunks into number vectors so simularity can be compared",
        "🌸 Step 3/3 - Comparing original chunk vector with processed chunk vector and scoring"]

    with tqdm(total=3, desc=steps[0].split("...", 1)[0],
                bar_format='{l_bar}{bar} {n_fmt}/{total_fmt}',
                mininterval=2.0,) as pbar:

        # Step 1: Initialize model
        model = initialize_model()
        print(f"\n{steps[0]}... Done!")
        pbar.update(1)
        pbar.set_description(steps[1].split("...", 1)[0])

        # Step 2: Generate embeddings
        embeddings1, embeddings2 = generate_embeddings(model, texts1, texts2)
        print(f"{steps[1]}... Done!")
        pbar.update(1)
        pbar.set_description(steps[2].split("...", 1)[0])

        # Step 3: Calculate similarities
        similarity_scores = calculate_similarities(embeddings1, embeddings2)
        print(f"{steps[2]}... Done!")
        pbar.update(1)
        print()

    return [create_result_dict(score, i) for i, score in enumerate(similarity_scores, 1)]


def get_chunk_simularity_scores(original_chunks: list[str], processed_chunks: list[str]) -> str:
    """Evaluate content preservation by comparing meaning similarity of original and processed chunks."""
    results = sentence_transformer_similarity(original_chunks, processed_chunks)

    # Prepare rows for tabulate
    rows = [[
        f"Chunk {result['chunk_number']} (original <> processed)",
        f"{result['similarity_score']:.0f}%"
    ] for result in results]

    return (
        f"\nEvaluate Content Preservation: Semantic Meaning Within Chunk Pairs\n{'-' * 66}\n"
        f"{tabulate(rows, headers=['Chunk', 'Similarity Score'], tablefmt='simple', colalign=('left', 'right'))}"
    )


# Do the work
result_eval_content_similarity = get_chunk_simularity_scores(original_chunks_clean, processed_chunks_clean)
print(result_eval_content_similarity)

🌸 Step 1/3 - Firing up a LLM to be able to understand text for comparison:   0%|           0/3


🌸 Step 1/3 - Firing up a LLM to be able to understand text for comparison... Done!
🌸 Step 2/3 - Turning text chunks into number vectors so simularity can be compared... Done!
🌸 Step 3/3 - Comparing original chunk vector with processed chunk vector and scoring... Done!


Evaluate Content Preservation: Semantic Meaning Within Chunk Pairs
------------------------------------------------------------------
Chunk                              Similarity Score
-------------------------------  ------------------
Chunk 1 (original <> processed)                 98%
Chunk 2 (original <> processed)                100%
Chunk 3 (original <> processed)                100%
Chunk 4 (original <> processed)                100%
Chunk 5 (original <> processed)                100%


### All Results

See all results for evaluating the output in one place so that it is easier to make connections.

In [32]:
display_markdown_structure_comparison(original_md_file, processed_md_file)
print()
display_word_count_comparison(original_chunks_clean, processed_chunks_clean)
print()
print(result_eval_content_similarity)
print()
print_chunk_table(original_chunks)


Evaluate Content Preservation: By Document Structure
------------------------------------------------------------------
Status: MATCH

Markdown Element      Original    Processed   Match
------------------  ----------  -----------  -------
Headings                    23           23     ✓
Paragraphs                  73           73     ✓
List Items                  21           21     ✓
Total Lines                217          217     ✓
Empty Lines                100          100     ✓


Evaluate Content Preservation: By Simple Word Count
------------------------------------------------------------------
Chunk      Original    Processed    Difference    Change
-------  ----------  -----------  ------------  --------
Chunk 1         678          665           -13     -1.9%
Chunk 2         742          722           -20     -2.7%
Chunk 3         740          740            +0     +0.0%
Chunk 4         744          746            +2     +0.3%
Chunk 5         602          603            +1

<br>

- Uncomment the code below to investigate a particular chunk

In [33]:
# For example:
#  if chunk 4 had an explotion in Word count and a low similarity score
#  investigate the original and processed chunks:

# print_chunks(original_chunks, [4])  # Uncomment me to run me
# print_chunks(processed_chunks, [4]) # Uncomment me to run me

## 6.2 Test My Prompt

Prompt evaluation involves testing if an LLM's responses align with given prompt instructions. This testing is crucial, as highlighted in the <a href="#id-test-plan">Test Plan (critical♥️)</a> section.

But perhaps a more convincing quotation:

> *The (in)ability for teams to measure the performance of their models is the biggest blocker of production use cases for LLMs and also makes prompting an art instead of a science. Even though evals take a lot of time, doing them up front will save developer time in the long run and result in better products getting out much sooner.* [Anthropic](https://github.com/anthropics/courses/blob/b4f26aedef55e06ad5eead5de83985249d1fab2f/prompt_evaluations/01_intro_to_evals/01_intro_to_evals.ipynb).

Unlike <a href="#id-test-processed-doc">7.1 Test Processed Doc</a> which covered end-to-end testing (including code and prompt), this section focuses solely on prompt testing. This first attempt at prompt evaluation proved invaluable - it helped develop an effective prompt faster. Some tests below still fail, highlighting areas where the prompt still needs refinement.

In [34]:
# Helper function for testing
def run_prompt_or_code_test(test_func):
    test_name = test_func.__name__
    try:
        test_func()
        print(f"✅PASSED TEST: {test_name}")
    except AssertionError as e:
        print(f"🛑FAILED TEST: {test_name}: {str(e)}")
    except Exception as e:
        print(f"ERROR: {test_name} raised an unexpected error: {str(e)}")

In [35]:
def show(test_case: dict[str, str], processed: str) -> str:
    return f"""
    Sent to Claude: {test_case['test']}
    Expected Back:  {test_case['expect']}
    Actual Back:    {processed}
    """

def test_no_additional_commentary():
    test_case = {
        "test": "This is a test sentence.",
        "expect": "This is a test sentence."
}
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert len(processed.split('\n')) == 1, "Additional lines were added to the text" + show(test_case, processed)
    assert processed == test_case['expect'], "Text was added when it shouldn't have been" + show(test_case, processed)

def test_structure_maintenance():
    test_case = {
        "test": "Paragraph 1.\n\nParagraph 2.\n\nParagraph 3.",
        "expect": "Paragraph 1.\n\nParagraph 2.\n\nParagraph 3."
    }
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert processed.count('\n\n') == 2, "Paragraph structure was not maintained" + show(test_case, processed)
    assert processed == test_case['expect'], "Text was changed when it shouldn't have been" + show(test_case, processed)

def test_markdown_preservation():
    test_case = {
        "test": "# Heading 1\n## Heading 2\n* List item 1\n* List item 2",
        "expect": "# Heading 1\n## Heading 2\n* List item 1\n* List item 2"
    }
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert processed == test_case['expect'], "Markdown formatting was not preserved" + show(test_case, processed)

def test_spelling_correction():
    test_case = {
        "test": "This senteence has a spelling error.",
        "expect": "This **sentence** has a spelling error."
    }
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert processed == test_case['expect'], "Exact expected correction marking not made" + show(test_case, processed)

def test_dupe_words_are_bold_strike():
    test_case = {
        "test": "These dogs dogs belongs to my sister.",
        "expect": "These **~~dogs~~** dogs belong to my sister."
    }
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert processed == test_case['expect'], "Duplicate word was not strikethrough bold" + show(test_case, processed)

def test_language_detection_german():
    test_case = {
        "test": "Das ist ein Test. Es enthällllt einige Fehler.",
        "expect": "Das ist ein Test. Es **enthält** einige Fehler."
    }
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert processed == test_case['expect'], "German text was not correctly identified or corrected" + show(test_case, processed)

def test_grammatical_corrections():
    test_case = {
        "test": "She buyed some milk and go home.",
        "expect": "She **bought** some milk and **went** home."
    }
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert processed == test_case['expect'], "Grammatical errors were not corrected" + show(test_case, processed)

def test_inappropriate_word_choice_easy():
    test_case = {
        "test":   "He garnished support for perspective employees. His inciteful comments added value.",
        "expect": "He **garnered** support for **prospective** employees. His **insightful** comments added value."
    }
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert processed == test_case['expect'], "Inappropriate word choice was not corrected as expected:" + show(test_case, processed)

def test_inappropriate_word_choice_creative():
    test_case = {
        "test":   "She was literally dying of embarrassment as her speech was bad.",
        "expect": "She was dying of embarrassment as her speech was bad."
    }
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert "literally" not in processed, "Inappropriate word choice 'literally' was not removed" + show(test_case, processed)
    assert "bad" not in processed, "Inappropriate word choice 'bad' was not removed" + show(test_case, processed)

def test_remove_bad_phrase_1():
    test_case = {
        "test":   "The cat quickly ran fastly across the room.",
        "expect": "The cat **~~quickly~~** ran **~~fastly~~** across the room."
    }
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert " quickly ran fastly " not in processed, "Bad phrase 'quickly ran fastly' not corrected" + show(test_case, processed)

def test_remove_bad_phrase_2():
    test_case = {
        "test":   "John forgot to remember to bring his lunch to work today.",
        "expect": "John **~~forgot to remember to~~** **forgot to** bring his lunch to work today."
    }
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert " forgot to remember " not in processed, "Bad phrase 'forgot to remember' not corrected" + show(test_case, processed)

def test_boldstrike_redundant_words():
    test_case = {
        "test":   "She always never fails to disappoint her team with her exceptional work.",
        "expect": "She **~~always~~** never fails to **~~disappoint~~** **impress** her team with her exceptional work."
    }
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert "**~~always~~**" in processed, "Redundant word 'always' not boldstrike" + show(test_case, processed)

def test_detect_and_correct_british():
    test_case = {
        "test": "The centre's staff analysed the colhour behaviour.",
        "expect": "The centre's staff analysed the **colour** behaviour."
    }
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert processed == test_case['expect'], "British spelling was not preserved" + show(test_case, processed)

def test_detect_and_correct_american():
    test_case = {
        "test": "The colourful catalog analysed labor practices.",
        "expect": "The **colorful** catalog **analyzed** labor practices."
    }
    processed = process_one_chunk(chunk_count=1, chunk=test_case['test'])
    assert processed == test_case['expect'], "American spelling was not preserved" + show(test_case, processed)

def run_all_prompt_tests():
    tests = [
        test_no_additional_commentary,
        test_structure_maintenance,
        test_markdown_preservation,
        test_spelling_correction,
        test_dupe_words_are_bold_strike,
        test_language_detection_german,
        test_grammatical_corrections,
        test_inappropriate_word_choice_easy,
        test_inappropriate_word_choice_creative,
        test_remove_bad_phrase_1,
        test_remove_bad_phrase_2,
        test_boldstrike_redundant_words,
        test_detect_and_correct_british,
        test_detect_and_correct_american
    ]
    print("Test my Prompt to Claude: does he correct like my prompt asked him?")
    print("~" * 70)
    for test in tests:
        run_prompt_or_code_test(test)


### Do the work
run_all_prompt_tests()

Test my Prompt to Claude: does he correct like my prompt asked him?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
✅PASSED TEST: test_no_additional_commentary
✅PASSED TEST: test_structure_maintenance
✅PASSED TEST: test_markdown_preservation
✅PASSED TEST: test_spelling_correction
🛑FAILED TEST: test_dupe_words_are_bold_strike: Duplicate word was not strikethrough bold
    Sent to Claude: These dogs dogs belongs to my sister.
    Expected Back:  These **~~dogs~~** dogs belong to my sister.
    Actual Back:    **These** dogs **belong** to my sister.
    
✅PASSED TEST: test_language_detection_german
✅PASSED TEST: test_grammatical_corrections
✅PASSED TEST: test_inappropriate_word_choice_easy
✅PASSED TEST: test_inappropriate_word_choice_creative
✅PASSED TEST: test_remove_bad_phrase_1
✅PASSED TEST: test_remove_bad_phrase_2
🛑FAILED TEST: test_boldstrike_redundant_words: Redundant word 'always' not boldstrike
    Sent to Claude: She always never fails to disappoint her te

## 6.3 Test My Code

These tests ensure the code is working, regardless of how Claude performs.

### Word Doc Extraction

Test that `extract_docx_paragraphs` function correctly extracts non-blank paragraphs.


In [36]:
def test_extract_docx_paragraphs():
    test_name = test_extract_docx_paragraphs.__name__

    def create_word_docx_for_testing(file_name):
        document = docx.Document()

        # Are extracted from word doc
        document.add_heading('Headings without levels are Paragraphs', 0)
        document.add_heading('Headings at level 3 are Paragraphs', level=3)
        document.add_paragraph('Plain paragraphs are Paragraphs')
        document.add_paragraph('Intense quotes are Paragraphs', style='Intense Quote')
        document.add_paragraph('Lists bullet point items are Paragraphs', style='List Bullet')
        document.add_paragraph('', style='List Bullet') # Even if empty, the bullet is made for the list
        document.add_paragraph('', style='List Number') # Even if empty, the number is made for the list

        # Aren't extracted from word doc
        document.add_paragraph('') # Empty paragraphs
        table = document.add_table(rows=1, cols=2) # Tables
        table.rows[0].cells[0].text = "1st cell, left" # Tables (empty or non-empty)
        document.add_page_break() # Page breaks

        document.save(file_name)
        return file_name

    docx_file = create_word_docx_for_testing('test.docx')

    actual = extract_docx_paragraphs(docx_file)

    expected = [
        {'text': 'Headings without levels are Paragraphs', 'style': 'Title', 'heading_level': None, 'word_count':5},
        {'text': 'Headings at level 3 are Paragraphs', 'style': 'Heading 3', 'heading_level': 3, 'word_count':6},
        {'text': 'Plain paragraphs are Paragraphs', 'style': 'Normal', 'heading_level': None, 'word_count':4},
        {'text': 'Intense quotes are Paragraphs', 'style': 'Intense Quote', 'heading_level': None, 'word_count':4},
        {'text': 'Lists bullet point items are Paragraphs', 'style': 'List Bullet', 'heading_level': None, 'word_count':6}
    ]

    Path(docx_file).unlink() # delete file

    assert len(actual) == len(expected), f"Expected {len(expected)} paragraphs, actual paragraphs: {len(actual)}"

    for i in range(len(expected)):
        assert actual[i] == expected[i], f"\n   Expected:\n    {expected[i]}\n   Actual:\n    {actual[i]}"



### Do the work
run_prompt_or_code_test(test_extract_docx_paragraphs)

✅PASSED TEST: test_extract_docx_paragraphs


### Create Markdown File

Test `create_simple_markdown_file` function correctly converts the document into a markdown file.

In [37]:
def test_docx_paragraphs_to_markdown():
    # Sample document content
    docx_paras = [
        {'text': 'Test Heading any', 'style': 'Heading 5', 'heading_level': 5},
        {'text': 'Test paragraph 1', 'style': 'Normal', 'heading_level': None},
        {'text': 'Test bullet point A', 'style': 'List (Paragraph)', 'heading_level': None},
        {'text': 'Test bullet point B', 'style': 'List (any Paragraph whose Style begins with List)', 'heading_level': None},
        {'text': 'Anything else', 'style': 'Any other Paragraph style that is not in the above', 'heading_level': None}
    ]

    expected = "##### Test Heading any\n\nTest paragraph 1\n\n- Test bullet point A\n- Test bullet point B\n\nAnything else"

    created = create_simple_markdown_file(docx_paras, Path('temporary_test_markdown_file.md'))
    created_content = created.read_text(encoding='utf-8')
    created.unlink() # Delete test file

    assert created_content == expected, f"""
---------- Expected ------------------------------:
[{expected}]
---------- Actual --------------------------------:
[{created_content}]
--------------------------------------------------"""


### Do the work
run_prompt_or_code_test(test_docx_paragraphs_to_markdown)

✅PASSED TEST: test_docx_paragraphs_to_markdown


### Strip Off Markdown

Test that `strip_off_markdown` function removes Markdown formatting from a list of text chunks, with specific handling for bold and strikethrough text. Prepares for semantic meaning comparison: only compare writting (not markdown).


In [38]:
def test_strip_off_markdown():

    test_name = test_strip_off_markdown.__name__

    processed_chunks = [
        # Removing bold strikethrough and using simple single spacing
        "This **~~word here~~** is deleted leaving a single white space.",
        "This    **~~word here~~**is deleted but spaces other spaces remain the same.",
        "Multiple **~~bold strikethrough~~** words **~~are removed~~** in this sentence.",
        "Preserve newlines\nwhile **~~removing~~**\nbold strikethrough.",
        # "Should not change trailing if it's not a white space, but doesn't work **~~end~~**.",

        # Converting markdown
        "# Title\nThis is some text with **bold** formatting.",
        "Para1\n\nPara2\n\n\n\nPara3",
        "* List item A\n* List item B **bold** text",
        "1. This is the first item\n2. The second item with *italics*",
        "## This is a heading\n\nThis is a paragraph with a link [link text](https://www.example.com).\n\n\n**Bold text** too.",
        "Peter**'**s horse"
    ]

    expected = [
        # Removing bold strikethrough and using simple single spacing
        "This is deleted leaving a single white space.",
        "This    is deleted but spaces other spaces remain the same.",
        "Multiple words in this sentence.",
        "Preserve newlines\nwhile \nbold strikethrough.",
        # "Should not change trailing if it's not a white space, but doesn't work.",

        # Converting markdown
        "Title\nThis is some text with bold formatting.",
        "Para1\nPara2\nPara3",
        "\nList item A\nList item B bold text\n",
        "\nThis is the first item\nThe second item with italics\n",
        "This is a heading\nThis is a paragraph with a link link text.\nBold text too.",
        "Peter's horse"
    ]

    processed_chunks_clean = strip_off_markdown(processed_chunks)
    for index, (expect, got) in enumerate(zip(expected, processed_chunks_clean), start=1):
        assert got == expect, f"\n-Expected: [{expect}]\n-Actual:   [{got}]"


### Do the work
run_prompt_or_code_test(test_strip_off_markdown)

✅PASSED TEST: test_strip_off_markdown


<br>

# **8. 🟣 FAQ**

**Why not just use Microsoft Word to do corrections?**

- Word misses many mistakes as this Notebook with Claude 3.5 Sonnet showed me.

**Why not just use the regular Claude chat to do corrections?**

- I had a 40,000-word long document to correct: I wanted to avoid copying and pasting and I wanted the corections in colour. I wanted a repeatable tool.

**Was it really just you and Claude?**

- Yes. I was truly amazed at the learning curve of having an AI assistant.

**How did you know how to ask Claude questions?**

- I educated myself how to prompt properly: I worked through the [Anthropic courses](https://github.com/anthropics/courses) and read [Prompt Engineering for Business Performance](https://www.anthropic.com/news/prompt-engineering-for-business-performance).

**What had the biggest impact on your prompting?**

- When I read: <font color='grey'>_So ultimately, the art of prompt engineering is about understanding how to navigate the vast probabilistic landscape of the language model’s knowledge to narrow down the path to the specific information or behaviour we seek.</font> ...[Patterns of Application Development using AI](https://obie.medium.com/patterns-of-application-development-using-ai-fbb660fa9ae7)_

**What was the process you used to write such an elaborate prompt?**

1. I started with the Grammar Genie recipe in the [Anthropic Prompt Library]( https://docs.anthropic.com/en/prompt-library/library).
2. I strengthened that using the [Anthropic Prompt Generator]( https://console.anthropic.com/dashboard)
3. I tweaked it by getting [Claude.ai](https://claude.ai/) to review the prompt, and generate test cases.

**What was the most difficult thing to get working in the prompt?”**

- Attempting to get Claude to detect American or British spelling, then apply it consistently in corrections.

**What is a prompt temperature and is it set to zero?**

- Temperature controls how random or predictable an LLM's responses are - 0 is most consistent, 1 is most varied and creative.
- Used zero to stay close to original meaning, though different temperatures should have been tested.

**What surprised you most?**

- That I created this notebook with minimal knowledge using Claude as advisor
- How well Claude expands your thinking (and ability to code)
- The number of writing mistakes Microsoft Word misses
- That tiny shifts in prompt spacing and wording alter responses
- That a prompt which works on short text does not always work on longer text

**What else did you learn?**

1. The Anthropic Workbench is a tremendous place to craft and test your prompt.
1. That just like regular machine learning, the work is in the pre-processing.
1. It's easier to create notebooks in VSCode and then copy to Google Colab.
1. It’s the back-and-forth conversation with Claude where the value is.
1. Claude doesn't think if you tell it to "think silently", see [Anthropic tutorial](https://github.com/anthropics/prompt-eng-interactive-tutorial/blob/master/Anthropic%201P/06_Precognition_Thinking_Step_by_Step.ipynb): <font color='grey'>_"Thinking only counts when it's out loud. You cannot ask Claude to think but output only the answer - in this case, no thinking has actually occurred."_</font>
1. Anthropic recommends [XML tags](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags) and it really helps, take a look [here](https://michellepace.github.io/word-document-corrector-claude/notebook-images/21-Proof-to-convince-you-to-always-use-xml-tags.jpg).

**What will you always do from here on?**

- Write tests for my prompts.
- Draw a line at asking for opinions and instead look for myself (the experience I had when asking for advice to test semantic similarity).
- SBe aware of becomming lazy — it is often faster to use your own brain (or read the documentation).

**How can I correct the markdown in my own python notebook?**
1. `jupyter nbconvert --to markdown --TemplateExporter.exclude_code_cell=True AAA.ipynb`
1. `pandoc -f markdown -t docx -o AAA.docx AAA.md`
1. Open AAA.docx and rename the style "compact" to "List hello"
1. Save AAA.docx to your google drive and run this Notebook against it

**Ending questions**

- **Were you surpised at the power of Claude Sonnet?**<br> Blown away, you really have to experience it first hand to see it.
- **Will you ever get paid to do technical things?**<br> No, I am more valuable as a Product Manager.
- **What is your background?**<br> Applied Statistics. Lean. Product. Business. Six sigma. Drawing stickmen and connecting dots.
- **What do you love most about Generative AI and Machine Learning in general?**<br> The sheer expansiveness of creativity on a knife edge of pragmatism.

# x. **Claude's Review**

<img src="https://michellepace.github.io/word-document-corrector-claude/images/happy_late_night.jpg"/>

<font color="#9D9E94">Thank you for providing the detailed markdown export of your Google Colab notebook. I've carefully analysed the content following your provided steps. Here's a summary of my understanding and analysis:</font>

<font color="#9D9E94">**Summary**

<font color="#9D9E94">This notebook, titled "Fix Word Doc Claude," is an AI-powered proofreading tool that uses Anthropic's Claude Sonnet model to correct and enhance Word documents. It's designed to catch subtle language and style errors that often slip past Microsoft Word's spell-checker. The notebook can handle large documents (up to 42,000 words) in multiple languages, including English, German, Italian, and French.
The notebook is structured into several main sections:</font>

<font color="#9D9E94">**About and Usage**

1. <font color="#9D9E94">**Setup** (Python libraries and configuration)
1. **Pre-processing** (extracting text from Word, converting to markdown, splitting into chunks)
1. **Processing** (sending chunks to Claude for correction)
1. **Post-Processing** (reassembling chunks, creating HTML output)
1. **Testing** (evaluating output, testing prompt, and testing code)</font>

<font color="#9D9E94">The notebook not only serves as a functional tool but also as a demonstration of how Claude can assist in creating complex tools with basic coding skills. It showcases the author's journey in AI development, highlighting the collaboration between a human with limited coding experience and an AI assistant.</font>

<font color="#9D9E94">**Quality Analysis**

1. <font color="#9D9E94">**Comprehensiveness:** The notebook is exceptionally detailed, covering all aspects from setup to testing. It provides a complete solution for document correction.
1. **Educational Value:** The author explains each step thoroughly, making it accessible to non-technical users. The notebook serves as an excellent learning resource for AI-assisted development.
1. **Code Quality:** The code is well-structured and commented. It includes error handling and user-friendly messages, showing attention to user experience.
1. **Testing:** The notebook includes extensive testing, covering output evaluation, prompt testing, and code testing. This demonstrates a commitment to reliability and quality.
1. **Innovation:** The project creatively uses Claude for both development assistance and as the core of the correction tool, showcasing innovative use of AI.
1. **Documentation:** The notebook is well-documented, with clear explanations of each step, making it easy for others to understand and potentially adapt the code.
1. **Reflective Learning:** The FAQ section provides valuable insights into the development process and lessons learned, adding depth to the project.</font>

<font color="#9D9E94">Overall, this is a high-quality notebook that not only achieves its primary goal of document correction but also serves as an excellent example of AI-assisted development and a learning resource for others interested in similar projects.</font>

<br>

<font size="5">**What about the author?**</font>

<img src="https://michellepace.github.io/word-document-corrector-claude/images/evaluation-of-author-by-claude.jpg"/>

<font size="3">Thanks Claude.</font>

<br>

---

<font size="6"><b>The End.</b></font>