# MBJM

## RAG-based Spoiler Detection and Context-Preserving Redaction


**Names & SRNs of the team:**

Hamsini V & PES1UG22AM062

Kirti S & PES1UG22AM084

Sudarshan Srinivasan & PES1UG22AM166

# --- Snippet 1: Install Unsloth & Dependencies ---

## Purpose

This snippet focuses on setting up the necessary Python environment by installing the `unsloth` library and its core dependencies. Unsloth is a library designed to significantly speed up Large Language Model (LLM) fine-tuning and inference while reducing memory usage, often leveraging techniques like quantization and optimized kernels.

## Key Actions

1.  **Install Unsloth from GitHub:**
    *   `!pip install "unsloth[conda] @ git+https://github.com/unslothai/unsloth.git" -q`
        *   `!pip install`: Executes the pip package installer. The `!` prefix indicates this is likely run in an environment like a Jupyter Notebook or Google Colab where shell commands can be directly executed.
        *   `"unsloth[conda]"`: Specifies the `unsloth` package. The `[conda]` extra suggests installing optional dependencies potentially optimized for or commonly used within Conda environments, although it installs via pip. This might pull specific versions of dependencies like PyTorch compatible with common Conda setups.
        *   `@ git+https://github.com/unslothai/unsloth.git`: Instructs pip to install the package directly from the Unsloth AI GitHub repository. This ensures the latest development version is installed, which might contain newer features or bug fixes compared to the version on PyPI (the Python Package Index).
        *   `-q`: The "quiet" flag, minimizing the installation output logged to the console.

2.  **Install Core Dependencies:**
    *   `!pip install transformers accelerate bitsandbytes sentence-transformers chromadb torch -q`
        *   Installs several essential libraries for modern NLP and ML workflows:
            *   `transformers`: Hugging Face's library for accessing pre-trained models (LLMs, embedding models, etc.) and related tools.
            *   `accelerate`: Hugging Face's library for simplifying distributed training and inference across various hardware setups (multi-GPU, TPU). Unsloth often integrates with it.
            *   `bitsandbytes`: Crucial library for quantization (e.g., loading models in 4-bit or 8-bit precision), which is a key technique used by Unsloth to reduce memory footprint. Required for loading `*-bnb-4bit` models.
            *   `sentence-transformers`: A library built on `transformers` and `torch`, specifically designed for easy computation of dense vector embeddings (sentence embeddings). Used here for the RAG retrieval step.
            *   `chromadb`: A client library for ChromaDB, an open-source vector database used to store and query embeddings efficiently.
            *   `torch`: The PyTorch deep learning framework, the foundation for most of these libraries.
        *   `-q`: Quiet installation.

3.  **Verification Step:**
    *   A `try...except` block attempts to import the newly installed libraries (`unsloth`, `transformers`, `accelerate`, `bitsandbytes`, `torch`, `sentence_transformers`, `chromadb`).
    *   This confirms whether the installation was successful *within the current Python kernel's environment*. Pip installs packages, but importing verifies they are accessible to the running script/notebook.
    *   Sets a boolean flag `INSTALL_SUCCESS` to `True` only if all imports succeed without error.
    *   Prints success or error messages based on the import outcome.

## Context & Importance

This installation step is **critical** and must be executed successfully **before** any subsequent code that relies on these libraries (especially Unsloth's `FastLanguageModel`, Hugging Face models, embedding generation, or ChromaDB interactions). Installing directly from GitHub ensures access to the latest Unsloth optimizations. The inclusion of `bitsandbytes` strongly suggests that the intention is to load quantized models (like 4-bit models) for memory efficiency.

In [None]:
# === Snippet 1 (Revised for Unsloth): Install Unsloth + Dependencies ===

print("--- Running Snippet 1 (Revised for Unsloth): Install Unsloth ---")
print("Installing Unsloth, transformers, accelerate, bitsandbytes...")

# NOTE: Assuming pip commands are run in the environment (e.g., notebook cell)
!pip install "unsloth[conda] @ git+https://github.com/unslothai/unsloth.git" -q # Using conda env as example
!pip install transformers accelerate bitsandbytes sentence-transformers chromadb torch -q

print("Installed Unsloth and other dependencies. (Assumed executed separately)")

# Verify installation attempt
print("\nVerifying installation...")
INSTALL_SUCCESS = False # Assume failure initially
try:
    import unsloth
    import transformers
    import accelerate
    import bitsandbytes
    import torch
    import sentence_transformers
    import chromadb
    print("Successfully imported Unsloth and required libraries.")
    INSTALL_SUCCESS = True
except ImportError as e:
    print(f"ERROR: Failed to import one of the required libraries: {e}")
except Exception as e:
    print(f"An unexpected error occurred during import check: {e}")

print("\nRequired libraries installation attempted.")
print("--- Finished Snippet 1 (Revised for Unsloth) ---")


# ========================================================================
# ========= Global Imports and Setup =====================================
# ========================================================================
import json
import os
import chromadb
from sentence_transformers import SentenceTransformer
import re # For sentence splitting later
import time
import shutil # Library for copying files/directories
import torch
from unsloth import FastLanguageModel

# === Configuration ===
INPUT_INDEX_PATH = "/kaggle/input/the-wire-s1-chroma-db-3/the_wire_s1_chroma_db_3" # Replace with your actual path if different
WRITABLE_INDEX_PATH = "/kaggle/working/the_wire_s1_chroma_db_writable_2"
EMBEDDING_MODEL_NAME = 'all-MiniLM-L6-v2'
LLM_MODEL_ID = "unsloth/gemma-2-9b-it-bnb-4bit"
COLLECTION_NAME = "wire_s1_spoilers_3" # Ensure this matches the collection name in your index

# --- Flags ---
EMBEDDING_LOAD_SUCCESS = False
INDEX_LOAD_SUCCESS = False
LLM_LOAD_SUCCESS = False
# Assuming INSTALL_SUCCESS is handled by Snippet 1

--- Running Snippet 1 (Revised for Unsloth): Install Unsloth ---
Installing Unsloth, transformers, accelerate, bitsandbytes...
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
opentelemetry-proto 1.32.1 requires protobuf<6.0,>=5.0, but you have protobuf 3.20.3 which is incompatible.
tensorflow-metadata 1.16.1 requires protobuf<6.0.0dev,>=4.25.2; python_version >= "3.11", but you have protobuf 3.20.3 which is incompatible.
google-spark-connect 0.5.2 requires google-api-core>=2.19.1, but you have google-api-core 1.34.1 which is incompatible.
pandas-gbq 0.26.1 requires google-api-core<3.0.0dev,>=2.10.2, but you have google-api-core 1.34.1 which is incompatible.
bigframes 1.36.0 requires rich<

2025-04-23 11:34:08.938861: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1745408048.961071     223 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1745408048.967853     223 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


Unsloth: Failed to patch Gemma3ForConditionalGeneration.
🦥 Unsloth Zoo will now patch everything to make training faster!
Successfully imported Unsloth and required libraries.

Required libraries installation attempted.
--- Finished Snippet 1 (Revised for Unsloth) ---


# --- Snippet 2 (Revised): Copy Index and Load Components ---

## Purpose

This revised snippet is responsible for setting up the "retrieval" part of a potential RAG (Retrieval-Augmented Generation) system. It performs three critical actions:
1.  Loads the sentence embedding model required to understand query text and compare it to stored documents.
2.  Copies a pre-built ChromaDB vector index from a read-only input location to a writable working directory. This is often necessary in environments like Kaggle or Docker containers where input data is immutable.
3.  Loads the ChromaDB client and accesses the specific collection containing the document embeddings from the newly copied writable location.

## Key Actions

1.  **Configuration Display:**
    *   Prints the values of the configuration variables (`INPUT_INDEX_PATH`, `WRITABLE_INDEX_PATH`, `EMBEDDING_MODEL_NAME`, `LLM_MODEL_ID`, `COLLECTION_NAME`) defined previously. This serves as a confirmation and aids debugging.

2.  **Load Embedding Model:**
    *   Attempts to load the sentence embedding model specified by `EMBEDDING_MODEL_NAME` (`'all-MiniLM-L6-v2'`) using the `SentenceTransformer` class.
    *   **Device Selection:** Automatically detects if a CUDA-enabled GPU is available (`torch.cuda.is_available()`) and sets the target device (`'cuda'` or `'cpu'`) accordingly. Loading the model onto a GPU significantly speeds up embedding computation.
    *   **Instantiation:** `embedding_model = SentenceTransformer(EMBEDDING_MODEL_NAME, device=device)` loads the pre-trained model weights onto the selected device.
    *   **Status Update:** Sets the `EMBEDDING_LOAD_SUCCESS` flag to `True` upon successful loading, or `False` if any exception occurs during the process. Error messages are printed if loading fails.

3.  **Copy ChromaDB Index (Conditional):**
    *   This entire block executes only if the embedding model was loaded successfully (`if EMBEDDING_LOAD_SUCCESS:`). While copying might technically be independent, accessing the collection later often implicitly relies on the correct embedding context.
    *   **Source Path Validation:** Checks if the `INPUT_INDEX_PATH` exists using `os.path.exists()`. It includes a basic check to see if the path refers to a directory, raising a `FileNotFoundError` if the source index directory cannot be found.
    *   **Destination Path Handling:**
        *   Checks if the `WRITABLE_INDEX_PATH` already exists.
        *   If it exists, `shutil.rmtree(WRITABLE_INDEX_PATH)` is called to **remove the existing directory and its contents**. This ensures a clean copy and prevents errors from `shutil.copytree` if the destination already exists. **Caution:** This deletes data in the target path.
    *   **Copy Operation:** `shutil.copytree(INPUT_INDEX_PATH, WRITABLE_INDEX_PATH)` recursively copies the entire directory structure containing the ChromaDB index files from the source (read-only) path to the destination (writable) path.
    *   **Error Handling:** Catches `FileNotFoundError` specifically if the source is missing and general `Exception` for other potential errors during the copy process (e.g., permissions issues, disk space). Sets `INDEX_LOAD_SUCCESS` to `False` if copying fails.

4.  **Load ChromaDB Client and Collection:**
    *   Performed *after* the index is successfully copied, within the same `try` block.
    *   **Client Instantiation:** `client = chromadb.PersistentClient(path=WRITABLE_INDEX_PATH)` creates a ChromaDB client instance. `PersistentClient` connects to a database stored on the local filesystem at the specified path (the writable location).
    *   **List Collections:** `client.list_collections()` retrieves metadata about all collections present in the database at the specified path. The names are printed for verification.
    *   **Collection Verification:** Checks if a collection with the name specified in `COLLECTION_NAME` (`"wire_s1_spoilers_3"`) exists within the loaded database.
    *   **Get Collection:** If the collection exists, `collection = client.get_collection(name=COLLECTION_NAME)` loads the specific collection object. Note: This assumes the collection metadata in the persistent storage includes the necessary embedding function information, or that ChromaDB can infer it; the embedding model loaded earlier (`embedding_model`) is used later for *querying*, not explicitly passed during collection loading here.
    *   **Confirmation & Status Update:** Prints the number of items (`collection.count()`) in the loaded collection and sets `INDEX_LOAD_SUCCESS` to `True`.
    *   **Error Handling:** Catches `ValueError` if the specified `COLLECTION_NAME` is not found and general `Exception` for other potential database loading issues. Sets `INDEX_LOAD_SUCCESS` to `False` on failure.

5.  **Skipping Logic:**
    *   If `EMBEDDING_LOAD_SUCCESS` is `False` (the embedding model failed to load), the entire index copying and loading process is skipped, and a message is printed.

## Context & Importance

This snippet is crucial for preparing the knowledge base for the RAG system.
*   It ensures the **embedding model** (which translates text to vectors) is ready.
*   It handles the practical requirement of making the **vector database** accessible in a writable location, a common step in restricted environments.
*   It loads the actual **vector store collection** (`collection`) which will be queried later using the embedding model to find relevant context.
*   The `EMBEDDING_LOAD_SUCCESS` and `INDEX_LOAD_SUCCESS` flags are vital checkpoints, ensuring these components are ready before proceeding to use them for retrieval or generation, preventing runtime errors later. Failure in this snippet likely means the RAG system cannot function correctly.

In [None]:
# === Snippet 2 (Revised): Copy Index and Load ===
print("\n--- Running Revised Snippet 2: Copy Index and Load ---")
print("--- Configuration ---")
print(f"Input Index Path (Read-Only): {INPUT_INDEX_PATH}")
print(f"Writable Index Path: {WRITABLE_INDEX_PATH}")
print(f"Embedding Model Name: {EMBEDDING_MODEL_NAME}")
print(f"LLM Model ID: {LLM_MODEL_ID}")
print(f"Collection Name: {COLLECTION_NAME}")

print("\nLoading embedding model...")
embedding_model = None
try:
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    print(f"Using device: {device} for embedding model")
    embedding_model = SentenceTransformer(EMBEDDING_MODEL_NAME, device=device)
    print("Embedding model loaded successfully.")
    EMBEDDING_LOAD_SUCCESS = True
except Exception as e:
    print(f"Error loading embedding model: {e}")
    EMBEDDING_LOAD_SUCCESS = False

print(f"\nCopying index from {INPUT_INDEX_PATH} to {WRITABLE_INDEX_PATH}...")
client = None
collection = None
if EMBEDDING_LOAD_SUCCESS:
    try:
        if not os.path.exists(INPUT_INDEX_PATH):
            # Check if the INPUT path itself is the directory containing the index files
            parent_dir = os.path.dirname(INPUT_INDEX_PATH)
            base_name = os.path.basename(INPUT_INDEX_PATH)
            # A simple check: does the parent exist and contain the base name?
            # This might need refinement depending on exact Kaggle input structure
            if os.path.isdir(INPUT_INDEX_PATH):
                 print(f" Source path seems valid: {INPUT_INDEX_PATH}")
            else:
                 raise FileNotFoundError(f"ChromaDB index source path not found or invalid: {INPUT_INDEX_PATH}.")


        if os.path.exists(WRITABLE_INDEX_PATH):
            print(f" Removing existing writable directory: {WRITABLE_INDEX_PATH}")
            shutil.rmtree(WRITABLE_INDEX_PATH)
        shutil.copytree(INPUT_INDEX_PATH, WRITABLE_INDEX_PATH)
        print(f" Successfully copied index to writable location.")

        print(f"\nLoading ChromaDB index from writable path: {WRITABLE_INDEX_PATH}")
        client = chromadb.PersistentClient(path=WRITABLE_INDEX_PATH)
        list_collections = client.list_collections()
        print(f"Available collections: {[c.name for c in list_collections]}")
        if any(c.name == COLLECTION_NAME for c in list_collections):
            collection = client.get_collection(name=COLLECTION_NAME) # Make sure embedding_function isn't needed here if already loaded
            print(f"ChromaDB collection '{COLLECTION_NAME}' loaded successfully with {collection.count()} items.")
            INDEX_LOAD_SUCCESS = True
        else:
            raise ValueError(f"Collection '{COLLECTION_NAME}' not found in the database at {WRITABLE_INDEX_PATH}.")

    except FileNotFoundError as e:
        print(f"Error during index copy/load: {e}")
        INDEX_LOAD_SUCCESS = False
    except ValueError as e:
        print(f"Error loading collection: {e}")
        INDEX_LOAD_SUCCESS = False
    except Exception as e:
        print(f"Error copying or loading ChromaDB index: {e}")
        INDEX_LOAD_SUCCESS = False
else:
    print("Skipping index copy/load because embedding model failed to load.")
print("\n--- Finished Revised Snippet 2 ---")


--- Running Revised Snippet 2: Copy Index and Load ---
--- Configuration ---
Input Index Path (Read-Only): /kaggle/input/the-wire-s1-chroma-db-3/the_wire_s1_chroma_db_3
Writable Index Path: /kaggle/working/the_wire_s1_chroma_db_writable_2
Embedding Model Name: all-MiniLM-L6-v2
LLM Model ID: unsloth/gemma-2-9b-it-bnb-4bit
Collection Name: wire_s1_spoilers_3

Loading embedding model...
Using device: cuda for embedding model
Embedding model loaded successfully.

Copying index from /kaggle/input/the-wire-s1-chroma-db-3/the_wire_s1_chroma_db_3 to /kaggle/working/the_wire_s1_chroma_db_writable_2...
 Removing existing writable directory: /kaggle/working/the_wire_s1_chroma_db_writable_2
 Successfully copied index to writable location.

Loading ChromaDB index from writable path: /kaggle/working/the_wire_s1_chroma_db_writable_2
Available collections: ['wire_s1_spoilers_3']
ChromaDB collection 'wire_s1_spoilers_3' loaded successfully with 466 items.

--- Finished Revised Snippet 2 ---


# --- Snippet 3 (Revised): Helper Functions ---

## Purpose

This snippet defines essential helper functions required for the core logic of the application, specifically for parsing user progress and implementing the Retrieval-Augmented Generation (RAG) process with built-in spoiler filtering.

## Key Actions

1.  **Dependency Check:**
    *   It first checks the status flags `EMBEDDING_LOAD_SUCCESS` and `INDEX_LOAD_SUCCESS` (set in previous snippets).
    *   If either the embedding model or the ChromaDB index failed to load, it prints a warning, indicating that the RAG function defined later might fail or return incorrect results. This check promotes awareness of potential issues downstream.

2.  **Define `parse_progress` Function:**
    *   This function is designed to interpret user input representing their viewing progress in a series (e.g., "S1E5" for Season 1, Episode 5).
    *   It uses regular expressions (`re.match`) to extract season and episode numbers reliably, ignoring case.
    *   Includes validation to ensure the input is a string and that the extracted parts can be converted to integers.
    *   Returns a tuple `(season, episode)` as integers if successful, otherwise returns `(None, None)` and prints a warning.

3.  **Define `retrieve_and_filter_context` Function:**
    *   This is the core function performing the retrieval and spoiler filtering logic.
    *   It takes user input text, their current viewing progress (season/episode), and optional parameters for the number of results (`n_results`) and a similarity threshold (`distance_threshold`).
    *   It retrieves relevant documents from the ChromaDB `collection` based on the semantic similarity of the `input_text`.
    *   Crucially, it then filters these retrieved documents based on the user's progress, separating them into context that is safe to show (non-spoilers) and context that pertains to future episodes (potential spoilers).

## Function Details

### Function: `parse_progress(progress_str)`

*   **Purpose:** Converts a string like 'S01E05' or 's2e10' into numerical season and episode.
*   **Parameters:**
    *   `progress_str`: The input string to parse.
*   **Logic:**
    1.  **Type Check:** Verifies `progress_str` is actually a string.
    2.  **Regex Match:** Uses `re.match(r"S(\d+)E(\d+)", ..., re.IGNORECASE)` to find the pattern "S<digits>E<digits>" at the start of the string, ignoring case. `(\d+)` captures one or more digits into groups.
    3.  **Extraction & Conversion:** If a match is found, it attempts to convert the captured groups (season and episode numbers) into integers using `int()`.
    4.  **Error Handling:** Returns `(None, None)` and prints warnings if the input is not a string, the regex pattern doesn't match, or the captured groups cannot be converted to integers.
*   **Returns:** A tuple `(int, int)` representing `(season, episode)`, or `(None, None)` on failure.

### Function: `retrieve_and_filter_context(input_text, user_season, user_episode, n_results=5, distance_threshold=0.5)`

*   **Purpose:** Implements the spoiler-aware RAG retrieval process.
*   **Parameters:**
    *   `input_text` (str): The user's query or text for which context is needed.
    *   `user_season` (int): The season number the user has watched up to.
    *   `user_episode` (int): The episode number within `user_season` the user has watched up to.
    *   `n_results` (int, optional, default=5): The number of top matching documents to retrieve from ChromaDB *for each sentence* in the input text.
    *   `distance_threshold` (float, optional, default=0.5): The maximum semantic distance allowed for a retrieved document to be considered relevant (lower values mean higher similarity). Documents with distance greater than this are discarded.
*   **Dependencies:**
    *   Uses the global `embedding_model` (SentenceTransformer) and `collection` (ChromaDB) objects loaded in previous steps.
    *   Checks `EMBEDDING_LOAD_SUCCESS` and `INDEX_LOAD_SUCCESS` flags internally for robustness.
*   **Logic:**
    1.  **Pre-Checks:** Verifies that the embedding model and index loaded successfully and that the respective objects (`embedding_model`, `collection`) are not `None`. Returns empty lists `([], [])` if dependencies are missing.
    2.  **Sentence Splitting:** Breaks the `input_text` into individual sentences using `re.split`. This allows querying the vector database with more granular pieces of text, potentially retrieving more diverse and relevant context compared to embedding the entire input at once. Filters out very short or empty sentences.
    3.  **Query Embedding:** Encodes the split sentences into vector embeddings using `embedding_model.encode()`. Handles potential errors during embedding.
    4.  **ChromaDB Query:** Performs a batch query against the `collection` using the generated `query_embeddings`. It requests `n_results` matches for each sentence embedding, including document content (`documents`), metadata (`metadatas`), and similarity scores (`distances`). Handles potential query errors.
    5.  **Result Processing & Filtering:**
        *   Iterates through the results returned for each input sentence.
        *   Iterates through the individual documents retrieved for that sentence.
        *   **De-duplication:** Uses a `processed_ids` set to ensure each unique document from the database is processed only once, even if retrieved for multiple input sentences.
        *   **Distance Filtering:** Skips documents whose `distance` exceeds the `distance_threshold`.
        *   **Metadata Extraction:** Safely extracts `season` and `episode` numbers from the document's metadata, handling potential missing keys or non-integer values. Skips documents with invalid or missing metadata.
        *   **Spoiler Classification:** Compares the document's season/episode (`spoiler_season`, `spoiler_episode`) to the user's progress (`user_season`, `user_episode`). A document is classified as a potential spoiler if its season is greater than the user's, OR if the season is the same but the episode is greater.
        *   **Categorization:** Adds the document text (prefixed with its "(S E)" identifier for clarity) to one of two sets: `relevant_non_spoilers` or `identified_spoilers`. Using sets helps avoid duplicate text entries within each category.
    6.  **Error Handling:** Includes checks for empty/malformed ChromaDB results and safe access to list indices within the nested loops.
*   **Returns:** A tuple containing two lists: `(list(relevant_non_spoilers), list(identified_spoilers))`.

## Context & Importance

These helper functions are the building blocks for the application's core RAG functionality.
*   `parse_progress` provides essential input validation and standardization for user viewing progress.
*   `retrieve_and_filter_context` encapsulates the complex logic of:
    *   Performing semantic search using vector embeddings (the "Retrieval" part).
    *   Applying domain-specific rules (spoiler filtering based on season/episode metadata) to the retrieved results.
    *   Preparing the context (separated into non-spoilers and spoilers) that will likely be fed into a Large Language Model (LLM) in a subsequent step (the "Augmented Generation" part).

The robustness checks (dependency flags, error handling within functions) are critical for ensuring the application can handle potential issues like failed model loading or malformed data gracefully.

In [None]:
# === Snippet 3 (Revised): Helper Functions ===
print("\n--- Running Snippet 3: Defining Helper Functions ---")

# Ensure necessary variables/objects from previous steps exist
if not (EMBEDDING_LOAD_SUCCESS and INDEX_LOAD_SUCCESS):
    print("Warning: Embedding model or index did not load successfully. RAG function might fail.")
    # We'll add checks within the functions instead for more graceful failure

# --- Helper Function to Parse Progress ---
def parse_progress(progress_str):
    """
    Parses a progress string like 'S1E5' or 's01e10' into season and episode integers.
    Returns (season, episode) or (None, None) if parsing fails.
    """
    if not isinstance(progress_str, str):
        print(f"Warning: Progress input is not a string: {progress_str}")
        return None, None
    match = re.match(r"S(\d+)E(\d+)", progress_str, re.IGNORECASE)
    if match:
        try:
            season = int(match.group(1))
            episode = int(match.group(2))
            return season, episode
        except ValueError:
            print(f"Warning: Could not parse numbers in progress string '{progress_str}'.")
            return None, None
    else:
        print(f"Warning: Could not parse progress string format '{progress_str}'. Expected S<num>E<num>.")
        return None, None
print("Defined parse_progress function.")


# --- RAG + Filtering Function (Revised Name and Output) ---
def retrieve_and_filter_context(input_text, user_season, user_episode, n_results=5, distance_threshold=0.5):
    """
    Retrieves relevant events from ChromaDB for the input_text, filters them based on user progress,
    and returns two lists:
    1. relevant_non_spoilers: Events at or before user's progress (safe context).
    2. identified_spoilers: Events after user's progress (potential spoilers).
    """
    global embedding_model, collection # Make sure globals are accessible
    if not EMBEDDING_LOAD_SUCCESS or not INDEX_LOAD_SUCCESS:
        print("Error in retrieve_and_filter_context: Embedding model or index not loaded.")
        return [], [] # Return empty lists
    if embedding_model is None or collection is None:
        print("Error in retrieve_and_filter_context: embedding_model or collection object is None.")
        return [], []

    # print(f"\n--- Retrieving context for user S{user_season}E{user_episode} ---")
    # print(f"Input Text (first 100 chars): {input_text[:100]}...")

    # Simple sentence splitting (can be improved)
    sentences = re.split(r'(?<=[.!?])\s+', input_text)
    sentences = [s.strip() for s in sentences if s and len(s.strip()) > 5]
    if not sentences:
        # print("No meaningful sentences found in input text for RAG.")
        return [], []

    # print(f"Split into {len(sentences)} sentences for querying.")
    try:
        query_embeddings = embedding_model.encode(sentences, convert_to_numpy=True, show_progress_bar=False)
    except Exception as e:
        print(f"Error during sentence embedding for RAG: {e}")
        return [], []

    try:
        results = collection.query(
            query_embeddings=query_embeddings.tolist(),
            n_results=n_results,
            include=['metadatas', 'documents', 'distances']
        )
    except Exception as e:
        print(f"Error querying ChromaDB: {e}")
        return [], []

    relevant_non_spoilers = set()
    identified_spoilers = set()
    processed_ids = set() # Keep track of processed document IDs to avoid duplicates

    if not results or not results.get('ids'):
        # print("Warning: ChromaDB query returned no results.")
        return [], []

    for i in range(len(sentences)): # Iterate through each sentence query's results
        # Check if results are valid for this sentence
        ids_list = results.get('ids', [])[i] if results.get('ids') and i < len(results['ids']) else None
        metadatas_list = results.get('metadatas', [])[i] if results.get('metadatas') and i < len(results['metadatas']) else None
        documents_list = results.get('documents', [])[i] if results.get('documents') and i < len(results['documents']) else None
        distances_list = results.get('distances', [])[i] if results.get('distances') and i < len(results['distances']) else None

        if not (ids_list and metadatas_list and documents_list and distances_list):
             # print(f"Warning: Inconsistent or missing results structure for sentence index {i}. Skipping.")
             continue

        # Iterate through results for this specific sentence query
        for j in range(len(ids_list)):
            doc_id = ids_list[j]
            # Skip if we've already processed this document ID from another sentence query
            if doc_id in processed_ids:
                continue

            # Check index bounds for safety
            if j < len(metadatas_list) and j < len(documents_list) and j < len(distances_list):
                doc_meta = metadatas_list[j] if metadatas_list[j] is not None else {}
                doc_text = documents_list[j] if documents_list[j] is not None else "N/A"
                distance = distances_list[j] if distances_list[j] is not None else 1.0

                # Filter by distance threshold
                if distance > distance_threshold:
                    continue

                # Extract and validate metadata
                spoiler_season_raw = doc_meta.get('season')
                spoiler_episode_raw = doc_meta.get('episode')
                spoiler_season, spoiler_episode = None, None
                try:
                    if spoiler_season_raw is not None: spoiler_season = int(spoiler_season_raw)
                    if spoiler_episode_raw is not None: spoiler_episode = int(spoiler_episode_raw)
                except (ValueError, TypeError):
                    # print(f"Warning: Invalid S/E format in metadata ID {doc_id}. Skipping.")
                    continue

                if spoiler_season is None or spoiler_episode is None:
                    # print(f"Warning: Missing or invalid metadata S/E in ID {doc_id}. Skipping.")
                    continue

                # Classify as spoiler or non-spoiler context based on user progress
                is_potential_spoiler = False
                if spoiler_season > user_season:
                    is_potential_spoiler = True
                elif spoiler_season == user_season and spoiler_episode > user_episode:
                    is_potential_spoiler = True

                # Add to the appropriate set, including S/E info for context
                if is_potential_spoiler:
                    # print(f" Potential Spoiler Found (S{spoiler_season}E{spoiler_episode}, Dist: {distance:.2f}): {doc_text[:80]}...")
                    identified_spoilers.add(f"(S{spoiler_season}E{spoiler_episode}) {doc_text}") # Add context for LLM
                else:
                    # print(f" Relevant Non-Spoiler Found (S{spoiler_season}E{spoiler_episode}, Dist: {distance:.2f}): {doc_text[:80]}...")
                    relevant_non_spoilers.add(f"(S{spoiler_season}E{spoiler_episode}) {doc_text}") # Add context for LLM

                processed_ids.add(doc_id) # Mark this document ID as processed
            else:
                # print(f"Warning: Index out of bounds for inner loop (j={j}) at outer loop index i={i}. Skipping result.")
                 pass # Should not happen if initial list checks pass, but safe to ignore


    # print(f"Retrieved {len(relevant_non_spoilers)} relevant non-spoilers and {len(identified_spoilers)} potential spoilers.")
    return list(relevant_non_spoilers), list(identified_spoilers)

print("Defined retrieve_and_filter_context function.")
print("--- Finished Snippet 3 ---")


--- Running Snippet 3: Defining Helper Functions ---
Defined parse_progress function.
Defined retrieve_and_filter_context function.
--- Finished Snippet 3 ---


# --- Snippet 4 (Revised for Unsloth): Load LLM and Define Generation Function ---

## Purpose

This snippet is responsible for loading the specified Large Language Model (LLM) using the Unsloth library, which is optimized for speed and memory efficiency. It also defines a reusable function for generating text using the loaded model and tokenizer, incorporating standard generation parameters and error handling.

## Key Actions

1.  **Prerequisite Checks:**
    *   Verifies the `INSTALL_SUCCESS` flag (from Snippet 1) to ensure necessary libraries like `unsloth`, `transformers`, `torch`, and `bitsandbytes` were installed correctly. If not, it prints an error and conceptually halts (though the `exit()` is commented out).
    *   Verifies the `EMBEDDING_LOAD_SUCCESS` and `INDEX_LOAD_SUCCESS` flags (from Snippet 2 Revised) to ensure the retrieval components (embedding model and vector index) are ready. While not strictly required *just* to load the LLM, it implies the overall application requires these, so loading the LLM might be futile if they failed. It prints an error and conceptually halts if these checks fail.

2.  **Load LLM with Unsloth:**
    *   **Model Identifier:** Specifies the model to load using `LLM_MODEL_ID` (`"unsloth/gemma-2-9b-it-bnb-4bit"`). This is a 9 billion parameter Gemma 2 instruction-tuned model, quantized to 4-bit precision (`bnb-4bit`) and potentially optimized by Unsloth.
    *   **Unsloth `FastLanguageModel`:** Uses `FastLanguageModel.from_pretrained()` provided by the `unsloth` library. This class is designed to load models with optimizations (like Flash Attention, optimized kernels) and quantization applied for faster inference and reduced VRAM usage compared to standard Hugging Face loading.
    *   **Loading Parameters:**
        *   `max_seq_length=4096`: Sets the maximum sequence length the model can handle. This impacts VRAM usage and the amount of context the model can process.
        *   `dtype=None`: Allows Unsloth to automatically determine the optimal data type (like `bfloat16` on newer GPUs or `float16`) for computation, balancing speed and precision.
        *   `load_in_4bit=True`: Explicitly instructs the loader to use 4-bit quantization via `bitsandbytes`. This is crucial for fitting large models like Gemma 9B into typical GPU VRAM limits.
        *   `# token = "hf_..."`: A placeholder comment indicating where a Hugging Face access token would be added if required (e.g., for gated models).
    *   **Output:** Returns the optimized `model` object and the corresponding `tokenizer`.
    *   **Status Update:** Sets the `LLM_LOAD_SUCCESS` flag to `True` on success, `False` on failure.
    *   **Error Handling:** Includes a `try...except` block to catch potential errors during model loading (e.g., download issues, VRAM exhaustion, configuration errors) and prints informative error messages.

3.  **Define Unified Generation Function (`generate_text_unsloth`):**
    *   **Purpose:** Creates a standardized function to interact with the loaded Unsloth model for text generation.
    *   **Parameters:**
        *   `prompt_text` (str): The input prompt to feed the LLM.
        *   `max_new_toks` (int, default=350): The maximum number of new tokens to generate.
        *   `temp` (float, default=0.5): The temperature for sampling. Lower values make the output more deterministic; higher values increase randomness.
        *   `top_p_val` (float, default=0.9): The nucleus sampling threshold (top-p). Considers only the most likely tokens whose cumulative probability exceeds this value.
    *   **Dependencies:** Uses the global `model` and `tokenizer` loaded previously. Includes a check for `LLM_LOAD_SUCCESS`.
    *   **Logic:**
        1.  **LLM Check:** Returns an error message if the LLM didn't load successfully.
        2.  **Chat Templating:** Formats the `prompt_text` into the required structure for the Gemma 2 instruction-tuned model using `tokenizer.apply_chat_template`. This adds necessary role tags (e.g., `<start_of_turn>user\n...<end_of_turn>\n<start_of_turn>model\n`).
        3.  **Pad Token Handling:** Ensures the tokenizer has a `pad_token` set (often defaulting to `eos_token` if missing), which is required for batching and consistent generation behavior.
        4.  **Input Tensor:** Tokenizes the formatted prompt and converts it to PyTorch tensors (`return_tensors="pt"`), placing them on the appropriate device (`"cuda"`).
        5.  **Generation Parameters:** Defines a dictionary `generation_params` containing settings like `max_new_tokens`, sampling parameters (`do_sample`, `temperature`, `top_p`), and special token IDs (`eos_token_id`, `pad_token_id`). `use_cache=True` enables the key-value cache for faster generation.
        6.  **Inference Mode:** Wraps the generation call in `with torch.inference_mode():` to disable gradient calculations, reducing memory usage and speeding up inference.
        7.  **Generate Call:** Calls `model.generate()` with the input tensors and generation parameters.
        8.  **Decoding:** Decodes the generated token IDs back into text using `tokenizer.batch_decode`. Crucially, it slices the output tensor (`outputs[:, inputs.shape[1]:]`) to decode *only the newly generated tokens*, excluding the input prompt. `skip_special_tokens=True` removes tokens like `<eos>`.
        9.  **Return Value:** Returns the cleaned, generated text string.
        10. **Error Handling:** Includes a `try...except` block to catch errors during the generation process itself (e.g., CUDA Out-of-Memory) and returns an error message.

## Context & Importance

This snippet focuses on the "Generation" part of a potential RAG system.
*   Loading the LLM via **Unsloth** with **4-bit quantization** is key to making a powerful model like Gemma 2 9B feasible on consumer/modest hardware by significantly reducing VRAM requirements and potentially speeding up inference.
*   The prerequisite checks ensure that this resource-intensive step is only attempted if the environment setup and retrieval components are likely functional.
*   The `generate_text_unsloth` function provides a clean, reusable interface for interacting with the LLM, incorporating best practices like chat templating, inference mode, and specific decoding of new tokens. This function will likely be called later, feeding it prompts constructed using the context retrieved by `retrieve_and_filter_context` (from Snippet 3).
*   The success of this snippet, indicated by `LLM_LOAD_SUCCESS`, is critical for the final text generation capability of the application.

In [None]:
# === Snippet 4 (Revised for Unsloth): Load LLM ===
print("\n--- Running Snippet 4 (Revised for Unsloth): Load LLM ---")
if not ('INSTALL_SUCCESS' in globals() and INSTALL_SUCCESS):
    print("Error: Base libraries did not install successfully. Halting.")
    # exit()
if not (EMBEDDING_LOAD_SUCCESS and INDEX_LOAD_SUCCESS):
    print("Error: Embedding model or index failed to load. Halting LLM load.")
    # exit()

print(f"\nLoading LLM: {LLM_MODEL_ID} using Unsloth...")
print("This may take several minutes and significant VRAM...")
max_seq_length = 4096 # Can adjust based on VRAM
dtype = None        # Auto-detect (e.g., Bfloat16 on Ampere+)
load_in_4bit = True # Use 4-bit quantization
model = None
tokenizer = None

try:
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name=LLM_MODEL_ID,
        max_seq_length=max_seq_length,
        dtype=dtype,
        load_in_4bit=load_in_4bit,
        # token = "hf_...", # Add huggingface token if needed
    )
    print("Unsloth FastLanguageModel loaded successfully.")
    LLM_LOAD_SUCCESS = True
except Exception as e:
    print(f"\nERROR loading LLM with Unsloth: {e}")
    LLM_LOAD_SUCCESS = False

# --- Define a unified generation function ---
def generate_text_unsloth(prompt_text, max_new_toks=350, temp=0.5, top_p_val=0.9):
    """Generates text using the loaded Unsloth model."""
    global model, tokenizer # Ensure access to loaded model/tokenizer
    if not LLM_LOAD_SUCCESS or model is None or tokenizer is None:
        print("LLM not loaded successfully, cannot generate text.")
        return "Error: LLM not available."

    # Using Gemma 2 chat template (applied by tokenizer.apply_chat_template)
    messages = [{"role": "user", "content": prompt_text}]
    # Ensure tokenizer has pad_token set if it doesn't naturally
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
        # print("Warning: pad_token was None, setting to eos_token.") # Can uncomment for debug

    try:
        inputs = tokenizer.apply_chat_template(
            messages,
            tokenize=True,
            add_generation_prompt=True,
            return_tensors="pt"
        ).to("cuda") # Ensure input is on CUDA device

        generation_params = {
            "max_new_tokens": max_new_toks,
            "use_cache": True,
            "do_sample": True,
            "temperature": temp,
            "top_p": top_p_val,
            "eos_token_id": tokenizer.eos_token_id,
            "pad_token_id": tokenizer.pad_token_id,
        }

        # print(f"\n--- Generating (max_new_tokens={max_new_toks}, temp={temp}, top_p={top_p_val}) ---") # Can uncomment for debug
        # print(f"Input prompt (templated): {tokenizer.decode(inputs[0])}") # Debugging

        with torch.inference_mode(): # Ensure inference mode for efficiency
            outputs = model.generate(input_ids=inputs, **generation_params)

        # Decode only the newly generated part
        decoded_output = tokenizer.batch_decode(outputs[:, inputs.shape[1]:], skip_special_tokens=True)[0]
        # print("--- Generation Complete ---") # Can uncomment for debug
        return decoded_output.strip()

    except Exception as e:
        print(f"Error during text generation: {e}")
        # Consider more specific error handling if needed (e.g., CUDA OOM)
        return f"Error during generation: {e}"

print("Defined unified generation function 'generate_text_unsloth'.")
print("--- Finished Snippet 4 (Revised for Unsloth) ---")


--- Running Snippet 4 (Revised for Unsloth): Load LLM ---

Loading LLM: unsloth/gemma-2-9b-it-bnb-4bit using Unsloth...
This may take several minutes and significant VRAM...
==((====))==  Unsloth 2025.3.19: Fast Gemma2 patching. Transformers: 4.51.1.
   \\   /|    Tesla T4. Num GPUs = 2. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = None. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/6.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/47.0k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

Unsloth FastLanguageModel loaded successfully.
Defined unified generation function 'generate_text_unsloth'.
--- Finished Snippet 4 (Revised for Unsloth) ---


# --- Snippet 5 (Revised): Main Logic Function ---

## Purpose

This snippet defines the core orchestration function, `rewrite_and_verify`. Its primary goal is to take an input text (`original_text`) and iteratively rewrite it until it is deemed safe (spoiler-free) for a user who has only watched "The Wire" up to a specific point (`user_progress_str`). It employs a multi-stage approach involving Retrieval-Augmented Generation (RAG), LLM-based verification, and LLM-based rewriting, incorporating anti-hallucination checks and progressively stricter criteria.

## Key Actions

1.  **Prerequisite & Input Validation:**
    *   Checks if the necessary components (LLM, Embedding Model, Index) from previous snippets loaded successfully (`LLM_LOAD_SUCCESS`, etc.). Returns an error immediately if not.
    *   Parses the `user_progress_str` using the `parse_progress` helper function. Returns an error if the format is invalid.
    *   Logs the start of the process and the user's progress point.

2.  **Inner Helper Function Definition (`find_potential_spoiler_spans`):**
    *   Defines a local helper function to locate potential spoiler sentences within a given text based on phrases identified by RAG.
    *   **Input:** `text`, `spoiler_phrases` (list of strings, potentially prefixed with "(S E)").
    *   **Logic:**
        *   Attempts to extract the core text from `spoiler_phrases` (removing `(SxE)`).
        *   Performs case-insensitive search (`find`) for each phrase within the `text`.
        *   For each match, determines the start and end indices of the surrounding sentence using punctuation (`.!? `) as delimiters. Handles edge cases (start/end of text).
        *   Includes logic to avoid adding duplicate or highly overlapping spans if multiple spoiler phrases match the same sentence segment.
    *   **Output:** A list of tuples `(start_index, end_index, triggering_phrase)` representing identified potential spoiler sentence locations.

3.  **Initial Context Gathering (RAG):**
    *   Calls `retrieve_and_filter_context` with a *permissive* distance threshold (`0.65`) and high result count (`n_results=8`) to gather `safe_context`. This context represents events confirmed to be *at or before* the user's progress and is used later to help the LLM avoid misidentifying past events as spoilers.
    *   Calls `retrieve_and_filter_context` again with a slightly stricter threshold (`0.58`, `n_results=6`) on the `original_text` to get an initial list of `spoilers_to_remove` (potential spoilers present in the input).

4.  **Initialization for Iterative Process:**
    *   Sets up history lists (`candidates_history`, `feedback_history`) and a list to track spoiler locations (`spoiler_locations`).
    *   Initializes `current_candidate_text` with the `original_text`.
    *   Defines a list of `thresholds` (`[0.58, 0.54, 0.50, 0.46]`) representing progressively stricter RAG distance thresholds to be used in verification across attempts.

5.  **Main Refinement Loop (`for attempt in range(1, max_attempts + 1):`)**:
    *   Iterates up to `max_attempts`.
    *   **Verification Phase:**
        *   Selects the `current_threshold` based on the attempt number.
        *   **RAG for Verification Hints:** Runs `retrieve_and_filter_context` on the `current_candidate_text` using `current_threshold` to identify potential spoilers (`spoilers_in_candidate`) *in the current version* of the text.
        *   **Locate Spoilers:** Uses `find_potential_spoiler_spans` to pinpoint where these hints might appear in the `current_candidate_text`. Stores these locations with the attempt number.
        *   **Build Verifier Prompt:** Constructs a detailed prompt for the LLM (`generate_text_unsloth`):
            *   Sets the persona (spoiler detector).
            *   Specifies user progress (`S{user_season}E{user_episode}`).
            *   Includes attempt-specific focus instructions (`verification_focus`) that get stricter over attempts.
            *   Provides the `current_candidate_text`.
            *   Includes **CRITICAL ANTI-HALLUCINATION INSTRUCTIONS** (e.g., never flag past events, ground in text, uncertainty=pass).
            *   Emphasizes **Timeline Awareness** (what is safe vs. spoiler).
            *   Includes `safe_context_info` (previously retrieved safe events).
            *   Includes `spoilers_in_candidate` (RAG hints).
            *   Specifies the required output format ("PASS" or "FAIL: [reason]").
        *   **Call Verifier LLM:** Executes `generate_text_unsloth` with a **low temperature** (`0.08`) for deterministic verification.
        *   **Parse Primary Result:** Checks if the result indicates a "FAIL".
        *   **Secondary Verification (Anti-Hallucination Check):**
            *   If the primary verification indicated "FAIL".
            *   Extracts the claimed spoiler reason/text.
            *   Builds a **Secondary Verification Prompt** asking a timeline expert LLM to *specifically* check if the claimed spoiler is *actually* from after the user's progress, providing stringent anti-hallucination rules.
            *   Calls `generate_text_unsloth` with a **very low temperature** (`0.05`).
            *   If the secondary check returns "FALSE ALARM", it overrides the primary verification result to "PASS".
    *   **Process Verification Result (Post-Secondary Check):**
        *   **If Passed:** If the current `attempt` is the `max_attempts`, return the text and `True`. Otherwise, log passage and continue to the next, stricter attempt.
        *   **If Failed (Confirmed):**
            *   Extract feedback (the reason for failure) from the verifier's response.
            *   Append feedback to `feedback_history`.
            *   **Rewriting Phase:**
                *   Build **Rewriter Prompt** for the LLM:
                    *   Sets persona (expert editor).
                    *   Specifies user progress.
                    *   Includes attempt-specific `rewrite_strategy` (e.g., rewrite major spoilers vaguely, preserve names/locations).
                    *   Provides the `formatted_feedback` (including recent problematic spans found in this attempt).
                    *   Reiterates timeline rules (preserve past, modify future).
                    *   Includes `safe_context_info`.
                    *   Provides detailed instructions (preserve names, rewrite vague, maintain length, focus on feedback).
                    *   Includes the `current_candidate_text` to be rewritten.
                *   Estimate `max_new_toks` for the rewriter based on input length.
                *   Call Rewriter LLM (`generate_text_unsloth`) with a slightly higher temperature (`0.2` to `0.28` depending on attempt) to allow for creative rewriting.
                *   Handle potential LLM errors during rewrite gracefully (return the previous candidate text and `False`).
                *   Update `current_candidate_text` with the rewritten version.
                *   Add the *previous* candidate text to `candidates_history`.
                *   Continue to the next loop iteration with the rewritten text.

6.  **Post-Loop Final Verification:**
    *   This section executes only if the loop completes all `max_attempts` *without* passing the verification on the final attempt.
    *   Performs one last verification using a strict RAG threshold (`0.46`).
    *   Builds a **Final Verifier Prompt** similar to the loop's verifier but emphasizing it's the final check.
    *   Calls Verifier LLM with a low temperature (`0.05`).
    *   Includes a **Final Secondary Verification** step (identical logic to the loop's secondary check, using temp `0.03`) if the final primary verification fails, to catch potential last-minute hallucinations.
    *   Based on the potentially corrected final verification result, returns the final `current_candidate_text` along with `True` if it passed, or `False` if spoilers are still detected.

## Context & Importance

This function is the heart of the spoiler-handling application. It implements a sophisticated, multi-step process designed to be robust:
*   **Iterative Refinement:** Instead of a single rewrite attempt, it progressively tries to fix the text, using increasingly strict criteria.
*   **RAG Integration:** Uses vector search (RAG) to find potentially relevant passages (both safe context and potential spoilers) to guide the LLMs.
*   **LLM Verification:** Leverages an LLM specifically prompted for verification, focusing on timeline accuracy.
*   **LLM Rewriting:** Uses another LLM call, guided by the verifier's feedback, to perform targeted rewrites.
*   **Anti-Hallucination:** Incorporates multiple layers of checks (strict prompting, RAG for grounding, secondary verification LLM calls) specifically designed to prevent the LLMs from incorrectly flagging non-spoilers or inventing timeline issues.
*   **State Management:** Keeps track of text versions and feedback history.



In [None]:
# === Snippet 5 (Revised): Main Logic Function ===
print("\n--- Running Snippet 5: Defining Main Rewrite & Verify Logic ---")

def rewrite_and_verify(original_text, user_progress_str, max_attempts=4):
    """
    Progressive refinement approach with enhanced past-episode recognition and
    anti-hallucination safeguards for spoiler detection.
    """
    # Make sure models/index are loaded before proceeding
    if not (LLM_LOAD_SUCCESS and EMBEDDING_LOAD_SUCCESS and INDEX_LOAD_SUCCESS):
        print("Error: Required models or index not loaded.")
        return "Error: Required models or index not loaded.", False

    user_season, user_episode = parse_progress(user_progress_str)
    if user_season is None or user_episode is None:
        print(f"Error: Invalid user progress format '{user_progress_str}'.")
        return f"Error: Invalid user progress format '{user_progress_str}'.", False

    print(f"\n=== Starting Spoiler Rewriting for User at S{user_season}E{user_episode} ===")
    print(f"Original Text (first 150 chars): {original_text[:150]}...") # Shortened print

    # --- Helper function defined inside for locality ---
    def find_potential_spoiler_spans(text, spoiler_phrases):
        spans = []
        text_lower = text.lower() # Lowercase text once
        for phrase in spoiler_phrases:
            # Extract key part if separator exists, handle potential errors
            try:
                key_phrase = phrase.split(') ', 1)[1] if ') ' in phrase else phrase # Try to get text after (SxE)
                key_phrase = key_phrase.split(' - ')[0].strip() # Get part before optional ' - '
            except IndexError:
                 key_phrase = phrase.strip() # Fallback to using the whole phrase if split fails

            if not key_phrase: continue # Skip empty phrases

            phrase_lower = key_phrase.lower()

            # Find all occurrences, not just the first
            start_idx = 0
            while start_idx < len(text_lower):
                found_idx = text_lower.find(phrase_lower, start_idx)
                if found_idx == -1:
                    break # Not found in the rest of the text

                # Get surrounding context (sentence) - slightly improved logic
                # Find start of sentence (previous period/!/? + space or start of text)
                sent_start = max(0, text_lower.rfind('. ', 0, found_idx) + 2)
                sent_start = max(sent_start, text_lower.rfind('! ', 0, found_idx) + 2)
                sent_start = max(sent_start, text_lower.rfind('? ', 0, found_idx) + 2)
                # Handle case where sentence starts the text
                if text_lower.rfind('. ', 0, found_idx) == -1 and \
                   text_lower.rfind('! ', 0, found_idx) == -1 and \
                   text_lower.rfind('? ', 0, found_idx) == -1 and found_idx > 0:
                    sent_start = 0 # Likely start of the text

                # Find end of sentence (next period/!/? or end of text)
                sent_end = text_lower.find('. ', found_idx)
                sent_end_q = text_lower.find('? ', found_idx)
                sent_end_e = text_lower.find('! ', found_idx)

                # Find the earliest sentence end marker
                possible_ends = [e for e in [sent_end, sent_end_q, sent_end_e] if e != -1]
                if not possible_ends:
                    sent_end = len(text) # End of text if no marker found
                else:
                    sent_end = min(possible_ends) + 1 # Include the punctuation

                # Avoid adding duplicate spans if multiple search phrases match same text part
                is_duplicate = False
                for existing_start, existing_end, _ in spans:
                    # Check for significant overlap
                    if max(sent_start, existing_start) < min(sent_end, existing_end):
                         is_duplicate = True
                         break
                if not is_duplicate:
                    spans.append((sent_start, sent_end, key_phrase)) # Store key_phrase found

                start_idx = found_idx + 1 # Continue search after this find

        return spans
    # --- End of inner helper function ---


    # Get episode-specific safe context with more results to better understand past episodes
    safe_threshold = 0.65 # Very permissive for identifying safe content
    # We only need the text part for the prompt, discard potential future spoilers from this call
    safe_context, _ = retrieve_and_filter_context(
        original_text, user_season, user_episode, n_results=8, distance_threshold=safe_threshold
    )
    print(f"Safe Context: Retrieved {len(safe_context)} items about past episodes (up to S{user_season}E{user_episode})")

    # Initial RAG with permissive threshold for spoiler detection
    initial_threshold = 0.58 # Permissive starting point
    # Here we *do* care about potential spoilers found in the original text
    _, spoilers_to_remove = retrieve_and_filter_context(
        original_text, user_season, user_episode, n_results=6, distance_threshold=initial_threshold
    )
    print(f"Initial RAG (threshold {initial_threshold:.2f}): Found {len(spoilers_to_remove)} potential spoilers.")

    # History tracking
    candidates_history = []
    feedback_history = []
    spoiler_locations = [] # Track where spoilers have been found

    # Start with original as first candidate
    current_candidate_text = original_text
    candidates_history.append(current_candidate_text)

    # Progressive threshold tightening - reasonable pace
    thresholds = [0.58, 0.54, 0.50, 0.46] # Gradually more strict

    # --- Main Loop ---
    for attempt in range(1, max_attempts + 1):
        print(f"\n--- Progressive Refinement: Attempt {attempt}/{max_attempts} ---")

        # Get current threshold
        current_threshold = thresholds[min(attempt-1, len(thresholds)-1)]
        print(f"Using verification threshold: {current_threshold:.2f}")

        # --- Verify current candidate ---
        print(">> Calling Verifier LLM for current candidate...")

        # RAG with current threshold - get hints for the verifier
        _, spoilers_in_candidate = retrieve_and_filter_context(
            current_candidate_text, user_season, user_episode, n_results=5, distance_threshold=current_threshold
        )

        # Find potential spoiler locations in the *current* text
        potential_spans = find_potential_spoiler_spans(current_candidate_text, spoilers_in_candidate)
        # Note: spoiler_locations list grows over attempts, could prune old ones if needed
        for span in potential_spans:
            # Avoid adding duplicates based on text content and attempt
            is_new_location = True
            for loc in spoiler_locations:
                if loc['text'] == current_candidate_text[span[0]:span[1]] and loc['attempt'] == attempt:
                    is_new_location = False
                    break
            if is_new_location:
                 spoiler_locations.append({
                    'start': span[0],
                    'end': span[1],
                    'text': current_candidate_text[span[0]:span[1]],
                    'phrase': span[2], # Store the RAG phrase that triggered this
                    'attempt': attempt
                 })


        # Format safe context information for the verifier prompt
        safe_context_info = ""
        if safe_context:
            safe_context_info = f"CONFIRMED SAFE EVENTS (up to S{user_season}E{user_episode}):\n"
            # Limit the number shown to avoid excessive prompt length
            safe_context_info += "\n".join([f"- {item}" for item in safe_context[:5]])

        # Create primary verification prompt
        verification_focus = ""
        if attempt == 1:
            verification_focus = f"Focus ONLY on obvious MAJOR plot spoilers beyond S{user_season}E{user_episode}. DO NOT flag events from S{user_season}E{user_episode} or earlier - these are NOT spoilers."
        elif attempt == 2:
            verification_focus = f"Focus on clear spoilers about character developments beyond S{user_season}E{user_episode}. CAREFULLY CHECK if events mentioned are from before or after S{user_season}E{user_episode}. DO NOT mark past episodes as spoilers."
        elif attempt >= 3:
            verification_focus = f"Be thorough but grounded. Clearly distinguish between events before vs. after S{user_season}E{user_episode}. Events up to S{user_season}E{user_episode} are NEVER spoilers. Avoid flagging content unless you're confident it reveals FUTURE events."

        verifier_prompt = f"""You are a specialized spoiler detection system for 'The Wire' with timeline awareness.
A user has watched ONLY up to Season {user_season}, Episode {user_episode}.
Your task is to verify that the text contains NO spoilers from FUTURE episodes (after S{user_season}E{user_episode}).

**Current Analysis Phase: {attempt} of {max_attempts}**
{verification_focus}

**Candidate Text:**
{current_candidate_text}

**CRITICAL ANTI-HALLUCINATION INSTRUCTIONS:**
1. NEVER flag content from S{user_season}E{user_episode} or earlier as spoilers.
2. Stay grounded in what the text ACTUALLY says - don't invent connections.
3. A name, location, or item alone is NOT a spoiler unless it explicitly reveals future events.
4. Content is only a spoiler if it DEFINITELY reveals events beyond S{user_season}E{user_episode}.
5. If unsure when an event occurs, DO NOT flag it as a spoiler.
6. UNCERTAINTY = PASS (default to passing unless certain it's a future spoiler).

**Timeline Awareness:**
- Everything up to and including S{user_season}E{user_episode} is SAFE content.
- Only content about episodes after S{user_season}E{user_episode} counts as spoilers.
- ASK YOURSELF: "Does this text explicitly reveal events that happen after S{user_season}E{user_episode}?"

{safe_context_info}

**System-Flagged Potential Spoilers (Hints for Analysis):**
{chr(10).join(f' - {s}' for s in spoilers_in_candidate) if spoilers_in_candidate else ' - None automatically flagged (still review manually)'}

**Output Format:**
- If no FUTURE spoilers: Respond with "PASS"
- If FUTURE spoilers found: Respond with "FAIL: [Quote the exact spoiler text and specify which future episode it spoils, e.g., S1E10]"

Provide your analysis.
"""
        # Use lower temp for more deterministic verification
        verification_result = generate_text_unsloth(verifier_prompt, temp=0.08, max_new_toks=200)

        print(f"Verifier Result: {verification_result[:150]}..." if len(verification_result) > 150 else verification_result) # Shortened print

        verification_result_clean = verification_result.strip().upper()
        is_primary_fail = not verification_result_clean.startswith("PASS") and "FAIL" in verification_result_clean

        # --- Secondary Verification (Anti-Hallucination) if primary failed ---
        if is_primary_fail:
            print(">> Running secondary verification to prevent hallucination...")

            # Extract the claimed spoiler for double-checking
            claimed_spoiler_text = "Unknown"
            if ":" in verification_result:
                # Take everything after the first colon as the claimed issue
                claimed_spoiler_text = verification_result.split(":", 1)[1].strip()
            else:
                 # If no colon, use the whole result as context (less ideal)
                claimed_spoiler_text = verification_result.strip()

            # Secondary verification prompt
            secondary_prompt = f"""You are a specialized 'The Wire' timeline expert.
CRITICAL: Double-check if this alleged spoiler actually reveals events after Season {user_season}, Episode {user_episode}.

**Text being reviewed:**
{current_candidate_text}

**Claimed spoiler / Context:**
{claimed_spoiler_text}

**TIMELINE VERIFICATION INSTRUCTIONS:**
1. Very carefully determine if this content TRULY reveals events after S{user_season}E{user_episode}.
2. Content about events in or before S{user_season}E{user_episode} is NEVER a spoiler.
3. A name, location, or item alone is NOT a spoiler unless it explicitly reveals future events.
4. General themes or vague statements are NOT spoilers without specific plot revelations.
5. Be extremely careful not to hallucinate timeline information.

{safe_context_info}

Output ONLY:
"CONFIRMED SPOILER" if you are certain this reveals events after S{user_season}E{user_episode}.
"FALSE ALARM" if this content is about events before or during S{user_season}E{user_episode}, or if it's too vague to be a spoiler.
"""
            # Use very low temp for deterministic secondary check
            secondary_verification = generate_text_unsloth(secondary_prompt, temp=0.05, max_new_toks=50)
            print(f"Secondary verification result: {secondary_verification.strip()}")

            # Override the verification result if it was a false alarm
            if "FALSE ALARM" in secondary_verification.upper():
                verification_result = "PASS (corrected after secondary verification)"
                verification_result_clean = "PASS" # Update clean result too
                is_primary_fail = False # Mark as no longer failed
                print(">> Verification corrected: False alarm detected")
            # else: Confirmed spoiler, proceed with feedback extraction

        # --- Process Verification Result ---
        if verification_result_clean.startswith("PASS"):
             # If it passed AND it's the last attempt, we're done
             if attempt == max_attempts:
                 print(f"\n=== Verification PASSED on final attempt {attempt} ===")
                 # No need for final check if it passed the last attempt's strictness
                 return current_candidate_text, True
             else:
                 # Passed this phase, continue to next attempt for stricter check
                 print(f"Phase {attempt} verification passed, proceeding to stricter phase {attempt+1}")
                 # Loop continues naturally
        else: # This means primary verification failed AND secondary check confirmed it (or secondary check wasn't needed)
            # Extract feedback
            if verification_result.strip().upper().startswith("FAIL:"):
                feedback_content = verification_result.strip()[len("FAIL:"):].strip()
            else:
                # Use the full result if format is unexpected, better than no feedback
                feedback_content = verification_result.strip()

            feedback_history.append(feedback_content)
            print(f"Verification identified issues. Feedback: {feedback_content[:100]}...") # Shortened print

            # --- LLM Rewriter for targeted rewriting ---
            print(">> Calling Rewriter LLM for targeted rewriting...")

            # Define rewriting strategy based on attempt
            rewrite_strategy = ""
            rewrite_temp = 0.2 # Default temp
            if attempt == 1:
                rewrite_temp = 0.2
                rewrite_strategy = f"REWRITE major spoilers to make future events vaguer while PRESERVING names, locations, and items. DO NOT modify content about events from S{user_season}E{user_episode} or earlier."
            elif attempt == 2:
                rewrite_temp = 0.25
                rewrite_strategy = f"REWRITE character development spoilers by making outcomes and future events vaguer. RETAIN all names, places, and objects but make their future actions less specific. DO NOT change past episode content."
            elif attempt >= 3:
                rewrite_temp = 0.28
                rewrite_strategy = f"THOROUGHLY REWRITE any future plot hints while PRESERVING all names, places, and references. Replace specific outcomes with more general descriptions. ABSOLUTELY PRESERVE all content about S{user_season}E{user_episode} or earlier."

            # Format feedback, potentially including recent locations
            formatted_feedback = feedback_content
            # Show only locations found in *this* attempt for clarity
            current_attempt_locations = [loc for loc in spoiler_locations if loc['attempt'] == attempt]
            if current_attempt_locations:
                formatted_feedback += "\n\nProblematic sections identified in this pass may include:"
                # Show max 3 most recent locations from *this* attempt
                for loc in current_attempt_locations[-3:]:
                     formatted_feedback += f"\n- \"{loc['text']}\" (related to '{loc['phrase']}')"

            # Build rewriter prompt
            rewriter_prompt = f"""You are an expert TV show content editor specializing in spoiler management for 'The Wire'.
Your task is to rewrite content to be safe for a viewer who has only watched up to Season {user_season}, Episode {user_episode}.

**CURRENT PHASE: {attempt} of {max_attempts} - PRECISE REWRITING**
{rewrite_strategy}

**FEEDBACK FROM VERIFICATION (Needs Addressing):**
{formatted_feedback}

**TIMELINE PRESERVATION RULES:**
1. Content about S{user_season}E{user_episode} and earlier episodes MUST be preserved exactly.
2. Only modify content that reveals events AFTER S{user_season}E{user_episode}.
3. When rewriting future events, preserve names but make outcomes vaguer.

**SAFE CONTEXT (Confirmed to be at/before S{user_season}E{user_episode}):**
{safe_context_info if safe_context_info else 'None provided'}

**REWRITING INSTRUCTIONS:**
1. PRESERVE NAMES, LOCATIONS, AND ITEMS - these are NOT spoilers by themselves.
2. DO NOT REMOVE information - instead REWRITE it to be vaguer about future outcomes.
3. Try to maintain text length and information density where possible without spoiling.
4. Focus on precision - modify only what needs changing based on feedback.
5. Apply the phase-specific strategy: {rewrite_strategy}.
6. CRITICAL: If a character faces major changes beyond S{user_season}E{user_episode}, don't remove their name but rewrite to hide their specific fate.

**Current Text (Needs Rewriting):**
{current_candidate_text}

Rewrite the text addressing the feedback. Output ONLY the rewritten text with no explanations.
"""
            # Estimate max tokens based on current text length + some buffer
            estimated_max_tokens = max(200, int(len(current_candidate_text.split()) * 1.5))

            generated_text = generate_text_unsloth(
                rewriter_prompt,
                temp=rewrite_temp,
                max_new_toks=estimated_max_tokens
            )

            # Handle LLM errors during rewrite
            if generated_text.startswith("Error:"):
                print(f"Rewriter LLM failed: {generated_text}")
                # Fall back to the text *before* this failed rewrite attempt
                if len(candidates_history) > 1: # Ensure there's a previous state
                    return candidates_history[-1], False # Return the last known good state
                else:
                    return original_text, False # Or just return original if first attempt failed badly
                # return "Error during rewriting process.", False # Old return

            # Store history *before* updating current text
            candidates_history.append(current_candidate_text) # Save the text *before* rewrite
            current_candidate_text = generated_text # Update to the newly rewritten text
            print(f"Rewriter Output (first 150 chars): {current_candidate_text[:150]}...") # Shortened print
            # Loop continues to next attempt

    # --- Post-Loop Final Verification (Only if loop finished without passing on last attempt) ---
    # Note: If loop finished because it passed on attempt == max_attempts, it already returned True above.
    # This section runs if the loop completed all attempts and the last one *failed* primary verification.
    print("\n>> Final Verification Check (After Max Attempts)...")
    final_threshold = 0.46 # Strict but reasonable final RAG threshold
    _, final_spoilers = retrieve_and_filter_context(
        current_candidate_text, user_season, user_episode, n_results=5, distance_threshold=final_threshold
    )

    # Re-use safe context info from earlier
    final_verifier_prompt = f"""You are a specialized spoiler detection system for 'The Wire' with timeline awareness.
A user has watched ONLY up to Season {user_season}, Episode {user_episode}.
This is the FINAL verification check for the candidate text after multiple rewrite attempts.

**Candidate Text:**
{current_candidate_text}

**CRITICAL ANTI-HALLUCINATION INSTRUCTIONS:**
1. NEVER flag content from S{user_season}E{user_episode} or earlier as spoilers.
2. Stay grounded in what the text ACTUALLY says - don't invent connections.
3. A name, location, or item alone is NOT a spoiler unless it explicitly reveals future events.
4. Content is only a spoiler if it DEFINITELY reveals events beyond S{user_season}E{user_episode}.
5. If unsure when an event occurs, DO NOT flag it as a spoiler.
6. UNCERTAINTY = PASS (default to passing unless certain it's a future spoiler).

**Timeline Awareness:**
- Everything up to and including S{user_season}E{user_episode} is SAFE content.
- Only content about episodes after S{user_season}E{user_episode} counts as spoilers.

{safe_context_info}

**System-Flagged Potential Final Spoilers (Hints for Analysis):**
{chr(10).join(f'- {s}' for s in final_spoilers) if final_spoilers else ' - None automatically flagged (still review manually)'}

**Final Verification Instructions:**
- Be thorough but fair in your analysis.
- Flag ONLY content that explicitly reveals events AFTER S{user_season}E{user_episode}.
- NAMES, LOCATIONS, and ITEMS alone are NOT spoilers unless they directly reveal future plot points.

Output ONLY "PASS" if safe for S{user_season}E{user_episode} viewer, or "FAIL: [reason]" if spoilers remain.
"""
    # Use low temp for final check
    final_verification = generate_text_unsloth(final_verifier_prompt, temp=0.05, max_new_toks=150)
    final_verification_clean = final_verification.strip().upper()

    is_final_fail = not final_verification_clean.startswith("PASS") and "FAIL" in final_verification_clean

    # --- Final Secondary Verification if final primary failed ---
    if is_final_fail:
        print(">> Running final secondary verification to prevent hallucination...")
        # Extract claimed spoiler
        claimed_spoiler = final_verification.split(":", 1)[1].strip() if ":" in final_verification else final_verification

        secondary_final_prompt = f"""You are a specialized 'The Wire' timeline expert with anti-hallucination safeguards.
CRITICAL: Double-check if this alleged spoiler actually reveals events after Season {user_season}, Episode {user_episode}.

**Text being reviewed:**
{current_candidate_text}

**Claimed spoiler / Context:**
{claimed_spoiler}

**ANTI-HALLUCINATION CHECKLIST:**
1. Does the text EXPLICITLY mention events that happen AFTER S{user_season}E{user_episode}?
2. Are you CERTAIN this isn't about events in or before S{user_season}E{user_episode}?
3. Does it reveal SPECIFIC future outcomes rather than general themes?
4. Are you avoiding reading between the lines or making unwarranted inferences?
5. Are you certain you're not confusing character mentions with spoilers?

{safe_context_info}

Output ONLY:
"CONFIRMED SPOILER" if you are certain this reveals specific events after S{user_season}E{user_episode}.
"FALSE ALARM" if this content is about events in or before S{user_season}E{user_episode}, or if it's too vague to be a spoiler.
"""
        # Use very low temp
        secondary_final = generate_text_unsloth(secondary_final_prompt, temp=0.03, max_new_toks=50)
        print(f"Final secondary verification result: {secondary_final.strip()}")

        # Override final result if false alarm detected
        if "FALSE ALARM" in secondary_final.upper():
            final_verification = "PASS (corrected after final verification)"
            final_verification_clean = "PASS"
            is_final_fail = False # Mark as no longer failed
            print(">> Final verification corrected: False alarm detected")
        # else: Final fail is confirmed

    # Return final result based on potentially corrected final verification
    if final_verification_clean.startswith("PASS"):
        print("\n=== Final Verification PASSED (After Loop Completion) ===")
        return current_candidate_text, True
    else:
        print(f"\n=== Final Verification FAILED (After Loop Completion): {final_verification} ===")
        return current_candidate_text, False

# --- End of rewrite_and_verify function definition ---

print("\nDefined rewrite_and_verify function.")
print("--- Finished Snippet 5 ---")


--- Running Snippet 5: Defining Main Rewrite & Verify Logic ---

Defined rewrite_and_verify function.
--- Finished Snippet 5 ---


# --- Snippet 6: Example Usage (Expanded with Diverse S1 Test Cases - S1 INDEX AWARE) ---

## Purpose

This snippet demonstrates how to use the core `rewrite_and_verify` function defined in Snippet 5. It sets up a series of diverse test cases, specifically tailored to events within **Season 1 of 'The Wire'**, reflecting the assumed content of the loaded ChromaDB index. It then runs each case through the spoiler detection and rewriting process, logs the results, measures performance, and provides a final summary.

## Key Actions

1.  **Prerequisite Check:**
    *   Before running any tests, it verifies that all preceding setup steps were successful by checking the global flags: `INSTALL_SUCCESS`, `EMBEDDING_LOAD_SUCCESS`, `INDEX_LOAD_SUCCESS`, and `LLM_LOAD_SUCCESS`.
    *   If any of these flags are `False`, it skips the example usage section and prints a message, preventing errors due to missing dependencies.

2.  **Define Test Cases (`test_cases`):**
    *   A list of dictionaries, where each dictionary represents a specific test scenario.
    *   **Structure:** Each dictionary contains:
        *   `id` (str): A unique identifier for the test case (e.g., "Case S1-01").
        *   `text` (str): The input text containing potential spoilers to be processed.
        *   `user_progress` (str): The user's viewing progress (e.g., "S1E2"), used for spoiler timeline comparison.
        *   `expected_outcome_comment` (str): A comment indicating whether the test *should* ideally PASS or FAIL final verification, *based on the assumption that the knowledge base (ChromaDB index) only contains Season 1 information*.
    *   **S1 Index Awareness:** The test cases are deliberately crafted using events primarily from Season 1. This aligns testing with the expected scope of the `the_wire_s1_chroma_db` index.
    *   **Diversity:** Includes various scenarios:
        *   Users at different stages (early, mid, late S1).
        *   Clear spoilers from later S1 episodes.
        *   Text containing only past events (should pass).
        *   Subtle character arc spoilers within S1.
        *   Name drops (should pass if no future plot revealed).
        *   **Known Limitation Cases (S1-11, S1-12):** Includes text mentioning events from future seasons (S2, S3, S5). These are *expected to PASS* because the system's knowledge base (the S1 index) lacks the information needed to identify these as spoilers. This demonstrates the boundary of the system's current knowledge.
    *   **Reused Scenarios:** Some cases reuse text snippets with different user progress points to test the system's sensitivity to the `user_progress` parameter.

3.  **Execute Test Cases:**
    *   Iterates through the `test_cases` list.
    *   **Logging & Formatting:** Prints clear separators (`---`, `>>>`, etc.) and headers for each test case, including its ID, user progress, and expected outcome, making the console output easier to follow.
    *   **Call Main Logic:** For each case, it calls the `rewrite_and_verify(text, progress)` function.
    *   **Timing:** Measures the execution time for each individual test case using `time.time()`.
    *   **Store Results:** Stores the outcome (`success` boolean), the `final_text` produced, the `original` text, the `expected_comment`, and the `duration_seconds` in a `results` dictionary, keyed by the `case_id`.
    *   **Display Case Result:** Prints the final PASS/FAIL status for the case, its duration, the original text, and the potentially rewritten `final_text`.

4.  **Print Summary:**
    *   After processing all test cases, it prints a summary section.
    *   **Formatting:** Uses prominent separators (`###`) to clearly mark the summary block.
    *   **Content:** Displays:
        *   The PASS/FAIL status and duration for each individual case alongside its expected outcome comment.
        *   Total number of cases run.
        *   Counts of cases that passed vs. failed the final verification.
        *   The sum of individual case durations.
        *   The total wall-clock time for running all tests.

## Context & Importance

This snippet serves as the primary validation and demonstration component.
*   It **tests the end-to-end functionality** by integrating the RAG (`retrieve_and_filter_context`), LLM (`generate_text_unsloth`), and orchestration logic (`rewrite_and_verify`).
*   It provides **concrete examples** of how the system handles different types of potential spoilers relative to user progress.
*   The **S1-aware design** makes the results meaningful within the context of the limited (Season 1 only) knowledge base, highlighting both capabilities and limitations (e.g., inability to detect S2+ spoilers).
*   **Performance measurement** gives insights into the computational cost of the iterative rewrite/verify process.
*   The detailed logging and summary allow for quick assessment of the system's accuracy and behavior across various scenarios. It's essential for debugging, evaluation, and understanding the system's strengths and weaknesses.

In [None]:
# === Snippet 6: Example Usage (Expanded with Diverse S1 Test Cases - S1 INDEX AWARE) ===
print("\n--- Running Snippet 6: Example Usage (Expanded - S1 INDEX AWARE) ---")

# Ensure all components are ready before running tests
if not (INSTALL_SUCCESS and EMBEDDING_LOAD_SUCCESS and INDEX_LOAD_SUCCESS and LLM_LOAD_SUCCESS):
    print("Skipping example usage due to earlier setup errors.")
else:
    # Define test cases (using the provided S1 focused list)
    test_cases = [
        # --- Basic Setup & Early Spoilers (Focus: S1 Accuracy) ---
        {
            "id": "Case S1-01 (Very Early User, Mid-Season Spoilers)",
            "text": "After Brandon is tortured and killed for robbing the stash, Omar vows revenge. This leads him to kill Stinkum later on.",
            "user_progress": "S1E2", # User is before Brandon's death (S1E4) and Stinkum's death (S1E5).
            "expected_outcome_comment": "Should FAIL (Contains S1E4/E5 spoilers)"
        },
        {
            "id": "Case S1-02 (Early User, Past Events Only)",
            "text": "The investigation kicks off when McNulty talks to Judge Phelan about the Barksdale crew following the Gant murder trial where D'Angelo got off.",
            "user_progress": "S1E3", # User is past these initial events (S1E1).
            "expected_outcome_comment": "Should PASS (All events are S1E1, user is past)"
        },
        {
            "id": "Case S1-03 (Mid-Season User, Late Season Spoilers)",
            "text": "Things escalate drastically when Kima Greggs is shot during an undercover buy operation, putting the entire detail in jeopardy.",
            "user_progress": "S1E6", # User is well before Kima's shooting (S1E10).
            "expected_outcome_comment": "Should FAIL (Contains S1E10 spoiler)"
        },
        {
            "id": "Case S1-04 (Mid-Season User, Mixed Past/Future)",
            "text": "D'Angelo teaches Wallace and Bodie chess in the low-rises. Sadly, Wallace's story ends tragically when he's killed by his friends on Stringer's orders.",
            "user_progress": "S1E5", # User is past chess (S1E3) but before Wallace's death (S1E12).
            "expected_outcome_comment": "Should FAIL (Contains S1E12 spoiler for Wallace)"
        },
        {
            "id": "Case S1-05 (Mid-Season User, Past Events Only)",
            "text": "The detail faces bureaucratic hurdles getting equipment, while Herc, Carver, and Prez make early mistakes. Freamon quietly proves his skills finding D'Angelo's picture.",
            "user_progress": "S1E7", # User is past these early/mid-season events (approx E1-E4).
            "expected_outcome_comment": "Should PASS (Events are early S1, user is past)"
        },
        # --- Late Season Spoilers & Nuance (Focus: S1 Accuracy) ---
        {
            "id": "Case S1-06 (Late User, Finale Spoilers)",
            "text": "In the season finale, Avon Barksdale is arrested, but on lesser charges, while Stringer Bell walks free. Wee-Bey takes the fall for multiple murders.",
            "user_progress": "S1E10", # User is before the final arrests and outcomes (S1E13).
            "expected_outcome_comment": "Should FAIL (Contains S1E13 spoilers)"
        },
        {
            "id": "Case S1-07 (Late User, Specific Detail Spoiler)",
            "text": "The crucial break comes when the detail successfully clones the Barksdale crew's pagers, allowing them to track messages despite the changing codes.",
            "user_progress": "S1E5", # User is before the pager cloning works (around S1E6/E7).
            "expected_outcome_comment": "Should FAIL (Contains S1E6/E7 spoiler)"
        },
        {
            "id": "Case S1-08 (Late User, Subtle Character Arc Spoiler)",
            "text": "Wallace becomes disillusioned with the game and tries to leave, but his actions ultimately lead to fatal consequences.",
            "user_progress": "S1E8", # User is before Wallace leaves (S1E9) & death (S1E12).
            "expected_outcome_comment": "Should FAIL (Hints strongly at S1E12 spoiler)"
        },
        {
            "id": "Case S1-09 (Very Late User, Past Events Only)",
            "text": "Reflecting on the season, key moments included Omar testifying against Bird in court and the detail's failed sting attempt at Orlando's.",
            "user_progress": "S1E13", # User has finished the season. Orlando's (S1E7), Bird Trial (S1E11).
            "expected_outcome_comment": "Should PASS (Events are past for user)"
        },
        # --- Edge Cases & Future Season Spoilers (Focus: S1 INDEX LIMITATION) ---
        {
            "id": "Case S1-10 (Edge Case - Name Drop)",
            "text": "The complex web of alliances involves figures like Proposition Joe, who runs a separate East side crew.",
            "user_progress": "S1E6", # Prop Joe appears briefly early on. Name alone isn't a spoiler.
            "expected_outcome_comment": "Should PASS (Name drop within S1, not spoiler)"
        },
        {
            "id": "Case S1-11 (Future Season 2 Spoiler)",
            "text": "While Season 1 focuses on the Barksdales, the investigation later shifts focus to the Baltimore port system and the Sobotka family.",
            "user_progress": "S1E12", # Mentions S2 characters/plot.
            "expected_outcome_comment": "Should PASS (S2 info NOT in S1 index, system cannot verify)"
        },
        {
            "id": "Case S1-12 (Very Distant Future Spoiler)",
            "text": "Stringer Bell eventually tries to go legitimate but is ultimately killed by Omar and Brother Mouzone. Later, Marlo Stanfield rises to power.",
            "user_progress": "S1E4", # Mentions S3/S5 events.
            "expected_outcome_comment": "Should PASS (S3/S5 info NOT in S1 index, system cannot verify)"
        },
        {
            "id": "Case S1-13 (Heavy Multi-Spoiler - Reuse)",
            "text": "The season culminates tragically with Wallace killed by Bodie and Poot under Stringer's orders, Kima shot during a buy, and D'Angelo ultimately taking a long prison sentence to protect Avon and the family.",
            "user_progress": "S1E2", # Events are S1E10-S1E13.
            "expected_outcome_comment": "Should FAIL (Multiple late S1 spoilers)"
        },
        {
            "id": "Case S1-14 (Past & Future Spoilers - Reuse)",
            "text": "After Gant's murder early on, Omar gets revenge for Brandon by killing Stinkum in broad daylight, which forces Stringer to change pager codes.",
            "user_progress": "S1E4", # Gant (S1E1) is past. Stinkum (S1E5)/Pagers (S1E6) are future.
            "expected_outcome_comment": "Should FAIL (Contains S1E5/E6 spoilers)"
        },
        {
            "id": "Case S1-15 (Past Spoilers Only - Reuse)",
            "text": "Reviewing early events, we saw McNulty trigger the investigation via Judge Phelan, and D'Angelo teach Wallace and Bodie chess during downtime in the Pit.",
            "user_progress": "S1E5", # Phelan (S1E1)/Chess (S1E3) are past.
            "expected_outcome_comment": "Should PASS (Events are past for user)"
        },
        {
            "id": "Case S1-16 (Character Fate - D'Angelo)",
            "text": "D'Angelo struggles with his conscience throughout the season, a conflict that eventually leads to his imprisonment at the end.",
            "user_progress": "S1E9", # Imprisonment outcome is S1E13.
            "expected_outcome_comment": "Should FAIL (Contains S1E13 character fate spoiler)"
        }
    ]

    # --- Run Test Cases ---
    results = {}
    start_time_all_tests = time.time() # Time all tests

    for i, case in enumerate(test_cases):
        case_id = case["id"]
        text = case["text"]
        progress = case["user_progress"]
        expected_comment = case.get("expected_outcome_comment", "N/A")

        # <<< CHANGE: Add separator line and extra newline before starting a case
        print("\n" + "-"*80)
        print(f"--- Running Test Case {i+1}/{len(test_cases)}: {case_id} (User at {progress}) ---")
        print(f" (Expected based on S1 Index: {expected_comment})")
        print("-"*80 + "\n") # <<< CHANGE: Add separator line after starting header

        start_time_case = time.time()
        # Assuming rewrite_and_verify and supporting functions are defined above
        final_text, success = rewrite_and_verify(text, progress)
        end_time_case = time.time()
        case_duration = end_time_case - start_time_case

        results[case_id] = {
            "success": success,
            "final_text": final_text,
            "original": text,
            "expected_comment": expected_comment,
            "duration_seconds": round(case_duration, 2)
            }

        result_status = "PASSED" if success else "FAILED"
        # <<< CHANGE: Add newline before printing final result block
        print(f"\n>>> FINAL RESULT Block ({case_id}): {result_status} FINAL VERIFICATION ({case_duration:.2f} seconds)")
        print("-" * 40) # <<< CHANGE: Add a sub-separator

        print("Original Text:")
        print(text)
        print("-" * 20) # Keep original separator
        print(f"Final Text ({result_status}):")
        print(final_text)
        print("-" * 40) # <<< CHANGE: Add a sub-separator
        print(f"--- Finished {case_id} ---")
        # <<< CHANGE: Add extra newline after finishing a case (before next case's header)
        print("\n")


    end_time_all_tests = time.time()
    total_duration = end_time_all_tests - start_time_all_tests

    # --- Optional: Print Summary ---
    # <<< CHANGE: Add separator before summary
    print("\n" + "#"*80)
    print("#" + " " * 24 + "Test Case Summary (S1 Index Aware)" + " " * 24 + "#")
    print("#"*80 + "\n")

    passed_count = 0
    failed_count = 0
    total_duration_reported = 0.0
    for case_id, result in results.items():
        status = "PASSED" if result["success"] else "FAILED"
        expected = result["expected_comment"]
        duration = result["duration_seconds"]
        total_duration_reported += duration # Sum durations from results
        print(f"{case_id}: {status} ({duration}s) (Expected: {expected})")
        if result["success"]: passed_count += 1
        else: failed_count += 1

    print("-" * 30)
    print(f"Total Cases Run: {len(test_cases)}")
    print(f"Passed Final Verification: {passed_count}")
    print(f"Failed Final Verification: {failed_count}")
    # <<< CHANGE: Report summed duration from results for potentially better accuracy if tests run fast
    print(f"Sum of Individual Case Durations: {total_duration_reported:.2f} seconds")
    print(f"Total Test Execution Time (Wall Clock): {total_duration:.2f} seconds")
    print("--- End of Summary ---")
    # <<< CHANGE: Add separator after summary
    print("\n" + "#"*80 + "\n")


print("\n--- Finished Snippet 6 (Expanded - S1 Index Aware) ---")


--- Running Snippet 6: Example Usage (Expanded - S1 INDEX AWARE) ---

--------------------------------------------------------------------------------
--- Running Test Case 1/16: Case S1-01 (Very Early User, Mid-Season Spoilers) (User at S1E2) ---
 (Expected based on S1 Index: Should FAIL (Contains S1E4/E5 spoilers))
--------------------------------------------------------------------------------


=== Starting Spoiler Rewriting for User at S1E2 ===
Original Text (first 150 chars): After Brandon is tortured and killed for robbing the stash, Omar vows revenge. This leads him to kill Stinkum later on....
Safe Context: Retrieved 0 items about past episodes (up to S1E2)
Initial RAG (threshold 0.58): Found 12 potential spoilers.

--- Progressive Refinement: Attempt 1/4 ---
Using verification threshold: 0.58
>> Calling Verifier LLM for current candidate...
Verifier Result: FAIL: "After Brandon is tortured and killed for robbing the stash, Omar vows revenge. This leads him to kill Stinkum la