# LLM-as-a-Judge with Advanced Azure AI Evaluation

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sachin0034/MLAI-community-labs/blob/main/Class-Labs/Lab-14(LLM-Judge-lab)/Lab-2(LLM-as-a-Judge%20with%20Advanced%20Azure%20AI%20Evaluation)/LLM_as_a_Judge_with_Advanced_Azure_AI_Evaluation.ipynb)






In previous iterations of our Legal Document Analyzer, we leveraged OpenAI models for both the extraction of key terms from legal contracts and their subsequent evaluation. As demonstrated in our prior lab, this approach provided initial insights into the capabilities of large language models for legal text analysis. Building upon that foundation, this current project significantly enhances our evaluation methodology by integrating pre-trained evaluators from Azure AI. This shift allows us to utilize specialized, robust metrics for assessing the quality of our LLM's extractions, specifically focusing on aspects like groundedness, coherence, relevance, and fluency, thereby providing a more comprehensive and nuanced understanding of model performance.

# Intro to Azure AI Evaluation
Azure AI evaluations are a set of tools and features within Azure AI Studio designed to assess the performance and quality of generative AI models and applications. They provide a structured way to measure various aspects of AI responses, including accuracy, groundedness, and safety, using both built-in and customizable metrics.

[Read more about Azure AI Evaluation](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/develop/evaluate-sdk)

---

## Prerequisites

> ⚠️ **Note:** Make sure you have an active **Azure subscription account**. Without it, you won't be able to run this notebook with Azure integrations.  

---

Before you get started, please make sure you have the following ready:

---

### 1. Sample Contract File for Testing

To try out the contract analysis workflow, download the sample contract file provided below:

- [Download Sample Contract (Google Drive)](https://drive.google.com/file/d/1E557kdNBZ5cDUvVDLNrEVRuKcRSYDG3Z/view?usp=sharing)

---

### 2. Ground Truth CSV File

Download the ground truth CSV file from the link below:

- [Download Ground Truth CSV (Google Drive)](https://drive.google.com/file/d/1E557kdNBZ5cDUvVDLNrEVRuKcRSYDG3Z/view?usp=sharing)

---

### 3. OpenAI API Key

You’ll need your own OpenAI API key to access the language models used for contract evaluation. If you don’t have one yet, follow this step-by-step guide to generate your API key:

- [How to get your own OpenAI API key (Medium article)](https://medium.com/@lorenzozar/how-to-get-your-own-openai-api-key-f4d44e60c327)

---

### 4. Azure Setup (for Azure OpenAI / Azure AI Foundry Users)

If you're using Azure services, ensure you have the following:

- ✅ **An active Azure subscription**
- ✅ **Sufficient Azure portal permissions** (Contributor, Cognitive Services Contributor, or Owner at the subscription level)
- ✅ **An Azure AI Foundry resource or an Azure OpenAI resource deployed**

---

### 📘 Useful Documentation Links

**Configure and Connect to Azure AI Foundry:**
- [Create a new Azure AI Foundry project (Official docs)](https://learn.microsoft.com/en-us/azure/ai-services/foundry/how-to-create-project)
- [Configure a connection to use Azure AI Foundry Models](https://learn.microsoft.com/en-us/azure/ai-services/foundry/how-to-use-model)

**Azure OpenAI API Reference:**
- [Azure OpenAI parameters and authentication](https://learn.microsoft.com/en-us/azure/ai-services/openai/reference)
- [LangChain `AzureChatOpenAI` Configuration Example](https://python.langchain.com/docs/integrations/chat/azure_openai)

**General Azure AI Setup Guide:**
- [Configure your environment for Azure AI resources](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/configure-environment)

---


# 📦 Cell 1 - Installing the Essentials

Hey there! 👋  
Before we dive into the magic, let’s pack our toolkit. This cell installs all the essential Python packages we'll need on our journey.  
Think of it as preparing your inventory before entering the dungeon. 🧙‍♂️⚔️🛡️

| Package               | What it Does                                                                 |
|-----------------------|------------------------------------------------------------------------------|
| `langchain`           | Your command center for building with language models. Makes chaining prompts a breeze! |
| `pypdf`               | Helps you read and extract text from PDF files. Like opening scrolls of ancient text. 📜 |
| `docx2txt`            | Extracts text from `.docx` files—Word documents won't hide anything from you now! |
| `pandas`              | Powerful tool for data manipulation and analysis. Like Excel, but 100x cooler. 📊 |
| `openai`              | The official package to talk to OpenAI’s powerful GPT models. Say hello to your AI buddy! 👋 |
| `gradio`              | Quickly build web UIs to interact with your models. Test your ideas live, instantly! ⚡ |
| `azure-ai-generative`| Enables you to connect with Azure’s AI services. If you’re an Azure fan, you’ll love this. ☁️ |

And here's the magic wand to install them all at once:


In [4]:
! pip install langchain pypdf docx2txt pandas openai gradio azure-ai-generative



# 📦 Cell 2 - Bringing the Community Power with `langchain-community`

Hey, welcome back! 👋  
Now that we’ve got the base tools ready, let’s add something special: the **LangChain Community** package! 🌍✨

This package includes connectors and integrations built by the awesome LangChain community—  
so you get access to a wider range of tools, wrappers, loaders, and chains that aren't part of the core library.

| Package               | What it Does                                                                 |
|-----------------------|------------------------------------------------------------------------------|
| `langchain-community`| A collection of community-contributed tools and integrations for LangChain. Think of it as a treasure chest 🧰 full of plugins to level up your LangChain experience!


In [5]:
! pip install langchain-community

Collecting langchain-community
  Downloading langchain_community-0.3.27-py3-none-any.whl.metadata (2.9 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.10.1-py3-none-any.whl.metadata (3.4 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.1-py3-none-any.whl.metadata (9.4 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading mypy_extensions-1.1.0-py3-none-any.whl.metadata (1.1 kB)
Downloading langchain_community-0.3.27-py3-none-any.whl (2.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.

# 🧪 Cell 3 - Evaluating Like a Pro with `azure-ai-evaluation`

Hey again! 🎯  
Now that you've got tools to build cool LLM apps, let’s add something that helps you **evaluate how well your AI is doing**. Because hey, building is one thing—but knowing if it actually works well? That's next-level! 💡

In this cell, we install:

| Package               | What it Does                                                                 |
|-----------------------|------------------------------------------------------------------------------|
| `azure-ai-evaluation` | A Microsoft Azure package to help you evaluate AI models and workflows. Great for benchmarking, scoring, and analyzing LLM output quality in a more structured and automated way!


In [6]:
! pip install azure-ai-evaluation



# 🏗️ Cell 4 - Getting Set Up: Imports & Libraries

Hey hey! 👋 You've made it to Cell 4 — nice work!  
This one is like unpacking your toolbox 🧰 — we’re importing everything we’ll need to build, analyze, and serve your LLM-based app.

Let’s break down what’s happening here:

| Module / Package                             | Why We Need It                                                                 |
|---------------------------------------------|--------------------------------------------------------------------------------|
| `os`, `io`, `tempfile`, `re`, `json`, `time`| Core Python modules for file handling, regex, JSON manipulation, and timing. Basic stuff, but super powerful! |
| `pandas as pd`                               | For tabular data handling and display. Think of it as Excel in code form. 🧮     |
| `gradio as gr`                               | To build a beautiful web-based UI for your app. Plug and play GUI components! 🎛️ |
| `langchain.document_loaders`                | We’re loading text from PDFs, Word docs, and plain `.txt` files. Time to feed the LLM! 📄 |
| `langchain.text_splitter`                   | Splits large chunks of text into manageable pieces. No more overwhelming the poor model! ✂️ |
| `langchain.schema.Document`                 | Represents a single document in a structured way. Used downstream for processing. 📚 |
| `openai` and error classes                  | Used to interact with OpenAI’s GPT models and handle API-related errors gracefully. 🤖❌ |
| `azure.ai.evaluation`                       | Imports evaluation tools like:  
→ `GroundednessEvaluator`: Checks factual grounding  
→ `CoherenceEvaluator`: Checks flow and consistency  
→ `RelevanceEvaluator`: Checks how on-topic the response is  
→ `FluencyEvaluator`: Checks grammar and readability

In [None]:
import os
import io
import tempfile
import re
import json
from langchain.document_loaders import PyPDFLoader, Docx2txtLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
import pandas as pd
from langchain.schema import Document
from openai import OpenAI
from openai import APIError, APIConnectionError, RateLimitError
import time
import gradio as gr

# Azure AI Evaluation imports
from azure.ai.evaluation import GroundednessEvaluator, CoherenceEvaluator, RelevanceEvaluator, FluencyEvaluator

# 🔐 Cell 5 - Setting Up OpenAI & Azure Connections

Now that we’ve got all the imports ready, it’s time to connect with the powerful brains behind your app: **OpenAI** and **Azure OpenAI Service**.

Let’s walk through what’s happening here:

| Section                       | What It Does                                                                 |
|------------------------------|------------------------------------------------------------------------------|
| **OpenAI Client Initialization** | This sets up a secure connection to OpenAI's API using your API key. This client is what you’ll use to send prompts and get responses from GPT models. 🤖 |
| **Azure AI Project Setup**     | You define the Azure AI project info here — subscription ID, resource group, and project name. This ensures Azure knows where your project lives. 🌍 |
| **Azure OpenAI Model Config**  | You specify important config like the endpoint, API key, deployment name (e.g., `gpt-4o-mini`), and API version. This tells your app which model to talk to and where. ⚙️ |

### ✅ A Few Notes:

- Make sure you **keep your API keys secret** 🔐 (like we just did here). Never expose them in public repos.
- Ensure you **place all your keys and config values in a `.env` file or a secure secrets manager** — **never hard-code them directly in scripts**. 🔑
- The values in the Azure config are needed to authenticate and route your request to the right deployment.
- Using `gpt-4o-mini`? Awesome choice! It's fast and lightweight, but still powerful for evaluations and generation tasks.



In [14]:
# Initialize OpenAI client
client = OpenAI(api_key='Insert Your API key')
# Initialize Azure AI project and Azure OpenAI connection with your environment variables
azure_ai_project = {
    "subscription_id": "Insert your subscription_id",
    "resource_group_name": "Insert your resource_group_name",
    "project_name": "Insert your project_name",
}

model_config = {
    "azure_endpoint": "Insert your azure_endpoint",
    "api_key": "Insert your api_key",
    "azure_deployment": "Insert your azure_deployment",
    "api_version": "Insert your api_version"
}

# 🧠 Cell 6 - Time to Evaluate! Azure AI Evaluators + Key Terms

Hey again, look at you flying through this! ✈️  
In this cell, we’re setting up **automated evaluation tools** powered by Azure — because building smart apps is cool, but building *smart apps that know they’re smart*? That’s elite. 🧪💡

---

### 🎯 What’s Happening in This Cell:

| Element                         | What It Does                                                                 |
|---------------------------------|------------------------------------------------------------------------------|
| `GroundednessEvaluator`         | Checks if the AI’s response is **fact-based and grounded in source content**. Super useful to avoid hallucinations. 📚 |
| `CoherenceEvaluator`            | Evaluates if the response is **logically consistent** and makes sense from start to finish. 🧩 |
| `RelevanceEvaluator`            | Measures how **on-topic** the response is with respect to the prompt or context. 🎯 |
| `FluencyEvaluator`              | Assesses grammar, structure, and readability. Helps you polish your AI output to sound natural and professional. ✍️ |
| `KEY_TERMS`                     | A list of important contractual/legal terms to focus on. These will likely be used later to search or filter documents. 📜 |

---

### ✅ Why This Matters:

These evaluators allow your app to:
- Auto-score and critique LLM outputs
- Enforce higher quality standards
- Pinpoint weak spots in the response (grounding, clarity, relevance, etc.)

✨ **Pro tip:** Automating these evaluations helps in QA workflows, model comparison, and ensuring your product stays reliable as it scales.


In [None]:
# Initialize Azure evaluators
groundedness_eval = GroundednessEvaluator(model_config=model_config)
coherence_eval = CoherenceEvaluator(model_config=model_config)
relevance_eval = RelevanceEvaluator(model_config=model_config)
fluency_eval = FluencyEvaluator(model_config=model_config)

KEY_TERMS = [
    "Service Warranty",
    "Limitation of Liability",
    "Governing Law",
    "Termination for Cause",
    "Payment Terms",
    "Confidentiality Obligations"
]

# 🧾 Cell 7 - Extracting & Chunking Your Documents

Now that your evaluators are ready to judge, let’s give them something to work with! This cell focuses on **loading documents** and **prepping them for AI consumption**.

---

## 📤 Function 1: `extract_text_from_file(file_path)`

This function is your universal document reader — it handles PDFs, Word docs, text files, and even CSVs.

| Parameter        | What It Means                         |
|------------------|----------------------------------------|
| `file_path`      | The full path to the document you want to extract text from. |

### 🔄 What It Does:
- Checks the file extension
- Uses the appropriate **LangChain loader** for `.pdf`, `.docx`, `.txt`
- Converts `.csv` files to strings using Pandas
- Returns:
  - `text`: A raw string of all the content
  - `docs`: A list of `Document` objects (LangChain’s format for handling documents)

### 🛑 Error Handling Features:
- Gracefully catches:
  - Missing files
  - Empty or malformed CSVs
  - Any other sneaky exceptions
- Raises clean errors and logs helpful messages

🎯 **Why it's cool:**  
This function makes your app compatible with a **variety of file types** without having to write separate logic for each one.

---

## ✂️ Function 2: `chunk_document(docs, chunk_size=1000, chunk_overlap=200)`

Alright, now that you've got a document loaded — it's probably huge, right? This function breaks it into digestible **chunks** for the LLM to handle better.

| Parameter         | What It Means                                                        |
|-------------------|-----------------------------------------------------------------------|
| `docs`            | List of LangChain `Document` objects (from the previous function)     |
| `chunk_size`      | Max number of characters per chunk                                    |
| `chunk_overlap`   | Number of overlapping characters between chunks (helps with context!) |

### 🔧 How It Works:
- Uses `RecursiveCharacterTextSplitter` from LangChain
- Breaks big docs into multiple smaller pieces
- Adds overlap so no context is lost between chunks

💡 **Use case tip:** This chunking is **essential** when you want to pass documents to an LLM without overwhelming it with too much content.

In [None]:
def extract_text_from_file(file_path):
    """
    Extracts text from various file types with improved error handling.

    Args:
        file_path (str): The path to the file.

    Returns:
        tuple: A tuple containing the extracted text (str) and a list of Document objects.

    Raises:
        ValueError: If the file type is unsupported.
        Exception: For errors during file reading or processing.
    """
    ext = os.path.splitext(file_path)[1].lower()
    try:
        if ext == ".pdf":
            loader = PyPDFLoader(file_path)
            docs = loader.load()
            text = "\n".join([doc.page_content for doc in docs])
        elif ext in [".docx", ".doc"]:
            loader = Docx2txtLoader(file_path)
            docs = loader.load()
            text = "\n".join([doc.page_content for doc in docs])
        elif ext in [".txt"]:
            loader = TextLoader(file_path)
            docs = loader.load()
            text = "\n".join([doc.page_content for doc in docs])
        elif ext == ".csv":
            df = pd.read_csv(file_path)
            text = df.to_string(index=False)
            docs = [Document(page_content=text, metadata={"page": "N/A"})]
        else:
            raise ValueError("Unsupported file type")
        return text, docs
    except FileNotFoundError:
        print(f"Error: File not found at {file_path}")
        raise
    except pd.errors.EmptyDataError:
        print(f"Error: CSV file is empty or malformed at {file_path}")
        raise
    except Exception as e:
        print(f"An error occurred while reading the file {file_path}: {e}")
        raise

def chunk_document(docs, chunk_size=1000, chunk_overlap=200):
    """
    Splits a list of Document objects into smaller chunks.

    Args:
        docs (list): A list of Langchain Document objects.
        chunk_size (int): The maximum size of each chunk in characters.
        chunk_overlap (int): The number of characters to overlap between chunks.

    Returns:
        list: A list of smaller Langchain Document chunks.
    """
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
        length_function=len,
        is_separator_regex=False,
    )
    chunks = text_splitter.split_documents(docs)
    return chunks

# 🔄 Cell 8 - Making LLM Calls with Retry Logic (Because Errors Happen!)

You're about to learn how to make your app **error-tolerant** and **resilient** when communicating with OpenAI's API. This cell defines a function that retries the call if something goes wrong. Think of it as a polite "try again please" mechanism for your LLM.

---

## 🧠 Function: `query_llm_with_retry(...)`

This function sends a chat message to the LLM (like GPT-4o), and if the API hiccups — it *tries again*, up to a maximum number of retries. 💪

| Parameter       | What It Does                                                                 |
|-----------------|--------------------------------------------------------------------------------|
| `messages`      | A list of chat messages in the format OpenAI’s Chat API expects. 📨             |
| `model`         | The LLM model to use (default is `"gpt-4o"` — great choice!) 🤖                 |
| `max_retries`   | How many times to retry the call if it fails. Defaults to 3. 🔁                |
| `delay`         | Initial wait time (in seconds) between retries. Increases exponentially. ⏳    |

---

### 🔄 What's Happening Internally:

- Wraps the LLM call in a loop with try-except blocks.
- If an API error occurs (like `RateLimitError`, `APIConnectionError`, or a general `APIError`), it:
  - Logs the error with a message.
  - Waits using exponential backoff (`delay * (2^i)`)
  - Tries again until `max_retries` is hit.
- If it fails all attempts, it raises the error with a final message.

---

### 🧯 Handles These Errors Like a Champ:

| Error Type            | Why It Might Happen                                        |
|------------------------|------------------------------------------------------------|
| `APIError`             | General server-side error from OpenAI                     |
| `APIConnectionError`   | Network or connectivity issue                             |
| `RateLimitError`       | You’re sending too many requests too fast — slow down! 🐢 |
| `Exception`            | Anything else unexpected gets caught here                |

---

✅ **Why This Rocks:**
- Prevents your app from crashing on temporary issues
- Helps maintain smooth user experience
- Keeps retry behavior clean and configurable

---


In [None]:
def query_llm_with_retry(messages, model="gpt-4o", max_retries=3, delay=1):
    """
    Sends a query to the LLM with retry logic for API errors.

    Args:
        messages (list): The list of messages for the chat completion.
        model (str): The LLM model to use.
        max_retries (int): Maximum number of retries.
        delay (int): Delay in seconds between retries.

    Returns:
        str: The content of the LLM's response.

    Raises:
        Exception: If the LLM call fails after all retries.
    """
    for i in range(max_retries):
        try:
            completion = client.chat.completions.create(
                model=model,
                messages=messages
            )
            return completion.choices[0].message.content
        except (APIError, APIConnectionError, RateLimitError) as e:
            print(f"API error during LLM call (Attempt {i+1}/{max_retries}): {e}")
            if i < max_retries - 1:
                time.sleep(delay * (2 ** i))
            else:
                print("Max retries reached. LLM call failed.")
                raise
        except Exception as e:
            print(f"An unexpected error occurred during LLM call (Attempt {i+1}/{max_retries}): {e}")
            if i < max_retries - 1:
                time.sleep(delay * (2 ** i))
            else:
                print("Max retries reached. LLM call failed.")
                raise

# 📄 Cell 9 - Extracting Legal Clauses Using GPT and Chunked Analysis

Welcome to Cell 9 — the **brainiest cell yet**! 🧠  
This function takes your documents and intelligently pulls out legal clauses related to key terms like `"Service Warranty"` or `"Payment Terms"` — and it does so chunk by chunk using a GPT model.

---

## 🔍 Function: `extract_key_terms(docs, key_terms)`

This function:
- Splits documents into small pieces using `chunk_document()`
- Sends each chunk to the GPT model with a **well-crafted prompt**
- Asks the model to tell us whether a specific **key term** is mentioned
- If yes, asks it to provide a **brief summary** of the clause

| Parameter     | What It Means                                                                 |
|---------------|--------------------------------------------------------------------------------|
| `docs`        | A list of LangChain `Document` objects (usually chunks from previous steps)   |
| `key_terms`   | A list of key phrases or legal terms to search for in the document            |

---

## 🧠 How It Works:

1. 🔁 **Loops over each term** in your `key_terms` list.
2. 🪓 **Splits the document** into smaller, overlapping chunks (to respect token limits and retain context).
3. 📬 **For each chunk**, sends a system+user message to GPT asking:
    - Is the term present?
    - If yes, summarize it in **2 sentences** max.
4. ✅ If GPT replies `"Relevant: Yes"`, it:
    - Extracts the summary using regex
    - Tracks the chunk number and page number
5. ❌ If GPT replies `"Relevant: No"`, it moves on to the next chunk.
6. 📊 Final result is a **dictionary per key term**, with:
    - A combined summary of all matches
    - Page numbers where it was found

---

## 🛡️ Resilience Features:

| Feature               | Why It Helps                                                            |
|------------------------|------------------------------------------------------------------------|
| `query_llm_with_retry` | Handles OpenAI hiccups with smart retry logic — no crashing! 🔁          |
| Regex parsing          | Extracts structured output from GPT reliably, with fallbacks. 🕵️‍♂️      |
| Page metadata tracking | Associates summaries with original page numbers for better context 📍 |

---

## 🧠 Output Example (per term):

```json
{
  "Service Warranty": {
    "answer": "Found relevant information. Summary: The warranty covers service defects within 12 months.",
    "page_number": "1, 3"
  },
  "Termination for Cause": {
    "answer": "Not found.",
    "page_number": "Not applicable"
  }
}


In [None]:
def extract_key_terms(docs, key_terms):
    """
    Extracts clauses related to key terms from document chunks using LLM.
    Uses chunking to handle token limits and attempts to correlate results with page numbers.

    Args:
        docs (list): A list of Langchain Document objects (can be chunks).
        key_terms (list): List of key terms to extract.

    Returns:
        dict: A dictionary with each key term and its associated extracted answer and page number.
    """
    results = {}
    chunks = chunk_document(docs)

    for term in key_terms:
        term_results = []
        found_in_chunks = False
        for i, chunk in enumerate(chunks):
            prompt = (
                f"You are a legal document analysis assistant.\n"
                f"Your task is to determine if the following document snippet contains information pertaining to the term: '{term}'.\n"
                f"If it does, extract the relevant clause(s) and provide a very brief summary (no more than 2 sentences, focusing only on the key obligation or restriction).\n"
                f"If relevant information is found, respond in the following structured format:\n\n"
                f"Relevant: Yes\n"
                f"Summary: <A very brief summary of the relevant clause(s)>\n\n"
                f"If the term is not found in this snippet, respond exactly with:\n"
                f"Relevant: No\n\n"
                f"Document Snippet (Chunk {i+1}):\n{chunk.page_content}"
            )

            messages = [
                {"role": "system", "content": "You are a legal contract analysis assistant."},
                {"role": "user", "content": prompt}
            ]

            try:
                answer = query_llm_with_retry(messages, model="gpt-4o-mini")
                print(f"****************LLM Answer for '{term}' in Chunk {i+1}*******************")
                print(answer)

                if "Relevant: Yes" in answer:
                    summary_match = re.search(r"Summary:\s*(.*)", answer, re.DOTALL)
                    summary = summary_match.group(1).strip() if summary_match else "Summary extraction failed."
                    page_number = chunk.metadata.get("page", "N/A")
                    term_results.append({"clause_summary": summary, "page_number": page_number, "chunk_index": i})
                    found_in_chunks = True
                elif "Relevant: No" in answer:
                    pass
                else:
                    print(f"Warning: Unexpected LLM response format for term '{term}' in chunk {i+1}. Response:\n{answer}")

            except Exception as e:
                print(f"An error occurred while processing chunk {i+1} for term '{term}': {e}")

        if not found_in_chunks or not term_results:
            results[term] = {"answer": "Not found.", "page_number": "Not applicable"}
        else:
            combined_summary = " ".join([res["clause_summary"] for res in term_results])
            page_numbers = sorted(list(set([res["page_number"] for res in term_results])))
            page_info = ", ".join(map(str, page_numbers)) if page_numbers and page_numbers != ["N/A"] else "N/A"

            results[term] = {
                "answer": f"Found relevant information. Summary: {combined_summary}",
                "page_number": page_info
            }

    return results


# 🧾 Cell 10 - Extracting Verbatim Ground Truth from Legal Documents

Boom! You’ve arrived at Cell 10 — the ultimate extractor. 🧠✨  
While Cell 9 gave you brief summaries of legal clauses, this one is laser-focused on grabbing the **exact original wording** (aka "ground truth") for each key term. No paraphrasing — we want it raw and accurate. 💯

---

## 🔍 Function: `extract_ground_truth(docs, key_terms)`

This function extracts **verbatim clauses** related to specific legal terms using GPT. It processes each chunk of the document, asks for a structured JSON response, and builds a dictionary of exact matches.

| Parameter     | Description                                                                  |
|---------------|------------------------------------------------------------------------------|
| `docs`        | A list of LangChain `Document` objects (pre-processed chunks from earlier)  |
| `key_terms`   | List of legal/contract terms you're trying to extract from the document      |

---

### 🧠 How It Works:

1. 📖 **Splits your documents** into smaller chunks with `chunk_document()`
2. 🔁 **Loops over each `term`** you want to search
3. 📨 Sends each chunk to the LLM with a prompt asking for:
   - **Exact match text** if the term is found
   - Page number metadata for traceability
   - A strict JSON format for easier parsing
4. ✅ If GPT responds with a valid JSON and relevant clause, it's added to your results
5. 🧹 If no result is found in any chunk, marks the term as `"Not found"`

---

### 🔐 Prompt Magic:

The prompt tells GPT to return its response like this:

```json
{
  "term": "Confidentiality Obligations",
  "ground_truth_answer": "<verbatim text>",
  "snippet_page": "3"
}


In [None]:
def extract_ground_truth(docs, key_terms):
    """
    Extracts ground truth answers for each key term from document chunks using LLM.
    Parses JSON output and attempts to correlate results with page numbers.

    Args:
        docs (list): A list of Langchain Document objects (can be chunks).
        key_terms (list): List of key terms to extract.

    Returns:
        dict: A dictionary with each key term and its associated extracted ground truth and page number.
    """
    ground_truth = {}
    chunks = chunk_document(docs)

    for term in key_terms:
        ground_truth_answer = "Not found."
        page_number = "Not applicable"
        found_verbatim = False

        for i, chunk in enumerate(chunks):
            prompt = (
                f"You are a legal document analysis assistant. "
                f"Your task is to extract the *ground truth* from the following document snippet for the key term: '{term}'. "
                f"The ground truth is the exact text (verbatim) from the snippet that directly addresses or defines the key term. "
                f"Return your response in the following JSON format:\n"
                f'{{\n  "term": "{term}",\n  "ground_truth_answer": "<verbatim text or Not found>",\n  "snippet_page": "<page number of snippet or N/A>"\n}}\n\n'
                f"If the term is not mentioned or no relevant section exists in this snippet, the JSON should be:\n"
                f'{{\n  "term": "{term}",\n  "ground_truth_answer": "Not found",\n  "snippet_page": "N/A"\n}}\n\n'
                f"Document Snippet (Chunk {i+1}, Page: {chunk.metadata.get('page', 'N/A')}):\n{chunk.page_content}"
            )

            messages = [
                {"role": "system", "content": "You are a legal contract analysis assistant."},
                {"role": "user", "content": prompt}
            ]

            try:
                answer_str = query_llm_with_retry(messages, model="gpt-4o-mini")
                print(f"****************GROUNDTRUTHANSWER for '{term}' in Chunk {i+1}*******************")
                print(answer_str)

                try:
                    answer_json = json.loads(answer_str)
                    llm_ground_truth = answer_json.get("ground_truth_answer", "Parsing failed.")
                    llm_snippet_page = answer_json.get("snippet_page", "N/A")

                    if llm_ground_truth != "Not found" and llm_ground_truth != "Parsing failed.":
                        ground_truth_answer = llm_ground_truth
                        page_number = chunk.metadata.get("page", "N/A")
                        found_verbatim = True
                        break

                except json.JSONDecodeError:
                    print(f"Warning: LLM response is not valid JSON for term '{term}' in chunk {i+1}. Response:\n{answer_str}")

            except Exception as e:
                print(f"An error occurred while processing chunk {i+1} for term '{term}': {e}")

        ground_truth[term] = {
            "ground_truth_answer": ground_truth_answer,
            "page_number": page_number
        }

    return ground_truth

### ✂️ truncate_text(text, max_length)

**Purpose:**  
Truncates a given text to a specified character limit. Adds `"..."` at the end if truncation occurs.

**Args:**
- `text` (`str`): The input string to be truncated.
- `max_length` (`int`): The maximum number of characters allowed.

**Returns:**
- A truncated string ending in `"..."` if it exceeds `max_length`, else the original text.


In [None]:
def truncate_text(text, max_length):
    """Truncate text to specified length for display purposes"""
    return text[:max_length] + "..." if len(text) > max_length else text

### `evaluate_entry(entry, results_df)`

Evaluates a single key term entry from a legal document against four key criteria: **groundedness**, **coherence**, **relevance**, and **fluency** using Azure/OpenAI evaluation models. The function appends the evaluated scores and metadata into a running results DataFrame.

#### **Parameters**
- `entry` (`dict`):  
  A dictionary containing:
  - `'Key Term Name'` (`str`): The name of the legal clause (e.g., "Payment Terms").
  - `'query'` (`str`): The search query or clause being evaluated.
  - `'context'` (`str`): The document snippet or paragraph from which the ground truth is extracted.
  - `'ground_truth'` (`str`): The verbatim text from the document associated with the key term.
  - `'llm_response'` (`str`): The output provided by the LLM (optional for evaluation comparison).

- `results_df` (`pd.DataFrame`):  
  The running dataframe to which the evaluated result will be appended.

#### **Returns**
- `pd.DataFrame`:  
  A new DataFrame with one additional row containing:
  - Key Term Name
  - Context
  - Query
  - Ground Truth
  - LLM Response
  - Groundedness Score
  - Coherence Score
  - Relevance Score
  - Fluency Score

#### **Dependencies**
- `truncate_text(text, max_length)` - helper function to shorten long queries for console output.
- `groundedness_eval`, `coherence_eval`, `relevance_eval`, `fluency_eval` - initialized Azure or OpenAI evaluation objects.

#### **Example**
```python
entry = {
    "Key Term Name": "Confidentiality Obligations",
    "query": "What are the confidentiality obligations?",
    "context": "The parties shall keep all confidential information private...",
    "ground_truth": "The parties agree not to disclose any proprietary or confidential info...",
    "llm_response": "Parties must not share confidential information during or after the agreement."
}

results_df = evaluate_entry(entry, results_df)
```

This function helps build a structured and automated evaluation pipeline for contract analysis or legal clause extraction systems.


In [None]:
def evaluate_entry(entry, results_df):
    """Evaluate a single entry using all evaluators and add to results dataframe"""
    print(f"Evaluating query: {truncate_text(entry['query'], 40)}")

    # Format inputs for each evaluator
    groundedness_input = {
        "query": entry["query"],
        "context": entry["context"],
        "response": entry["ground_truth"]
    }

    coherence_input = {
        "query": entry["query"],
        "response": entry["ground_truth"]
    }

    relevance_input = {
        "query": entry["query"],
        "response": entry["ground_truth"]
    }

    fluency_input = {
        "response": entry["ground_truth"]
    }

    # Get all scores
    groundedness_score = groundedness_eval(**groundedness_input)
    coherence_score = coherence_eval(**coherence_input)
    relevance_score = relevance_eval(**relevance_input)
    fluency_score = fluency_eval(**fluency_input)

    # Create new row for the dataframe
    new_row = {
        'Key Term Name': entry['Key Term Name'], # Ensure Key Term Name is passed through
        'Context': entry['context'],
        'Query': entry['query'],
        'Ground Truth': entry['ground_truth'],
        'LLM Response': entry['llm_response'],
        'Groundedness Score': groundedness_score['groundedness'],
        'Coherence Score': coherence_score['coherence'],
        'Relevance Score': relevance_score['relevance'],
        'Fluency Score': fluency_score['fluency']
    }

    # Use concat with a pre-defined DataFrame
    new_row_df = pd.DataFrame([new_row])
    return pd.concat([results_df, new_row_df], ignore_index=True)

### `create_evaluation_dataset(llm_results, ground_truth_results, document_text)`

Generates a structured evaluation dataset combining LLM-generated answers and ground truth answers for a fixed set of legal key terms.

---

#### **Parameters**
- `llm_results` (`dict`):  
  Dictionary where keys are legal key terms and values are dictionaries with at least an `"answer"` field containing the LLM's output.

- `ground_truth_results` (`dict`):  
  Dictionary where keys are the same legal key terms and values contain a `"ground_truth_answer"` (text extracted from the original document).

- `document_text` (`str`):  
  The full legal document text. Only the first 2000 characters are used as "context" to limit evaluation input size.

---

#### **Returns**
- `list[dict]`:  
  A list of evaluation entries where each item is a dictionary with the following structure:
  ```python
  {
      'Key Term Name': "<Term Name>",
      'query': "Extract information about <Term> from the legal document.",
      'context': "<First 2000 chars of document_text>",
      'ground_truth': "<Extracted answer from document>",
      'llm_response': "<LLM's answer>"
  }
  ```

---

#### **Dependencies**
- `KEY_TERMS` (global list):  
  A list of predefined legal clauses like:
  ```python
  [
      "Service Warranty",
      "Limitation of Liability",
      "Governing Law",
      "Termination for Cause",
      "Payment Terms",
      "Confidentiality Obligations"
  ]
  ```

---

#### **Example**
```python
dataset = create_evaluation_dataset(llm_results, ground_truth_results, full_doc_text)
for row in dataset:
    print(row['Key Term Name'], row['query'], row['llm_response'])
```

---

This function is essential to bridge model outputs and ground truth values before passing them to automated evaluators.


In [None]:
def create_evaluation_dataset(llm_results, ground_truth_results, document_text):
    """Create evaluation dataset from LLM and ground truth results"""
    evaluation_data = []

    for term in KEY_TERMS:
        if term in llm_results and term in ground_truth_results:
            llm_answer = llm_results[term].get('answer', 'Not found.')
            ground_truth_answer = ground_truth_results[term].get('ground_truth_answer', 'Not found.')

            # Create evaluation entry
            entry = {
                'Key Term Name': term,  # Add the key term name here
                'query': f"Extract information about {term} from the legal document.",
                'context': document_text[:2000],  # Use first 2000 chars as context
                'ground_truth': ground_truth_answer,
                'llm_response': llm_answer
            }
            evaluation_data.append(entry)

    return evaluation_data

### `evaluate_responses(llm_results, ground_truth_results, document_text)`

Evaluates LLM-generated responses against extracted ground truth answers using a suite of Azure/OpenAI evaluation models. It computes metrics such as groundedness, coherence, relevance, and fluency, and returns a structured DataFrame of results.

---

#### **Parameters**
- `llm_results` (`list[dict]`):  
  A list of LLM responses. Each dictionary typically includes:
  - `'term'`: The key term being queried.
  - `'llm_response'`: The LLM's output for that term.

- `ground_truth_results` (`dict`):  
  Dictionary mapping each key term to:
  - `'ground_truth_answer'`: Verbatim extracted text from the document.
  - `'page_number'`: (Optional) Page number from which the text was extracted.

- `document_text` (`str`):  
  The full text of the original legal document (or a combined snippet text), used as context for evaluators.

---

#### **Returns**
- `pd.DataFrame`:  
  A DataFrame with one row per evaluated key term, including:
  - `'Key Term Name'`
  - `'Context'`
  - `'Query'`
  - `'Ground Truth'`
  - `'LLM Response'`
  - `'Groundedness Score'`
  - `'Coherence Score'`
  - `'Relevance Score'`
  - `'Fluency Score'`

If an error occurs during evaluation, returns a DataFrame with an `"Error"` column describing the issue.

---

#### **Dependencies**
- `create_evaluation_dataset(...)`:  
  Assembles entries combining LLM outputs and ground truths for consistent evaluation.
- `evaluate_entry(...)`:  
  Evaluates a single entry and appends it to the result DataFrame.

---

#### **Example**
```python
results_df = evaluate_responses(llm_results, ground_truth_results, full_document_text)

# Show results
print(results_df.head())
```

This function is intended for automated scoring and benchmarking of legal clause extraction or compliance generation pipelines.


In [None]:
def evaluate_responses(llm_results, ground_truth_results, document_text):
    """Evaluate LLM responses against ground truth using Azure evaluators"""
    try:
        # Create evaluation dataset
        evaluation_data = create_evaluation_dataset(llm_results, ground_truth_results, document_text)

        # Initialize results dataframe with all necessary columns
        results_df = pd.DataFrame(columns=[
            'Key Term Name', 'Context', 'Query', 'Ground Truth', 'LLM Response', 'Groundedness Score',
            'Coherence Score', 'Relevance Score', 'Fluency Score'
        ])

        # Evaluate each entry
        for entry in evaluation_data:
            results_df = evaluate_entry(entry, results_df)

        return results_df

    except Exception as e:
        print(f"Error during evaluation: {e}")
        # Return an empty DataFrame or a DataFrame with error info for Gradio
        return pd.DataFrame({'Error': [f"Evaluation failed: {e}"]})

### `process_document(contract_file, ground_truth_file=None)`

| **Aspect**         | **Details**                                                                                                                                           |
|--------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Description**    | Processes a legal contract and optional ground truth file to extract key legal terms and evaluate them using Azure AI Evaluators.                     |
| **Parameters**     | `contract_file` (`str`): Path to the contract file. <br> `ground_truth_file` (`str`, optional): Path to the ground truth file. Defaults to `None`.    |
| **Returns**        | A tuple of 3 elements:<br>• `llm_results` (`dict`): Key term answers from the LLM<br>• `ground_truth_results` (`dict` or `None`)<br>• `evaluation_results` (`pd.DataFrame` or `None`) |
| **Steps**          | 1. Extract text from contract file<br>2. Run LLM extraction for `KEY_TERMS`<br>3. If ground truth is provided:<br>&nbsp;&nbsp;&nbsp;a. Extract ground truth answers<br>&nbsp;&nbsp;&nbsp;b. Run evaluation using Azure evaluators |
| **Evaluation**     | If ground truth is provided, each term's response is evaluated on:<br>• Groundedness<br>• Coherence<br>• Relevance<br>• Fluency                      |
| **Error Handling** | If any error occurs:<br>• Logs the error<br>• Returns error dictionaries for `llm_results` and `ground_truth_results`<br>• Returns an error DataFrame |
| **Example**        | ```python<br>llm_out, gt_out, eval_df = process_document("contract.pdf", "ground_truth.pdf")<br>print(eval_df.head())<br>```                         |
| **Depends On**     | • `extract_text_from_file()`<br>• `extract_key_terms()`<br>• `extract_ground_truth()`<br>• `evaluate_responses()`<br>• Global: `KEY_TERMS`            |


In [None]:
def process_document(contract_file, ground_truth_file=None):
    """
    Processes the uploaded contract and optional ground truth document.

    Args:
        contract_file (str): Path to the contract document.
        ground_truth_file (str, optional): Path to the ground truth document. Defaults to None.

    Returns:
        tuple: A tuple containing the LLM extraction results, ground truth results, and evaluation results.
    """
    llm_results = {}
    ground_truth_results = None
    evaluation_results = None
    document_text = ""

    try:
        # Step 1: Extract text and documents from the contract file
        print(f"Processing contract file: {contract_file}")
        contract_text, contract_docs = extract_text_from_file(contract_file)
        document_text = contract_text

        # Step 2: Extract key terms using the LLM on document chunks
        print("Extracting key terms using LLM...")
        llm_results = extract_key_terms(contract_docs, KEY_TERMS)
        print("Key term extraction complete.")

        # Step 3: Extract ground truth if provided
        if ground_truth_file:
            print(f"\nProcessing ground truth file: {ground_truth_file}")
            ground_truth_text, ground_truth_docs = extract_text_from_file(ground_truth_file)

            print("Extracting ground truth using LLM...")
            ground_truth_results = extract_ground_truth(ground_truth_docs, KEY_TERMS)
            print("Ground truth extraction complete.")

            # Step 4: Evaluate LLM responses against ground truth
            print("Starting evaluation...")
            evaluation_results = evaluate_responses(llm_results, ground_truth_results, document_text)
            print("Evaluation complete.")

    except Exception as e:
        print(f"\nAn error occurred during document processing: {e}")
        llm_results = {"Error": f"Processing failed: {e}"}
        ground_truth_results = {"Error": f"Processing failed: {e}"}
        # Ensure evaluation_results is a DataFrame even on error for Gradio
        evaluation_results = pd.DataFrame({'Error': [f"Processing failed: {e}"]})

    return llm_results, ground_truth_results, evaluation_results

### `display_results(contract_file, ground_truth_file=None)`

| **Aspect**         | **Details**                                                                                                                                                                |
|--------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Description**    | Processes input files using `process_document()` and formats outputs for display in Gradio: LLM results, ground truth, and evaluation.                                      |
| **Parameters**     | `contract_file` (`str` or `file`): Contract document to analyze.<br>`ground_truth_file` (`str` or `file`, optional): Ground truth reference file. Defaults to `None`.      |
| **Returns**        | Tuple of: <br>• `llm_output` (`str`): Markdown-formatted LLM results<br>• `ground_truth_output` (`str`): Markdown-formatted ground truth<br>• `evaluation_df_for_gradio` (`pd.DataFrame`) |
| **LLM Output**     | Displayed with term-wise markdown formatting. Shows extracted answers for each key term.                                                                                     |
| **Ground Truth**   | If provided, shows extracted ground truth for each key term. Otherwise, outputs a message that no file was provided.                                                       |
| **Evaluation**     | If evaluation is successful, returns a filtered and formatted `DataFrame` with the following columns:<br>• `Key Term Name`, `Query`, `Ground Truth`, `LLM Response`, `Groundedness Score`, `Coherence Score`, `Relevance Score`, `Fluency Score` |
| **Error Handling** | If evaluation fails or input is missing, returns a default empty `DataFrame` or an error DataFrame for Gradio compatibility.                                               |

---

### `create_interface()`

| **Aspect**         | **Details**                                                                                                                                                     |
|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Description**    | Builds the Gradio user interface for uploading legal contracts and optional ground truth files, analyzing them, and displaying the results.                      |
| **Returns**        | `interface` (`gr.Blocks`): A configured Gradio Blocks interface ready to launch or integrate into a `launch()` call.                                              |
| **Layout**         | - Title and instructions using `Markdown`<br>- Two file inputs (`contract_file`, `ground_truth_file`)<br>- `Analyze` button<br>- Output: 2 `Markdown` and 1 `DataFrame` |
| **Output Sections**| - LLM Extraction Results (Markdown)<br> - Ground Truth Results (Markdown)<br> - Evaluation Results (Gradio `DataFrame`)                                           |
| **Interaction**    | On click of the Analyze button, the `display_results()` function is called, and its outputs are fed into the respective display sections.                        |
| **File Support**   | Accepts `.pdf`, `.docx`, `.doc`, `.txt`, `.csv` for both contract and ground truth documents.                                                                    |


In [13]:
def display_results(contract_file, ground_truth_file=None):
    """
    Display results in a formatted way for Gradio interface
    """
    llm_results, ground_truth_results, evaluation_results = process_document(contract_file, ground_truth_file)

    # Format LLM Results
    llm_output = "## LLM Extraction Results\n\n"
    for term, result in llm_results.items():
        if isinstance(result, dict) and 'answer' in result:
            llm_output += f"**{term}:**\n"
            llm_output += f"- Answer: {result['answer']}\n"
        else:
            llm_output += f"**{term}:** {result}\n\n"

    # Format Ground Truth Results
    ground_truth_output = "## Ground Truth Results\n\n"
    if ground_truth_results:
        for term, result in ground_truth_results.items():
            if isinstance(result, dict) and 'ground_truth_answer' in result:
                ground_truth_output += f"**{term}:**\n"
                ground_truth_output += f"- Answer: {result['ground_truth_answer']}\n"
            else:
                ground_truth_output += f"**{term}:** {result}\n\n"
    else:
        ground_truth_output = "No ground truth file provided."

    # Prepare Evaluation Results for Gradio DataFrame
    if evaluation_results is not None and not evaluation_results.empty:
        if 'Error' in evaluation_results.columns:
            # If there's an error, just return a simple DataFrame with the error message
            # Gradio DataFrame can display this, but it won't be in the desired evaluation format.
            # You might want to handle this error display differently in the UI if needed.
            # For now, it will show a table with one column 'Error' and the message.
            evaluation_df_for_gradio = evaluation_results
        else:
            # Define the desired column order for the final table
            desired_columns = [
                'Key Term Name', 'Query', 'Ground Truth', 'LLM Response',
                'Groundedness Score', 'Coherence Score', 'Relevance Score', 'Fluency Score'
            ]

            # Filter and reorder columns
            # Drop 'Context' as it's not requested in the final output format.
            display_df = evaluation_results.copy()
            if 'Context' in display_df.columns:
                display_df = display_df.drop(columns=['Context'])

            final_columns = [col for col in desired_columns if col in display_df.columns]
            evaluation_df_for_gradio = display_df[final_columns]
    else:
        # Return an empty DataFrame with the desired columns if no evaluation is performed
        # This prevents Gradio from throwing an error about unexpected output type.
        evaluation_df_for_gradio = pd.DataFrame(columns=[
            'Key Term Name', 'Query', 'Ground Truth', 'LLM Response',
            'Groundedness Score', 'Coherence Score', 'Relevance Score', 'Fluency Score'
        ])
        # You could also add a row indicating "No evaluation performed"
        # evaluation_df_for_gradio.loc[0] = ["N/A"] * len(evaluation_df_for_gradio.columns)
        # evaluation_df_for_gradio.loc[0, 'Key Term Name'] = "No evaluation performed (ground truth file required)."

    return llm_output, ground_truth_output, evaluation_df_for_gradio

# Gradio Interface
def create_interface():
    """Create Gradio interface for the legal document analyzer"""

    with gr.Blocks(title="Legal Document Analyzer with Azure Evaluation") as interface:
        gr.Markdown("# Legal Document Analyzer with Azure Evaluation")
        gr.Markdown("Upload a legal contract document and optionally a ground truth file to extract key terms and evaluate the results.")

        with gr.Row():
            with gr.Column():
                contract_file = gr.File(
                    label="Upload Contract Document",
                    file_types=[".pdf", ".docx", ".doc", ".txt", ".csv"]
                )
                ground_truth_file = gr.File(
                    label="Upload Ground Truth Document (Optional)",
                    file_types=[".pdf", ".docx", ".doc", ".txt", ".csv"]
                )
                analyze_btn = gr.Button("Analyze Document", variant="primary")

        with gr.Row():
            with gr.Column():
                llm_output = gr.Markdown(label="LLM Results")
            with gr.Column():
                ground_truth_output = gr.Markdown(label="Ground Truth Results")

        with gr.Row():
            # Changed from gr.HTML to gr.DataFrame
            # REMOVE headers="keys" - Gradio will infer from the DataFrame
            evaluation_output = gr.DataFrame(label="Evaluation Results", wrap=True) # wrap=True for better text wrapping in cells

        analyze_btn.click(
            fn=display_results,
            inputs=[contract_file, ground_truth_file],
            outputs=[llm_output, ground_truth_output, evaluation_output]
        )

    return interface

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://1ca899687ed3a39e6e.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://1ca899687ed3a39e6e.gradio.live


### `if __name__ == "__main__":` Block Overview

| **Aspect**         | **Details**                                                                                           |
|--------------------|--------------------------------------------------------------------------------------------------------|
| **Purpose**        | Entry point for running the app as a standalone script. Ensures the Gradio interface launches only when the script is executed directly. |
| **Functionality**  | - Calls the function to create the Gradio UI interface.<br>- Launches the app with debug and public sharing enabled. |
| **Parameters**     | `debug=True`: Enables debugging output in the console.<br>`share=True`: Generates a public Gradio link. |
| **Usage**          | Run this Python script directly to open the interface locally and via a public URL (if online).       |


In [None]:
if __name__ == "__main__":
    interface = create_interface()
    interface.launch(debug=True, share=True)