# LLM-as-a-Judge Simply Explained: A Complete Guide to Run LLM Evals

Recently, the concept of ‚ÄúLLM as a Judge‚Äù has been gaining significant traction in the AI and NLP communities. As someone deeply involved in the field of LLM evaluation, I‚Äôve seen firsthand how LLM judges are rapidly becoming the preferred method for evaluating language models. The reasons are clear: compared to traditional human evaluators, LLM judges offer faster, more scalable, and cost-effective assessments‚Äîeliminating much of the slow, expensive, and labor-intensive work that comes with manual review.

However, it‚Äôs important to recognize that LLM judges are not without their own challenges and limitations. Blindly relying on them can lead to misleading results and unnecessary frustration. That‚Äôs why, in this guide, I‚Äôll share everything I‚Äôve learned about leveraging LLM judges for system evaluation, including:

- The core principles behind LLM-as-a-Judge
- The practical benefits and pitfalls of automated evaluation
- Step-by-step instructions for setting up and running LLM-based evals 

---

## What exactly is ‚ÄúLLM as a Judge‚Äù?

‚ÄúLLM-as-a-Judge‚Äù refers to the process of using Large Language Models (LLMs) to evaluate the outputs of other LLM systems. Instead of relying on human evaluators‚Äîwhich can be slow, expensive, and inconsistent‚Äîthis approach leverages the reasoning and language understanding capabilities of LLMs to provide automated, scalable assessments.

The process typically works as follows:
1. **Define Evaluation Criteria:** You start by crafting an evaluation prompt that clearly specifies the criteria you want to assess (such as accuracy, relevance, faithfulness, bias, or any custom metric).
2. **Present Inputs and Outputs:** The LLM judge is given the original input (e.g., a question or task) and the output generated by the LLM system under evaluation.
3. **Automated Scoring:** The LLM judge reviews the information and assigns a score or rating based on the defined criteria.

LLM judges are commonly used to power advanced evaluation metrics like G-Eval, answer relevancy, faithfulness, and bias detection. By automating the evaluation process, LLM-as-a-Judge enables faster, more consistent, and more scalable assessments‚Äîmaking it an increasingly popular choice for both research and production environments.

---

## Prerequisites

Before you get started, please make sure you have the following ready:

---

### 1. Sample Contract File for Testing

To try out the contract analysis workflow, download the sample contract file provided below:

- [Download Sample Contract (Google Drive)](https://drive.google.com/file/d/11dCpPvkt1MJqaiFoNQ67ujZ3qdSMmIGw/view?usp=sharing)

### 2. OpenAI API Key

You‚Äôll need your own OpenAI API key to access the language models used for contract evaluation. If you don‚Äôt have one yet, follow this step-by-step guide to generate your API key:

- [How to get your own OpenAI API key (Medium article)](https://medium.com/@lorenzozar/how-to-get-your-own-openai-api-key-f4d44e60c327)

---

# Step 1: Install the Dependencies

Run the following command in your terminal or Jupyter notebook to install all required packages:

```python
!pip install gradio langchain openai python-docx PyPDF2 pandas
```

---


| Package       | Purpose / Use in Project                                                                 |
|---------------|-----------------------------------------------------------------------------------------|
| **gradio**    | Build interactive web UIs for machine learning and data apps. Lets users upload files, view results, and interact with your tool in a browser. |
| **langchain** | Framework for building applications powered by large language models (LLMs). Helps with document loading, processing, and LLM integration.      |
| **openai**    | Official Python client for OpenAI‚Äôs API. Allows your code to send prompts and receive responses from models like GPT-4.                         |
| **python-docx** | Read, write, and extract text from Microsoft Word (.docx) files. Used to process contract documents in Word format.                        |
| **PyPDF2**    | Read and extract text from PDF files. Enables your tool to analyze contracts provided as PDFs.                                                  |
| **pandas**    | Powerful data analysis and manipulation library. Used to organize, process, and display results in tables (dataframes).                        |

In [1]:
# Install necessary packages
! pip install gradio langchain openai python-docx PyPDF2 pandas

Defaulting to user installation because normal site-packages is not writeable



[notice] A new release of pip is available: 25.0.1 -> 25.1.1
[notice] To update, run: C:\Python313\python.exe -m pip install --upgrade pip


## After Installing Dependencies: Let's Start Importing!

Now that you‚Äôve installed all the necessary libraries, let‚Äôs import them into your Python script or notebook. Here‚Äôs a summary of each import and its purpose:

| Import Statement                                                                 | Purpose / Usage                                                                                                 |
|----------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|
| `import gradio as gr`                                                            | Imports Gradio for building interactive web interfaces for your app.                                            |
| `from langchain.document_loaders import PyPDFLoader, Docx2txtLoader, TextLoader` | Imports document loaders from LangChain to extract text from PDF, DOCX, and TXT files.                         |
| `from openai import OpenAI`                                                      | Imports the OpenAI client to interact with language models like GPT-4 for contract analysis.                    |
| `import pandas as pd`                                                            | Imports Pandas for organizing, processing, and displaying results in tables (dataframes).                      |
| `import os`                                                                     | Imports Python‚Äôs built-in OS module for handling file paths and interacting with the operating system.          |
| `import tempfile`                                                               | Imports the tempfile module to safely create and manage temporary files and directories during file processing. |


In [2]:
import gradio as gr
from langchain.document_loaders import PyPDFLoader, Docx2txtLoader, TextLoader
from openai import OpenAI
import pandas as pd
import os
import tempfile

## Key Terms and Evaluation Metrics

In this section ,  we focus on extracting specific key terms from contract documents and then evaluating the quality of those extractions using a set of well-defined metrics. Here‚Äôs a clear overview:

---

### Key Terms

These are the important contract clauses or topics that the tool will automatically search for and extract from uploaded documents:

| Key Term                     | Description (Typical Focus in Contracts)                |
|------------------------------|--------------------------------------------------------|
| Service Warranty             | Guarantees and conditions for services provided        |
| Limitation of Liability      | Limits on legal responsibility for damages or losses   |
| Governing Law                | Specifies which jurisdiction‚Äôs laws apply              |
| Termination for Cause        | Conditions under which the contract can be ended early |
| Payment Terms                | Details about payment amounts, schedules, and methods  |
| Confidentiality Obligations  | Rules about keeping information private                |

---

### Evaluation Metrics

After extracting the key terms, the tool evaluates the quality and accuracy of each extraction using the following metrics:

| Evaluation Metric                                               | What It Measures                                                                 |
|-----------------------------------------------------------------|----------------------------------------------------------------------------------|
| Answer Accuracy                                                 | Is the extracted answer factually correct?                                       |
| Citation Accuracy                                               | Are references (like page numbers) correct?                                      |
| Was the information extracted as per the question asked?        | Does the answer directly address the key term?                                   |
| Was the information complete?                                   | Is all relevant information included?                                            |
| Was the information enough to make a conclusive decision?       | Is the answer sufficient for decision-making?                                    |
| Were associated red flags covered in the extracted output?      | Are potential issues or risks mentioned?                                         |
| Was all information related to the relevant clause captured?    | Is the extraction thorough for the clause?                                       |
| Was the extracted information correctly favoring one party?     | Does the answer reflect the contract‚Äôs intent regarding parties?                 |
| Is the AI reasoning covering all aspects of the key term?       | Is the explanation comprehensive?                                                |
| Is the Contract display area highlighting the relevant parts?   | Are important sections visually highlighted?                                     |
| Was the information extracted from all relevant clauses?        | Are multiple relevant sections included if needed?                               |
| Was the page number of extracted information correct?           | Are page references accurate?                                                    |
| Was the AI reasoning discussing the relevant clause?            | Is the explanation focused on the right part?                                    |
| Were related provisions highlighted on clicking page numbers?   | Is navigation to related sections effective?                                     |
| Was the information extracted from the correct part of a stitched document? | Is the answer from the right section in merged documents?            |
| Does the information stay within document scope?                | Is the answer limited to the uploaded contract?                                  |
| Were results free from misleading claims?                       | Are there any false or misleading statements?                                    |
| Does the tool avoid generic/non-contract answers?               | Is the answer specific to the contract, not generic?                             |
| Did the AI avoid illegal or insensitive justifications?         | Are explanations appropriate and lawful?                                         |
| Did the tool prevent false claims about people/entities?        | Are there any incorrect statements about parties?                                |
| Did the tool context hateful/profane content?                   | Is the output free from inappropriate language?                                  |

---

**In summary:**  
- The **key terms** guide what information is extracted from contracts.
- The **evaluation metrics** ensure that the extracted information is accurate, complete, relevant, and presented in a user-friendly and responsible way.

In [3]:
KEY_TERMS = [
    "Service Warranty",
    "Limitation of Liability",
    "Governing Law",
    "Termination for Cause",
    "Payment Terms",
    "Confidentiality Obligations"
]

EVALUATION_METRICS = [
    "Answer Accuracy",
    "Citation Accuracy",
    "Was the information extracted as per the question asked in the key term?",
    "Was the information complete?",
    "Was the information enough to make a conclusive decision?",
    "Were associated red flags covered in the extracted output?",
    "Was all information related to the relevant clause captured?",
    "Was the extracted information correctly favoring one party?",
    "Is the AI reasoning covering all aspects of the key term?",
    "Is the Contract display area highlighting the relevant parts?",
    "Was the information extracted from all relevant clauses?",
    "Was the page number of extracted information correct?",
    "Was the AI reasoning discussing the relevant clause?",
    "Were related provisions highlighted on clicking page numbers?",
    "Was the information extracted from the correct part of a stitched document?",
    "Does the information stay within document scope?",
    "Were results free from misleading claims?",
    "Does the tool avoid generic/non-contract answers?",
    "Did the AI avoid illegal or insensitive justifications?",
    "Did the tool prevent false claims about people/entities?",
    "Did the tool context hateful/profane content?"
]

## Function: `extract_text_from_file`

This function is designed to handle the extraction of text from various types of contract files. It supports PDF, Word (DOCX/DOC), and plain text (TXT) formats, making your tool flexible for different document types.

---

### How It Works

| Step | What Happens                                                                                      | Why It‚Äôs Important                                  |
|------|--------------------------------------------------------------------------------------------------|-----------------------------------------------------|
| 1    | Determines the file extension using `os.path.splitext(file_path)[1].lower()`                      | Identifies the type of document being processed     |
| 2    | Selects the appropriate loader:                                                                   | Ensures correct extraction method for each format   |
|      | - `PyPDFLoader` for PDFs                                                                          |                                                     |
|      | - `Docx2txtLoader` for Word documents (`.docx`, `.doc`)                                           |                                                     |
|      | - `TextLoader` for plain text files (`.txt`)                                                      |                                                     |
| 3    | Raises a `ValueError` if the file type is not supported                                           | Prevents errors from unsupported file formats       |
| 4    | Loads the document(s) using the selected loader (`loader.load()`)                                 | Reads the content into a list of document objects   |
| 5    | Combines the text from all pages/sections into a single string using `"\n".join([...])`           | Provides a unified text block for further analysis  |
| 6    | Returns both the combined text and the list of document objects (`docs`)                          | The text is used for analysis; `docs` can provide page numbers or metadata if needed |

---

### Example Usage

Suppose you have a contract file called `contract.pdf`:

```python
text, docs = extract_text_from_file("contract.pdf")
print(text)  # Prints the full extracted text
```

---

**In summary:**  
This function is a robust utility for extracting and preparing contract text from various file formats, setting the stage for further analysis and evaluation in your workflow.

In [4]:
def extract_text_from_file(file_path):
    ext = os.path.splitext(file_path)[1].lower()
    if ext == ".pdf":
        loader = PyPDFLoader(file_path)
    elif ext in [".docx", ".doc"]:
        loader = Docx2txtLoader(file_path)
    elif ext in [".txt"]:
        loader = TextLoader(file_path)
    else:
        raise ValueError("Unsupported file type")
    docs = loader.load()
    # Combine all pages/sections
    text = "\n".join([doc.page_content for doc in docs])
    return text, docs  # docs for page numbers if needed

## Setting Up the OpenAI Client

To interact with OpenAI‚Äôs language models (such as GPT-4), you need to create a client object using your own API key. This allows your application to send prompts and receive responses from OpenAI‚Äôs servers.

---

### Example Code

```python
client = OpenAI(api_key='sk-...your-own-api-key-here...')
```
---

### For a Step-by-Step Guide

You can follow this detailed tutorial:  
[How to get your own OpenAI API key (Medium article)](https://medium.com/@lorenzozar/how-to-get-your-own-openai-api-key-f4d44e60c327)

---

### Important Note About API Keys

- **Security:** Never share your OpenAI API key publicly or commit it to version control (like GitHub). Treat it like a password.
- **Personal Key Required:** The API key in the example above is for demonstration only. You must use your own unique API key to access OpenAI services.

---

**In summary:**  
You need your own OpenAI API key to use the language models. Never share your key, and always keep it secure!

In [10]:
client = OpenAI(api_key='Place Yor API Key')

## Function: `extract_key_terms`

### This function uses an AI language model to automatically extract specific key terms or clauses from a contract document. It is designed to work with a list of key terms (such as "Service Warranty" or "Payment Terms") and return the relevant sections of the contract for each term.
---

### How It Works

| Step | What Happens                                                                                                    | Why It‚Äôs Important                                  |
|------|----------------------------------------------------------------------------------------------------------------|-----------------------------------------------------|
| 1    | Loops through each key term in the provided list.                                                              | Ensures all important contract clauses are checked.  |
| 2    | For each term, constructs a prompt asking the AI to extract relevant sections from the contract text.           | Guides the AI to focus on the specific clause.       |
| 3    | Sends the prompt to the OpenAI language model (e.g., GPT-4) for analysis.                                       | Leverages advanced AI for accurate extraction.       |
| 4    | Receives the AI‚Äôs answer, which should include the relevant text and, if possible, page numbers.                | Provides both the content and its location.          |
| 5    | Uses a regular expression to try to extract the page number from the AI‚Äôs answer, if mentioned.                 | Helps with citation and navigation in the document.  |
| 6    | Stores the answer and page number for each key term in a results dictionary.                                    | Organizes results for easy access and further use.   |
| 7    | Returns the dictionary mapping each key term to its extracted answer and page number.                           | Makes the output easy to use in later steps.         |

---

### Example Usage

Suppose you have extracted the text from a contract and want to find the relevant sections for a list of key terms:

```python
key_terms = ["Service Warranty", "Payment Terms"]
results = extract_key_terms(contract_text, key_terms)
print(results["Service Warranty"]["answer"])      # Shows the extracted section for Service Warranty
print(results["Service Warranty"]["page_number"]) # Shows the page number if found
```

---

### Why This Matters

- **Automates tedious work:** Saves time by letting AI scan and extract key clauses from lengthy contracts.
- **Consistent and thorough:** Ensures every key term is checked in the same way, reducing human error.
- **Prepares for evaluation:** The extracted answers can be further evaluated for accuracy and completeness using your evaluation metrics.

---

**In summary:**  
This function is a core part of your contract analysis workflow, using AI to quickly and accurately extract the most important sections from any contract document, ready for further review or evaluation.

In [6]:
def extract_key_terms(text, key_terms):
    results = {}
    for term in key_terms:
        prompt = (
            f"You are a legal document analysis assistant. "
            f"Extract the section(s) of the following contract that pertain to '{term}'. "
            f"Return the relevant text verbatim, and if possible, the page number(s) where it appears. "
            f"If not found, say 'Not found'.\n\n"
            f"Document:\n{text[:4000]}..."  # Truncate for token limit
        )
        completion = client.chat.completions.create(
            model="gpt-4-1106-preview",
            messages=[
                {"role": "system", "content": "You are a legal contract analysis assistant."},
                {"role": "user", "content": prompt}
            ]
        )
        answer = completion.choices[0].message.content
        # Try to extract page number if mentioned
        page_number = None
        if "page" in answer.lower():
            import re
            match = re.search(r'page[s]?\s*(\d+)', answer, re.IGNORECASE)
            if match:
                page_number = match.group(1)
        results[term] = {"answer": answer, "page_number": page_number}
    return results

## Function: judge_key_term

This function evaluates how well an extracted answer from a contract addresses a specific key term, using a set of evaluation metrics. It leverages an AI model to provide both a numerical score and a brief justification for each metric.

---

| Step                        | Description                                                                                                                      | Why It‚Äôs Important                                                                                 |
|-----------------------------|----------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|
| **1. Iterate Over Metrics** | For each evaluation metric (e.g., "Answer Accuracy", "Citation Accuracy"), the function performs an assessment.                  | Ensures every aspect of the answer is evaluated systematically.                                    |
| **2. Prompt Construction**  | Builds a prompt for the AI model, asking it to evaluate the extracted answer for the key term against the current metric.        | Guides the AI to focus on the specific evaluation criteria.                                        |
|                             | If a ground truth answer is available, it is included in the prompt for comparison.                                              | Allows for more rigorous and objective evaluation.                                                 |
| **3. AI Evaluation**        | Sends the prompt to the OpenAI language model (e.g., GPT-4) and receives a response with:                                       | Leverages advanced AI for consistent and expert-like evaluation.                                   |
|                             | - A score from 1 (poor) to 5 (excellent)                                                                                        |                                                                                                    |
|                             | - A brief justification (1-2 sentences) explaining the score                                                                    |                                                                                                    |
| **4. Parse Response**       | Extracts the score and justification from the AI‚Äôs response using regular expressions.                                          | Converts the AI‚Äôs output into structured data for further use.                                     |
| **5. Compile Results**      | For each metric, creates a result entry containing:                                                                             | Organizes evaluation data for easy analysis and display.                                           |
|                             | - Key term name                                                                                                                 |                                                                                                    |
|                             | - Extracted answer from the document                                                                                            |                                                                                                    |
|                             | - Evaluation metric name                                                                                                        |                                                                                                    |
|                             | - Score                                                                                                                         |                                                                                                    |
|                             | - Justification                                                                                                                 |                                                                                                    |
|                             | - (Page number is included as a placeholder and can be filled in later)                                                         |                                                                                                    |
| **6. Return Results**       | After evaluating all metrics, returns a list of these result entries.                                                           | Provides a comprehensive evaluation report for further analysis or display.                        |

---

### Why This Matters

| Benefit                    | Explanation                                                                                      |
|----------------------------|--------------------------------------------------------------------------------------------------|
| **Objective Assessment**   | Automates the evaluation process, reducing human bias and increasing consistency.                |
| **Detailed Feedback**      | Provides both a score and a justification, helping users understand the strengths and weaknesses.|
| **Ground Truth Comparison**| If a correct answer is known, enables direct comparison for more rigorous evaluation.            |

---

**In summary:**  
This function is essential for systematically and transparently evaluating the quality of information extracted from contracts, ensuring reliable and actionable results.

In [7]:
def judge_key_term(term, llm_answer, metrics, ground_truth=None):
    results = []
    for metric in metrics:
        prompt = (
            f"You are an expert contract evaluator. "
            f"Evaluate the following extracted answer for the key term '{term}' "
            f"against the evaluation metric: '{metric}'.\n"
            f"Extracted Answer: {llm_answer}\n"
        )
        if ground_truth:
            prompt += f"Ground Truth Answer: {ground_truth}\n"
        prompt += (
            "For this metric, provide:\n"
            "- A score from 1 (poor) to 5 (excellent)\n"
            "- A brief justification (1-2 sentences)\n"
            "Respond in the format: Score: <number>\nJustification: <text>"
        )
        completion = client.chat.completions.create(
            model="gpt-4-1106-preview",
            messages=[
                {"role": "system", "content": "You are a contract evaluation expert."},
                {"role": "user", "content": prompt}
            ]
        )
        content = completion.choices[0].message.content
        import re
        score_match = re.search(r"Score:\s*(\d+)", content)
        justification_match = re.search(r"Justification:\s*(.*)", content, re.DOTALL)
        score = int(score_match.group(1)) if score_match else None
        justification = justification_match.group(1).strip() if justification_match else content
        results.append({
            "key_term_name": term,
            "llm_extracted_ans_from_doc": llm_answer,
            "page_number": None,  # Will fill later
            "evulation_metric_name": metric,
            "score": score,
            "justification": justification
        })
    return results

## Function: process_documents (Version Without Ground Truth)

This function manages the main workflow for analyzing a contract document, from extracting its text to evaluating key terms and preparing the results for display. In this version, ground truth comparison is not included.

---

| Step | Description                                                                                                   | Why It‚Äôs Important                                                      |
|------|---------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------|
| 1    | Extracts text and document objects from the uploaded contract file using `extract_text_from_file`.             | Converts the contract into a format suitable for further analysis.       |
| 2    | Extracts key terms from the contract text using `extract_key_terms` and the predefined `KEY_TERMS` list.       | Identifies and isolates the most important clauses in the contract.      |
| 3    | (Ground truth loading is commented out in this version.)                                                      | This version does not compare to reference answers.                      |
| 4    | For each key term:                                                                                             | Ensures every key term is systematically evaluated.                      |
|      | - Retrieves the LLM-extracted answer and page number.                                                          |                                                                         |
|      | - Evaluates the extracted answer using `judge_key_term` and the `EVALUATION_METRICS` list.                     | Produces a set of scores and justifications for each metric.             |
|      | - Updates the evaluation results with the correct page number.                                                 | Keeps results organized and traceable.                                   |
|      | - Collects all evaluation results into a single list.                                                          |                                                                         |
| 5    | Converts the results into a Pandas DataFrame for easy display and further analysis.                            | Makes it simple to present and analyze the results in tabular form.      |
| 6    | Returns the extracted contract text and the DataFrame of evaluation results.                                   | Provides all necessary outputs for downstream use (e.g., UI display).    |

---

### Why This Matters

- **End-to-End Automation:**  
  This function ties together all the core steps of the contract analysis pipeline, making the process seamless for the user.

- **Simplicity:**  
  By omitting ground truth comparison, this version is ideal for real-world scenarios where reference answers may not be available.

- **Structured Output:**  
  Returns results in a DataFrame, which is ideal for visualization, reporting, or further processing.

---

**In summary:**  
This version of `process_documents` is the main driver function for contract analysis, handling everything from document ingestion to evaluation, and returning results ready for display or further use.

In [8]:
def process_documents(contract_file, ground_truth_file=None):
    # contract_file is a path (str), not a file-like object
    text, docs = extract_text_from_file(contract_file)
    
    # Step 2: Extract key terms
    key_term_results = extract_key_terms(text, KEY_TERMS)
    
    # # Step 3: Load ground truth if provided
    # ground_truth = None
    # if ground_truth_file is not None:
    #     ground_truth = load_ground_truth(ground_truth_file)
    
    # Step 4: Judge each key term
    all_results = []
    for term in KEY_TERMS:
        llm_ans = key_term_results[term]["answer"]
        page_number = key_term_results[term]["page_number"]
        # gt_ans = ground_truth[term] if ground_truth and term in ground_truth else None
        evals = judge_key_term(term, llm_ans, EVALUATION_METRICS)
        for e in evals:
            e["page_number"] = page_number
        all_results.extend(evals)
    
    # Step 5: Prepare DataFrame for display
    df = pd.DataFrame(all_results)
    return text, df

## Gradio App Interface: LLM Contract Judge

This section defines the interactive web interface for the contract analysis tool using Gradio. The interface allows users to upload contract files, extract key terms, evaluate them using an LLM, and view the results in a user-friendly format.

---

| UI Element / Step         | Description                                                                                                   | Why It‚Äôs Important                                                      |
|---------------------------|---------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------|
| **App Container**         | Uses `gr.Blocks()` to create a modular, flexible Gradio app.                                                  | Allows for a clean, organized, and interactive user interface.           |
| **Title Markdown**        | Displays a title and brief instructions at the top of the app.                                                | Helps users understand the app‚Äôs purpose and how to use it.              |
| **Upload & Extract Tab**  | Provides a tab for uploading contract files (PDF, DOCX, TXT).                                                 | Lets users easily provide the documents they want to analyze.            |
| **File Upload Widget**    | Allows users to upload a contract file.                                                                       | Supports multiple file formats for flexibility.                          |
| **Extract Button**        | A button labeled "Extract & Evaluate" to start the analysis process.                                          | Gives users control over when to begin processing.                       |
| **Extracted Text Box**    | Displays the extracted text from the uploaded contract.                                                       | Offers transparency and lets users review what was extracted.             |
| **Results Table Tab**     | Provides a separate tab to display the evaluation results in a table format.                                  | Organizes results for easy review and comparison.                        |
| **Results Dataframe**     | Shows a table with columns for key term, extracted answer, page number, evaluation metric, score, and justification. | Presents detailed evaluation results in a structured, readable way.      |
| **run_all Function**      | Defines the function that runs the full analysis pipeline when the button is clicked.                         | Connects the UI to the backend logic for seamless operation.             |
| **Button Click Event**    | Links the "Extract & Evaluate" button to the `run_all` function, passing the uploaded file as input.          | Ensures user actions trigger the correct processing workflow.            |
| **App Launch**            | Calls `demo.launch()` to start the Gradio app and make it accessible in the browser.                          | Makes the tool available for interactive use.                            |

---

### Why This Matters

- **User-Friendly:**  
  The Gradio interface makes it easy for users to interact with complex AI-powered contract analysis tools without needing to write code.

- **Transparency:**  
  Users can see both the raw extracted text and the detailed evaluation results, increasing trust in the tool.

- **Efficiency:**  
  The app streamlines the workflow from document upload to actionable insights, all in one place.

---

**In summary:**  
This Gradio app provides an accessible, interactive front-end for your contract analysis pipeline, allowing users to upload documents, trigger analysis, and review results with ease.

# When you run the last cell in your notebook, you‚Äôll see a message like the one shown in the image below. Click on the "Running on local URL" link‚Äîyou will be redirected to a new screen where you can interact with the LLM Contract Judge app.

![Gradio Local URL Example](Images//img-1.png)

In [9]:
with gr.Blocks() as demo:
    gr.Markdown("# üìÑ LLM Contract Judge\nUpload a contract, extract key terms, and evaluate with LLM.")
    with gr.Tab("Upload & Extract"):
        contract_file = gr.File(label="Upload Contract (PDF, DOCX, TXT)")
        # ground_truth_file = gr.File(label="Upload Ground Truth CSV (optional)", optional=True)
        extract_btn = gr.Button("Extract & Evaluate")
        extracted_text = gr.Textbox(label="Extracted Text", lines=10)
    with gr.Tab("Results Table"):
        results_table = gr.Dataframe(headers=[
            "key_term_name", "llm_extracted_ans_from_doc", "page_number", "evulation_metric_name", "score", "justification"
        ], label="Evaluation Results")
    
    def run_all(contract_file, ground_truth_file):
        text, df = process_documents(contract_file, ground_truth_file)
        return text, gr.update(value=df)
    
    extract_btn.click(
        run_all,
        inputs=[contract_file],
        outputs=[extracted_text, results_table]
    )

demo.launch()



Running on local URL:  http://127.0.0.1:7860


--------



To create a public link, set `share=True` in `launch()`.




