## Installation

In [1]:
# Install required packages with compatible versions
print("Installing/upgrading required packages...")
print("="*60)

# Force upgrade transformers and huggingface_hub together for compatibility
!pip install --upgrade --force-reinstall --no-cache-dir transformers huggingface_hub

print("\nInstalling other required packages...")
!pip install --upgrade --quiet peft accelerate bitsandbytes gradio sentencepiece protobuf

print("\n" + "="*60)
print("‚úì All packages installed")
print("="*60)
print()
print("‚ö†Ô∏è  CRITICAL: RESTART KERNEL NOW!")
print("   Kaggle: Click ‚ãÆ (three dots) ‚Üí Restart Session")
print("   Or use: Session ‚Üí Restart Session")
print("   Then continue from Cell 3")
print("="*60)

Installing/upgrading required packages...
Collecting transformers
  Downloading transformers-5.2.0-py3-none-any.whl.metadata (32 kB)
Collecting huggingface_hub
  Downloading huggingface_hub-1.4.1-py3-none-any.whl.metadata (13 kB)
Collecting numpy>=1.17 (from transformers)
  Downloading numpy-2.4.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (6.6 kB)
Collecting packaging>=20.0 (from transformers)
  Downloading packaging-26.0-py3-none-any.whl.metadata (3.3 kB)
Collecting pyyaml>=5.1 (from transformers)
  Downloading pyyaml-6.0.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (2.4 kB)
Collecting regex!=2019.12.17 (from transformers)
  Downloading regex-2026.1.15-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (40 kB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m40.5/40.5 kB[0m [31m158


Installing other required packages...

‚úì All packages installed

‚ö†Ô∏è  CRITICAL: RESTART KERNEL NOW!
   Kaggle: Click ‚ãÆ (three dots) ‚Üí Restart Session
   Or use: Session ‚Üí Restart Session
   Then continue from Cell 3


# üöÄ Gradio Deployment - Kaggle & Colab Compatible

## Run this AFTER training your model

This notebook deploys your trained Legal Case Summarization model with Gradio.

**Compatible with:**
- ‚úÖ Kaggle Notebooks
- ‚úÖ Google Colab
- ‚úÖ Local Jupyter (with GPU)

**Steps:**
1. ‚úÖ **Run Cell 2** (Installation - installs all required packages)
2. ‚ö†Ô∏è **RESTART KERNEL** (Critical!)
   - **Kaggle**: Click ‚ãÆ ‚Üí Restart Session
   - **Colab**: Runtime ‚Üí Restart Runtime
3. ‚úÖ **Skip Cell 2**, start from Cell 4 onwards
4. ‚úÖ **Check Cell 7** (Version verification)
5. ‚úÖ Continue through all cells
6. ‚úÖ **Final cell creates a public Gradio link** (click to access interface)

**Prerequisites:**
- Model must be trained (adapter files saved) OR attached as dataset
- Internet must be enabled (Kaggle: Add-ons ‚Üí Internet ‚Üí ON)
- **Gemma license accepted**: Visit https://huggingface.co/google/gemma-2b
- HuggingFace token configured (already set in Cell 9)
- GPU accelerator recommended (but not required for inference)

**Common Issues:**
- If you see ImportError ‚Üí Restart kernel after Cell 2
- If can't see Gradio interface ‚Üí The public link appears at the bottom (click it)
- If you see PackageNotFoundError ‚Üí Run Cell 2 again, then restart kernel

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [1]:
# Check if model files exist
import os

# Update MODEL_DIR based on your situation:
# Option 1: Attached as Kaggle dataset (RECOMMENDED):
MODEL_DIR = "/content/drive/MyDrive/Colab Notebooks/Domain-Specific-FIned-Tuned-Legal-AI/legal-summarization-lora"

# Option 2: Same Kaggle session (just trained):
# MODEL_DIR = "./legal-summarization-lora"
#
# IMPORTANT: Check "Input" or "Data" tab on right sidebar
# to see the exact dataset path. It's usually:
# /kaggle/input/{dataset-name}/

print("="*60)
print("CHECKING MODEL FILES")
print("="*60)

required_files = [
    "adapter_model.safetensors",  # or adapter_model.bin
    "adapter_config.json",
    "tokenizer.json",
    "tokenizer_config.json"
]

if not os.path.exists(MODEL_DIR):
    print(f"‚ùå Model directory not found: {MODEL_DIR}")
    print("\nYou need to either:")
    print("1. Train the model first (run the training notebook)")
    print("2. OR attach a Kaggle dataset with your trained model")
    print("   - Click 'Add Data' (right sidebar)")
    print("   - Select your saved model dataset")
    print("   - Update MODEL_DIR to /kaggle/input/...")
    raise FileNotFoundError(f"Model not found at {MODEL_DIR}")

print(f"‚úì Model directory found: {MODEL_DIR}\n")

missing_files = []
for file in required_files:
    file_path = os.path.join(MODEL_DIR, file)
    # Check for both .safetensors and .bin versions
    if file == "adapter_model.safetensors":
        safetensors_path = os.path.join(MODEL_DIR, "adapter_model.safetensors")
        bin_path = os.path.join(MODEL_DIR, "adapter_model.bin")
        if os.path.exists(safetensors_path):
            size = os.path.getsize(safetensors_path) / (1024 * 1024)
            print(f"‚úì adapter_model.safetensors ({size:.2f} MB)")
        elif os.path.exists(bin_path):
            size = os.path.getsize(bin_path) / (1024 * 1024)
            print(f"‚úì adapter_model.bin ({size:.2f} MB)")
        else:
            print(f"‚ùå Missing: adapter_model.safetensors or adapter_model.bin")
            missing_files.append(file)
    else:
        if os.path.exists(file_path):
            size = os.path.getsize(file_path) / (1024 * 1024)
            print(f"‚úì {file} ({size:.2f} MB)")
        else:
            print(f"‚ùå Missing: {file}")
            missing_files.append(file)

if missing_files:
    print(f"\n‚ùå Missing {len(missing_files)} required file(s)")
    raise FileNotFoundError("Model files incomplete")

print("\n" + "="*60)
print("‚úì ALL MODEL FILES PRESENT")
print("="*60)

CHECKING MODEL FILES
‚úì Model directory found: /content/drive/MyDrive/Colab Notebooks/Domain-Specific-FIned-Tuned-Legal-AI/legal-summarization-lora

‚úì adapter_model.safetensors (7.04 MB)
‚úì adapter_config.json (0.00 MB)
‚úì tokenizer.json (32.76 MB)
‚úì tokenizer_config.json (0.00 MB)

‚úì ALL MODEL FILES PRESENT


In [3]:
# Import required libraries
import torch
import gradio as gr
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

print("‚úì Libraries imported successfully")

‚úì Libraries imported successfully


In [4]:
# Verify package versions (run after kernel restart)
import transformers
import peft
import bitsandbytes as bnb

print("="*60)
print("PACKAGE VERSION CHECK")
print("="*60)
print(f"Transformers: {transformers.__version__}")
print(f"PEFT: {peft.__version__}")
print(f"Bitsandbytes: {bnb.__version__}")

# Check if transformers version is compatible with Gemma
version_parts = transformers.__version__.split('.')
major = int(version_parts[0])
minor = int(version_parts[1]) if len(version_parts) > 1 else 0

if major > 4 or (major == 4 and minor >= 38):
    print("\n‚úì Transformers version is compatible with Gemma (requires 4.38.0+)")
else:
    print(f"\n‚ùå Transformers version too old: {transformers.__version__}")
    print("   Required: 4.38.0+")
    print("   Go back to Cell 2, run installation, and restart kernel again")

print("="*60)

PACKAGE VERSION CHECK
Transformers: 5.2.0
PEFT: 0.18.1
Bitsandbytes: 0.49.2

‚úì Transformers version is compatible with Gemma (requires 4.38.0+)


In [None]:
# Configuration
BASE_MODEL_NAME = "google/gemma-2b"

MODEL_DIR = "/content/drive/MyDrive/Colab Notebooks/Domain-Specific-FIned-Tuned-Legal-AI/legal-summarization-lora"

print("="*60)
print("LOADING MODEL FOR DEPLOYMENT")
print("="*60)

print(f"Base model: {BASE_MODEL_NAME}")
print("="*60)
print(f"Adapter path: {MODEL_DIR}")

LOADING MODEL FOR DEPLOYMENT
Base model: google/gemma-2b
Adapter path: /content/drive/MyDrive/Colab Notebooks/Domain-Specific-FIned-Tuned-Legal-AI/legal-summarization-lora


In [6]:
# Authenticate with Hugging Face (required for Gemma access)
from huggingface_hub import login

print("="*60)
print("HUGGING FACE AUTHENTICATION")
print("="*60)

# Your HuggingFace token
HF_TOKEN = "YOUR_HUGGINGFACE_TOKEN_HERE"

try:
    login(token=HF_TOKEN, add_to_git_credential=False)
    print("‚úì Authentication successful!")
    print()
    print("IMPORTANT: Make sure you've accepted the Gemma license:")
    print("Visit: https://huggingface.co/google/gemma-2b")
    print("Click 'Agree and access repository'")
except Exception as e:
    print(f"‚ùå Authentication failed: {e}")
    print()
    print("To fix:")
    print("1. Get your token from: https://huggingface.co/settings/tokens")
    print("2. Update HF_TOKEN value above")
    print("3. Accept Gemma license: https://huggingface.co/google/gemma-2b")
    raise

print("="*60)

HUGGING FACE AUTHENTICATION
‚úì Authentication successful!

IMPORTANT: Make sure you've accepted the Gemma license:
Visit: https://huggingface.co/google/gemma-2b
Click 'Agree and access repository'


In [7]:
# Step 1: Configure 4-bit quantization
print("\n1. Configuring quantization...")
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)
print("‚úì Quantization configured (4-bit NF4)")


1. Configuring quantization...
‚úì Quantization configured (4-bit NF4)


In [8]:
# Step 2: Load base Gemma-2B model
print("\n2. Loading base model...")
print("   (This downloads ~5GB on first run - takes 2-3 minutes)")

base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL_NAME,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

print("‚úì Base model loaded successfully")


2. Loading base model...
   (This downloads ~5GB on first run - takes 2-3 minutes)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/627 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/13.5k [00:00<?, ?B/s]

Downloading (incomplete total...): 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Loading weights:   0%|          | 0/164 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

‚úì Base model loaded successfully


In [9]:
# Step 3: Load YOUR trained LoRA adapters
print("\n3. Loading your fine-tuned LoRA adapters...")
model = PeftModel.from_pretrained(base_model, MODEL_DIR)
print("‚úì LoRA adapters loaded and applied to base model")

# Get model info
device = next(model.parameters()).device
num_params = model.num_parameters() / 1e9

print(f"\nModel Info:")
print(f"  Device: {device}")
print(f"  Parameters: {num_params:.2f}B")


3. Loading your fine-tuned LoRA adapters...
‚úì LoRA adapters loaded and applied to base model

Model Info:
  Device: cuda:0
  Parameters: 2.51B


In [10]:
# Step 4: Load tokenizer
print("\n4. Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id

print("‚úì Tokenizer loaded")
print(f"  Vocabulary size: {len(tokenizer):,}")

print("\n" + "="*60)
print("‚úÖ MODEL READY FOR INFERENCE!")
print("="*60)


4. Loading tokenizer...
‚úì Tokenizer loaded
  Vocabulary size: 256,000

‚úÖ MODEL READY FOR INFERENCE!


In [11]:
# Define inference function
def summarize_case(judgment_text, max_length=256, temperature=0.7):
    """
    Generate a legal case summary from input judgment text.

    Args:
        judgment_text (str): Legal court judgment text
        max_length (int): Maximum summary length in tokens
        temperature (float): Sampling temperature (0.1 = focused, 1.0 = creative)

    Returns:
        str: Generated summary
    """
    if not judgment_text.strip():
        return "‚ö†Ô∏è Please enter a legal judgment to summarize."

    # Format prompt (matches training format)
    prompt = f"""Instruction:
Summarize the following legal court judgment.

Input:
{judgment_text[:3000]}

Response:
"""

    # Tokenize
    inputs = tokenizer(
        prompt,
        return_tensors="pt",
        truncation=True,
        max_length=1280
    ).to(model.device)

    # Generate summary
    try:
        with torch.no_grad():
            outputs = model.generate(
                **inputs,
                max_new_tokens=max_length,
                temperature=temperature,
                top_p=0.9,
                do_sample=True,
                pad_token_id=tokenizer.pad_token_id,
                eos_token_id=tokenizer.eos_token_id,
            )

        # Decode
        full_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

        # Extract only the generated response
        if "Response:" in full_text:
            summary = full_text.split("Response:")[-1].strip()
        else:
            summary = full_text[len(prompt):].strip()

        return summary

    except Exception as e:
        return f"‚ùå Error generating summary: {str(e)}"

print("‚úì Inference function defined")

‚úì Inference function defined


In [12]:
# Test the inference function
test_judgment = """The plaintiff filed a breach of contract claim against the defendant
alleging failure to deliver goods as per the agreement dated January 15, 2024.
The court examined the contract terms, delivery schedules, and evidence of communications
between parties. The defendant argued force majeure due to supply chain disruptions.
After reviewing all evidence, the court finds that the defendant failed to provide
adequate notice and documentation of the alleged force majeure event. The contract
clearly states that any delay must be communicated within 48 hours with supporting
evidence. The defendant's notification came 3 weeks after the missed delivery date.
Therefore, the court rules in favor of the plaintiff and awards damages for breach of contract."""

print("Testing inference...\n")
print("Input (sample judgment):")
print(test_judgment[:200] + "...\n")
print("="*60)
print("Generated Summary:")
print("="*60)

test_summary = summarize_case(test_judgment, max_length=128, temperature=0.7)
print(test_summary)
print("\n" + "="*60)
print("‚úì Inference test successful!")
print("="*60)

Testing inference...

Input (sample judgment):
The plaintiff filed a breach of contract claim against the defendant 
alleging failure to deliver goods as per the agreement dated January 15, 2024. 
The court examined the contract terms, delivery sc...

Generated Summary:
The defendant argued force majeure due to supply chain disruptions. 
The court finds that the defendant failed to provide adequate notice and documentation of the alleged force majeure event. 
The contract clearly states that any delay must be communicated within 48 hours with supporting evidence. 
The defendant's notification came 3 weeks after the missed delivery date. 
Therefore, the court rules in favor of the plaintiff and awards damages for breach of contract.

‚úì Inference test successful!


In [13]:
# Create Gradio Interface
print("\nCreating Gradio interface...\n")

demo = gr.Interface(
    fn=summarize_case,
    inputs=[
        gr.Textbox(
            lines=15,
            placeholder="Paste legal court judgment here...",
            label="üìÑ Legal Court Judgment",
            info="Enter the full text of a legal court judgment or case document"
        ),
        gr.Slider(
            minimum=128,
            maximum=512,
            value=256,
            step=32,
            label="üìè Max Summary Length (tokens)",
            info="Longer = more detailed, shorter = more concise"
        ),
        gr.Slider(
            minimum=0.1,
            maximum=1.0,
            value=0.7,
            step=0.1,
            label="üå°Ô∏è Temperature",
            info="Lower = more focused, higher = more creative"
        )
    ],
    outputs=gr.Textbox(
        lines=10,
        label="üìù Generated Summary"
    ),
    title="‚öñÔ∏è Legal Case Summarization Assistant",
    description="""
## AI-powered legal case summarization using domain-specific fine-tuned LLM

**Model Details:**
- Base: Google Gemma-2B (2 billion parameters)
- Fine-tuning: LoRA on 4,000 legal case judgments
- Specialization: Court judgments, legal opinions, case law

**Instructions:**
1. Paste a legal court judgment in the input box
2. Adjust summary length and creativity (optional)
3. Click "Submit" to generate summary
4. Review the AI-generated summary

**Best results with:** Formal legal judgments, court opinions, case documents

‚ö†Ô∏è **Disclaimer:** This is an AI assistant. Generated summaries should be
reviewed by qualified legal professionals. Not a substitute for legal advice.
    """,
    examples=[
        [
            "The plaintiff filed a breach of contract claim against the defendant alleging failure to deliver goods as per the agreement dated January 15, 2024. The court examined the contract terms, delivery schedules, and evidence of communications between parties. The defendant argued force majeure due to supply chain disruptions. After reviewing all evidence, the court finds that...",
            256,
            0.7
        ],
        [
            "In this employment discrimination case, the plaintiff alleges discriminatory treatment based on age in violation of federal employment laws. The defendant company contests these allegations claiming the termination was part of a broader restructuring initiative. Evidence presented includes performance reviews, internal emails, and testimony from multiple witnesses...",
            256,
            0.7
        ]
    ],
    theme=gr.themes.Soft(),
    cache_examples=False
)

print("‚úì Gradio interface created")


Creating Gradio interface...



  super().__init__(


‚úì Gradio interface created


In [14]:
# Launch Gradio interface
print("\n" + "="*60)
print("üöÄ LAUNCHING GRADIO INTERFACE")
print("="*60)
print("\nThis will create a public link that works in Kaggle and Colab.")
print("The link is temporary and expires after 72 hours.\n")

# For both Kaggle and Colab: use share=True to get a public link
# This creates a temporary URL that you can access
demo.launch(
    share=True,  # Creates public link (works on Kaggle AND Colab)
    debug=False,
    show_error=True
)

print("\n‚úÖ Gradio interface launched!")
print("Click the public URL above to access the interface.")


üöÄ LAUNCHING GRADIO INTERFACE

This will create a public link that works in Kaggle and Colab.
The link is temporary and expires after 72 hours.

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://5ec79190144a7805d8.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)



‚úÖ Gradio interface launched!
Click the public URL above to access the interface.
