# GPT-OSS Legal Contract Summarizer

This notebook demonstrates how to run OpenAI's GPT‑OSS models (20B or 120B) for summarizing legal contracts and flagging compliance risks.

It supports TXT, CSV, and PDF input formats.

⚠️ **Note:** GPT‑OSS‑120B requires substantial GPU memory (8×A100 80GB). For Colab, use GPT‑OSS‑20B with `load_in_8bit=True` if you encounter out-of-memory errors.

In [None]:
!pip install -q torch transformers accelerate bitsandbytes sentencepiece pandas pypdf
!git lfs install

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import pandas as pd
from pypdf import PdfReader

In [None]:
contract_text = """
Section 14: Data Protection Obligations
The Vendor shall comply with all applicable data-protection laws, including GDPR and CCPA.
The Vendor must notify the Client within 72 hours of any data breach.
The Vendor shall not transfer Client Data outside approved jurisdictions without prior written consent.

Section 18: Termination Rights
The Client may terminate this Agreement with 30 days' written notice if the Vendor breaches compliance obligations.
"""

In [None]:
model_name = "openai/gpt-oss-20b"  # Change to gpt-oss-120b for higher capacity

In [None]:
print(f"Loading model: {model_name} ... This may take a few minutes.")
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.float16,
    load_in_8bit=False
)

In [None]:
prompt = f"""
You are a legal compliance assistant.
Summarize the key obligations and termination rights from the following contract.
Also list any potential compliance risks.

Contract:
{contract_text}
"""

In [None]:
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=400,
    temperature=0.2,
    do_sample=False
)
print("\n" + "="*30 + " MODEL OUTPUT " + "="*30)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))