In [27]:
!pip install transformers datasets sentencepiece torch



In [28]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch

In [29]:
model = "t5-small"

In [30]:
tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForSeq2SeqLM.from_pretrained(model)

In [31]:
text="""Judge It Well â€” Fine-Tuning LLMs for Legal Intelligence
Transforming general AI into specialized legal expertise.

Abstract
Judge It Well is a project aimed at transforming a general-purpose large language model (LLM) into a specialized legal expert through structured fine-tuning. The goal is to build an AI system capable of interpreting, analyzing, and reasoning through complex legal documents with high accuracy and efficiency.

Introduction
Artificial Intelligence has advanced rapidly, yet legal text remains one of the most challenging domains for AI. Legal language is:

Highly technical
Context-dependent
Structured but linguistically unique
General LLMs cannot fully understand or reason through legal documents without domain-specific training. Judge It Well bridges this gap by fine-tuning LLMs on legal data, enabling them to process statutes, contracts, judgments, and legal queries more effectively.

Project Overview
Imagine taking a general AI and training it to speak the language of law. This project focuses on:

Fine-tuning an LLM for legal interpretation
Teaching the model to handle legal reasoning
Building tools for preprocessing, training, evaluation, and deployment
Objectives
Develop a fine-tuned LLM specialized for legal tasks
Prepare and preprocess diverse legal datasets
Support tasks like summarization, Q&A, clause extraction, classification
Evaluate model performance with legal benchmarks
Why Legal AI?
Legal documents are:

Dense and complex
Time-consuming to analyze manually
Filled with technical language and long reasoning chains
A specialized model can assist:

Lawyers
Students
Researchers
Legal-tech applications
By automating repetitive tasks and improving decision-making efficiency.

Methodology
1. Dataset Preparation
The process includes:

Collecting corpora (judgments, statutes, case summaries, contracts)
Cleaning and preprocessing text
Chunking into model-friendly segments
Tokenization and vocabulary optimization
2. Model Fine-Tuning
Techniques used:

LoRA / QLoRA parameter-efficient fine-tuning
Supervised Fine-Tuning (SFT)
Instruction tuning for legal reasoning tasks
3. Evaluation
The model is evaluated on:

Legal reasoning tasks
Summarization quality
Text classification
Domain-specific metrics
Applications
A fine-tuned legal LLM can be used for:

AI legal assistants
Automated contract review
Case law summarization
Compliance analysis
Research support for students
Structured Q&A on legal documents
Conclusion
Judge It Well is a foundational step toward building AI that truly understands legal text. By fine-tuning general-purpose LLMs on legal documents, we create tools that are more:

Accurate
Efficient
Context-aware
Explainable
This project moves us closer to deploying specialized AI"""

prompt = "summarize the text: " + text

In [32]:
result = tokenizer(prompt, return_tensors="pt", max_length=512, truncation=True)
result

{'input_ids': tensor([[21603,     8,  1499,    10, 12330,    94,  1548,     3,   318, 11456,
            18,   382,   202,    53,   301, 11160,     7,    21, 11281,  5869,
          2825,  1433,  4946, 10454,   879,  7833,   139,     3,  8689,  1281,
          2980,     5, 20114, 12330,    94,  1548,    19,     3,     9,   516,
             3,  8287,    44,     3, 21139,     3,     9,   879,    18, 19681,
           508,  1612,   825,    41, 10376,   329,    61,   139,     3,     9,
             3,  8689,  1281,  2205,   190, 14039,  1399,    18,    17,   202,
            53,     5,    37,  1288,    19,    12,   918,    46,  7833,   358,
          3919,    13,     3, 29490,     6,     3, 19175,     6,    11, 20893,
           190,  1561,  1281,  2691,    28,   306,  7452,    11,  3949,     5,
         18921, 24714,  5869,  2825,  1433,    65,  2496,  7313,     6,   780,
          1281,  1499,  3048,    80,    13,     8,   167,  4421,  3303,     7,
            21,  7833,     5, 11281,  

In [33]:
tokenized_summary = model.generate(result["input_ids"], max_length=120, min_length=40, length_penalty=2.0, num_beams=4)
tokenized_summary

tensor([[    0, 12330,    94,  1548,    19,     3,     9,   516,     3,  8287,
            44,     3, 21139,     3,     9,   879,    18, 19681,   508,  1612,
           825,   139,     3,     9,     3,  8689,  1281,  2205,   190, 14039,
          1399,    18,    17,   202,    53,     3,     5,     8,  1288,    19,
            12,   918,    46,  7833,   358,  3919,    13,     3, 29490,     6,
             3, 19175,     6,    11, 20893,   190,  1561,  1281,  2691,    28,
           306,  7452,    11,  3949,     3,     5,     1]])

In [34]:
tokenized_summary.shape

torch.Size([1, 67])

In [35]:
summary = tokenizer.decode(tokenized_summary[0], skip_special_tokens=True)
summary

'Judge It Well is a project aimed at transforming a general-purpose large language model into a specialized legal expert through structured fine-tuning. the goal is to build an AI system capable of interpreting, analyzing, and reasoning through complex legal documents with high accuracy and efficiency.'