In [1]:
import torch
from transformers import BartForConditionalGeneration, BartTokenizer

# Check if GPU is available
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# Load BART model and tokenizer
model_name = "facebook/bart-large-cnn"
tokenizer = BartTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name).to(device)


Using device: cuda


Some weights of BartForConditionalGeneration were not initialized from the model checkpoint at facebook/bart-large-cnn and are newly initialized: ['model.shared.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [2]:
def chunk_text(text, tokenizer, max_length=1024, overlap=200):
    """
    Splits the input text into chunks of max_length tokens with overlapping context.
    """
    tokens = tokenizer.encode(text)
    chunks = []
    
    for i in range(0, len(tokens), max_length - overlap):
        chunk = tokens[i : i + max_length]
        chunks.append(tokenizer.decode(chunk))
    
    return chunks


In [3]:
def summarize_chunks(chunks, model, tokenizer):
    """
    Generates summaries for each chunk and returns the combined summary.
    """
    summaries = []
    
    for chunk in chunks:
        inputs = tokenizer(chunk, return_tensors="pt", truncation=True, max_length=1024).to(device)
        
        with torch.no_grad():  # Disable gradient calculations for inference
            summary_ids = model.generate(**inputs, max_length=256)
        
        summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
        summaries.append(summary)
    
    return summaries


In [4]:
import pandas as pd
data=pd.read_csv("legal_summaries.csv")

In [5]:
legal_text = data['input_text'][7800]
chunks = chunk_text(legal_text, tokenizer)

# Summarize each chunk
chunk_summaries = summarize_chunks(chunks, model, tokenizer)

# Combine mini-summaries into a final summary
final_summary = " ".join(chunk_summaries)
print(final_summary)

Token indices sequence length is longer than the specified maximum sequence length for this model (33029 > 1024). Running this sequence through the model will result in indexing errors



These appeals raise several matters which are important to the international market in telecommunications. There is a dispute (in the Conversant appeals: para 17 below) whether England is the appropriate forum to determine those matters. A question as to the nature of the requirement that the licence, which the owner of a Standard Essential Patent (SEP) must offer to an implementer, be non discriminatory. European Telecommunications Standards Institute (ETSI) is a French association formed in 1988. ETSI is recognised as the SSO in the European Union telecommunications sector. The patents which are the subject of these appeals are the UK designations of European patents (UK patents) which have been declared to ETSi as essential. The relevant standards are telecommunications standards for 2G (GSM), 3G (UMTS) and 4G (LTE) The ETSI IPR Policy is a contractual document, governed by French law. It binds the members of EtsI and their affiliates. It speaks (clause 15(6) of patents which are i