The code installs and prepares essential libraries used for natural language processing (NLP) model development and evaluation. The command !pip install --upgrade transformers datasets evaluate rouge_score bert_score nltk upgrades and installs the Transformers library (for loading and fine-tuning pre-trained models like BART and T5), Datasets (for handling large text datasets efficiently), and Evaluate (for model performance assessment). The rouge_score and bert_score packages are used to compute key text generation metrics that compare generated summaries to reference texts. The NLTK (Natural Language Toolkit) library supports additional linguistic evaluations such as the METEOR score. The subsequent nltk.download('wordnet') and nltk.download('punkt') commands ensure that the required lexical database and tokenization tools are available for text preprocessing and evaluation tasks.

In [None]:
!pip install --upgrade transformers datasets evaluate rouge_score bert_score nltk

import nltk
# Download required NLTK data for METEOR
nltk.download('wordnet')
nltk.download('punkt')

Collecting datasets
  Downloading datasets-4.3.0-py3-none-any.whl.metadata (18 kB)
Collecting evaluate
  Downloading evaluate-0.4.6-py3-none-any.whl.metadata (9.5 kB)
Collecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting bert_score
  Downloading bert_score-0.3.13-py3-none-any.whl.metadata (15 kB)
Collecting nltk
  Downloading nltk-3.9.2-py3-none-any.whl.metadata (3.2 kB)
Collecting pyarrow>=21.0.0 (from datasets)
  Downloading pyarrow-22.0.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (3.2 kB)
Downloading datasets-4.3.0-py3-none-any.whl (506 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m506.8/506.8 kB[0m [31m15.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading evaluate-0.4.6-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m9.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading bert_score-0.3.13-py3-none-any.whl (61 kB)
[2K   

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

The command !pip install textstat installs the TextStat library, a Python package used to evaluate the readability and linguistic quality of generated text. It provides metrics such as the Flesch Reading Ease Score, Gunning Fog Index, and SMOG Index, which help quantify how easily a human reader can understand a given passage. In this study, TextStat was used to measure the readability of automatically generated headlines, ensuring that the outputs were not only accurate and contextually relevant but also clear and easy to read.

In [None]:
!pip install textstat

Collecting textstat
  Downloading textstat-0.7.10-py3-none-any.whl.metadata (15 kB)
Collecting pyphen (from textstat)
  Downloading pyphen-0.17.2-py3-none-any.whl.metadata (3.2 kB)
Downloading textstat-0.7.10-py3-none-any.whl (239 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m239.2/239.2 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pyphen-0.17.2-py3-none-any.whl (2.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m24.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pyphen, textstat
Successfully installed pyphen-0.17.2 textstat-0.7.10


The command !pip install textstat datasets transformers installs three essential Python libraries required for natural language processing (NLP) tasks. The Transformers library, developed by Hugging Face, provides access to state-of-the-art pre-trained models such as BART, T5, and BERT, which can be fine-tuned for text generation or summarization. The Datasets library enables efficient loading, preprocessing, and management of large text datasets used in model training and evaluation. Meanwhile, TextStat is utilized to compute readability metrics, such as the Flesch Reading Ease score, to assess the linguistic quality and clarity of generated summaries. Together, these libraries establish a robust environment for fine-tuning Transformer-based models and evaluating both their quantitative performance and textual readability.

In [None]:
!pip install textstat datasets transformers



# ***Setting up the Environment and Loading Data***

This code segment handles data loading, preprocessing, and conversion of a news dataset for Transformer-based model fine-tuning. It first imports the necessary libraries—pandas for structured data manipulation, datasets from Hugging Face for model-ready data formatting, and re for regular expression-based text cleaning. The clean_text() function is defined to standardize and sanitize textual data by converting text to lowercase, removing HTML tags, URLs, and excessive whitespace. This ensures that all input text is consistent, noise-free, and suitable for model training. The script then attempts to load the dataset news-article-categories.csv using UTF-8 encoding, a common standard for text-based data such as Kaggle datasets, while handling potential file-loading errors gracefully.

Once the dataset is successfully loaded, the script performs systematic preprocessing and dataset preparation. It selects only the relevant columns (body and title), renames them to text and summary for consistency, and removes missing values to maintain data quality. The cleaning function is applied to both columns, producing a uniform and readable dataset. After preprocessing, the cleaned data is converted into a Hugging Face Dataset object, which facilitates efficient tokenization and integration with Transformer models. Finally, the dataset is split into training and testing subsets using an 80–20 ratio, stored in a DatasetDict structure, ensuring an organized and balanced division of data for model fine-tuning and evaluation.

In [None]:
import pandas as pd
from datasets import Dataset, DatasetDict
import re # Import the regular expression library

# --- (A) CREATE A CLEANING FUNCTION ---
def clean_text(text):
    if not isinstance(text, str): # Handle potential non-string data
        return ""
    text = text.lower()
    text = re.sub(r'<.*?>', '', text)
    text = re.sub(r'https?://\S+|www\.\S+', '', text)
    text = re.sub(r'\s+', ' ', text).strip()
    return text

# --- 1. Load Your Custom Dataset ---
try:
    # Changed encoding to 'utf-8', which is standard for Kaggle datasets
    df = pd.read_csv('news-article-categories.csv', encoding='utf-8')
    print("Successfully loaded 'news-article-categories.csv'")

except FileNotFoundError:
    print("Error: 'news-article-categories.csv' not found.")
    df = None # Set df to None if file not found

if df is not None:
    # --- 2. Preprocess and Prepare the Dataset ---
    # --- THIS IS THE FIX ---
    # Select the correct columns from the new dataset ('body' and 'title')
    df = df[['body', 'title']]
    # Rename them to the standard names the rest of the script expects ('text' and 'summary')
    df.columns = ['text', 'summary']

    # Handle potential missing values in the new dataset
    df.dropna(inplace=True)

    # --- (B) APPLY THE CLEANING FUNCTION TO YOUR DATA ---
    print("\n--- Applying preprocessing to the dataset ---")
    df['text'] = df['text'].apply(clean_text)
    df['summary'] = df['summary'].apply(clean_text)
    print("Preprocessing complete. Example of cleaned article:")
    print(df.iloc[0]['text'])

    # --- 3. Convert to a Hugging Face Dataset ---
    hg_dataset = Dataset.from_pandas(df)

    # --- 4. Split into Training and Validation Sets ---
    train_test_split = hg_dataset.train_test_split(test_size=0.2)
    dataset = DatasetDict({
        'train': train_test_split['train'],
        'test': train_test_split['test']
    })

    print("\nDataset structure:")
    print(dataset)

Successfully loaded 'news-article-categories.csv'

--- Applying preprocessing to the dataset ---
Preprocessing complete. Example of cleaned article:

Dataset structure:
DatasetDict({
    train: Dataset({
        features: ['text', 'summary', '__index_level_0__'],
        num_rows: 5497
    })
    test: Dataset({
        features: ['text', 'summary', '__index_level_0__'],
        num_rows: 1375
    })
})


# ***Tokenization***

This section of the code focuses on tokenization and data preparation for fine-tuning the facebook/bart-base model. It begins by importing the AutoTokenizer class from the Hugging Face Transformers library and defining the model checkpoint. The BART model was selected for its strong performance in text summarization and sequence-to-sequence tasks. The tokenizer corresponding to this checkpoint is loaded using AutoTokenizer.from_pretrained(model_checkpoint), ensuring that the tokenization process aligns with the model’s pre-training configuration. This step converts raw text into a sequence of numerical tokens that the model can understand while maintaining vocabulary consistency with BART’s architecture.

A custom preprocessing function, preprocess_function(), is then defined to tokenize both the input articles and their corresponding summaries. The input text is truncated to a maximum length of 1024 tokens, while summaries are limited to 128 tokens to maintain concise outputs. A filter is also applied to exclude articles longer than 500 words, reducing computational overhead and preventing token overflow during training. The map() method applies the tokenization across the dataset in batches, resulting in a structured dataset containing tokenized inputs and labels ready for model fine-tuning. This systematic preprocessing ensures the data is optimized for the BART model’s encoder-decoder framework, facilitating efficient and context-aware headline generation.

In [None]:
from transformers import AutoTokenizer

# --- 4. Define the Model Checkpoint ---
# ## <-- KEY CHANGE: Switched to the BART model ---
model_checkpoint = "facebook/bart-base"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

# --- 5. Create a BART-Specific Preprocessing Function ---
def preprocess_function(examples):
    # Tokenize the inputs
    model_inputs = tokenizer(examples["text"], max_length=1024, truncation=True)

    # Tokenize the target summaries (labels)
    with tokenizer.as_target_tokenizer():
        labels = tokenizer(examples["summary"], max_length=128, truncation=True)

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

# --- 6. Apply the Tokenization ---
dataset = dataset.filter(lambda x: len(x["text"].split()) < 500)
tokenized_datasets = dataset.map(preprocess_function, batched=True)
print("\nSample of tokenized data prepared for BART:")
print(tokenized_datasets['train'][0].keys())

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Filter:   0%|          | 0/5497 [00:00<?, ? examples/s]

Filter:   0%|          | 0/1375 [00:00<?, ? examples/s]

Map:   0%|          | 0/2801 [00:00<?, ? examples/s]



Map:   0%|          | 0/726 [00:00<?, ? examples/s]


Sample of tokenized data prepared for BART:
dict_keys(['text', 'summary', '__index_level_0__', 'input_ids', 'attention_mask', 'labels'])


# ***Fine-Tuning the Model***

aThis section defines the fine-tuning framework for the facebook/bart-base model using the Hugging Face Transformers library. After importing the necessary modules, nine key hyperparameters—including learning rate, batch sizes, number of epochs, weight decay, and warmup steps—are explicitly defined to control the model’s learning dynamics. The compute_metrics() function introduces a customized evaluation process that measures the readability and length of generated summaries using the Flesch Reading Ease score from the TextStat library. By averaging readability and length across generated outputs, the function provides insight into the linguistic fluency and conciseness of the model’s summaries, complementing traditional accuracy-based metrics.

The BART model is then loaded via AutoModelForSeq2SeqLM.from_pretrained(), ready for supervised fine-tuning on the tokenized dataset. A data collator ensures uniform batch formatting for sequence-to-sequence training. The Seq2SeqTrainingArguments configuration specifies detailed parameters for training, evaluation, logging, and checkpoint saving, enabling controlled and reproducible experiments. The Seq2SeqTrainer integrates the model, data, tokenizer, and metric function, managing all training and evaluation steps automatically. After training, the fine-tuned model is saved locally for reuse in headline generation or further evaluation. This comprehensive setup allows for efficient fine-tuning, ensuring that the resulting model produces contextually relevant, grammatically coherent, and readable headlines aligned with journalistic standards.

In [None]:
import transformers
from transformers import AutoModelForSeq2SeqLM, DataCollatorForSeq2Seq, Seq2SeqTrainingArguments, Seq2SeqTrainer
import numpy as np
import textstat

print("Transformers library version:", transformers.__version__)

# --- 9 ADJUSTABLE HYPERPARAMETERS ---
learning_rate = 1e-6                        # 1. Learning rate
train_batch_size = 8                         # 2. Training batch size
eval_batch_size = 8                          # 3. Evaluation batch size
num_train_epochs = 2                         # 4. Number of epochs
weight_decay = 0.00                          # 5. Weight decay
warmup_steps = 0                           # 6. Warmup steps
logging_steps = 50                           # 7. Logging frequency
generation_max_length = 128                  # 8. Max length for generated text
gradient_accumulation_steps = 2              # 9. Gradient accumulation steps

model_checkpoint = "facebook/bart-base"

# --- Compute metrics ---
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)

    readability_scores = [textstat.flesch_reading_ease(pred) for pred in decoded_preds if pred]
    avg_readability = np.mean(readability_scores) if readability_scores else 0

    prediction_lens = [len(pred.split()) for pred in decoded_preds if pred]
    avg_length = np.mean(prediction_lens) if prediction_lens else 0

    return {
        "avg_readability": round(avg_readability, 2),
        "avg_length": round(avg_length, 2),
    }

# --- Load Pre-trained Model ---
model = AutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)

# --- Prepare Data Collator ---
data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)

# --- Define Training Arguments ---
training_args = Seq2SeqTrainingArguments(
    output_dir="./bart_base_finetuned_intrinsic",
    do_eval=True,
    logging_strategy="steps",
    logging_steps=logging_steps,
    save_strategy="epoch",
    learning_rate=learning_rate,
    per_device_train_batch_size=train_batch_size,
    per_device_eval_batch_size=eval_batch_size,
    weight_decay=weight_decay,
    warmup_steps=warmup_steps,
    save_total_limit=3,
    num_train_epochs=num_train_epochs,
    predict_with_generate=True,
    generation_max_length=generation_max_length,
    gradient_accumulation_steps=gradient_accumulation_steps,
    fp16=True,
    report_to="none",
)

# --- Initialize Trainer ---
trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

# --- Fine-tune Model ---
print("\nStarting model fine-tuning...")
trainer.train()

# --- Save Model ---
model_save_path = "./my_finetuned_bart_summarizer_intrinsic"
trainer.save_model(model_save_path)
print(f"Model saved to {model_save_path}")


Transformers library version: 4.57.1


model.safetensors:   0%|          | 0.00/558M [00:00<?, ?B/s]

  trainer = Seq2SeqTrainer(



Starting model fine-tuning...


Step,Training Loss
50,3.8759
100,2.6914
150,2.4474
200,2.3324
250,2.2366
300,2.1619
350,2.2047




Model saved to ./my_finetuned_bart_summarizer_intrinsic


## ***Metric of the Fine-Tuned***

This section of the code performs the evaluation of the fine-tuned BART model using both quantitative and linguistic quality metrics. The fine-tuned model and its corresponding tokenizer are loaded from the saved directory ./my_finetuned_bart_summarizer_intrinsic. The Hugging Face Evaluate library is used to load the ROUGE metric, a standard measure for text summarization that quantifies the overlap of words and phrases between generated summaries and reference headlines. A custom safe_decode() function ensures that numerical token predictions are safely converted back into readable text, preventing decoding errors caused by invalid token IDs. This setup guarantees accurate and stable evaluation of the model’s generated outputs.

The compute_metrics() function computes both ROUGE-based and intrinsic text quality metrics. It calculates ROUGE-1, ROUGE-2, ROUGE-L, and ROUGE-Lsum scores to measure lexical and structural similarity to reference summaries. Additionally, it evaluates the Flesch Reading Ease score using the TextStat library to assess readability and computes the average output length to ensure concise and well-structured headlines. These metrics collectively assess not only accuracy but also the fluency and clarity of generated summaries. Using the Seq2SeqTrainer and Seq2SeqTrainingArguments, the model is evaluated on the test dataset with predict_with_generate=True, allowing generation-based scoring. The final printed results provide a comprehensive performance overview, validating the model’s effectiveness in producing coherent, readable, and information-rich news headlines.

In [None]:
import numpy as np
import torch
import textstat
import evaluate
from transformers import (
    AutoModelForSeq2SeqLM,
    AutoTokenizer,
    Seq2SeqTrainer,
    Seq2SeqTrainingArguments,
    DataCollatorForSeq2Seq,
)

# --- Load Model & Tokenizer ---
model_path = "./my_finetuned_bart_summarizer_intrinsic"
print(f"Loading fine-tuned BART model from: {model_path}")

model = AutoModelForSeq2SeqLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)

# --- Load ROUGE metric ---
rouge = evaluate.load("rouge")

# --- Safe Decode ---
def safe_decode(predictions):
    decoded = []
    for pred in predictions:
        pred = np.clip(pred, 0, tokenizer.vocab_size - 1)
        text = tokenizer.decode(pred, skip_special_tokens=True)
        decoded.append(text)
    return decoded

# --- Compute Metrics ---
def compute_metrics(eval_pred):
    predictions, labels = eval_pred

    if isinstance(predictions, tuple):
        predictions = predictions[0]

    # Decode predictions
    decoded_preds = safe_decode(predictions)

    # Replace -100s in labels before decoding
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_labels = safe_decode(labels)

    # --- ROUGE scores ---
    rouge_scores = rouge.compute(
        predictions=decoded_preds,
        references=decoded_labels,
        use_stemmer=True
    )
    rouge1 = rouge_scores["rouge1"] * 100
    rouge2 = rouge_scores["rouge2"] * 100
    rougeL = rouge_scores["rougeL"] * 100
    rougeLsum = rouge_scores["rougeLsum"] * 100

    # --- Readability & Length ---
    readability_scores = [textstat.flesch_reading_ease(pred) for pred in decoded_preds if pred]
    avg_readability = np.mean(readability_scores) if readability_scores else 0

    prediction_lens = [len(pred.split()) for pred in decoded_preds if pred]
    avg_length = np.mean(prediction_lens) if prediction_lens else 0

    return {
        "rouge1": round(rouge1, 4),
        "rouge2": round(rouge2, 4),
        "rougeL": round(rougeL, 4),
        "rougeLsum": round(rougeLsum, 4),
        "avg_readability": round(avg_readability, 2),
        "avg_length": round(avg_length, 2),
    }

# --- Evaluation Args ---
eval_args = Seq2SeqTrainingArguments(
    output_dir="./bart_eval_results",
    per_device_eval_batch_size=4,
    predict_with_generate=True,
    report_to="none",
)

# --- Initialize Trainer ---
trainer = Seq2SeqTrainer(
    model=model,
    args=eval_args,
    eval_dataset=tokenized_datasets["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

# --- Run Evaluation ---
print("\n🔎 Evaluating fine-tuned BART model...")
metrics = trainer.evaluate()

# --- Print Results ---
print("\n✅ BART Evaluation Results:")
for k, v in metrics.items():
    print(f"• {k}: {v:.4f}" if isinstance(v, (int, float)) else f"• {k}: {v}")


Loading fine-tuned BART model from: ./my_finetuned_bart_summarizer_intrinsic


Downloading builder script: 0.00B [00:00, ?B/s]

  trainer = Seq2SeqTrainer(



🔎 Evaluating fine-tuned BART model...



✅ BART Evaluation Results:
• eval_loss: 1.9045
• eval_model_preparation_time: 0.0031
• eval_rouge1: 39.3211
• eval_rouge2: 19.6991
• eval_rougeL: 35.7522
• eval_rougeLsum: 35.7852
• eval_avg_readability: 56.2000
• eval_avg_length: 10.1800
• eval_runtime: 74.6691
• eval_samples_per_second: 9.7230
• eval_steps_per_second: 2.4370


# ***Using the Fine-Tuned Model***

This section implements the interactive inference and evaluation phase of the fine-tuned facebook/bart-base model. It begins by checking for GPU availability using the PyTorch library, ensuring computational efficiency during real-time summarization. The fine-tuned BART model and tokenizer are loaded through the Hugging Face pipeline function under the “summarization” task, enabling an end-to-end workflow from text input to summary generation. Once loaded, the system enters an interactive loop where users can input custom news articles for automatic summarization. The script manages potential user and hardware errors gracefully, including handling empty inputs, excessively long texts, or GPU memory issues. This setup allows users to test the model dynamically, making it a practical implementation for real-world applications of AI-driven summarization tools.

The generated summary is followed by an extensive quantitative and linguistic evaluation using multiple readability and efficiency metrics. These include generation time, token generation rate, compression ratio, redundancy ratio, and average sentence length, which collectively describe the model’s processing efficiency and textual conciseness. Additionally, readability indices—such as Flesch Reading Ease, Gunning Fog Index, SMOG Index, and Automated Readability Index (ARI)—are computed using the TextStat library to assess the fluency and accessibility of generated summaries. Together, these metrics offer a holistic evaluation of both performance and linguistic quality, validating the fine-tuned BART model’s ability to produce coherent, concise, and human-readable headlines suitable for automated journalism workflows.

In [None]:
from transformers import pipeline
import torch
import time
import textstat

# --- 1. Verify GPU availability ---
if torch.cuda.is_available():
    device = 0
    print(f"✅ Using GPU: {torch.cuda.get_device_name(0)}")
else:
    device = -1
    print("⚠️ GPU not available — using CPU instead.")

# --- 2. Load Your Fine-Tuned BART Model ---
try:
    model_path = "./my_finetuned_bart_summarizer_intrinsic"

    fine_tuned_summarizer = pipeline(
        "summarization",
        model=model_path,
        tokenizer=model_path,
        device=device  # GPU if available, else CPU
    )
    print("\n✅ Fine-Tuned BART Summarization Model Loaded")
    print(f"Loaded from: {model_path}")

    # --- 3. Interactive Inference Loop ---
    while True:
        article_text = input("\nEnter an article to summarize (or 'quit' to exit): ")

        if article_text.lower().strip() == "quit":
            print("👋 Exiting fine-tuned summarizer.")
            break

        if not article_text.strip():
            print("⚠️ Please enter some text.")
            continue

        try:
            start_time = time.time()

            # --- 🧠 Summarization with safety checks ---
            result = fine_tuned_summarizer(
                article_text[:4000],    # Limit very long input to avoid GPU overflow
                max_length=150,
                min_length=30,
                do_sample=False,
                truncation=True
            )
            summary_text = result[0]["summary_text"]

            end_time = time.time()

            # --- 4. Metric Computation ---
            generation_time = end_time - start_time
            input_words = len(article_text.split())
            summary_words = len(summary_text.split())
            compression_ratio = summary_words / input_words if input_words else 0
            tokens_per_second = summary_words / generation_time if generation_time else 0

            words = summary_text.split()
            redundancy_ratio = 1 - len(set(words)) / len(words) if words else 0

            sentences = [s.strip() for s in summary_text.split('.') if s.strip()]
            avg_sentence_length = sum(len(s.split()) for s in sentences) / len(sentences) if sentences else 0

            flesch = textstat.flesch_reading_ease(summary_text)
            gunning_fog = textstat.gunning_fog(summary_text)
            smog = textstat.smog_index(summary_text)
            ari = textstat.automated_readability_index(summary_text)

            # --- 5. Display Results ---
            print("\n🧾 --- Summary from Fine-Tuned BART Model ---")
            print(summary_text)
            print("-" * 20)
            print("📊 --- METRICS ---")
            print(f"• Generation Time: {generation_time:.2f} s")
            print(f"• Tokens per Second: {tokens_per_second:.2f}")
            print(f"• Word Count: {summary_words} (from {input_words} original)")
            print(f"• Compression Ratio: {compression_ratio:.2%}")
            print(f"• Avg Sentence Length: {avg_sentence_length:.2f} words")
            print(f"• Redundancy Ratio: {redundancy_ratio:.2%}")
            print(f"• Readability (Flesch): {flesch:.2f}")
            print(f"• Gunning Fog Index: {gunning_fog:.2f}")
            print(f"• SMOG Index: {smog:.2f}")
            print(f"• ARI: {ari:.2f}")
            print("-" * 60)

        except torch.cuda.CudaError as cuda_err:
            print(f"⚠️ CUDA Error occurred: {cuda_err}")
            print("💡 Tip: Try shorter text or restart runtime to clear GPU memory.")
        except Exception as e:
            print(f"⚠️ Unexpected error: {e}")

except OSError:
    print(f"⚠️ Model not found at '{model_path}'. Ensure it's fine-tuned and saved correctly.")
except Exception as e:
    print(f"⚠️ Initialization error: {e}")


✅ Using GPU: Tesla T4


Device set to use cuda:0



✅ Fine-Tuned BART Summarization Model Loaded
Loaded from: ./my_finetuned_bart_summarizer_intrinsic


Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Both `max_new_tokens` (=256) and `max_length`(=150) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)



🧾 --- Summary from Fine-Tuned BART Model ---
model says photographer raped her when she was 16: 'I didn't realize [at the time] that he was raping girls.’
--------------------
📊 --- METRICS ---
• Generation Time: 1.08 s
• Tokens per Second: 18.59
• Word Count: 20 (from 3426 original)
• Compression Ratio: 0.58%
• Avg Sentence Length: 10.50 words
• Redundancy Ratio: 5.00%
• Readability (Flesch): 72.33
• Gunning Fog Index: 10.00
• SMOG Index: 8.84
• ARI: 9.77
------------------------------------------------------------


# ***Using the Model without Fine-tuning***

This section demonstrates the baseline evaluation of the generic, pre-trained facebook/bart-base model before fine-tuning. The script loads the model using the Hugging Face Transformers pipeline for summarization, which simplifies inference by combining the model and tokenizer into a single interface. Informational logs are suppressed to streamline console output. An interactive loop allows users to input news articles for summarization or exit by typing “quit.” The model then generates summaries with defined parameters for minimum and maximum length to ensure concise yet informative outputs. The script also tracks the total generation time to evaluate the model’s efficiency in processing input text.

After generating each summary, several quantitative and linguistic quality metrics are computed to assess performance. These include compression ratio, tokens per second, redundancy ratio, and average sentence length, which provide insights into efficiency and coherence. Additionally, readability metrics—such as Flesch Reading Ease, Gunning Fog Index, SMOG Index, and Automated Readability Index (ARI)—are calculated using the TextStat library to measure fluency and accessibility. By comparing these baseline metrics with those of the fine-tuned model, researchers can quantify the impact of fine-tuning on both output quality and linguistic readability, establishing a clear benchmark for improvement in headline generation performance.

In [None]:
from transformers import pipeline
import time
import textstat
from transformers.utils import logging

# Suppress informational messages from transformers
logging.set_verbosity_error()

try:
    # --- 1. Load the Generic, Pre-trained BART Model ---
    summarizer = pipeline("summarization", model="facebook/bart-base", tokenizer="facebook/bart-base")
    print("\n✅ Generic Pre-trained Summarization Model Loaded (facebook/bart-base)")

    # --- 2. Create an Interactive Loop ---
    while True:
        article_text = input("\nEnter an article to summarize (or 'quit' to exit): ")
        if article_text.lower() == "quit":
            print("👋 Exiting generic summarizer.")
            break
        if not article_text.strip():
            continue

        start_time = time.time()

        # --- Generate summary (BART does not use a prefix) ---
        result = summarizer(article_text, max_length=150, min_length=30, do_sample=False)
        end_time = time.time()

        summary_text = result[0]["summary_text"]

        # --- Metrics Calculation ---
        generation_time = end_time - start_time
        input_words = len(article_text.split())
        summary_words = len(summary_text.split())
        compression_ratio = summary_words / input_words if input_words else 0
        tokens_per_second = summary_words / generation_time if generation_time else 0

        words = summary_text.split()
        redundancy_ratio = 1 - len(set(words)) / len(words) if words else 0

        sentences = [s.strip() for s in summary_text.split('.') if s.strip()]
        avg_sentence_length = sum(len(s.split()) for s in sentences) / len(sentences) if sentences else 0

        # --- Readability Scores ---
        flesch = textstat.flesch_reading_ease(summary_text)
        gunning_fog = textstat.gunning_fog(summary_text)
        smog = textstat.smog_index(summary_text)
        ari = textstat.automated_readability_index(summary_text)

        # --- Output ---
        print("\n🧾 --- Summary from Generic BART Model ---")
        print(summary_text)
        print("-" * 20)
        print("📊 --- METRICS ---")
        print(f"• Generation Time: {generation_time:.2f} s")
        print(f"• Tokens per Second: {tokens_per_second:.2f}")
        print(f"• Word Count: {summary_words} (from {input_words} original)")
        print(f"• Compression Ratio: {compression_ratio:.2%}")
        print(f"• Avg Sentence Length: {avg_sentence_length:.2f} words")
        print(f"• Redundancy Ratio: {redundancy_ratio:.2%}")
        print(f"• Readability (Flesch): {flesch:.2f}")
        print(f"• Gunning Fog Index: {gunning_fog:.2f}")
        print(f"• SMOG Index: {smog:.2f}")
        print(f"• ARI: {ari:.2f}")
        print("-" * 60)

except Exception as e:
    print(f"⚠️ An error occurred: {e}")

⚠️ An error occurred: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

