### **<center>Transformer-Based Summarization for Cleaning YouTube Video Descriptions</center>**

<center><em>
Leverage the power of transformer-based text summarization to automatically remove irrelevant information from YouTube video descriptions, ensuring they're concise and informative.
</em></center>

#### Intro:

YouTube video descriptions are vital for attracting viewers, but often contain extraneous information that hinders understanding. This project utilizes transformer-based text summarization models (like BERT and GPT) to automatically clean these descriptions.

By training a summarization model on a dataset of YouTube descriptions paired with their human-refined counterparts, the model learns to identify and remove irrelevant content while preserving key points. This leads to concise, informative descriptions.

The project will explore the fine-tuning and evaluation of transformer models for this specific summarization task, focusing on their ability to remove extraneous information and produce distilled video descriptions.

**Key Points:**
- Problem: YouTube descriptions often contain excessive tags, promotions, and irrelevant details.
- Solution: Transformer-based text summarization models trained to clean descriptions.
- Approach: Fine-tune models on a dataset of original and human-cleaned descriptions.
- Goal: Produce concise, informative descriptions that enhance user experience.
- Evaluation: Focus on the models' ability to remove extraneous information effectively.

In [47]:
%%shell
sudo apt -y update
sudo apt install -y wget curl unzip
wget http://archive.ubuntu.com/ubuntu/pool/main/libu/libu2f-host/libu2f-udev_1.1.4-1_all.deb
dpkg -i libu2f-udev_1.1.4-1_all.deb
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
dpkg -i google-chrome-stable_current_amd64.deb

wget -N https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/118.0.5993.70/linux64/chromedriver-linux64.zip -P /tmp/
unzip -o /tmp/chromedriver-linux64.zip -d /tmp/
chmod +x /tmp/chromedriver-linux64/chromedriver
mv /tmp/chromedriver-linux64/chromedriver /usr/local/bin/chromedriver
pip install selenium chromedriver_autoinstaller

[33m0% [Working][0m            Hit:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Get:3 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Hit:4 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:5 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Ign:6 https://r2u.stat.illinois.edu/ubuntu jammy InRelease
Hit:7 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:8 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:9 https://r2u.stat.illinois.edu/ubuntu jammy Release
Hit:10 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Hit:12 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Fetched 257 kB in 2s (148 kB/s)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
50 packages can be upgraded. Run 'apt list --upg



In [48]:
!pip install peft
!pip install datasets
!pip install rouge-score

Collecting rouge-score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: rouge-score
  Building wheel for rouge-score (setup.py) ... [?25l[?25hdone
  Created wheel for rouge-score: filename=rouge_score-0.1.2-py3-none-any.whl size=24935 sha256=088051753549d0f7b2ee09ea2585bbe98a0ece9ce5f9aa0c65177f4839e111c6
  Stored in directory: /root/.cache/pip/wheels/5f/dd/89/461065a73be61a532ff8599a28e9beef17985c9e9c31e541b4
Successfully built rouge-score
Installing collected packages: rouge-score
Successfully installed rouge-score-0.1.2


In [49]:
import sys
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')


from selenium import webdriver
import chromedriver_autoinstaller
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.chrome.service import Service
from selenium.common.exceptions import TimeoutException, ElementNotInteractableException
from selenium.webdriver.common.action_chains import ActionChains

import pandas as pd
import numpy as np

import time
import random
import re
import json
import random

In [50]:
def init_webdriver():
    """Initializes and returns a Chrome WebDriver instance with options.

    Returns:
        webdriver.Chrome: A configured Chrome WebDriver instance.

    Raises:
        Exception: If the WebDriver fails to initialize.
    """
    try:
        chrome_options = webdriver.ChromeOptions()
        chrome_options.add_argument('--headless')
        chrome_options.add_argument('--no-sandbox')
        chrome_options.add_argument('--disable-dev-shm-usage')
        chromedriver_autoinstaller.install()
        driver = webdriver.Chrome(options=chrome_options)

        print("WebDriver initialized successfully")
        return driver
    except Exception as e:
        print(f"Failed to initialize WebDriver: {e}")
        raise


def close_webdriver(driver):
    """Closes the provided WebDriver instance.

    Args:
        driver (webdriver.Chrome): The WebDriver instance to close.

    Prints:
        str: Confirmation message that the WebDriver has been closed.
    """
    print("WebDriver successfully closed")
    driver.quit()


In [51]:
def get_video_data(video_id):
    """Fetches video data from YouTube given a video ID.

    Args:
        video_id (str): The ID of the YouTube video to fetch data for.

    Returns:
        dict: A dictionary containing the video data with the following keys:
            - 'channel_name': The name of the channel that uploaded the video.
            - 'video_title': The title of the video.
            - 'video_description': The description of the video.

    Raises:
        Exception: If there is an error accessing or processing the video data.
    """
    driver = init_webdriver()
    video_url = f"https://www.youtube.com/watch?v={video_id}"
    video_data = {}

    try:
        driver.get(video_url)

        try:
            # Wait for the bottom-row element to be present
            bottom_row = WebDriverWait(driver, 20).until(
                EC.presence_of_element_located((By.XPATH, '//*[@id="bottom-row"]'))
            )

            # Locate and click the expand button if it exists
            try:
                expand_button = WebDriverWait(driver, 10).until(
                    EC.element_to_be_clickable((By.XPATH, '/html/body/ytd-app/div[1]/ytd-page-manager/ytd-watch-flexy/div[5]/div[1]/div/div[2]/ytd-watch-metadata/div/div[4]/div[1]/div/ytd-text-inline-expander/tp-yt-paper-button[1]'))
                )
                expand_button.click()
            except TimeoutException:
                pass  # Ignore if the expand button is not found

            # Wait for elements to be visible and extract data
            expanded_description = WebDriverWait(driver, 10).until(
                EC.visibility_of_element_located((By.ID, 'description-inline-expander'))
            )
            title_element = WebDriverWait(driver, 10).until(
                EC.presence_of_element_located((By.XPATH, '//h1[@class="style-scope ytd-watch-metadata"]//yt-formatted-string'))
            )
            channel_name_element = WebDriverWait(driver, 10).until(
                EC.presence_of_element_located((By.XPATH, '//ytd-channel-name[@id="channel-name"]//yt-formatted-string//a'))
            )

            video_data = {
                'channel_name': channel_name_element.text,
                'video_title': title_element.text,
                'video_description': expanded_description.text
            }

        except TimeoutException:
            print(f"Error processing {video_url}: Elements not found within timeout.")

    except Exception as e:
        print(f"Error processing {video_url}: {e}")

    finally:
        # Close the browser when done
        close_webdriver(driver)

    return video_data


In [52]:
#Test:
video_id = "CETSlLO_jio"
# Get video data
video_data = get_video_data(video_id)

# Print the data
video_data


WebDriver initialized successfully
WebDriver successfully closed


{'channel_name': 'truTV',
 'video_title': 'Funniest If You Laugh You Lose Moments (Mashup) | Impractical Jokers | truTV',
 'video_description': 'It\'s impossible not to laugh at the way Murr rides away on his luggage. We can\'t get enough of these "If You Laugh, You Lose" challenges. Watch Impractical Jokers on truTV.\n\n#ImpracticalJokers  #truTV #BrianQuinn #JamesMurray #SalVulcano\n\nSubscribe: http://bit.ly/truTVSubscribe\nWatch More Impractical Jokers: http://bit.ly/2p59m19\nWatch full episodes for Free: http://bit.ly/ImpracticalJokersTruTV\n\nAbout Impractical Jokers:\nThree comedians and lifelong friends compete to embarrass each other amongst the general public with a series of hilarious and outrageous dares. When Sal, Q, and Murr challenge each other to say or do something, they have to do it… if they refuse, they lose! At the end of every episode - with the help of a celebrity guest - the episode\'s loser must endure a punishment of epic proportions.\n\nDownload the Jokers Wh

**Data Collection - Overview**
- The data collection process involves gathering YouTube video descriptions along with additional metadata, such as the channel name and video title. We are going to use the above functions for this. This data will be used to train and evaluate our transformer-based summarization model.

**Steps:**

- Fetch Video Data:
Iterate through a predefined list of YouTube video IDs.
> For each video ID, use a custom function to retrieve the video data.
- The function fetches:
> - Channel Name: The name of the channel where the video was uploaded.
> - Video Title: The title of the video.
> - Video Description: The description text provided by the video uploader.
- Store Data:
> - Append the retrieved data, formatted as a dictionary, to the list.
> - Store the collected data in a file (e.g., JSON or CSV) to facilitate access and further processing.

**Example Output**
> - The collected data will be a list of dictionaries, each containing the following keys:

> - ```yaml
channel_name: The name of the YouTube channel.
video_title: The title of the video.
video_description: The description text of the video.


In [53]:
from google.colab import drive
import os
#mounting google drive
drive.mount('/content/drive')
########################################
#changing the working directory
os.chdir("/content/drive/MyDrive/NLP_Data")

!pwd

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/MyDrive/NLP_Data


In [54]:
# below are functions for reading a writting json file for the current working directory

def save_to_json(data, filename):
    with open(filename, 'w') as json_file:
        json.dump(data, json_file, indent=4)

def load_from_json(filename):
    with open(filename, 'r') as json_file:
        comments = json.load(json_file)
    return comments

In [55]:
df = pd.read_csv('video_data.csv')
df.head()


Unnamed: 0,channel_name,video_title,video_description,clean_video_description
0,LastWeekTonight,Miss America Pageant: Last Week Tonight with J...,The Miss America Pageant…how is this still a t...,John Oliver criticizes the Miss America Pagean...
1,ESPN,"Smooth 🔥 (via @dariusgaddy2, @d.looo_/TT) #shorts",✔️ Subscribe to ESPN+ http://espnplus.com/yout...,This is a short video showcasing smooth moves ...
2,PowerfulJRE,Joe Rogan Experience #1227 - Mike Tyson,Mike Tyson is the former undisputed heavyweigh...,"Mike Tyson, the former undisputed heavyweight ..."
3,PowerfulJRE,Joe Rogan Experience #872 - Graham Hancock & R...,Graham Hancock is an English author and journa...,Graham Hancock and Randall Carlson discuss cro...
4,Mentour Pilot,HOW was THIS Allowed to HAPPEN?!,Go to https://curiositystream.thld.co/mento......,This video explores a close call between two A...


In [56]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 233 entries, 0 to 232
Data columns (total 4 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   channel_name             233 non-null    object
 1   video_title              233 non-null    object
 2   video_description        233 non-null    object
 3   clean_video_description  233 non-null    object
dtypes: object(4)
memory usage: 7.4+ KB


In [57]:
from sklearn.model_selection import train_test_split
# Split the dataset into training and validation sets (80-20 split)
train_df, val_df = train_test_split(df, test_size=0.2, random_state=42)


###### **Data Preparation for Fine-Tuning**


1. **Data Formatting:**
- Each row is formatted to include a combined input of `channel_name`, `video_title`, and `video_description`.
- The target output is `clean_video_descriptions`.
2. **Converting to Dataset Object:**
- `Dataset.from_list(formatted_data)` converts the list of formatted `input-output` pairs into a `Hugging Face Dataset` object.
3. **Tokenization:**
- The `tokenize_data` function tokenizes both the input text and the target text.
- The tokenized target is added to the input dictionary under `"labels"`, as required for `seq2seq` training.
4. **Tokenized Dataset:**
- The tokenized dataset, `tokenized_datasets`, is now ready for `fine-tuning` the `BART` model using `LoRA`.


In [58]:
from datasets import Dataset
from transformers import BartTokenizer

In [59]:
# Loading the tokenizer
model_name = "facebook/bart-large-cnn"
tokenizer = BartTokenizer.from_pretrained(model_name)


# Data Preparation
# Format the data for training
formatted_train_data = [
    {
        "input": f"Channel: {row['channel_name']}, Title: {row['video_title']}, Description: {row['video_description']}",
        "output": row['clean_video_description']
    }
    for _, row in train_df.iterrows()
]

# Format the data for validation
formatted_val_data = [
    {
        "input": f"Channel: {row['channel_name']}, Title: {row['video_title']}, Description: {row['video_description']}",
        "output": row['clean_video_description']
    }
    for _, row in val_df.iterrows()
]

# Convert formatted data to Dataset objects
train_dataset = Dataset.from_list(formatted_train_data)
val_dataset = Dataset.from_list(formatted_val_data)

# Tokenization

# Load the tokenizer
model_name = "facebook/bart-large-cnn"
tokenizer = BartTokenizer.from_pretrained(model_name)

# Tokenization function for training and validation datasets
def tokenize_data(example):
    model_inputs = tokenizer(
        example["input"],
        max_length=512,
        padding="max_length",
        truncation=True
    )
    with tokenizer.as_target_tokenizer():
        labels = tokenizer(
            example["output"],
            max_length=128,
            padding="max_length",
            truncation=True
        )
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

# Tokenize datasets
tokenized_train_dataset = train_dataset.map(tokenize_data, batched=True)
tokenized_val_dataset = val_dataset.map(tokenize_data, batched=True)





Map:   0%|          | 0/186 [00:00<?, ? examples/s]



Map:   0%|          | 0/47 [00:00<?, ? examples/s]

###### **Fine-Tuning the BART Model with LoRA**

**Who is LoRA?**

LoRA, which stands for Low-Rank Adaptation, is a technique used in fine-tuning large language models (LLMs) to make them more efficient and less computationally expensive.

Key points about LoRA:

- Reduced parameter updates: Instead of updating all the parameters of a pre-trained LLM during fine-tuning, LoRA focuses on updating a smaller set of parameters, specifically low-rank matrices that are added to the existing model weights.
- Efficiency: This approach significantly reduces the number of trainable parameters, leading to faster training times and lower memory requirements compared to traditional fine-tuning methods.
- Preserved performance: Despite the reduction in updated parameters, LoRA has been shown to achieve comparable or even better performance than full fine-tuning in many cases.
- Adaptability: It can be easily integrated with various LLM architectures and fine-tuning tasks.

LoRA offers a practical and effective solution to fine-tune large language models for specific tasks without incurring the high computational costs associated with full fine-tuning.



**To fine-tune the BART model with LoRA, we will follow these steps:**


1. **Set Up LoRA Configuration:** Defining the LoRA parameters such as rank `(r)`, scaling factor `(lora_alpha)`, target modules `(q_proj and v_proj)`, dropout rate `(lora_dropout)`, etc.
2. **Wrap the BART Model with LoRA:** We use the peft library to apply LoRA to the original BART model, which allows for efficient fine-tuning with fewer trainable parameters.
3. **Define Training Arguments:** Configuring the training parameters like `batch size`, `number of epochs`, `learning rate`, `logging steps`, `evaluation strategy`, and `saving intervals` using the `TrainingArguments` class from Hugging Face.
4. **Define the Compute Metrics Function:** Setting up a function to compute evaluation metrics such as `ROUGE scores`, which measure the quality of the generated summaries against the reference summaries.
5. **Train the Model:** We use the Hugging Face Trainer to fine-tune the LoRA-wrapped model on the training dataset while evaluating it on a validation dataset during training to monitor the model's performance.
6. **Evaluate the Fine-Tuned Model:** After training,we evaluate the model's performance on the validation dataset using the `ROUGE metric` to understand how well the model generates summaries.
7. **Save the Fine-Tuned Model:** Lastly we save the fine-tuned model and tokenizer for future use in generating summaries or further fine-tuning.


In [60]:
from sklearn.model_selection import train_test_split
from datasets import Dataset, load_metric
from transformers import BartTokenizer, BartForConditionalGeneration, TrainingArguments, Trainer
from peft import LoraConfig, get_peft_model  # PEFT LoRA utilities



# Loading the base BART model
model = BartForConditionalGeneration.from_pretrained(model_name)

# Configure LoRA
lora_config = LoraConfig(
    r=16,  # Rank of the LoRA matrix
    lora_alpha=32,  # Scaling factor for LoRA
    target_modules=["q_proj", "v_proj"],  # Target attention layers to apply LoRA
    lora_dropout=0.05,  # Dropout rate for LoRA
    bias="none",  # No bias
)

# Wrap the original model with LoRA
lora_model = get_peft_model(model, lora_config)

# Defining the Training Arguments

training_args = TrainingArguments(
    output_dir="./results",          # Output directory for model and checkpoints
    num_train_epochs=3,              # Number of training epochs
    per_device_train_batch_size=4,   # Batch size per device during training
    per_device_eval_batch_size=4,    # Batch size per device during evaluation
    warmup_steps=500,                # Number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # Strength of weight decay
    logging_dir="./logs",            # Directory for storing logs
    logging_steps=10,                # Frequency of logging
    save_steps=1000,                 # Number of steps before saving model checkpoint
    evaluation_strategy="steps",     # Evaluation strategy
    eval_steps=500,                  # Frequency of evaluation steps
)

# Defining Evaluation Metric and Compute Function

# Loading the evaluation metric
rouge_metric = load_metric("rouge")

# Defining the compute_metrics function
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

    # ROUGE expects a newline after each sentence
    decoded_preds = ["\n".join(pred.strip().split(". ")) for pred in decoded_preds]
    decoded_labels = ["\n".join(label.strip().split(". ")) for label in decoded_labels]

    # Compute ROUGE scores
    result = rouge_metric.compute(predictions=decoded_preds, references=decoded_labels, use_stemmer=True)
    result = {key: value.mid.fmeasure * 100 for key, value in result.items()}
    return result

# Initializing the Trainer and then we Train the Model

trainer = Trainer(
    model=lora_model,                # The LoRA model to fine-tune
    args=training_args,              # Training arguments
    train_dataset=tokenized_train_dataset,  # The tokenized training dataset
    eval_dataset=tokenized_val_dataset,  # The tokenized validation dataset
    compute_metrics=compute_metrics  # Metrics computation function
)

# Training the model with LoRA
trainer.train()

# Evaluating the model on the validation dataset

eval_results = trainer.evaluate()

print(f"Evaluation Results: {eval_results}")

# Lastly we save the Fine-Tuned Model

lora_model.save_pretrained("./fine-tuned-lora-model")
tokenizer.save_pretrained("./fine-tuned-lora-model")




Step,Training Loss,Validation Loss


Evaluation Results: {'eval_runtime': 4.819, 'eval_samples_per_second': 9.753, 'eval_steps_per_second': 2.49, 'epoch': 3.0}


('./fine-tuned-lora-model/tokenizer_config.json',
 './fine-tuned-lora-model/special_tokens_map.json',
 './fine-tuned-lora-model/vocab.json',
 './fine-tuned-lora-model/merges.txt',
 './fine-tuned-lora-model/added_tokens.json')


1. **LoRA Configuration:**
- The `LoraConfig` class is used to define the configuration for Low-Rank Adaptation.
- Key parameters include:
> - `r`: The rank of the LoRA matrix.
> - `lora_alpha`: Scaling factor for LoRA.
> - `target_modules`: Specifies which modules in the model should have LoRA applied (usually the attention layers).
> - `lora_dropout`: Dropout rate to be applied to LoRA.
> - `bias`: Specifies how to handle biases; in this case, no bias is applied ("none").
2. **Wrap the BART Model with LoRA:**
- The `get_peft_model` function from the peft library wraps the original BART model with LoRA, making it suitable for `parameter-efficient fine-tuning`.
3. **Defining Training Arguments:**
- TrainingArguments defines various parameters for the training process:
> - `num_train_epochs`: Number of epochs for training.
> - `per_device_train_batch_size` and `per_device_eval_batch_size`: Batch sizes for training and evaluation.
> - `logging_steps`: Frequency of logging training metrics.
> - `eval_steps`: Frequency of evaluation during training.
> - `save_steps`: Frequency of saving model checkpoints.
4. **Trainer Setup and Training:**
- The Trainer class handles the training loop, evaluation, and checkpointing. It takes the LoRA model and training arguments as input.
5. **Save the Fine-Tuned Model:**
- After training, the fine-tuned model and tokenizer are saved using the save_pretrained method.


In [80]:
test_df = pd.read_csv('test_video_data.csv')
test_df.head()


Unnamed: 0,channel_name,video_title,video_description
0,Cracked,4 Awful Ways Our Ancestors Got High (That We T...,Chapters\nView all\nIntro\n0:00\nNose Pipe\n0:...
1,Science Channel,"Could This Be The Legendary ""Magic Bridge"" Con...",What on Earth? | Tuesdays 9p\nAncient Hindu lo...
2,StevenCrowder,Biological Males Should Not Compete in Women's...,"In this edition of Change My Mind, Steven Crow..."
3,PragerU,"Fund the Children, Not the Schools | 5 Minute ...",Why is it that parents have so little control ...
4,Dr. Todd Grande,Elliot Rodger (King of the INCELS) | Mental He...,This video answers the question: Can I analyze...


In [88]:
# Select a subset of the test data
new_data = test_df.sample(n=1)
new_data = new_data.reset_index(drop=True)

new_data['video_description'][0]

"Bob Ross painted more than 1,000 landscapes for his television show — so why are they so hard to find? Solving one of the internet’s favorite little mysteries. \n\nRead more about Bob Ross: https://nyti.ms/2xIsshb\nWatched the video? Here are a few more details on The Times website. https://www.nytimes.com/2019/07/12/ar...\n\nSubscribe: http://bit.ly/U8Ys7n\nMore from The New York Times Video:  http://nytimes.com/video\n----------\nWhether it's reporting on conflicts abroad and political divisions at home, or covering the latest style trends and scientific developments, New York Times video journalists provide a revealing and unforgettable view of the world. It's all the news that's fit to watch.\nKey moments\nView all\nCHAPTER ONE\n0:39\nAnnette Kowalski\n2:47\nCHAPTER TWO\n2:54\nWalt Kowalski\n3:10\nCHAPTER THREE\n4:44\nTranscript\nFollow along using the transcript.\nShow transcript\nThe New York Times\n4.53M subscribers\nVideos\nAbout\nSecrets of ‘Sesame Street’ Songwriting (Featur

In [89]:
# Convert the DataFrame to a list of dictionaries
new_data = new_data.to_dict(orient='records')


###### **Inference with the Fine-Tuned Model**


1. **Prepare Input for Inference:** The new input data is formatted similarly to the training data.
2. **Tokenization:** The `formatted_inputs` are tokenized using the BART tokenizer.
3. **Generate Summaries:** The `generate()` method is called on the `LoRA fine-tuned model` to generate summaries.
4. **Output Summaries:** The generated summaries are decoded and printed.

In [90]:
import torch


In [91]:
# Preparing input for inference
formatted_inputs = [
    f"Channel: {item['channel_name']}, Title: {item['video_title']}, Description: {item['video_description']}"
    for item in new_data
]

# # Tokenize input
# tokenizer = BartTokenizer.from_pretrained("facebook/bart-large-cnn")
# inputs = tokenizer(formatted_inputs, max_length=512, return_tensors="pt", truncation=True, padding="max_length")

# Move inputs to the same device as the model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
inputs = {key: value.to(device) for key, value in inputs.items()}

# Loading the fine-tuned LoRA model and tokenizer
model_path = "./fine-tuned-lora-model"
tokenizer = BartTokenizer.from_pretrained(model_path)
model = BartForConditionalGeneration.from_pretrained(model_path)

# LoRA configuration applied to the model
lora_model = get_peft_model(model, lora_config)
lora_model.to(device)

# Generate summaries
with torch.no_grad():
    outputs = lora_model.generate(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        max_length=128,
        num_beams=4,
        early_stopping=True
    )

# Decode and print summaries
generated_summaries = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(f"Generated Summaries: {generated_summaries}")

Generated Summaries: ['The Fast Lane Car takes a look at the top 5 cars we think are great, but have low sales numbers compared to their competition. The Mazda 6, Chevy SS, Jaguar FType, Cadillac ELR, and Ford F-150 are among the cars we look at. The Top 5 Great Cars That Few Buy: Surprising Overlooked Automotive Gems.']
