<h3> Using BART for summarization</h3>

The model is pretrained on a large corpus of text using a denoising autoencoder objective, where parts of the input sequence are randomly masked, and the model learns to reconstruct the original text. <br> In the code, I loaded a pretrained version of BART from the Hugging Face's transformers library, specifically the "facebook/bart-base" variant.<br><br>I did try using the bart-large version, but my laptop was unable to handle as requires more RAM.

In [2]:
!pip install transformers





In [5]:
import pandas as pd
df=pd.read_csv('Augmented_data.csv')
df_copy=df

In [7]:
df_copy.drop(df_copy.columns[0], axis=1, inplace=True)

In [8]:
df_copy

Unnamed: 0,label,review
0,__label__2,"\t""Great CD: My lovely Pat has one of the GREA..."
1,__label__2,"\t""One of the best game music soundtracks - fo..."
2,__label__1,"\t""Batteries died within a year ...: I bought ..."
3,__label__2,"\t""works fine, but Maha Energy is better: Chec..."
4,__label__2,"\t""Great for the non-audiophile: Reviewed quit..."
...,...,...
2095,__label__1,"\t""Can I shoot myself now??: A few years back ..."
2096,__label__1,\tAwful: This was one of the worst books I hav...
2097,__label__1,\tAwful: This was one of the worst books I hav...
2098,__label__1,\tCruel and Depressing Story: Hardy paints a c...


In [21]:
import torch
from transformers import BartTokenizer, BartForConditionalGeneration

def summarize_reviews(reviews, max_length=150):
    tokenizer = BartTokenizer.from_pretrained("facebook/bart-base") #bart-base used instead of bart-large due to incompatibility issues
    model = BartForConditionalGeneration.from_pretrained("facebook/bart-base")

    summaries = []

    for review in reviews:
        inputs = tokenizer.encode("summarize: " + review, return_tensors="pt", max_length=max_length, truncation=True)

        # The length of summarization can be changed accoordingly. I have used a max-length of 30 as we are summarizing reviews which arent generally very long

        summary_ids = model.generate(inputs, max_length=30, min_length=10, length_penalty=2.0, num_beams=4, early_stopping=True)
        summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
        summaries.append(summary)

    return summaries

#To check if this script is being run as a main program or imported as a module
if __name__ == "__main__":
    reviews = df_copy.loc[:30, 'review']   # i have taken 30 just to test as, my laptop cant handle huge data at once

    summary_list = summarize_reviews(reviews)

    for i, summary in enumerate(summary_list):
        print(f"Review {i + 1} Summary: {summary}")


Review 1 Summary: summarize: _________________________"Great CD: My lovely Pat has one of the GREAT voices of her generation. I have listened
Review 2 Summary: summarize: _________________"One of the best game music soundtracks - for a game I didn't really play: Despite
Review 3 Summary: summarize: _________________"Batteries died within a year...: I bought this charger in Jul 2003 and it worked
Review 4 Summary: summarize: __________________"works fine, but Maha Energy is better: Check out Maha's website. Their Pow
Review 5 Summary: summarize: _________________"Great for the non-audiophile: Reviewed quite a bit of the combo players and was
Review 6 Summary: summarize: _________________"DVD Player crapped out after one year: I also began having the incorrect disc problems that I
Review 7 Summary: summarize: _________________"Incorrect Disc: I love the style of this, but after a couple years, the DVD
Review 8 Summary: summarize: _________________"DVD menu select problems: I cannot scrol

In [19]:
df_copy.loc[:30,'review']

0     \t"Great CD: My lovely Pat has one of the GREA...
1     \t"One of the best game music soundtracks - fo...
2     \t"Batteries died within a year ...: I bought ...
3     \t"works fine, but Maha Energy is better: Chec...
4     \t"Great for the non-audiophile: Reviewed quit...
5     \t"DVD Player crapped out after one year: I al...
6     \t"Incorrect Disc: I love the style of this, b...
7     \t"DVD menu select problems: I cannot scroll t...
8     \t"Unique Weird Orientalia from the 1930's: Ex...
9     \t"Not an ""ultimate guide"": Firstly,I enjoye...
10    \t"Great book for travelling Europe: I current...
11    \t"Not!: If you want to listen to El Duke , th...
12    \t"A complete Bust: This game requires quickti...
13    \tTRULY MADE A DIFFERENCE!: I have been using ...
14    \t"didn't run off of USB bus power: Was hoping...
15    \t"Don't buy!: First of all, the company took ...
16    \t"Simple, Durable, Fun game for all ages: Thi...
17    \t"Review of Kelly Club for Toddlers: For 