In [None]:
import numpy as np
import pandas as pd
from google.colab import drive
drive.mount('/content/drive')
path = '/content/drive/MyDrive/NLP_spoiler'

Mounted at /content/drive


In [None]:
%%capture
! pip install bert_score

In [None]:
%%capture
! pip install -U sentence-transformers

# Openai Reviews - Evaluation

In this section, we explore different scores and metrics that evaluate the rephrasing of the spoiler-containing reviews. The elements we consider for evaluation are the original review, the rephrased review, the plot summary and the plot synopsis.

## Openai prepare

In [None]:
openai_reviews = pd.read_csv(f"{path}/data/openai_reviews.csv").reset_index(drop=True)
openai_reviews.dropna(inplace=True)
openai_reviews.reset_index(inplace = True, drop = True)
openai_reviews = openai_reviews.loc[:79, :]
openai_reviews.head()

Unnamed: 0,movie_id,plot_summary,genre,release_date,plot_synopsis,review_date,user_id,is_spoiler,review_text,review_text_len,rephrased_review
0,tt0105112,"Former CIA analyst, Jack Ryan is in England wi...","['Action', 'Thriller']",1992-06-05,"Jack Ryan (Ford) is on a ""working vacation"" in...",5 March 2008,ur16517420,True,The second Tom Clancy novel made into a film (...,341,In the second film adaptation of a Tom Clancy ...
1,tt1204975,"Billy (Michael Douglas), Paddy (Robert De Niro...",['Comedy'],2013-11-01,Four boys around the age of 10 are friends in ...,2 June 2014,ur5291991,True,Last Vegas is a comedy that features an ensemb...,220,Last Vegas is a comedy featuring a star-studde...
2,tt0040897,"Fred C. Dobbs and Bob Curtin, both down on the...","['Adventure', 'Drama', 'Western']",1948-01-24,Fred Dobbs (Humphrey Bogart) and Bob Curtin (T...,27 June 2004,ur1406078,True,John Huston's genius as a director is undeniab...,337,John Huston's talent as a director shines thro...
3,tt0126886,Tracy Flick is running unopposed for this year...,"['Comedy', 'Drama', 'Romance']",1999-05-07,Jim McAllister (Matthew Broderick) is a much-a...,16 January 2009,ur8239592,True,"Popular, but frustrated high school civics tea...",273,"In this sharp and witty high school comedy, de..."
4,tt0286716,"Bruce Banner, a brilliant scientist with a clo...","['Action', 'Sci-Fi']",2003-06-20,Bruce Banner (Eric Bana) is a research scienti...,1 December 2011,ur24340247,True,its sad that such an underrated film hulk 2003...,150,"It's disappointing that the 2003 film ""Hulk"" i..."


## BERTScore

BERTScore is an automatic evaluation of generated text, proposed by [Zhang et al. (2020)](https://arxiv.org/pdf/1904.09675) as an alternative to existing scores, such as BLEU or METEOR. BERTScore uses pre-trained contextual embeddings from models like BERT to compute the cosine similarity between the original and the generated texts. Unlike other methods which focus on n-gram matches, BERTScore is able to understand similarities in context and meanings. This makes it an ideal metric for understanding whether a corpus has been paraphrased correctly.

A core point of our research is that of rephrasing spoiler-containing reviews. The rephrased review must match the style and opinions of the original reviewer as much as possible. We compute the BERTScore between the paraphrased review and the original one, as well as between the paraphrased review and the spoiler-mask version of the original review.

The output of the BERTScore function is composed of precision, recall and F1 score. In this context, precision measures how many tokens in the generated candidate text have semantically similar corresponding tokens in the original one. Recall measures how many of the tokens in the original text have corresponding tokens in the generated text that are semantically similar. Lastly, F1 is the armonic mean of the previous two measures.


In [None]:
openai_bertscore_df = openai_reviews[['plot_summary', 'plot_synopsis', 'rephrased_review', 'review_text']]
openai_bertscore_df.dropna(inplace=True)
openai_bertscore_df = openai_bertscore_df.loc[:78,:]
openai_bertscore_df.shape

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  openai_bertscore_df.dropna(inplace=True)


(79, 4)

In [None]:
openai_bertscore_df.head()

Unnamed: 0,plot_summary,plot_synopsis,rephrased_review,review_text
0,"Former CIA analyst, Jack Ryan is in England wi...","Jack Ryan (Ford) is on a ""working vacation"" in...",In the second film adaptation of a Tom Clancy ...,The second Tom Clancy novel made into a film (...
1,"Billy (Michael Douglas), Paddy (Robert De Niro...",Four boys around the age of 10 are friends in ...,Last Vegas is a comedy featuring a star-studde...,Last Vegas is a comedy that features an ensemb...
2,"Fred C. Dobbs and Bob Curtin, both down on the...",Fred Dobbs (Humphrey Bogart) and Bob Curtin (T...,John Huston's talent as a director shines thro...,John Huston's genius as a director is undeniab...
3,Tracy Flick is running unopposed for this year...,Jim McAllister (Matthew Broderick) is a much-a...,"In this sharp and witty high school comedy, de...","Popular, but frustrated high school civics tea..."
4,"Bruce Banner, a brilliant scientist with a clo...",Bruce Banner (Eric Bana) is a research scienti...,"It's disappointing that the 2003 film ""Hulk"" i...",its sad that such an underrated film hulk 2003...


In [None]:
from bert_score import score

for _, row in openai_bertscore_df.iterrows():
    P, R, F1 = score([row['rephrased_review']], [row['review_text']], lang='en', verbose=False)
    openai_bertscore_df.loc[_, 'BertScore_P'] = P.item()
    openai_bertscore_df.loc[_, 'BertScore_R'] = R.item()
    openai_bertscore_df.loc[_, 'BertScore_F1'] = F1.item()

openai_bertscore_df.head()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/482 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['ro

Unnamed: 0,plot_summary,plot_synopsis,rephrased_review,review_text,BertScore_P,BertScore_R,BertScore_F1
0,"Former CIA analyst, Jack Ryan is in England wi...","Jack Ryan (Ford) is on a ""working vacation"" in...",In the second film adaptation of a Tom Clancy ...,The second Tom Clancy novel made into a film (...,0.889449,0.822509,0.85467
1,"Billy (Michael Douglas), Paddy (Robert De Niro...",Four boys around the age of 10 are friends in ...,Last Vegas is a comedy featuring a star-studde...,Last Vegas is a comedy that features an ensemb...,0.909648,0.878852,0.893985
2,"Fred C. Dobbs and Bob Curtin, both down on the...",Fred Dobbs (Humphrey Bogart) and Bob Curtin (T...,John Huston's talent as a director shines thro...,John Huston's genius as a director is undeniab...,0.902205,0.863175,0.882259
3,Tracy Flick is running unopposed for this year...,Jim McAllister (Matthew Broderick) is a much-a...,"In this sharp and witty high school comedy, de...","Popular, but frustrated high school civics tea...",0.899278,0.829411,0.862933
4,"Bruce Banner, a brilliant scientist with a clo...",Bruce Banner (Eric Bana) is a research scienti...,"It's disappointing that the 2003 film ""Hulk"" i...",its sad that such an underrated film hulk 2003...,0.897287,0.879418,0.888262


In [None]:
print(f"Average precision: {round(openai_bertscore_df['BertScore_P'].mean(), 4)}")
print(f"Average recall: {round(openai_bertscore_df['BertScore_R'].mean(), 4)}")
print(f"Average F1: {round(openai_bertscore_df['BertScore_F1'].mean(), 4)}")

Average precision: 0.8919
Average recall: 0.8545
Average F1: 0.8727


## METEOR Score

As a second metric for evaluation, we use the METEOR score. Unlike traditional metrics, like BLEU, it includes synonyms, stemmings and paraphrasing. After aligning chunks of text from the candidate and reference sentences, it balances precision and recall based on these alignments. It then combines them into an F1-score, applying a fragmentation penalty that penalizes disjoint matches.

Despite the superiority of BERTscore, which analyzes contextual meanings, METEOR provides a more comprehensive evaluation. Specifically, this measaure accounts for synonym and paraphrase flexiblity.

In [None]:
meteor_df = openai_reviews[['rephrased_review', 'review_text']]
meteor_df.dropna(inplace=True)
meteor_df.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  meteor_df.dropna(inplace=True)


Unnamed: 0,rephrased_review,review_text
0,In the second film adaptation of a Tom Clancy ...,The second Tom Clancy novel made into a film (...
1,Last Vegas is a comedy featuring a star-studde...,Last Vegas is a comedy that features an ensemb...
2,John Huston's talent as a director shines thro...,John Huston's genius as a director is undeniab...
3,"In this sharp and witty high school comedy, de...","Popular, but frustrated high school civics tea..."
4,"It's disappointing that the 2003 film ""Hulk"" i...",its sad that such an underrated film hulk 2003...


In [None]:
import pandas as pd
from nltk.translate.meteor_score import meteor_score
from nltk.tokenize import word_tokenize
import nltk

nltk.download('punkt')
nltk.download('wordnet')

def calculate_meteor(row):
    tokenized_rephrase = word_tokenize(row['rephrased_review'])
    tokenized_original = word_tokenize(row['review_text'])

    return meteor_score([tokenized_original], tokenized_rephrase)

openai_bertscore_df['meteor'] = meteor_df.apply(calculate_meteor, axis=1)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...


In [None]:
print(f"Average METEOR score: {round(openai_bertscore_df['meteor'].mean(), 4)}")

Average METEOR score: 0.2596


## Cosine similarities for S5

In [None]:
openai_bertscore_df.reset_index(drop = True, inplace = True)

In [None]:
from sentence_transformers import SentenceTransformer, util
import numpy as np

model = SentenceTransformer('sentence-transformers/sentence-t5-base')



def spoiler_penalty_score(row):
  ''' Calculate the spoiler similarity scores for a given row in the dataframe. Used to calculate the spoiler Adjustment factor.
  '''
  original_embedding = model.encode(row['review_text'])
  rephrase_embedding = model.encode(row['rephrased_review'])
  synopsis_embedding = model.encode(row['plot_synopsis'])
  summary_embedding = model.encode(row['plot_summary'])

  original_synopsis_similarity = util.cos_sim(original_embedding, synopsis_embedding).numpy()[0][0]
  rephrase_synopsis_similarity = util.cos_sim(rephrase_embedding, synopsis_embedding).numpy()[0][0]
  original_summary_similarity = util.cos_sim(original_embedding, summary_embedding).numpy()[0][0]
  rephrase_summary_similarity = util.cos_sim(rephrase_embedding, summary_embedding).numpy()[0][0]
  synosum_similarity = util.cos_sim(synopsis_embedding, summary_embedding).numpy()[0][0]
  orireph_similarity = util.cos_sim(original_embedding, rephrase_embedding).numpy()[0][0]

  return  original_summary_similarity, rephrase_summary_similarity, original_synopsis_similarity, rephrase_synopsis_similarity, synosum_similarity, orireph_similarity

for row in range(len(openai_bertscore_df)):
  original_summary_similarity, rephrase_summary_similarity, original_synopsis_similarity, rephrase_synopsis_similarity, synosum_similarity,orireph_similarity = spoiler_penalty_score(openai_bertscore_df.loc[row,:])
  openai_bertscore_df.loc[row,['original_summary_score', 'rephrase_summary_score', 'original_synopsis_score', 'rephrase_synopsis_score', 'synosum_score', 'orireph_similarity']] = original_summary_similarity, rephrase_summary_similarity, original_synopsis_similarity, rephrase_synopsis_similarity, synosum_similarity, orireph_similarity




modules.json:   0%|          | 0.00/461 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/1.98k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/1.39k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/219M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.92k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/1.79k [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

2_Dense/config.json:   0%|          | 0.00/115 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.36M [00:00<?, ?B/s]

rust_model.ot:   0%|          | 0.00/2.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.36M [00:00<?, ?B/s]

In [None]:
print(f"Average BERT score: {openai_bertscore_df['BertScore_F1'].mean()}")


# Llama Reviews - Evaluation

## Llama prepare

In [None]:
reviews = pd.read_csv(f"{path}/data/llama_reviews.csv").reset_index(drop=True)
reviews.dropna(inplace=True)
reviews.reset_index(inplace = True, drop = True)

In [None]:
import re
matched_rows = reviews[reviews['rephrased'].str.contains('\n\n')]
print(len(matched_rows))

reviews.loc[matched_rows.index,'rephrased'] = reviews.loc[matched_rows.index,'rephrased'].apply(lambda x : re.sub(r"I\'ve rephrased.*?\n\n", '', x, flags=re.DOTALL))
reviews.loc[matched_rows.index,'rephrased'] = reviews.loc[matched_rows.index,'rephrased'].apply(lambda x : re.sub(r"I\'ve rewritten.*?\n\n", '', x, flags=re.DOTALL))
reviews.loc[matched_rows.index,'rephrased'] = reviews.loc[matched_rows.index,'rephrased'].apply(lambda x : re.sub(r"Here is.*?\n\n", '', x, flags=re.DOTALL))
reviews.loc[matched_rows.index,'rephrased'] = reviews.loc[matched_rows.index,'rephrased'].apply(lambda x : re.sub(r"I removed\sany.*?\n\n", '', x, flags=re.DOTALL))
reviews.loc[matched_rows.index,'rephrased'] = reviews.loc[matched_rows.index,'rephrased'].apply(lambda x : re.sub(r"(\"*\w*\'*)*\s*\:*\n\n", '', x))
reviews.loc[matched_rows.index,'rephrased'] = reviews.loc[matched_rows.index,'rephrased'].apply(lambda x : re.sub(r"\"*REVIEW\'*\s*\:\s*",'',x))

72


In [None]:
llama_bertscore_df = reviews[['plot_summary', 'plot_synopsis', 'rephrased', 'review_text']]
llama_bertscore_df = llama_bertscore_df.loc[:78,:]
llama_bertscore_df.shape


(79, 4)

In [None]:
llama_bertscore_df.rephrased.head()

0    The second Tom Clancy novel made into a film, ...
1    "Last Vegas is a comedy that features an ensem...
2    John Huston's genius as a director is undeniab...
3    Popular, but frustrated high school civics tea...
4    It's sad that such an underrated film, Hulk 20...
Name: rephrased, dtype: object

## BERTScore

In [None]:
from bert_score import score

for _, row in llama_bertscore_df.iterrows():
    P, R, F1 = score([row['rephrased']], [row['review_text']], lang='en', verbose=False)
    llama_bertscore_df.loc[_, 'BertScore_P'] = P.item()
    llama_bertscore_df.loc[_, 'BertScore_R'] = R.item()
    llama_bertscore_df.loc[_, 'BertScore_F1'] = F1.item()

llama_bertscore_df.head()

In [None]:
print(f"Average precision: {round(llama_bertscore_df['BertScore_P'].mean(), 4)}")
print(f"Average recall: {round(llama_bertscore_df['BertScore_R'].mean(), 4)}")
print(f"Average F1: {round(llama_bertscore_df['BertScore_F1'].mean(), 4)}")

Average precision: 0.9203
Average recall: 0.8973
Average F1: 0.9084


## METEOR Score

In [None]:
meteor_df = reviews[['rephrased', 'review_text']]
meteor_df.head()

Unnamed: 0,rephrased,review_text
0,"The second Tom Clancy novel made into a film, ...",The second Tom Clancy novel made into a film (...
1,"""Last Vegas is a comedy that features an ensem...",Last Vegas is a comedy that features an ensemb...
2,John Huston's genius as a director is undeniab...,John Huston's genius as a director is undeniab...
3,"Popular, but frustrated high school civics tea...","Popular, but frustrated high school civics tea..."
4,"It's sad that such an underrated film, Hulk 20...",its sad that such an underrated film hulk 2003...


In [None]:
import pandas as pd
from nltk.translate.meteor_score import meteor_score
from nltk.tokenize import word_tokenize
import nltk

nltk.download('punkt')
nltk.download('wordnet')

def calculate_meteor(row):
    tokenized_rephrase = word_tokenize(row['rephrased'])
    tokenized_original = word_tokenize(row['review_text'])

    return meteor_score([tokenized_original], tokenized_rephrase)

llama_bertscore_df['meteor'] = meteor_df.apply(calculate_meteor, axis=1)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [None]:
print(f"Average METEOR score: {round(llama_bertscore_df['meteor'].mean(), 4)}")

Average METEOR score: 0.5208


## Cosine similarities for S5

In [None]:
from sentence_transformers import SentenceTransformer, util
import numpy as np

model = SentenceTransformer('sentence-transformers/sentence-t5-base')



def spoiler_penalty_score_sigm(row):
  ''' Calculate the spoiler similarity scores for a given row in the dataframe. Used to calculate the spoiler Adjustment factor.'''

  original_embedding = model.encode(row['review_text'])
  rephrase_embedding = model.encode(row['rephrased'])
  synopsis_embedding = model.encode(row['plot_synopsis'])
  summary_embedding = model.encode(row['plot_summary'])

  original_synopsis_similarity = util.cos_sim(original_embedding, synopsis_embedding).numpy()[0][0]
  rephrase_synopsis_similarity = util.cos_sim(rephrase_embedding, synopsis_embedding).numpy()[0][0]
  original_summary_similarity = util.cos_sim(original_embedding, summary_embedding).numpy()[0][0]
  rephrase_summary_similarity = util.cos_sim(rephrase_embedding, summary_embedding).numpy()[0][0]
  synosum_similarity = util.cos_sim(synopsis_embedding, summary_embedding).numpy()[0][0]
  orireph_similarity = util.cos_sim(original_embedding, rephrase_embedding).numpy()[0][0]

  return  original_summary_similarity, rephrase_summary_similarity, original_synopsis_similarity, rephrase_synopsis_similarity, synosum_similarity, orireph_similarity



for row in range(len(llama_bertscore_df)):
  original_summary_similarity, rephrase_summary_similarity, original_synopsis_similarity, rephrase_synopsis_similarity, synosum_similarity, orireph_similarity = spoiler_penalty_score_sigm(llama_bertscore_df.loc[row,:])
  llama_bertscore_df.loc[row,['original_summary_score', 'rephrase_summary_score', 'original_synopsis_score', 'rephrase_synopsis_score', 'synosum_score','orireph_similarity']] =  original_summary_similarity, rephrase_summary_similarity, original_synopsis_similarity, rephrase_synopsis_similarity, synosum_similarity,orireph_similarity




# Save data into external datasets

In [None]:
openai_bertscore_df.to_csv(f'{path}/data/openai_evaluation_final.csv', index=False)
llama_bertscore_df.to_csv(f'{path}/data/llama_evaluation_final.csv', index = False)

# LLMs Comparison

In [None]:
openai_bertscore_df = pd.read_csv(f'{path}/data/openai_evaluation_final.csv')
llama_bertscore_df = pd.read_csv(f'{path}/data/llama_evaluation_final.csv')

#we remove the last review as due to issues with openai, its 80th rephrased_review was misplaced
openai_bertscore_df = openai_bertscore_df.loc[:78,:]
llama_bertscore_df = llama_bertscore_df.loc[:78,:]
openai_bertscore_df.shape

(79, 16)

In [None]:
openai_bertscore_df.columns

Index(['plot_summary', 'plot_synopsis', 'rephrased_review', 'review_text',
       'BertScore_P', 'BertScore_R', 'BertScore_F1', 'meteor',
       'spoiler_penalty', 'original_summary_score', 'rephrase_summary_score',
       'original_synopsis_score', 'rephrase_synopsis_score', 'synosum_score',
       'orireph_similarity', 'S5'],
      dtype='object')

In [None]:
#we compute the difference in similarity with the synopsis, between the original review and its rephrased version, and we normalize it
#a higher gain_over_syno means that the rephrased review is less similar to a spoiler-containing synopsis than the original review. this positively affects the adjustment factor
gain_over_syno = (openai_bertscore_df.original_synopsis_score - openai_bertscore_df.rephrase_synopsis_score)/openai_bertscore_df.original_synopsis_score

#we compute the difference in similarity with the summary, between the rephrased review and its original version, and we normalize it
#a higher gain_over_sum means that the rephrased review is more similar to a spoiler_free summary than the original review. this positively affects the adjustment factor
gain_over_sum = (openai_bertscore_df.rephrase_summary_score - openai_bertscore_df.original_summary_score)/openai_bertscore_df.original_summary_score

#combine the two gains into the adjustment factor
openai_bertscore_df['adjustment_factor'] = gain_over_syno + gain_over_sum
openai_bertscore_df['S5'] = openai_bertscore_df['BertScore_F1']* ( 1 + openai_bertscore_df['adjustment_factor'])

In [None]:
openai_bertscore_df[['BertScore_F1', 'S5', 'adjustment_factor', 'meteor']].describe()

Unnamed: 0,BertScore_F1,S5,adjustment_factor,meteor
count,79.0,79.0,79.0,79.0
mean,0.872728,0.873681,0.001061,0.2596
std,0.031614,0.033894,0.011564,0.155132
min,0.829782,0.813078,-0.030933,0.066647
25%,0.850186,0.847224,-0.003296,0.166751
50%,0.864654,0.870297,0.000402,0.224665
75%,0.888103,0.891355,0.004734,0.306165
max,0.990578,0.990342,0.042963,0.950046


In [None]:
#we reiterate the previous calculation on llama's rephrased reviews
gain_over_syno = (llama_bertscore_df.original_synopsis_score- llama_bertscore_df.rephrase_synopsis_score)/llama_bertscore_df.original_synopsis_score
gain_over_sum = (llama_bertscore_df.rephrase_summary_score - llama_bertscore_df.original_summary_score)/llama_bertscore_df.original_summary_score

llama_bertscore_df['adjustment_factor'] = gain_over_syno + gain_over_sum
llama_bertscore_df['S5'] = llama_bertscore_df['BertScore_F1'] * ( 1 + llama_bertscore_df['adjustment_factor'])

In [None]:
llama_bertscore_df[['BertScore_F1', 'S5', 'adjustment_factor', 'meteor']].describe()

Unnamed: 0,BertScore_F1,S5,adjustment_factor,meteor
count,79.0,79.0,79.0,79.0
mean,0.908433,0.90845,6e-05,0.520772
std,0.042578,0.042494,0.008482,0.247964
min,0.794113,0.794554,-0.021955,0.029077
25%,0.877811,0.875364,-0.004964,0.343455
50%,0.909101,0.912643,-0.000332,0.492983
75%,0.937926,0.937676,0.003908,0.72056
max,0.994772,0.993995,0.025582,0.978428


In [None]:
#we use as a test case the reviews whose rephrased version had the highest similarity with the plot's synopsis, among those made by llama
print('CHAT')
print(openai_bertscore_df.loc[llama_bertscore_df['rephrase_synopsis_score'].argmax(), ['BertScore_F1', 'S5', 'adjustment_factor', 'meteor','original_synopsis_score', 'rephrase_synopsis_score', 'original_summary_score','rephrase_summary_score']])
print()
print('LLAMA')
print(llama_bertscore_df.loc[llama_bertscore_df['rephrase_synopsis_score'].argmax(), ['BertScore_F1', 'S5', 'adjustment_factor', 'meteor','original_synopsis_score', 'rephrase_synopsis_score', 'original_summary_score', 'rephrase_summary_score']])
print()
print('CHAT')
display(openai_bertscore_df.loc[llama_bertscore_df['rephrase_synopsis_score'].argmax(), 'rephrased_review'])
print()
print('LLAMA')
display(llama_bertscore_df.loc[llama_bertscore_df['rephrase_synopsis_score'].argmax(), 'rephrased'])
print()
print('ORIGINAL REVIEW')
display(llama_bertscore_df.loc[llama_bertscore_df['rephrase_synopsis_score'].argmax(), 'review_text'])

CHAT
BertScore_F1               0.862933
S5                         0.868052
adjustment_factor          0.005932
meteor                     0.198712
original_synopsis_score    0.954373
rephrase_synopsis_score    0.941717
original_summary_score     0.906419
rephrase_summary_score     0.899777
Name: 3, dtype: object

LLAMA
BertScore_F1               0.908091
S5                         0.908199
adjustment_factor          0.000119
meteor                     0.467202
original_synopsis_score    0.954373
rephrase_synopsis_score    0.949229
original_summary_score     0.906419
rephrase_summary_score     0.901642
Name: 3, dtype: object

CHAT


"In this sharp and witty high school comedy, dedicated civics teacher Jim McAllister clashes with the determined and ambitious student Tracy Flick, played brilliantly by Reese Witherspoon. McAllister persuades affable athlete Paul Metzler to run for class president, leading to unexpected complications when Paul's spirited sister Tammy also enters the race. Director Alexander Payne and writer Jim Taylor craft a biting satire that cleverly mirrors broader societal issues through the lens of high school dynamics. The film skillfully tackles themes like ethics, politics, relationships, and the consequences of one's actions. The stellar cast, including standout performances from Matthew Broderick, Chris Klein, and Jessica Campbell, delivers a solid portrayal of the characters. With engaging cinematography and a lively score, this film is a delightful and thought-provoking watch."


LLAMA


'Popular, but frustrated high school civics teacher Jim McAllister (an excellent performance by Matthew Broderick) clashes with a driven and ambitious student, Tracy Flick (Reese Witherspoon in peak form), who\'s determined to become president of Carver High\'s student body. McAllister convinces a likable student, Paul Metzler (played with endearingly goofy charm by Chris Klein), to run for class president. Complications arise when Paul\'s sister Tammy (the adorable Jessica Campbell) decides to join the presidential race. Meanwhile, McAllister\'s personal life begins to unravel. Director/co-writer Alexander Payne and co-writer Jim Taylor craft a hilariously savage, cynical, and unsentimental satire on American society, using high school as a microcosm of the world at large. The film tackles a range of topics, including morals, politics, teen sexuality, and the consequences of one\'s actions. The cast delivers uniformly fine performances, with Broderick and Witherspoon engaging in a del


ORIGINAL REVIEW


"Popular, but frustrated high school civics teacher Jim McAllister (an excellent performance by Matthew Broderick) locks horns with ruthlessly driven and ambitious overachiever Tracy Flick (Reese Witherspoon in peak aggressively obnoxious form), who's determined to become president of Carver High's student body. McAllister convinces amiable dumb jock Paul Metzler (played with endearingly goofy charm by Chris Klein) to run for class president. Complications ensue when Paul's sassy lesbian sister Tammy (the adorable Jessica Campbell) decides to join the presidential race. Plus McAllister's personal life is starting to unravel. Director/co-writer Alexander Payne and co-writer Jim Taylor concoct a hilariously savage, cynical and unsentimental no-holds-barred satire on American society as a general whole which ingeniously uses high school as an apt microcosm of the world at large: we've got fiercely barbed commentary on such worthy topics as morals, politics, teen sexuality, marital infidel