<a href="https://colab.research.google.com/github/mdurgasrikari/INFO_5731_Group_3_Project/blob/main/INFO_5731_Group_3_Project_Code_Final.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Import packages

In [None]:
!pip install bert-score
!pip install -U spacy
!pip install gensim

# Download spaCy model for word embeddings
!python -m spacy download en_core_web_md

import spacy
# Load spaCy model with word vectors
nlp = spacy.load('en_core_web_md')

Collecting bert-score
  Downloading bert_score-0.3.13-py3-none-any.whl (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.1/61.1 kB[0m [31m955.9 kB/s[0m eta [36m0:00:00[0m
Installing collected packages: bert-score
Successfully installed bert-score-0.3.13
Collecting en-core-web-md==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.7.1/en_core_web_md-3.7.1-py3-none-any.whl (42.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.8/42.8 MB[0m [31m26.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: en-core-web-md
Successfully installed en-core-web-md-3.7.1
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_md')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kern

In [None]:
# Installing Gemini API using pip.

!pip install -q -U google-generativeai

# Importing required packages and converting input text to Markdown format (lightweight markup language with plain-text formatting syntax  to create rich text using a simple and easy-to-read syntax)

import pathlib
import textwrap

import google.generativeai as genai

from IPython.display import display
from IPython.display import Markdown


def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m146.8/146.8 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.5/664.5 kB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.8/48.8 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[?25h

### Setup your API key

Before we use the Gemini API, we must first obtain an API key. We created the key using Google AI Studio.

<a class="button button-primary" href="https://makersuite.google.com/app/apikey" target="_blank" rel="noopener noreferrer">Get an API key</a>

In [None]:
# Used to securely store the API key
from google.colab import userdata

#Created a API key using google account and added the key to this project
#Or use `os.getenv('GOOGLE_API_KEY')` to fetch an environment variable.

GOOGLE_API_KEY=userdata.get('gemini_key')

genai.configure(api_key=GOOGLE_API_KEY)

In Colab, add the key to the secrets manager under the "🔑" in the left panel. Give it the name `GOOGLE_API_KEY`.

## List models

Now we'll call the Gemini API. We will use `list_models` to see the available Gemini models:

* `gemini-pro`: optimized for text-only prompts.
* `gemini-pro-vision`: optimized for text-and-images prompts.

In [None]:
for m in genai.list_models():
  if 'generateContent' in m.supported_generation_methods:
    print(m.name)

models/gemini-1.0-pro
models/gemini-1.0-pro-001
models/gemini-1.0-pro-latest
models/gemini-1.0-pro-vision-latest
models/gemini-1.5-pro-latest
models/gemini-pro
models/gemini-pro-vision


## Generate text from image and text inputs

Gemini provides a multimodal model (`gemini-pro-vision`) that accepts both text and images and inputs. The `GenerativeModel.generate_content` API is designed to handle multimodal prompts and returns a text output.

In [None]:
# Main source code to generate AI summary for each image

import os
import pandas as pd
import zipfile
from PIL import Image
import google.generativeai as genai

def generate_image_summaries_to_dataframe(zip_file_path):
  """Generates summaries for images within a zip folder and creates a Pandas DataFrame.

  Args:
      zip_file_path (str): Path to the zip folder containing images.

  Returns:
      pd.DataFrame: DataFrame with columns 'Image Name' and 'Summary'.
  """

  data = []
  with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    for image_file in zip_ref.namelist():
      if image_file.lower().endswith(('.jpg', '.jpeg', '.png')):
        # Extract image from zip
        zip_ref.extract(image_file)

        # Load image
        img = Image.open(image_file)

        # Generate summary using GenAI (assuming model is loaded)
        model = genai.GenerativeModel('gemini-pro-vision')
        response = model.generate_content(img)
        #summary = response.text
        if response.parts:
          summary = response.text
        else:
          # Handle invalid response
          summary = "Error: Invalid response from model"

        # Append data for DataFrame
        data.append({'Image Name': image_file, 'AI generated Summary': summary})

        # Delete extracted image
        os.remove(image_file)

  # Create DataFrame
  df = pd.DataFrame(data)
  return df

# Example usage
zip_file_path = 'images.zip'
# Replace with your zip file path
images_ex = generate_image_summaries_to_dataframe(zip_file_path)
images_ex.head(6)

Unnamed: 0,Image Name,AI generated Summary
0,1.jpg,The diagram shows the relationships between d...
1,10.jpg,The diagram shows the relationships between d...
2,100.png,This table shows the ablation study results. ...
3,101.png,"As can be seen from the table, when the numbe..."
4,102.png,The table shows the effectiveness of the prop...
5,103.jpg,The diagram shows different types of web anal...


In [None]:
#Reading author summary example file
author_summary_ex = pd.read_csv('/content/Authour_Summary_example.csv')
# Joining Author summary Example csv with AI generated summary Examples
images_ex_1 = images_ex.reset_index()
final_data = images_ex_1.merge(author_summary_ex, on = 'index')
final_data.head(4)

Unnamed: 0,index,Image Name,AI generated Summary,Author Summary,Journal,Research Paper Name,Research Paper Link,Image Link,Image Caption,Diagram(Flowdiagram/ Table),Team Member
0,0,1.jpg,The diagram shows the relationships between d...,Studies based on bibliometric analysis general...,Information Processing & Management,New trends in bibliometric APIs: A comparative...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1VuXbnOJISyoGF...,Fig. 1.Main process of bibliometric analysis.,Flowdiagram,Durga Srikari Maguluri
1,1,10.jpg,The diagram shows the relationships between d...,"In Fig. 3, the available extension options are...",Information Processing & Management,New trends in bibliometric APIs: A comparative...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1sf1PXV5FpvL7a...,Fig. 3. Diagram of extensions through identifi...,Diagram,Durga Srikari Maguluri
2,2,100.png,This table shows the ablation study results. ...,"As shown in Fig. 1, we take a two-step approac...",Information Processing & Management,Do you see what I see? Images of the COVID-19 ...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1dDh19kgXigFOE...,Fig. 1. Overview of the data collection and an...,Flowdiagram,Durga Srikari Maguluri
3,3,101.png,"As can be seen from the table, when the numbe...",We first analyzed how similar our datasets are...,Information Processing & Management,Is my stance the same as your stance? A cross ...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1Ww4A9RYbwTE4n...,Fig. 1. Similarity across dataset through bag-...,Diagram,Durga Srikari Maguluri


### Basic Cosine Similarity

In [None]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer

def preprocess_text(text):
  """Preprocesses text for similarity comparison."""
  text = text.lower()
  text = ''.join([c for c in text if c.isalnum() or c.isspace()])  # Remove punctuation
  stop_words = stopwords.words('english')
  words = [w for w in text.split() if w not in stop_words]
  stemmer = PorterStemmer()
  stemmed_words = [stemmer.stem(w) for w in words]
  return stemmed_words

def calculate_similarity(summary1, summary2, metric='cosine'):
  """Calculates similarity between summaries using chosen metric."""
  if metric == 'cosine':
    vectorizer = TfidfVectorizer()
    vectors = vectorizer.fit_transform([summary1, summary2])
    return vectors.toarray().dot(vectors.toarray().T)[0, 1]
  elif metric == 'jaccard':
    summary1_words = set(preprocess_text(summary1))
    summary2_words = set(preprocess_text(summary2))
    intersection = len(summary1_words.intersection(summary2_words))
    union = len(summary1_words.union(summary2_words))
    return intersection / union if union else 0
  else:
    raise ValueError("Invalid metric. Choose 'cosine' or 'jaccard'.")

# Choose similarity metric (cosine or jaccard)
metric = 'cosine'

final_data['Cosine_Similarity Score'] = final_data.apply(lambda row: calculate_similarity(row['AI generated Summary'], row['Author Summary'], metric), axis=1)

final_data.head(5)

Unnamed: 0,index,Image Name,AI generated Summary,Author Summary,Journal,Research Paper Name,Research Paper Link,Image Link,Image Caption,Diagram(Flowdiagram/ Table),Team Member,Cosine_Similarity Score
0,0,1.jpg,The diagram shows the relationships between d...,Studies based on bibliometric analysis general...,Information Processing & Management,New trends in bibliometric APIs: A comparative...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1VuXbnOJISyoGF...,Fig. 1.Main process of bibliometric analysis.,Flowdiagram,Durga Srikari Maguluri,0.346813
1,1,10.jpg,The diagram shows the relationships between d...,"In Fig. 3, the available extension options are...",Information Processing & Management,New trends in bibliometric APIs: A comparative...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1sf1PXV5FpvL7a...,Fig. 3. Diagram of extensions through identifi...,Diagram,Durga Srikari Maguluri,0.32848
2,2,100.png,This table shows the ablation study results. ...,"As shown in Fig. 1, we take a two-step approac...",Information Processing & Management,Do you see what I see? Images of the COVID-19 ...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1dDh19kgXigFOE...,Fig. 1. Overview of the data collection and an...,Flowdiagram,Durga Srikari Maguluri,0.394983
3,3,101.png,"As can be seen from the table, when the numbe...",We first analyzed how similar our datasets are...,Information Processing & Management,Is my stance the same as your stance? A cross ...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1Ww4A9RYbwTE4n...,Fig. 1. Similarity across dataset through bag-...,Diagram,Durga Srikari Maguluri,0.323956
4,4,102.png,The table shows the effectiveness of the prop...,"Information anxiety, being a cluster of negati...",Information Processing & Management,From information seeking to information avoida...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1qT4mcyKvdEhVd...,Fig. 3. Proposed Research Model based on the S...,Flow Diagram,Durga Srikari Maguluri,0.168728


### Rouge-WE

In [None]:
import spacy
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

def calculate_rouge_we(ai_summary, author_summary):
    # Tokenize and process AI-generated summary
    ai_doc = nlp(ai_summary)
    ai_tokens = [token for token in ai_doc if not token.is_stop]

    # Tokenize and process Author summary
    author_doc = nlp(author_summary)
    author_tokens = [token for token in author_doc if not token.is_stop]

    # Calculate word embeddings for AI-generated summary
    ai_vec = np.mean([token.vector for token in ai_tokens], axis=0).reshape(1, -1)

    # Calculate word embeddings for Author summary
    author_vec = np.mean([token.vector for token in author_tokens], axis=0).reshape(1, -1)

    # Compute cosine similarity between word embeddings
    similarity_score = cosine_similarity(ai_vec, author_vec)[0][0]

    return similarity_score

# Apply calculate_rouge_we function to DataFrame
final_data['ROUGE-WE'] = final_data.apply(lambda row: calculate_rouge_we(row['AI generated Summary'], row['Author Summary']), axis=1)

# Display DataFrame with ROUGE-WE scores
final_data.head(6)

Unnamed: 0,index,Image Name,AI generated Summary,Author Summary,Journal,Research Paper Name,Research Paper Link,Image Link,Image Caption,Diagram(Flowdiagram/ Table),Team Member,Cosine_Similarity Score,ROUGE-WE
0,0,1.jpg,The diagram shows the relationships between d...,Studies based on bibliometric analysis general...,Information Processing & Management,New trends in bibliometric APIs: A comparative...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1VuXbnOJISyoGF...,Fig. 1.Main process of bibliometric analysis.,Flowdiagram,Durga Srikari Maguluri,0.346813,0.746439
1,1,10.jpg,The diagram shows the relationships between d...,"In Fig. 3, the available extension options are...",Information Processing & Management,New trends in bibliometric APIs: A comparative...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1sf1PXV5FpvL7a...,Fig. 3. Diagram of extensions through identifi...,Diagram,Durga Srikari Maguluri,0.32848,0.801316
2,2,100.png,This table shows the ablation study results. ...,"As shown in Fig. 1, we take a two-step approac...",Information Processing & Management,Do you see what I see? Images of the COVID-19 ...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1dDh19kgXigFOE...,Fig. 1. Overview of the data collection and an...,Flowdiagram,Durga Srikari Maguluri,0.394983,0.777171
3,3,101.png,"As can be seen from the table, when the numbe...",We first analyzed how similar our datasets are...,Information Processing & Management,Is my stance the same as your stance? A cross ...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1Ww4A9RYbwTE4n...,Fig. 1. Similarity across dataset through bag-...,Diagram,Durga Srikari Maguluri,0.323956,0.832563
4,4,102.png,The table shows the effectiveness of the prop...,"Information anxiety, being a cluster of negati...",Information Processing & Management,From information seeking to information avoida...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1qT4mcyKvdEhVd...,Fig. 3. Proposed Research Model based on the S...,Flow Diagram,Durga Srikari Maguluri,0.168728,0.796254
5,5,103.jpg,The diagram shows different types of web anal...,"Overall, our approach provides a comprehensive...",Information Processing & Management,Unveiling the dynamics of crisis events: Senti...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1K53xFNW4nb9zc...,Fig. 1. Workflow diagram of the proposed appro...,Flowchart,Durga Srikari Maguluri,0.02902,0.85882


### Bert Score

In [None]:
import pandas as pd
from bert_score import score

def calculate_bert_score(ai_summary, author_summary):
    # Compute BERTScore for the summaries
    P, R, F1 = score([ai_summary], [author_summary], lang='en', verbose=False)

    # Extract the F1 score (you can use other scores like P or R as needed)
    bert_score = F1.item()

    return bert_score

# Apply calculate_bert_score function to DataFrame
final_data['BertScore'] = final_data.apply(lambda row: calculate_bert_score(row['AI generated Summary'], row['Author Summary']), axis=1)

# Display DataFrame with BertScore values
final_data.head(5)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/482 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['ro

Unnamed: 0,index,Image Name,AI generated Summary,Author Summary,Journal,Research Paper Name,Research Paper Link,Image Link,Image Caption,Diagram(Flowdiagram/ Table),Team Member,Cosine_Similarity Score,ROUGE-WE,BertScore
0,0,1.jpg,The diagram shows the relationships between d...,Studies based on bibliometric analysis general...,Information Processing & Management,New trends in bibliometric APIs: A comparative...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1VuXbnOJISyoGF...,Fig. 1.Main process of bibliometric analysis.,Flowdiagram,Durga Srikari Maguluri,0.346813,0.746439,0.797696
1,1,10.jpg,The diagram shows the relationships between d...,"In Fig. 3, the available extension options are...",Information Processing & Management,New trends in bibliometric APIs: A comparative...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1sf1PXV5FpvL7a...,Fig. 3. Diagram of extensions through identifi...,Diagram,Durga Srikari Maguluri,0.32848,0.801316,0.80691
2,2,100.png,This table shows the ablation study results. ...,"As shown in Fig. 1, we take a two-step approac...",Information Processing & Management,Do you see what I see? Images of the COVID-19 ...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1dDh19kgXigFOE...,Fig. 1. Overview of the data collection and an...,Flowdiagram,Durga Srikari Maguluri,0.394983,0.777171,0.798874
3,3,101.png,"As can be seen from the table, when the numbe...",We first analyzed how similar our datasets are...,Information Processing & Management,Is my stance the same as your stance? A cross ...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1Ww4A9RYbwTE4n...,Fig. 1. Similarity across dataset through bag-...,Diagram,Durga Srikari Maguluri,0.323956,0.832563,0.814086
4,4,102.png,The table shows the effectiveness of the prop...,"Information anxiety, being a cluster of negati...",Information Processing & Management,From information seeking to information avoida...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1qT4mcyKvdEhVd...,Fig. 3. Proposed Research Model based on the S...,Flow Diagram,Durga Srikari Maguluri,0.168728,0.796254,0.813785


### Bleu Score

In [None]:
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
import pandas as pd

def calculate_bleu_score(ai_summary, author_summary):
    # Tokenize the summaries into lists of words
    ai_tokens = ai_summary.split()
    author_tokens = author_summary.split()

    # Check if both summaries are non-empty
    if not ai_tokens or not author_tokens:
        return 0.0  # Return zero BLEU score for empty summaries

    # Use SmoothingFunction with Chen-Cherry method for BLEU score calculation
    smoothing = SmoothingFunction()
    # Calculate BLEU score with unigram (1-gram) precision, and Chen-Cherry smoothing
    bleu_score = sentence_bleu([author_tokens], ai_tokens, weights=(1,), smoothing_function=smoothing.method7)

    return bleu_score

# Apply calculate_bleu_score function to DataFrame
final_data['BLEU Score'] = final_data.apply(lambda row: calculate_bleu_score(row['AI generated Summary'], row['Author Summary']), axis=1)

# Display DataFrame with BLEU Score values
final_data.head(5)

Unnamed: 0,index,Image Name,AI generated Summary,Author Summary,Journal,Research Paper Name,Research Paper Link,Image Link,Image Caption,Diagram(Flowdiagram/ Table),Team Member,Cosine_Similarity Score,ROUGE-WE,BertScore,BLEU Score
0,0,1.jpg,The diagram shows the relationships between d...,Studies based on bibliometric analysis general...,Information Processing & Management,New trends in bibliometric APIs: A comparative...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1VuXbnOJISyoGF...,Fig. 1.Main process of bibliometric analysis.,Flowdiagram,Durga Srikari Maguluri,0.346813,0.746439,0.797696,0.243913
1,1,10.jpg,The diagram shows the relationships between d...,"In Fig. 3, the available extension options are...",Information Processing & Management,New trends in bibliometric APIs: A comparative...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1sf1PXV5FpvL7a...,Fig. 3. Diagram of extensions through identifi...,Diagram,Durga Srikari Maguluri,0.32848,0.801316,0.80691,0.459963
2,2,100.png,This table shows the ablation study results. ...,"As shown in Fig. 1, we take a two-step approac...",Information Processing & Management,Do you see what I see? Images of the COVID-19 ...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1dDh19kgXigFOE...,Fig. 1. Overview of the data collection and an...,Flowdiagram,Durga Srikari Maguluri,0.394983,0.777171,0.798874,0.059817
3,3,101.png,"As can be seen from the table, when the numbe...",We first analyzed how similar our datasets are...,Information Processing & Management,Is my stance the same as your stance? A cross ...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1Ww4A9RYbwTE4n...,Fig. 1. Similarity across dataset through bag-...,Diagram,Durga Srikari Maguluri,0.323956,0.832563,0.814086,0.002991
4,4,102.png,The table shows the effectiveness of the prop...,"Information anxiety, being a cluster of negati...",Information Processing & Management,From information seeking to information avoida...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1qT4mcyKvdEhVd...,Fig. 3. Proposed Research Model based on the S...,Flow Diagram,Durga Srikari Maguluri,0.168728,0.796254,0.813785,0.013769


### Meteor Score

In [None]:
import pandas as pd
import re

def calculate_meteor_score(generated_summary, reference_summary):
    """
    Calculates the METEOR score between a generated summary and a reference summary.

    Args:
        generated_summary (str): The generated summary.
        reference_summary (str): The reference summary.

    Returns:
        float: The METEOR score between 0 and 1.
    """
    # Preprocess text by removing punctuation and converting to lowercase
    generated_summary = re.sub(r"[^\w\s]", "", generated_summary.lower())
    reference_summary = re.sub(r"[^\w\s]", "", reference_summary.lower())

    # Split sentences into word lists
    generated_words = generated_summary.split()
    reference_words = reference_summary.split()

    # Calculate sentence-level METEOR scores
    meteor_score = 0
    for generated_sentence in generated_words:
        max_overlap = 0
        for reference_sentence in reference_words:
            overlap = min(len(generated_sentence), len(reference_sentence)) - (
                len(generated_sentence) - len(set(generated_sentence).intersection(reference_sentence))
            )
            max_overlap = max(max_overlap, overlap)
        meteor_score += max_overlap / len(generated_sentence)

    # Calculate final METEOR score (average across sentences)
    return meteor_score / len(generated_words)

# Add new column to store METEOR scores
final_data["METEOR Score"] = None

# Calculate METEOR score for each pair of summaries
for index, row in final_data.iterrows():
    generated_summary = row["AI generated Summary"]
    reference_summary = row["Author Summary"]
    meteor_score_value = calculate_meteor_score(generated_summary, reference_summary)
    final_data.at[index, "METEOR Score"] = meteor_score_value


# Print the dataframe with the new column
final_data.head(4)

Unnamed: 0,index,Image Name,AI generated Summary,Author Summary,Journal,Research Paper Name,Research Paper Link,Image Link,Image Caption,Diagram(Flowdiagram/ Table),Team Member,Cosine_Similarity Score,ROUGE-WE,BertScore,BLEU Score,METEOR Score
0,0,1.jpg,The diagram shows the relationships between d...,Studies based on bibliometric analysis general...,Information Processing & Management,New trends in bibliometric APIs: A comparative...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1VuXbnOJISyoGF...,Fig. 1.Main process of bibliometric analysis.,Flowdiagram,Durga Srikari Maguluri,0.346813,0.746439,0.797696,0.243913,0.84486
1,1,10.jpg,The diagram shows the relationships between d...,"In Fig. 3, the available extension options are...",Information Processing & Management,New trends in bibliometric APIs: A comparative...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1sf1PXV5FpvL7a...,Fig. 3. Diagram of extensions through identifi...,Diagram,Durga Srikari Maguluri,0.32848,0.801316,0.80691,0.459963,0.791617
2,2,100.png,This table shows the ablation study results. ...,"As shown in Fig. 1, we take a two-step approac...",Information Processing & Management,Do you see what I see? Images of the COVID-19 ...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1dDh19kgXigFOE...,Fig. 1. Overview of the data collection and an...,Flowdiagram,Durga Srikari Maguluri,0.394983,0.777171,0.798874,0.059817,0.831614
3,3,101.png,"As can be seen from the table, when the numbe...",We first analyzed how similar our datasets are...,Information Processing & Management,Is my stance the same as your stance? A cross ...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1Ww4A9RYbwTE4n...,Fig. 1. Similarity across dataset through bag-...,Diagram,Durga Srikari Maguluri,0.323956,0.832563,0.814086,0.002991,0.907764


In [None]:
#Convert the last five columns into percentages with '%' symbol and two decimal points
final_data['Cosine_Similarity Score'] = final_data['Cosine_Similarity Score'].map(lambda x: '{:.2f}%'.format(x * 100))
final_data['ROUGE-WE'] = final_data['ROUGE-WE'].map(lambda x: '{:.2f}%'.format(x * 100))
final_data['BertScore'] = final_data['BertScore'].map(lambda x: '{:.2f}%'.format(x * 100))
final_data['BLEU Score'] = final_data['BLEU Score'].map(lambda x: '{:.2f}%'.format(x * 100))
final_data['METEOR Score'] = final_data['METEOR Score'].map(lambda x: '{:.2f}%'.format(x * 100))

# Display the updated DataFrame
final_data.head(4)

Unnamed: 0,index,Image Name,AI generated Summary,Author Summary,Journal,Research Paper Name,Research Paper Link,Image Link,Image Caption,Diagram(Flowdiagram/ Table),Team Member,Cosine_Similarity Score,ROUGE-WE,BertScore,BLEU Score,METEOR Score
0,0,1.jpg,The diagram shows the relationships between d...,Studies based on bibliometric analysis general...,Information Processing & Management,New trends in bibliometric APIs: A comparative...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1VuXbnOJISyoGF...,Fig. 1.Main process of bibliometric analysis.,Flowdiagram,Durga Srikari Maguluri,34.68%,74.64%,79.77%,24.39%,84.49%
1,1,10.jpg,The diagram shows the relationships between d...,"In Fig. 3, the available extension options are...",Information Processing & Management,New trends in bibliometric APIs: A comparative...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1sf1PXV5FpvL7a...,Fig. 3. Diagram of extensions through identifi...,Diagram,Durga Srikari Maguluri,32.85%,80.13%,80.69%,46.00%,79.16%
2,2,100.png,This table shows the ablation study results. ...,"As shown in Fig. 1, we take a two-step approac...",Information Processing & Management,Do you see what I see? Images of the COVID-19 ...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1dDh19kgXigFOE...,Fig. 1. Overview of the data collection and an...,Flowdiagram,Durga Srikari Maguluri,39.50%,77.72%,79.89%,5.98%,83.16%
3,3,101.png,"As can be seen from the table, when the numbe...",We first analyzed how similar our datasets are...,Information Processing & Management,Is my stance the same as your stance? A cross ...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1Ww4A9RYbwTE4n...,Fig. 1. Similarity across dataset through bag-...,Diagram,Durga Srikari Maguluri,32.40%,83.26%,81.41%,0.30%,90.78%


In [None]:
# Save the DataFrame to a CSV file
final_data.to_csv('journal_images_summary_200_durga_srikari.csv', index=False)

### Table with Statistics vs Table with text Summary

In [None]:
# Main source code to generate AI summary for each image

import os
import pandas as pd
import zipfile
from PIL import Image
import google.generativeai as genai

def generate_image_summaries_to_dataframe(zip_file_path):
  """Generates summaries for images within a zip folder and creates a Pandas DataFrame.

  Args:
      zip_file_path (str): Path to the zip folder containing images.

  Returns:
      pd.DataFrame: DataFrame with columns 'Image Name' and 'Summary'.
  """

  data = []
  with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    for image_file in zip_ref.namelist():
      if image_file.lower().endswith(('.jpg', '.jpeg', '.png')):
        # Extract image from zip
        zip_ref.extract(image_file)

        # Load image
        img = Image.open(image_file)

        # Generate summary using GenAI (assuming model is loaded)
        model = genai.GenerativeModel('gemini-pro-vision')
        response = model.generate_content(img)
        #summary = response.text
        if response.parts:
          summary = response.text
        else:
          # Handle invalid response
          summary = "Error: Invalid response from model"

        # Append data for DataFrame
        data.append({'Image Name': image_file, 'AI generated Summary': summary})

        # Delete extracted image
        os.remove(image_file)

  # Create DataFrame
  df = pd.DataFrame(data)
  return df

# Example usage
zip_file_path = 'stats_text_images.zip'
# Replace with your zip file path
text_stat_comp = generate_image_summaries_to_dataframe(zip_file_path)
text_stat_comp.head(6)

Unnamed: 0,Image Name,AI generated Summary
0,1.jpg,The table shows the results of a sentiment an...
1,10.png,The table above shows the different emotions ...
2,100.png,The table shows the results of different meth...
3,101.png,The table shows the performance of different ...
4,102.png,| Dataset | Type | Instances | Dimensions | C...
5,103.png,| Variable|BOC|COM|CVI|IVL|KA|KC|MEV|PSI|REF|...


In [None]:
#Reading author summary example file
Stats_text_comp = pd.read_csv('Stats_text_comp.csv')

#Joining Author summary Example csv with AI generated summary Examples
text_stat_comp_1 = text_stat_comp.reset_index()
Stats_text_comp_final = text_stat_comp_1.merge(Stats_text_comp, on = 'index')
Stats_text_comp_final.head(4)

Unnamed: 0,index,Image Name,AI generated Summary,Author Summary,Journal,Research Paper Name,Research Paper Link,Image Link,Image Caption,Diagram(Flowdiagram/ Table),Team Member
0,0,1.jpg,The table shows the results of a sentiment an...,We observe that the model exhibits the capabil...,Information Processing & Management,Unveiling the dynamics of crisis events: Senti...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1vpPr6w67B07k_...,Table 9. Visualization of attention words from...,Table,Durga Srikari Maguluri
1,1,10.png,The table above shows the different emotions ...,"For the first two groups, participants were as...",Information Processing & Management,The rationality of explanation or human capaci...,The rationality of explanation or human capaci...,https://drive.google.com/file/d/1UXungZ6VgBgoL...,Table 3–1. Method for hypotheses testing.,Table,Durga Srikari Maguluri
2,2,100.png,The table shows the results of different meth...,To distinguish the data analysis capacity amon...,Information Processing & Management,The rationality of explanation or human capaci...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1UXungZ6VgBgoL...,Fig. 3–4. : A screenshot of the sales plot.,Diagram,Durga Srikari Maguluri
3,3,101.png,The table shows the performance of different ...,Before the data analysis for checking the effe...,Information Processing & Management,The rationality of explanation or human capaci...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1ferxa9PrgMIkT...,"Table 3–2. Differences in trust, reliance, tim...",Table,Durga Srikari Maguluri


Basic Cosine Similarity

In [None]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer

def preprocess_text(text):
  """Preprocesses text for similarity comparison."""
  text = text.lower()
  text = ''.join([c for c in text if c.isalnum() or c.isspace()])  # Remove punctuation
  stop_words = stopwords.words('english')
  words = [w for w in text.split() if w not in stop_words]
  stemmer = PorterStemmer()
  stemmed_words = [stemmer.stem(w) for w in words]
  return stemmed_words

def calculate_similarity(summary1, summary2, metric='cosine'):
  """Calculates similarity between summaries using chosen metric."""
  if metric == 'cosine':
    vectorizer = TfidfVectorizer()
    vectors = vectorizer.fit_transform([summary1, summary2])
    return vectors.toarray().dot(vectors.toarray().T)[0, 1]
  elif metric == 'jaccard':
    summary1_words = set(preprocess_text(summary1))
    summary2_words = set(preprocess_text(summary2))
    intersection = len(summary1_words.intersection(summary2_words))
    union = len(summary1_words.union(summary2_words))
    return intersection / union if union else 0
  else:
    raise ValueError("Invalid metric. Choose 'cosine' or 'jaccard'.")

# Choose similarity metric (cosine or jaccard)
metric = 'cosine'

Stats_text_comp_final['Cosine_Similarity Score'] = Stats_text_comp_final.apply(lambda row: calculate_similarity(row['AI generated Summary'], row['Author Summary'], metric), axis=1)

Stats_text_comp_final.head(5)

Unnamed: 0,index,Image Name,AI generated Summary,Author Summary,Journal,Research Paper Name,Research Paper Link,Image Link,Image Caption,Diagram(Flowdiagram/ Table),Team Member,Cosine_Similarity Score
0,0,1.jpg,The table shows the results of a sentiment an...,We observe that the model exhibits the capabil...,Information Processing & Management,Unveiling the dynamics of crisis events: Senti...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1vpPr6w67B07k_...,Table 9. Visualization of attention words from...,Table,Durga Srikari Maguluri,0.633946
1,1,10.png,The table above shows the different emotions ...,"For the first two groups, participants were as...",Information Processing & Management,The rationality of explanation or human capaci...,The rationality of explanation or human capaci...,https://drive.google.com/file/d/1UXungZ6VgBgoL...,Table 3–1. Method for hypotheses testing.,Table,Durga Srikari Maguluri,0.219453
2,2,100.png,The table shows the results of different meth...,To distinguish the data analysis capacity amon...,Information Processing & Management,The rationality of explanation or human capaci...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1UXungZ6VgBgoL...,Fig. 3–4. : A screenshot of the sales plot.,Diagram,Durga Srikari Maguluri,0.359896
3,3,101.png,The table shows the performance of different ...,Before the data analysis for checking the effe...,Information Processing & Management,The rationality of explanation or human capaci...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1ferxa9PrgMIkT...,"Table 3–2. Differences in trust, reliance, tim...",Table,Durga Srikari Maguluri,0.294083
4,4,102.png,| Dataset | Type | Instances | Dimensions | C...,"is a matrix. , , , and denote its th row, ()t...",Information Processing & Management,Adaptive orthogonal semi-supervised feature se...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1YkTXX9ZpDD7nT...,Table 2 shows the notations used in this paper,Table,Durga Srikari Maguluri,0.0


### Rouge-WE

In [None]:
import spacy
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

def calculate_rouge_we(ai_summary, author_summary):
    # Tokenize and process AI-generated summary
    ai_doc = nlp(ai_summary)
    ai_tokens = [token for token in ai_doc if not token.is_stop]

    # Tokenize and process Author summary
    author_doc = nlp(author_summary)
    author_tokens = [token for token in author_doc if not token.is_stop]

    # Calculate word embeddings for AI-generated summary
    ai_vec = np.mean([token.vector for token in ai_tokens], axis=0).reshape(1, -1)

    # Calculate word embeddings for Author summary
    author_vec = np.mean([token.vector for token in author_tokens], axis=0).reshape(1, -1)

    # Compute cosine similarity between word embeddings
    similarity_score = cosine_similarity(ai_vec, author_vec)[0][0]

    return similarity_score

# Apply calculate_rouge_we function to DataFrame
Stats_text_comp_final['ROUGE-WE'] = Stats_text_comp_final.apply(lambda row: calculate_rouge_we(row['AI generated Summary'], row['Author Summary']), axis=1)

# Display DataFrame with ROUGE-WE scores
Stats_text_comp_final.head(6)

Unnamed: 0,index,Image Name,AI generated Summary,Author Summary,Journal,Research Paper Name,Research Paper Link,Image Link,Image Caption,Diagram(Flowdiagram/ Table),Team Member,Cosine_Similarity Score,ROUGE-WE
0,0,1.jpg,The table shows the results of a sentiment an...,We observe that the model exhibits the capabil...,Information Processing & Management,Unveiling the dynamics of crisis events: Senti...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1vpPr6w67B07k_...,Table 9. Visualization of attention words from...,Table,Durga Srikari Maguluri,0.633946,0.738772
1,1,10.png,The table above shows the different emotions ...,"For the first two groups, participants were as...",Information Processing & Management,The rationality of explanation or human capaci...,The rationality of explanation or human capaci...,https://drive.google.com/file/d/1UXungZ6VgBgoL...,Table 3–1. Method for hypotheses testing.,Table,Durga Srikari Maguluri,0.219453,0.518655
2,2,100.png,The table shows the results of different meth...,To distinguish the data analysis capacity amon...,Information Processing & Management,The rationality of explanation or human capaci...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1UXungZ6VgBgoL...,Fig. 3–4. : A screenshot of the sales plot.,Diagram,Durga Srikari Maguluri,0.359896,0.747077
3,3,101.png,The table shows the performance of different ...,Before the data analysis for checking the effe...,Information Processing & Management,The rationality of explanation or human capaci...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1ferxa9PrgMIkT...,"Table 3–2. Differences in trust, reliance, tim...",Table,Durga Srikari Maguluri,0.294083,0.803718
4,4,102.png,| Dataset | Type | Instances | Dimensions | C...,"is a matrix. , , , and denote its th row, ()t...",Information Processing & Management,Adaptive orthogonal semi-supervised feature se...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1YkTXX9ZpDD7nT...,Table 2 shows the notations used in this paper,Table,Durga Srikari Maguluri,0.0,-0.082298
5,5,103.png,| Variable|BOC|COM|CVI|IVL|KA|KC|MEV|PSI|REF|...,Step A: Set the uncertainty interval of the pr...,Information Processing & Management,Inconsistency elimination of multi-source info...,Inconsistency elimination of multi-source info...,https://drive.google.com/file/d/1v5TPVhG7kLOzO...,Algorithm 2. The CoP indicator calculation alg...,Table,Durga Srikari Maguluri,0.0,0.116254


### Bert Score

In [None]:
import pandas as pd
from bert_score import score

def calculate_bert_score(ai_summary, author_summary):
    # Compute BERTScore for the summaries
    P, R, F1 = score([ai_summary], [author_summary], lang='en', verbose=False)

    # Extract the F1 score (you can use other scores like P or R as needed)
    bert_score = F1.item()

    return bert_score

# Apply calculate_bert_score function to DataFrame
Stats_text_comp_final['BertScore'] = Stats_text_comp_final.apply(lambda row: calculate_bert_score(row['AI generated Summary'], row['Author Summary']), axis=1)

# Display DataFrame with BertScore values
Stats_text_comp_final.head(5)

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['ro

Unnamed: 0,index,Image Name,AI generated Summary,Author Summary,Journal,Research Paper Name,Research Paper Link,Image Link,Image Caption,Diagram(Flowdiagram/ Table),Team Member,Cosine_Similarity Score,ROUGE-WE,BertScore
0,0,1.jpg,The table shows the results of a sentiment an...,We observe that the model exhibits the capabil...,Information Processing & Management,Unveiling the dynamics of crisis events: Senti...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1vpPr6w67B07k_...,Table 9. Visualization of attention words from...,Table,Durga Srikari Maguluri,0.633946,0.738772,0.843621
1,1,10.png,The table above shows the different emotions ...,"For the first two groups, participants were as...",Information Processing & Management,The rationality of explanation or human capaci...,The rationality of explanation or human capaci...,https://drive.google.com/file/d/1UXungZ6VgBgoL...,Table 3–1. Method for hypotheses testing.,Table,Durga Srikari Maguluri,0.219453,0.518655,0.810113
2,2,100.png,The table shows the results of different meth...,To distinguish the data analysis capacity amon...,Information Processing & Management,The rationality of explanation or human capaci...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1UXungZ6VgBgoL...,Fig. 3–4. : A screenshot of the sales plot.,Diagram,Durga Srikari Maguluri,0.359896,0.747077,0.814545
3,3,101.png,The table shows the performance of different ...,Before the data analysis for checking the effe...,Information Processing & Management,The rationality of explanation or human capaci...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1ferxa9PrgMIkT...,"Table 3–2. Differences in trust, reliance, tim...",Table,Durga Srikari Maguluri,0.294083,0.803718,0.814818
4,4,102.png,| Dataset | Type | Instances | Dimensions | C...,"is a matrix. , , , and denote its th row, ()t...",Information Processing & Management,Adaptive orthogonal semi-supervised feature se...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1YkTXX9ZpDD7nT...,Table 2 shows the notations used in this paper,Table,Durga Srikari Maguluri,0.0,-0.082298,0.751198


### Bleu

In [None]:
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
import pandas as pd

def calculate_bleu_score(ai_summary, author_summary):
    # Tokenize the summaries into lists of words
    ai_tokens = ai_summary.split()
    author_tokens = author_summary.split()

    # Check if both summaries are non-empty
    if not ai_tokens or not author_tokens:
        return 0.0  # Return zero BLEU score for empty summaries

    # Use SmoothingFunction with Chen-Cherry method for BLEU score calculation
    smoothing = SmoothingFunction()
    # Calculate BLEU score with unigram (1-gram) precision, and Chen-Cherry smoothing
    bleu_score = sentence_bleu([author_tokens], ai_tokens, weights=(1,), smoothing_function=smoothing.method7)

    return bleu_score

# Apply calculate_bleu_score function to DataFrame
Stats_text_comp_final['BLEU Score'] = Stats_text_comp_final.apply(lambda row: calculate_bleu_score(row['AI generated Summary'], row['Author Summary']), axis=1)

# Display DataFrame with BLEU Score values
Stats_text_comp_final.head(5)

Unnamed: 0,index,Image Name,AI generated Summary,Author Summary,Journal,Research Paper Name,Research Paper Link,Image Link,Image Caption,Diagram(Flowdiagram/ Table),Team Member,Cosine_Similarity Score,ROUGE-WE,BertScore,BLEU Score
0,0,1.jpg,The table shows the results of a sentiment an...,We observe that the model exhibits the capabil...,Information Processing & Management,Unveiling the dynamics of crisis events: Senti...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1vpPr6w67B07k_...,Table 9. Visualization of attention words from...,Table,Durga Srikari Maguluri,0.633946,0.738772,0.843621,0.479053
1,1,10.png,The table above shows the different emotions ...,"For the first two groups, participants were as...",Information Processing & Management,The rationality of explanation or human capaci...,The rationality of explanation or human capaci...,https://drive.google.com/file/d/1UXungZ6VgBgoL...,Table 3–1. Method for hypotheses testing.,Table,Durga Srikari Maguluri,0.219453,0.518655,0.810113,0.053154
2,2,100.png,The table shows the results of different meth...,To distinguish the data analysis capacity amon...,Information Processing & Management,The rationality of explanation or human capaci...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1UXungZ6VgBgoL...,Fig. 3–4. : A screenshot of the sales plot.,Diagram,Durga Srikari Maguluri,0.359896,0.747077,0.814545,0.003248
3,3,101.png,The table shows the performance of different ...,Before the data analysis for checking the effe...,Information Processing & Management,The rationality of explanation or human capaci...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1ferxa9PrgMIkT...,"Table 3–2. Differences in trust, reliance, tim...",Table,Durga Srikari Maguluri,0.294083,0.803718,0.814818,0.455285
4,4,102.png,| Dataset | Type | Instances | Dimensions | C...,"is a matrix. , , , and denote its th row, ()t...",Information Processing & Management,Adaptive orthogonal semi-supervised feature se...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1YkTXX9ZpDD7nT...,Table 2 shows the notations used in this paper,Table,Durga Srikari Maguluri,0.0,-0.082298,0.751198,0.0


### Meteor Score

In [None]:
import pandas as pd
import re

def calculate_meteor_score(generated_summary, reference_summary):
    """
    Calculates the METEOR score between a generated summary and a reference summary.

    Args:
        generated_summary (str): The generated summary.
        reference_summary (str): The reference summary.

    Returns:
        float: The METEOR score between 0 and 1.
    """
    # Preprocess text by removing punctuation and converting to lowercase
    generated_summary = re.sub(r"[^\w\s]", "", generated_summary.lower())
    reference_summary = re.sub(r"[^\w\s]", "", reference_summary.lower())

    # Split sentences into word lists
    generated_words = generated_summary.split()
    reference_words = reference_summary.split()

    # Calculate sentence-level METEOR scores
    meteor_score = 0
    for generated_sentence in generated_words:
        max_overlap = 0
        for reference_sentence in reference_words:
            overlap = min(len(generated_sentence), len(reference_sentence)) - (
                len(generated_sentence) - len(set(generated_sentence).intersection(reference_sentence))
            )
            max_overlap = max(max_overlap, overlap)
        meteor_score += max_overlap / len(generated_sentence)

    # Calculate final METEOR score (average across sentences)
    return meteor_score / len(generated_words)

# Add new column to store METEOR scores
Stats_text_comp_final["METEOR Score"] = None

# Calculate METEOR score for each pair of summaries
for index, row in Stats_text_comp_final.iterrows():
    generated_summary = row["AI generated Summary"]
    reference_summary = row["Author Summary"]
    meteor_score_value = calculate_meteor_score(generated_summary, reference_summary)
    Stats_text_comp_final.at[index, "METEOR Score"] = meteor_score_value


# Print the dataframe with the new column
Stats_text_comp_final.head(4)


Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.




Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.




Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.




Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.



Unnamed: 0,index,Image Name,AI generated Summary,Author Summary,Journal,Research Paper Name,Research Paper Link,Image Link,Image Caption,Diagram(Flowdiagram/ Table),Team Member,Cosine_Similarity Score,ROUGE-WE,BertScore,BLEU Score,METEOR Score
0,0,1.jpg,The table shows the results of a sentiment an...,We observe that the model exhibits the capabil...,Information Processing & Management,Unveiling the dynamics of crisis events: Senti...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1vpPr6w67B07k_...,Table 9. Visualization of attention words from...,Table,Durga Srikari Maguluri,0.633946,0.738772,0.843621,0.479053,0.871839
1,1,10.png,The table above shows the different emotions ...,"For the first two groups, participants were as...",Information Processing & Management,The rationality of explanation or human capaci...,The rationality of explanation or human capaci...,https://drive.google.com/file/d/1UXungZ6VgBgoL...,Table 3–1. Method for hypotheses testing.,Table,Durga Srikari Maguluri,0.219453,0.518655,0.810113,0.053154,0.82769
2,2,100.png,The table shows the results of different meth...,To distinguish the data analysis capacity amon...,Information Processing & Management,The rationality of explanation or human capaci...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1UXungZ6VgBgoL...,Fig. 3–4. : A screenshot of the sales plot.,Diagram,Durga Srikari Maguluri,0.359896,0.747077,0.814545,0.003248,0.7967
3,3,101.png,The table shows the performance of different ...,Before the data analysis for checking the effe...,Information Processing & Management,The rationality of explanation or human capaci...,https://www.sciencedirect.com/science/article/...,https://drive.google.com/file/d/1ferxa9PrgMIkT...,"Table 3–2. Differences in trust, reliance, tim...",Table,Durga Srikari Maguluri,0.294083,0.803718,0.814818,0.455285,0.665332


In [None]:
# Save the DataFrame to a CSV file
Stats_text_comp_final.to_csv('tables_stats_comparision_115_durga_srikari.csv', index=False)
