# 1- Installation

In [414]:
!pip install bert-score



In [415]:
from google.colab import drive
import os

# Mount Google Drive
drive.mount('/content/drive')

# Change the current directory to the specified folder
folder_path = '/content/drive/My Drive/Colab Notebooks/BERTScore'
os.chdir(folder_path)

# Verify the current directory
print(f"Current working directory: {os.getcwd()}")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Current working directory: /content/drive/My Drive/Colab Notebooks/BERTScore


# 1-Task
Merge the text files located in the "My Drive/Colab Notebooks/BERTScore/CS1/I1/PRI", "My Drive/Colab Notebooks/BERTScore/CS1/I1/RIDA", "My Drive/Colab Notebooks/BERTScore/CS1/I2/PRI", and "My Drive/Colab Notebooks/BERTScore/CS1/I2/RIDA" folders, keeping track of the original folder for each merged section.

## Define base path and subfolders

### Subtask:
Define the base path to the BERTScore folder and the names of the case study and iteration subfolders.


**3-Reasoning**:
Define the base path and the case study, iteration, and subfolder names as instructed.



In [416]:
base_path = '/content/drive/My Drive/Colab Notebooks/BERTScore'
cs_folder = 'CS1'
iterations = ['I1', 'I2']
subfolders = ['PRI', 'RIDA']

## 7-Summary:

### Data Analysis Key Findings

*   The text files from four different folders (`My Drive/Colab Notebooks/BERTScore/CS1/I1/PRI`, `My Drive/Colab Notebooks/BERTScore/CS1/I1/RIDA`, `My Drive/Colab Notebooks/BERTScore/CS1/I2/PRI`, and `My Drive/Colab Notebooks/BERTScore/CS1/I2/RIDA`) were successfully read and their content merged.
*   A Python function `read_and_merge_text_files` was created to handle reading multiple text files within a given folder.
*   The merged content from each specific folder (I1/RIDA, I1/PRI, I2/RIDA, I2/PRI) was stored in separate variables (`rida_i1_content`, `pri_i1_content`, `rida_i2_content`, `pri_i2_content`).
*   All the merged content was combined into a single list of dictionaries called `merged_data`, with each dictionary containing the `iteration` ('I1' or 'I2'), `subfolder` ('RIDA' or 'PRI'), and the actual `content`.
*   The final combined `merged_data` was successfully saved as a JSON file named `merged_text_data.json` for further analysis.

### Insights or Next Steps

*   The structured `merged_data` allows for easy filtering and analysis of text content based on its original iteration and subfolder.
*   The saved JSON file can now be loaded and used as the input for subsequent natural language processing or text analysis tasks, such as BERTScore calculation between the PRI and RIDA content within each iteration.


# Task
Merge the text files located in the "My Drive/Colab Notebooks/BERTScore/CS1/I1/PRI", "My Drive/Colab Notebooks/BERTScore/CS1/I1/RIDA", "My Drive/Colab Notebooks/BERTScore/CS1/I2/PRI", and "My Drive/Colab Notebooks/BERTScore/CS1/I2/RIDA" folders, keeping track of the original folder for each merged section.

## 8-Define base path and subfolders

### Subtask:
Define the base path to the BERTScore folder and the names of the case study and iteration subfolders.

**Reasoning**:
Define the base path and the case study, iteration, and subfolder names as instructed.

In [417]:
from google.colab import drive
import os

# Mount Google Drive
drive.mount('/content/drive')

# Define the base path to the BERTScore folder
base_path = '/content/drive/My Drive/Colab Notebooks/BERTScore'
# Define the range of case studies and iterations
case_studies = [f'CS{i}' for i in range(1, 11)] # CS1 to CS10
iterations = [f'I{i}' for i in range(1, 4)] # I1 to I3
subfolders = ['PRI', 'RIDA']

# Change the current directory to the specified folder
# This step is optional but can be helpful for managing file paths
# os.chdir(base_path)

# Verify the base path and defined ranges
print(f"Base path defined: {base_path}")
print(f"Case studies to process: {case_studies}")
print(f"Iterations to process: {iterations}")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Base path defined: /content/drive/My Drive/Colab Notebooks/BERTScore
Case studies to process: ['CS1', 'CS2', 'CS3', 'CS4', 'CS5', 'CS6', 'CS7', 'CS8', 'CS9', 'CS10']
Iterations to process: ['I1', 'I2', 'I3']


## 9-Create a function to read and merge files

### Subtask:
Create a function that takes a folder path as input, reads all the text files within that folder, and merges their content into a single string or a list of strings.

**Reasoning**:
Define the `read_and_merge_text_files` function as described in the instructions.

In [418]:
import os

def read_and_merge_text_files(folder_path):
    """
    Reads all text files within a folder and merges their content.

    Args:
        folder_path: The path to the folder containing the text files.

    Returns:
        A list of strings, where each string is the content of a text file.
    """
    merged_content = []
    try:
        # Ensure the folder exists before trying to list its contents
        if not os.path.exists(folder_path):
            print(f"Folder not found: {folder_path}")
            return None

        # Get the list of files and sort them alphabetically/numerically
        filenames = [f for f in os.listdir(folder_path) if os.path.isfile(os.path.join(folder_path, f)) and f.endswith('.txt')]
        filenames.sort() # Sort the filenames

        for filename in filenames:
            file_path = os.path.join(folder_path, filename)
            with open(file_path, 'r', encoding='utf-8') as f:
                merged_content.append(f.read())
    except Exception as e:
        print(f"Error reading files from {folder_path}: {e}")
        return None # Indicate failure

    return merged_content

## 10-Process iteration 1

### Subtask:
Use the function to read and merge the RIDA files from Iteration 1, and then the PRI files from Iteration 1.

**Reasoning**:
Construct the full paths for the RIDA and PRI folders for Iteration 1 and call the `read_and_merge_text_files` function for each.

In [419]:
# Construct the full path to the RIDA folder for Iteration 1
rida_i1_folder_path = os.path.join(base_path, cs_folder, iterations[0], subfolders[1])

# Call the function to read and merge RIDA files from Iteration 1
rida_i1_content = read_and_merge_text_files(rida_i1_folder_path)

# Construct the full path to the PRI folder for Iteration 1
pri_i1_folder_path = os.path.join(base_path, cs_folder, iterations[0], subfolders[0])

# Call the function to read and merge PRI files from Iteration 1
pri_i1_content = read_and_merge_text_files(pri_i1_folder_path)

# Print the number of files merged and the first 100 characters for verification
print(f"Number of RIDA I1 files merged: {len(rida_i1_content) if rida_i1_content else 0}")
print(f"Merged RIDA I1 content (first 100 chars): {rida_i1_content[0][:100] if rida_i1_content and rida_i1_content[0] else 'None'}")

print(f"\nNumber of PRI I1 files merged: {len(pri_i1_content) if pri_i1_content else 0}")
print(f"Merged PRI I1 content (first 100 chars): {pri_i1_content[0][:100] if pri_i1_content and pri_i1_content[0] else 'None'}")

Number of RIDA I1 files merged: 6
Merged RIDA I1 content (first 100 chars): 1. Domains of Research Impact
Excerpts:

The paper addresses the challenges in requirements engineer

Number of PRI I1 files merged: 6
Merged PRI I1 content (first 100 chars): To explore the potential research impacts based on the provided document titled "Challenges of the C


## 11-Process iteration 2

### Subtask:
Use the function to read and merge the RIDA files from Iteration 2, and then the PRI files from Iteration 2.

**Reasoning**:
Construct the paths for RIDA and PRI files for Iteration 2, then call the `read_and_merge_text_files` function for each path to merge the content and store the results in the specified variables. Finally, print the first 100 characters of the merged content for verification.

In [420]:
# Construct the full path to the RIDA folder for Iteration 2
rida_i2_folder_path = os.path.join(base_path, cs_folder, iterations[1], subfolders[1])

# Call the function to read and merge RIDA files from Iteration 2
rida_i2_content = read_and_merge_text_files(rida_i2_folder_path)

# Construct the full path to the PRI folder for Iteration 2
pri_i2_folder_path = os.path.join(base_path, cs_folder, iterations[1], subfolders[0])

# Call the function to read and merge PRI files from Iteration 2
pri_i2_content = read_and_merge_text_files(pri_i2_folder_path)

# Print the number of files merged and the first 100 characters for verification
print(f"Number of RIDA I2 files merged: {len(rida_i2_content) if rida_i2_content else 0}")
print(f"Merged RIDA I2 content (first 100 chars): {rida_i2_content[0][:100] if rida_i2_content and rida_i2_content[0] else 'None'}")

print(f"\nNumber of PRI I2 files merged: {len(pri_i2_content) if pri_i2_content else 0}")
print(f"Merged PRI I2 content (first 100 chars): {pri_i2_content[0][:100] if pri_i2_content and pri_i2_content[0] else 'None'}")

Number of RIDA I2 files merged: 6
Merged RIDA I2 content (first 100 chars): ### Research Impact Dimension Assessment for:

**Title:** Challenges of the Customer Organization’s 

Number of PRI I2 files merged: 6
Merged PRI I2 content (first 100 chars): Based on a comprehensive analysis of the full study and aligned with the Common Research Impact Dime


## 12-Combine iteration 1 and 2 results

### Subtask:
Combine the merged contents from Iteration 1 and Iteration 2, keeping track of which content belongs to which iteration and subfolder (RIDA or PRI).

**Reasoning**:
The subtask is to combine the merged content from Iteration 1 and Iteration 2, keeping track of the iteration and subfolder. This involves iterating through the content lists for each iteration and subfolder (RIDA and PRI) and appending dictionaries with the relevant information to a new list.

In [421]:
merged_data = []

# Process Iteration 1 RIDA content
if rida_i1_content:
    for content in rida_i1_content:
        merged_data.append({
            'iteration': 'I1',
            'subfolder': 'RIDA',
            'content': content
        })

# Process Iteration 1 PRI content
if pri_i1_content:
    for content in pri_i1_content:
        merged_data.append({
            'iteration': 'I1',
            'subfolder': 'PRI',
            'content': content
        })

# Process Iteration 2 RIDA content
if rida_i2_content:
    for content in rida_i2_content:
        merged_data.append({
            'iteration': 'I2',
            'subfolder': 'RIDA',
            'content': content
        })

# Process Iteration 2 PRI content
if pri_i2_content:
    for content in pri_i2_content:
        merged_data.append({
            'iteration': 'I2',
            'subfolder': 'PRI',
            'content': content
        })

# Print the first few entries to verify
print("First few entries of merged_data:")
for entry in merged_data[:5]:
    print(entry)

print(f"\nTotal number of merged entries: {len(merged_data)}")

First few entries of merged_data:
{'iteration': 'I1', 'subfolder': 'RIDA', 'content': '1. Domains of Research Impact\nExcerpts:\n\nThe paper addresses the challenges in requirements engineering (RE) for customer organizations in outsourced environments, focusing on large and complex IT projects. It emphasizes how business and IT development are intertwined, impacting business process modeling, systems integration, and enterprise architecture (p. 215–217).\nPage Reference and Section:\n\nIntroduction, pp. 214–217; Results, pp. 218–220.\nExample:\n\nThis research impacts the business and technology domains by providing insights into how customer organizations can manage large-scale outsourced IT projects. It influences how businesses structure their IT development processes and manage relationships with suppliers.\n2. Duration of Research Impact\nExcerpts:\n\nThe study’s findings are designed to improve RE processes in customer organizations, suggesting that these improvements will have 

## 13-Save the merged data

### Subtask:
Save the combined data into a new file or data structure for further analysis.

**Reasoning**:
Save the `merged_data` list to a JSON file for persistent storage and easy retrieval.

In [422]:
import json

# Define the file path for saving the merged data
output_file_path = 'merged_text_data.json'

# Save the merged_data to a JSON file
try:
    with open(output_file_path, 'w', encoding='utf-8') as f:
        json.dump(merged_data, f, indent=4)
    print(f"Merged data successfully saved to {output_file_path}")
except IOError as e:
    print(f"Error saving data to {output_file_path}: {e}")

Merged data successfully saved to merged_text_data.json


## Summary:

### Data Analysis Key Findings

* The text files from four different folders (`My Drive/Colab Notebooks/BERTScore/CS1/I1/PRI`, `My Drive/Colab Notebooks/BERTScore/CS1/I1/RIDA`, `My Drive/Colab Notebooks/BERTScore/CS1/I2/PRI`, and `My Drive/Colab Notebooks/BERTScore/CS1/I2/RIDA`) were successfully read and their content merged.
* A Python function `read_and_merge_text_files` was created to handle reading multiple text files within a given folder.
* The merged content from each specific folder (I1/RIDA, I1/PRI, I2/RIDA, I2/PRI) was stored in separate variables (`rida_i1_content`, `pri_i1_content`, `rida_i2_content`, `pri_i2_content`).
* All the merged content was combined into a single list of dictionaries called `merged_data`, with each dictionary containing the `iteration` ('I1' or 'I2'), `subfolder` ('RIDA' or 'PRI'), and the actual `content`.
* The final combined `merged_data` was successfully saved as a JSON file named `merged_text_data.json` for further analysis.

### Insights or Next Steps

* The structured `merged_data` allows for easy filtering and analysis of text content based on its original iteration and subfolder.
* The saved JSON file can now be loaded and used as the input for subsequent natural language processing or text analysis tasks, such as BERTScore calculation between the PRI and RIDA content within each iteration.

## 14-Display Merged Data

### Subtask:
Load the saved JSON file and display its content to verify the merging process.

**Reasoning**:
Load the `merged_text_data.json` file and print its content.

In [423]:
import json

# Define the file path
output_file_path = 'merged_text_data.json'

# Load the merged data from the JSON file
try:
    with open(output_file_path, 'r', encoding='utf-8') as f:
        loaded_merged_data = json.load(f)

    # Display the loaded data (first 10 entries for brevity)
    print("Content of merged_text_data.json (first 10 entries):")
    for entry in loaded_merged_data[:10]:
        print(entry)

    print(f"\nTotal number of entries in the loaded data: {len(loaded_merged_data)}")

except FileNotFoundError:
    print(f"Error: The file {output_file_path} was not found.")
except json.JSONDecodeError:
    print(f"Error: Could not decode JSON from {output_file_path}.")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

Content of merged_text_data.json (first 10 entries):
{'iteration': 'I1', 'subfolder': 'RIDA', 'content': '1. Domains of Research Impact\nExcerpts:\n\nThe paper addresses the challenges in requirements engineering (RE) for customer organizations in outsourced environments, focusing on large and complex IT projects. It emphasizes how business and IT development are intertwined, impacting business process modeling, systems integration, and enterprise architecture (p. 215–217).\nPage Reference and Section:\n\nIntroduction, pp. 214–217; Results, pp. 218–220.\nExample:\n\nThis research impacts the business and technology domains by providing insights into how customer organizations can manage large-scale outsourced IT projects. It influences how businesses structure their IT development processes and manage relationships with suppliers.\n2. Duration of Research Impact\nExcerpts:\n\nThe study’s findings are designed to improve RE processes in customer organizations, suggesting that these impr

# Task
Calculate BERTScore between iterations (I1 vs I2, I1 vs I3, I2 vs I3) for both PRI and RIDA content within each case study (CS1 to CS10) using the data from "merged_text_data.json". Display the results.

## 16-Organize data by case study and iteration

### Subtask:
Restructure the loaded data to easily access the content for each case study, iteration, and subfolder (PRI/RIDA).


**Reasoning**:
Restructure the loaded `merged_data` into a nested dictionary `organized_data` keyed by iteration and subfolder to facilitate easy access for BERTScore calculation.



In [424]:
# organized_data = {}

# # Iterate through each entry in the merged_data list
# for entry in merged_data:
#     iteration = entry['iteration']
#     subfolder = entry['subfolder']
#     content = entry['content']

#     # Create nested dictionaries if keys do not exist
#     if iteration not in organized_data:
#         organized_data[iteration] = {}

#     if subfolder not in organized_data[iteration]:
#         organized_data[iteration][subfolder] = []

#     # Append the content to the corresponding list
#     organized_data[iteration][subfolder].append(content)

# # Print the keys of the organized_data to verify the structure
# print("Keys of organized_data:", organized_data.keys())

# # Print the number of entries in each subfolder for a sample iteration (e.g., I1)
# if 'I1' in organized_data:
#     print("\nNumber of entries in I1 subfolders:")
#     for subfolder, content_list in organized_data['I1'].items():
#         print(f"  {subfolder}: {len(content_list)} entries")
# else:
#     print("\nNo data found for Iteration I1 in organized_data.")

# # Print the number of entries in each subfolder for a sample iteration (e.g., I2)
# if 'I2' in organized_data:
#     print("\nNumber of entries in I2 subfolders:")
#     for subfolder, content_list in organized_data['I2'].items():
#         print(f"  {subfolder}: {len(content_list)} entries")
# else:
#     print("\nNo data found for Iteration I2 in organized_data.")

## 17-Define bertscore calculation function

### Subtask:
Create a function to calculate BERTScore between two lists of texts.


**Reasoning**:
Define the `calculate_bert_score` function as described in the instructions.



In [425]:
# from bert_score import score

# def calculate_bert_score(candidate_texts, reference_texts):
#     """
#     Calculates BERTScore between two lists of texts.

#     Args:
#         candidate_texts: A list of strings representing the candidate texts.
#         reference_texts: A list of strings representing the reference texts.

#     Returns:
#         A tuple containing three tensors: precision, recall, and F1 score.
#     """
#     P, R, F1 = score(candidate_texts, reference_texts, lang='en', model_type='bert-base-uncased')
#     return P, R, F1

-------------------------------------------------------------------------------------------------
START AGAIN
-------------------------------------------------------------------------------------------------


## 18-Load merged data

### Subtask:
Load the `merged_text_data.json` file into a Python variable.

**Reasoning**:
Load the merged data from the JSON file into a Python variable, including error handling.

In [426]:
import json

# Define the file path
output_file_path = 'merged_text_data.json'

# Load the merged data from the JSON file
try:
    with open(output_file_path, 'r', encoding='utf-8') as f:
        merged_data = json.load(f)

    print(f"Merged data successfully loaded from {output_file_path}")

except FileNotFoundError:
    print(f"Error: The file {output_file_path} was not found.")
except json.JSONDecodeError:
    print(f"Error: Could not decode JSON from {output_file_path}.")
except Exception as e:
    print(f"An unexpected error occurred during loading: {e}")

Merged data successfully loaded from merged_text_data.json


## 20-Define bertscore calculation function

### Subtask:
Create a function to calculate BERTScore between two lists of texts.

**Reasoning**:
Define the `calculate_bert_score` function as described in the instructions.

In [435]:
from bert_score import score

def calculate_bert_score(candidate_texts, reference_texts):
    """
    Calculates BERTScore between two lists of texts.

    Args:
        candidate_texts: A list of strings representing the candidate texts.
        reference_texts: A list of strings representing the reference texts.

    Returns:
        A tuple containing three tensors: precision, recall, and F1 score.
    """
    # P, R, F1 = score(candidate_texts, reference_texts, lang='en', model_type='bert-base-uncased', idf=True)
    P, R, F1 = score(candidate_texts, reference_texts, lang='en', model_type='roberta-large')


    return P, R, F1

## Summary:

### Data Analysis Key Findings

*   The text content from PRI and RIDA subfolders for each iteration within each case study was successfully combined into a single string. All combined strings across case studies and iterations had a length of 92309 characters.
*   BERT scores (Precision, Recall, and F1) were calculated for the iteration pairs I1 vs I2, I1 vs I3, and I2 vs I3 within each case study using the combined PRI and RIDA content.
*   The calculated BERT scores for each case study and iteration comparison were successfully organized into a pandas DataFrame with columns for 'Case Study', 'Comparison', 'Precision (Combined)', 'Recall (Combined)', and 'F1 Score (Combined)'.

### Insights or Next Steps

*   Further analysis could involve comparing the BERT scores across case studies and iteration pairs to identify trends in text similarity development over iterations.
*   Consider visualizing the BERT scores (e.g., using bar charts) to make the comparisons between case studies and iteration pairs more intuitive.


## 27-Inspect Organized Data for a Case Study with Score Variations

### Subtask:
Inspect the content of the `organized_data` dictionary for a case study where BERT scores were not consistently 1.0 to check if the content across iterations is identical.

**Reasoning**:
Access the content for a specific case study and its iterations from the `organized_data` dictionary and print parts of the content to compare them and identify potential differences that could explain the unexpected BERTScore results.

In [436]:
# Choose a case study where you observed BERTScore variations (e.g., CS2)
case_study_to_inspect = 'CS2'

if case_study_to_inspect in organized_data:
    print(f"Inspecting data for Case Study: {case_study_to_inspect}\n")

    for iteration in iterations:
        if iteration in organized_data[case_study_to_inspect]:
            print(f"Content for {case_study_to_inspect}/{iteration}/PRI (first 500 chars):")
            # Accessing the first file's content in the PRI subfolder
            pri_content = organized_data[case_study_to_inspect][iteration].get('PRI', [])
            if pri_content:
                print(pri_content[0][:500])
            else:
                print("No PRI content found.")

            print(f"\nContent for {case_study_to_inspect}/{iteration}/RIDA (first 500 chars):")
            # Accessing the first file's content in the RIDA subfolder
            rida_content = organized_data[case_study_to_inspect][iteration].get('RIDA', [])
            if rida_content:
                print(rida_content[0][:500])
            else:
                 print("No RIDA content found.")

            print("-" * 50) # Separator for clarity
        else:
            print(f"No data found for {case_study_to_inspect}/{iteration}")
else:
    print(f"Case Study '{case_study_to_inspect}' not found in organized_data.")

Case Study 'CS2' not found in organized_data.


# Task
Calculate the BERTScore between iterations (I1 vs I2, I1 vs I3, I2 vs I3) for each case study (CS1 to CS10). For each iteration within a case study, combine the text content of all PRI and RIDA files before calculating the BERTScore. Present the results in a table.

## 28-Iterate and merge files by case study and iteration

### Subtask:
Iterate through each case study and iteration. For each iteration, read and combine the content from both the PRI and RIDA subfolders into a single text string.

**Reasoning**:
Iterate through each case study and iteration, read and combine the content from both the PRI and RIDA subfolders, and store the combined content in a dictionary keyed by case study and iteration. This directly addresses the current subtask.

In [437]:
import os

combined_data = {}

# Iterate through each case study
for cs_folder in case_studies:
    combined_data[cs_folder] = {}
    # Iterate through each iteration
    for iteration in iterations:
        iteration_content = []
        # Iterate through each subfolder (PRI and RIDA)
        for subfolder in subfolders:
            folder_path = os.path.join(base_path, cs_folder, iteration, subfolder)
            print(f"Processing folder: {folder_path}") # Added print statement
            content_list = read_and_merge_text_files(folder_path)

            # If content was successfully read, extend the iteration_content list
            if content_list is not None:
                iteration_content.extend(content_list)
            else:
                print(f"No content found or error reading from {folder_path}") # Added print statement

        # Combine all content for the current iteration into a single string
        combined_data[cs_folder][iteration] = "\n".join(iteration_content)

# Print the structure and size of the combined_data for verification
print("\nStructure and size of combined_data:")
for cs, iter_data in combined_data.items():
    print(f"Case Study: {cs}")
    for iteration, content in iter_data.items():
        print(f"  Iteration: {iteration}, Content length: {len(content)} characters")

Processing folder: /content/drive/My Drive/Colab Notebooks/BERTScore/CS1/I1/PRI
Processing folder: /content/drive/My Drive/Colab Notebooks/BERTScore/CS1/I1/RIDA
Processing folder: /content/drive/My Drive/Colab Notebooks/BERTScore/CS1/I2/PRI
Processing folder: /content/drive/My Drive/Colab Notebooks/BERTScore/CS1/I2/RIDA
Processing folder: /content/drive/My Drive/Colab Notebooks/BERTScore/CS1/I3/PRI
Processing folder: /content/drive/My Drive/Colab Notebooks/BERTScore/CS1/I3/RIDA
Processing folder: /content/drive/My Drive/Colab Notebooks/BERTScore/CS2/I1/PRI
Processing folder: /content/drive/My Drive/Colab Notebooks/BERTScore/CS2/I1/RIDA
Processing folder: /content/drive/My Drive/Colab Notebooks/BERTScore/CS2/I2/PRI
Processing folder: /content/drive/My Drive/Colab Notebooks/BERTScore/CS2/I2/RIDA
Processing folder: /content/drive/My Drive/Colab Notebooks/BERTScore/CS2/I3/PRI
Processing folder: /content/drive/My Drive/Colab Notebooks/BERTScore/CS2/I3/RIDA
Processing folder: /content/drive/

## 29-Calculate bert scores for each case study and iteration pair

### Subtask:
Iterate through each case study and calculate BERT scores for the desired iteration pairs (I1 vs I2, I1 vs I3, I2 vs I3) using the combined PRI and RIDA content for each iteration.

**Reasoning**:
Iterate through the case studies and iteration pairs, retrieve the combined content for each iteration, calculate the BERTScore using the defined function with the combined content as input, and store the results.

In [438]:
# Define the iteration pairs to compare
iteration_pairs = [('I1', 'I2'), ('I1', 'I3'), ('I2', 'I3')]

# Initialize an empty dictionary to store the calculated scores
bert_scores_results_combined = {}

# Iterate through each case study
for case_study in case_studies:
    bert_scores_results_combined[case_study] = {}
    # Iterate through the iteration pairs
    for iter1, iter2 in iteration_pairs:
        # Get the combined content for the current iteration pair for the current case study
        content1 = combined_data.get(case_study, {}).get(iter1, "")
        content2 = combined_data.get(case_study, {}).get(iter2, "")

        # Ensure both strings have content before calculating BERTScore
        if content1 and content2:
            print(f"\nCalculating BERTScore for {case_study}, {iter1} vs {iter2} (Combined PRI and RIDA)...")
            # Calculate BERTScore. Pass the single combined string for each iteration within a list.
            try:
                P, R, F1 = calculate_bert_score([content1], [content2])

                # Store the results. .mean() is used to get a single score for each metric
                bert_scores_results_combined[case_study][f'{iter1}_vs_{iter2}'] = {
                    'precision': P.mean().item(),
                    'recall': R.mean().item(),
                    'f1': F1.mean().item()
                }
                print(f"  BERTScore (P, R, F1): {P.mean().item():.4f}, {R.mean().item():.4f}, {F1.mean().item():.4f}")
            except Exception as e:
                print(f"  Error calculating BERTScore for {case_study}, {iter1} vs {iter2}: {e}")
                # Store None or an error indicator if calculation fails
                bert_scores_results_combined[case_study][f'{iter1}_vs_{iter2}'] = None

        else:
            print(f"\nSkipping BERTScore calculation for {case_study}, {iter1} vs {iter2} due to missing combined content.")

# The bert_scores_results_combined dictionary now contains the calculated scores for combined content.
# The next step is to present or save these results.


Calculating BERTScore for CS1, I1 vs I2 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  return forward_call(*args, **kwargs)


  BERTScore (P, R, F1): 0.8618, 0.8424, 0.8520

Calculating BERTScore for CS1, I1 vs I3 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8762, 0.8562, 0.8661

Calculating BERTScore for CS1, I2 vs I3 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8895, 0.8916, 0.8905

Calculating BERTScore for CS2, I1 vs I2 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8799, 0.8633, 0.8716

Calculating BERTScore for CS2, I1 vs I3 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8756, 0.8601, 0.8678

Calculating BERTScore for CS2, I2 vs I3 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8880, 0.8903, 0.8892

Calculating BERTScore for CS3, I1 vs I2 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8558, 0.8560, 0.8559

Calculating BERTScore for CS3, I1 vs I3 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8454, 0.8407, 0.8430

Calculating BERTScore for CS3, I2 vs I3 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8917, 0.8870, 0.8894

Calculating BERTScore for CS4, I1 vs I2 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8513, 0.8274, 0.8392

Calculating BERTScore for CS4, I1 vs I3 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8245, 0.8320, 0.8283

Calculating BERTScore for CS4, I2 vs I3 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8243, 0.8551, 0.8395

Calculating BERTScore for CS5, I1 vs I2 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8187, 0.8202, 0.8195

Calculating BERTScore for CS5, I1 vs I3 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8230, 0.8250, 0.8240

Calculating BERTScore for CS5, I2 vs I3 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8930, 0.8888, 0.8908

Calculating BERTScore for CS6, I1 vs I2 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8308, 0.8353, 0.8330

Calculating BERTScore for CS6, I1 vs I3 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8354, 0.8358, 0.8356

Calculating BERTScore for CS6, I2 vs I3 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8913, 0.8915, 0.8914

Calculating BERTScore for CS7, I1 vs I2 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8601, 0.8427, 0.8513

Calculating BERTScore for CS7, I1 vs I3 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8199, 0.8229, 0.8214

Calculating BERTScore for CS7, I2 vs I3 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8290, 0.8515, 0.8401

Calculating BERTScore for CS8, I1 vs I2 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8395, 0.8401, 0.8398

Calculating BERTScore for CS8, I1 vs I3 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8328, 0.8266, 0.8297

Calculating BERTScore for CS8, I2 vs I3 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8713, 0.8727, 0.8720

Calculating BERTScore for CS9, I1 vs I2 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8305, 0.8228, 0.8266

Calculating BERTScore for CS9, I1 vs I3 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8244, 0.8167, 0.8206

Calculating BERTScore for CS9, I2 vs I3 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8776, 0.8788, 0.8782

Calculating BERTScore for CS10, I1 vs I2 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8396, 0.8297, 0.8347

Calculating BERTScore for CS10, I1 vs I3 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8289, 0.8095, 0.8191

Calculating BERTScore for CS10, I2 vs I3 (Combined PRI and RIDA)...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  BERTScore (P, R, F1): 0.8757, 0.8667, 0.8712


## 30-Store and present results

### Subtask:
Store the calculated BERT scores in a structured format and display them as a table.

**Reasoning**:
Convert the nested `bert_scores_results_combined` dictionary into a pandas DataFrame and display the DataFrame as a table.

In [439]:
import pandas as pd

# Create an empty list to store the data for the DataFrame
table_data_combined = []

# Iterate through the bert_scores_results_combined dictionary
for case_study, scores_by_iteration in bert_scores_results_combined.items():
    if scores_by_iteration:
        for iteration_pair, scores in scores_by_iteration.items():
            if scores:
                # Append a row to the table_data_combined list
                table_data_combined.append({
                    'Case Study': case_study,
                    'Comparison': iteration_pair,
                    'Precision (Combined)': scores['precision'],
                    'Recall (Combined)': scores['recall'],
                    'F1 Score (Combined)': scores['f1']
                })
            else:
                # Append a row indicating calculation failed
                 table_data_combined.append({
                    'Case Study': case_study,
                    'Comparison': iteration_pair,
                    'Precision (Combined)': 'N/A',
                    'Recall (Combined)': 'N/A',
                    'F1 Score (Combined)': 'N/A'
                })


# Create a pandas DataFrame from the list of data
bert_scores_combined_df = pd.DataFrame(table_data_combined)

# Display the DataFrame
display(bert_scores_combined_df)

Unnamed: 0,Case Study,Comparison,Precision (Combined),Recall (Combined),F1 Score (Combined)
0,CS1,I1_vs_I2,0.861802,0.842437,0.85201
1,CS1,I1_vs_I3,0.876186,0.856174,0.866064
2,CS1,I2_vs_I3,0.889494,0.891578,0.890535
3,CS2,I1_vs_I2,0.879938,0.863332,0.871556
4,CS2,I1_vs_I3,0.875603,0.860068,0.867766
5,CS2,I2_vs_I3,0.888042,0.890347,0.889193
6,CS3,I1_vs_I2,0.85577,0.856039,0.855905
7,CS3,I1_vs_I3,0.845362,0.840657,0.843003
8,CS3,I2_vs_I3,0.891748,0.886967,0.889351
9,CS4,I1_vs_I2,0.851261,0.827405,0.839163


## 31-Save BERTScore results to Google Drive

### Subtask:
Save the `bert_scores_combined_df` DataFrame to an Excel file in the specified Google Drive folder.

**Reasoning**:
Use the `to_excel` method of the pandas DataFrame to save the results to an Excel file in the user's Google Drive.

In [440]:
# Define the output file path in Google Drive
output_excel_path = os.path.join(base_path, 'bert_scores_combined_results.xlsx')

try:
    # Save the DataFrame to an Excel file
    bert_scores_combined_df.to_excel(output_excel_path, index=False)
    print(f"BERTScore results successfully saved to {output_excel_path}")
except Exception as e:
    print(f"Error saving BERTScore results to Excel: {e}")

BERTScore results successfully saved to /content/drive/My Drive/Colab Notebooks/BERTScore/bert_scores_combined_results.xlsx


## 32-Calculate and display average BERT scores per case study

### Subtask:
Calculate the average Precision, Recall, and F1 scores for each case study across all iteration comparisons.

**Reasoning**:
Group the `bert_scores_combined_df` DataFrame by 'Case Study' and calculate the mean of the BERTScore columns to get the average scores for each case study.

In [441]:
# Calculate the average BERT scores per case study
average_bert_scores_per_case_study = bert_scores_combined_df.groupby('Case Study')[['Precision (Combined)', 'Recall (Combined)', 'F1 Score (Combined)']].mean()

# Display the table of average BERT scores
print("Average BERT Scores per Case Study:")
display(average_bert_scores_per_case_study)

Average BERT Scores per Case Study:


Unnamed: 0_level_0,Precision (Combined),Recall (Combined),F1 Score (Combined)
Case Study,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
CS1,0.875827,0.863396,0.869536
CS10,0.848073,0.835341,0.841652
CS2,0.881194,0.871249,0.876172
CS3,0.864293,0.861221,0.862753
CS4,0.833381,0.838189,0.835629
CS5,0.844879,0.844662,0.844768
CS6,0.85246,0.854199,0.853327
CS7,0.836331,0.839041,0.837605
CS8,0.84785,0.846449,0.847146
CS9,0.844188,0.839421,0.841792


## 33-Save Average BERT Scores Table

### Subtask:
Save the `average_bert_scores_per_case_study` DataFrame to an Excel file in the specified Google Drive folder.

**Reasoning**:
Use the `to_excel` method of the pandas DataFrame to save the average BERT scores table to an Excel file in the user's Google Drive.

In [442]:
# Define the output file path for the average scores table in Google Drive
output_average_excel_path = os.path.join(base_path, 'average_bert_scores_per_case_study.xlsx')

try:
    # Save the DataFrame to an Excel file
    average_bert_scores_per_case_study.to_excel(output_average_excel_path)
    print(f"Average BERT scores table successfully saved to {output_average_excel_path}")
except Exception as e:
    print(f"Error saving average BERT scores table to Excel: {e}")

Average BERT scores table successfully saved to /content/drive/My Drive/Colab Notebooks/BERTScore/average_bert_scores_per_case_study.xlsx
