## Function to Add Relevance Scores to Feature Files

This function that an evaluation file, creates a dataframe, and adds relevance scores to feature files. The function takes three arguments:

- `evaluation_file_path`: the path to the evaluation file containing the relevance scores.
- `features_dir_path`: the directory path containing the feature files.
- `output_dir_path`: the path to the directory where the updated feature files with relevance scores will be saved.

This function loops through all the feature files in the directory, reads each file, and merges the relevance scores from the evaluation file into the feature file based on matching topic number and argument ID. If a relevance score is not found, the function drops the row.

The updated feature files with relevance scores are saved to a new file in the specified output directory.

The docno is additionally saved as a comment at the end of each line.

The function prints messages indicating the progress of the relevance score addition process. Once all the feature files have been processed, the function prints a message indicating that the process is finished.

### Output
The function outputs updated feature files with relevance scores for each feature file in the input directory. The updated feature files are saved in the specified output directory.

### Required Libraries
The function requires the following Python libraries:

- pandas
- os

In [1]:
#Imports
import os
import pandas as pd

In [2]:
#Global variables
data_path = '/Users/balazs/Desktop/dissertationProjectCode/dissertationCodeBase/'

In [3]:
def add_relevance_score_to_features(evaluation_file_path, features_dir_path, output_dir_path):
    # Read the evaluation file and create a dataframe
    evaluation_df = pd.read_csv(evaluation_file_path, header=None, delimiter=' ', names=["qid", "placeholder", "docno", "relevance_score"])
    
    # Turn all negative relevance scores into 0
    evaluation_df.loc[evaluation_df['relevance_score'] < 0, 'relevance_score'] = 0
    
    # Loop through all the feature extraction files
    features_files_path = [os.path.join(features_dir_path, filename) for filename in os.listdir(features_dir_path) if filename.endswith('_features.csv')]
    
    for features_file_path in features_files_path:
        
        # Read the current features file
        features_df = pd.read_csv(features_file_path, header=0, sep=',')

        # Merge the relevance scores into the features dataframe based on matching document number
        merged_df = pd.merge(features_df, evaluation_df[["docno", "relevance_score"]], on=["docno"], how="left")
        merged_df.dropna(subset=["relevance_score"], inplace=True)  # Drop rows where relevance score is not found
        
        # Save the merged dataframe to a new file in the specified output directory without headers and in CSV format
        file_name = os.path.basename(os.path.splitext(features_file_path)[0]) + "_with_relevance.csv"
        updated_features_file_path = os.path.join(output_dir_path, file_name)
        merged_df.to_csv(updated_features_file_path, sep=',', index=False, header=True)

    print("Relevance scores added to all feature extraction files!")

In [4]:
# Paths for adding relevance scores
evaluation_file_path = data_path + 'Evaluation/ev_files_21/touche-task1-51-100-relevance.qrels'
features_dir_path = data_path + 'Data/features_extracted_2021/lingustic_sentiment_features'
output_dir_path = data_path + 'Data/merged_features_relevance_2021/linguistic_sentiment_features_relevance'

# Call the function to add relevance scores to the feature extraction files
add_relevance_score_to_features(evaluation_file_path, features_dir_path, output_dir_path)

Relevance scores added to all feature extraction files!


In [5]:
# Paths for adding relevance scores
evaluation_file_path = data_path + 'Evaluation/ev_files_21/touche-task1-51-100-relevance.qrels'
features_dir_path = data_path + 'Data/features_extracted_2021/sbert_features'
output_dir_path = data_path + "Data/merged_features_relevance_2021/sbert_features_relevance"

# Call the function to add relevance scores to the feature extraction files
add_relevance_score_to_features(evaluation_file_path, features_dir_path, output_dir_path)

Relevance scores added to all feature extraction files!


In [6]:
# Paths for adding relevance scores
evaluation_file_path = data_path + 'Evaluation/ev_files_21/touche-task1-51-100-relevance.qrels'
features_dir_path = data_path + 'Data/features_extracted_2021/sentiment_sarcasm_features'
output_dir_path = data_path + "Data/merged_features_relevance_2021/sentiment_sarcasm_features_relevance"

# Call the function to add relevance scores to the feature extraction files
add_relevance_score_to_features(evaluation_file_path, features_dir_path, output_dir_path)

Relevance scores added to all feature extraction files!


In [7]:
# Paths for adding relevance scores
evaluation_file_path = data_path + 'Evaluation/ev_files_21/touche-task1-51-100-relevance.qrels'
features_dir_path = data_path + 'Data/features_extracted_2021/tf_idf_features'
output_dir_path = data_path + "Data/merged_features_relevance_2021/tf_idf_features_relevance"

# Call the function to add relevance scores to the feature extraction files
add_relevance_score_to_features(evaluation_file_path, features_dir_path, output_dir_path)

Relevance scores added to all feature extraction files!


In [None]:
# Paths for adding relevance scores
evaluation_file_path = data_path + 'Evaluation/ev_files_20/touche2020-task1-relevance-args-me-corpus-version-2020-04-01.qrels'
features_dir_path = data_path + 'Data/features_extracted_2020/lingustic_sentiment_features'
output_dir_path = data_path + "Data/merged_features_relevance_2020/linguistic_sentiment_features_relevance"

# Call the function to add relevance scores to the feature extraction files
#add_relevance_score_to_features(evaluation_file_path, features_dir_path, output_dir_path)

In [8]:
# Paths for adding relevance scores
evaluation_file_path = data_path + 'Evaluation/ev_files_20/touche2020-task1-relevance-args-me-corpus-version-2020-04-01.qrels'
features_dir_path = data_path + 'Data/features_extracted_2020/sbert_features'
output_dir_path = data_path + "Data/merged_features_relevance_2020/sbert_features_relevance"

# Call the function to add relevance scores to the feature extraction files
add_relevance_score_to_features(evaluation_file_path, features_dir_path, output_dir_path)

Relevance scores added to all feature extraction files!


In [9]:
# Paths for adding relevance scores
evaluation_file_path = data_path + 'Evaluation/ev_files_20/touche2020-task1-relevance-args-me-corpus-version-2020-04-01.qrels'
features_dir_path = data_path + 'Data/features_extracted_2020/sentiment_sarcasm_features'
output_dir_path = data_path + "Data/merged_features_relevance_2020/sentiment_sarcasm_features_relevance"

# Call the function to add relevance scores to the feature extraction files
add_relevance_score_to_features(evaluation_file_path, features_dir_path, output_dir_path)

Relevance scores added to all feature extraction files!


In [10]:
# Paths for adding relevance scores
evaluation_file_path = data_path + 'Evaluation/ev_files_20/touche2020-task1-relevance-args-me-corpus-version-2020-04-01.qrels'
features_dir_path = data_path + 'Data/features_extracted_2020/tf_idf_features'
output_dir_path = data_path + "Data/merged_features_relevance_2020/tf_idf_features_relevance"

# Call the function to add relevance scores to the feature extraction files
add_relevance_score_to_features(evaluation_file_path, features_dir_path, output_dir_path)

Relevance scores added to all feature extraction files!
