# Multiple Variable Method for obtaining Representative Years
## Normalizing the files
After you get the representative year outputs from the SVM script, you can use those to combine and normalize the data for the first phase of the MVM method. Depending on the number of your variables and the weight, as well as the mode you followed (CY, 12M) in the SVM method, you may have to modify the code. You have to ensure you have the folder(s) you want from SVM method in the folder of the MVM IPYNB. This code is written in a way to be able to access the folders and read/normalize those files and save it in a new folder 'Normalized'. 


In [1]:
import pandas as pd
import os

# List of start and end year, variable, scenario
# Note: this code is written such that you can loop for multiple year_ranges tuples, loc list, var list, ssp lists.
# This is an example run of the code.

year_ranges = [(2030, 2040)]
loc = ['FMBORS']
var = ['THETAO', 'SO']
ssp = [585]
mode  = 'CY' #'12M' if you ran SVM for rolling years

# Base directory for saving results
base_dir = 'H:/M.Sc Thesis - Data, Methodology, Results/MVM Script and Data'
file_endings = ['_jsd', '_symn', '_symd', '_symp', '_representative_years']

## Functions

def normalize_metric(df, metric_col):
    min_val = df[metric_col].min()
    max_val = df[metric_col].max()
    df[metric_col] = 1 - (df[metric_col] - min_val) / (max_val - min_val)
    return df

normalized_base_dir = os.path.join(base_dir, 'Normalized')
if not os.path.exists(normalized_base_dir):
    os.makedirs(normalized_base_dir)

# Normalizing the metrics from 0-1. 1 shows the representative year.

for startyear, endyear in year_ranges:
    for location in loc:
        for variable in var:
            for scenario in ssp:
                folder_name = f'{location}_{variable}_{scenario}_{mode}_{startyear}_{endyear}'
                folder_path = os.path.join(base_dir, folder_name)
                # Process each file in the folder
                for ending in file_endings:
                    file_name = f'{folder_name}{ending}.csv'
                    file_path = os.path.join(folder_path, file_name)
                    
                    # Skip the _representative_years file as it doesn't need normalization
                    if ending == '_representative_years':
                        continue
                    
                    # Read the CSV file
                    df = pd.read_csv(file_path)

                    # Determine the metric column name based on the file ending
                    metric_col = 'JSD' if ending == '_jsd' else 'RMSE'
                    
                    # Normalize the metric column
                    df = normalize_metric(df, metric_col)

                    # Save the normalized dataframe to the new folder
                    normalized_file_path = os.path.join(normalized_base_dir, file_name)
                    df.to_csv(normalized_file_path, index=False)
                    print(f'Normalized file saved: {normalized_file_path}')


Normalized file saved: H:/M.Sc Thesis - Data, Methodology, Results/MVM Script and Data\Normalized\FMBORS_THETAO_585_CY_2030_2040_jsd.csv
Normalized file saved: H:/M.Sc Thesis - Data, Methodology, Results/MVM Script and Data\Normalized\FMBORS_THETAO_585_CY_2030_2040_symn.csv
Normalized file saved: H:/M.Sc Thesis - Data, Methodology, Results/MVM Script and Data\Normalized\FMBORS_THETAO_585_CY_2030_2040_symd.csv
Normalized file saved: H:/M.Sc Thesis - Data, Methodology, Results/MVM Script and Data\Normalized\FMBORS_THETAO_585_CY_2030_2040_symp.csv
Normalized file saved: H:/M.Sc Thesis - Data, Methodology, Results/MVM Script and Data\Normalized\FMBORS_SO_585_CY_2030_2040_jsd.csv
Normalized file saved: H:/M.Sc Thesis - Data, Methodology, Results/MVM Script and Data\Normalized\FMBORS_SO_585_CY_2030_2040_symn.csv
Normalized file saved: H:/M.Sc Thesis - Data, Methodology, Results/MVM Script and Data\Normalized\FMBORS_SO_585_CY_2030_2040_symd.csv
Normalized file saved: H:/M.Sc Thesis - Data, Me

## Composite Representative Year
The following snippets will read the normalized files based on your choice, so you have to modify the script of location/file name/variable column headers. The end script will give you a representative year which is balanced for the variables you are inputting.

In [21]:
## Read the csv file and put into dataframe

directory = 'H:/M.Sc Thesis - Data, Methodology, Results/MVM Script and Data/Normalized'

# Define the file names
file_so_jsd = 'FMBORS_SO_585_CY_2030_2040_jsd.csv'
file_thetao_jsd = 'FMBORS_THETAO_585_CY_2030_2040_jsd.csv'

# Read the CSV files
df_so_jsd = pd.read_csv(os.path.join(directory, file_so_jsd))
df_thetao_jsd = pd.read_csv(os.path.join(directory, file_thetao_jsd))

# Create the new DataFrame
new_df = pd.DataFrame()
new_df['Year'] = df_so_jsd.iloc[:, 0]
new_df['SO'] = df_so_jsd.iloc[:, 1]
new_df['THETAO'] = df_thetao_jsd.iloc[:, 1]

# Display the new DataFrame
print(new_df)

           Year        SO    THETAO
0   sub_ts_2030  0.501838  0.643940
1   sub_ts_2031  0.477124  0.769654
2   sub_ts_2032  0.478511  0.564155
3   sub_ts_2033  0.743861  0.453186
4   sub_ts_2034  0.000000  0.293148
5   sub_ts_2035  0.522892  0.691305
6   sub_ts_2036  0.687675  1.000000
7   sub_ts_2037  0.503399  0.811489
8   sub_ts_2038  0.445564  0.000000
9   sub_ts_2039  0.902209  0.872088
10  sub_ts_2040  1.000000  0.247181


In [20]:
# Set weights for variables
# Either use the following for equal weightage of variables
# Or you can ignore the no_of_var and equal_wt, and put values for wt_variable
# The total weight must be equal to 1 e.g if there are 4 variables and equal weight, the weight summation would be 0.25+0.25+0.25+0.25 =1
no_of_var = 2
equal_wt = 1/no_of_var
wt_so=equal_wt
wt_thetao=equal_wt
# Calculate the weighted sum
new_df['weighted_sum'] = new_df['SO'] * wt_so + new_df['THETAO'] * wt_thetao   #Change here if you have more variables
new_df['abs_diff'] = abs(1-new_df['weighted_sum'])   
print(new_df)

# Find the 'Year' for the minimum abs_diff

min_abs_diff_year = new_df.loc[new_df['abs_diff'].idxmin(), 'Year']
residual = new_df.loc[new_df['abs_diff'].idxmin(), 'abs_diff']
print(f'The representative year for your input combination of variables is {min_abs_diff_year} with a residual of {residual}.')

           Year        SO    THETAO  weighted_sum  abs_diff
0   sub_ts_2030  0.501838  0.643940      0.572889  0.427111
1   sub_ts_2031  0.477124  0.769654      0.623389  0.376611
2   sub_ts_2032  0.478511  0.564155      0.521333  0.478667
3   sub_ts_2033  0.743861  0.453186      0.598524  0.401476
4   sub_ts_2034  0.000000  0.293148      0.146574  0.853426
5   sub_ts_2035  0.522892  0.691305      0.607099  0.392901
6   sub_ts_2036  0.687675  1.000000      0.843837  0.156163
7   sub_ts_2037  0.503399  0.811489      0.657444  0.342556
8   sub_ts_2038  0.445564  0.000000      0.222782  0.777218
9   sub_ts_2039  0.902209  0.872088      0.887149  0.112851
10  sub_ts_2040  1.000000  0.247181      0.623591  0.376409
The representative year for your input combination of variables is sub_ts_2039 with a residual of 0.11285147723030597.
