In this notebook, you will find the protocal where the observed sequences in the testing dataset is created. In the folder `training_data/degradataion_data`, you will find the $50$ complete run-to-failure sequences. These sequences are generated from a theoretical model which contains three failure modes:
- infant mortality
- control board failure
- crack growth failure
The three failure modes are "competing", i.e., whichever failure mode occurs first will be the one that causes the failure. It is "run-to-failure" as the crack growth process is monitored until a failure occurs (but the failure could be due to either one of the three failure modes).

The testing dataset you have in the folder `testing_data/group_0` is created by based on the complete run-to-failure sequences by randomly truncated them at a given time slot $t_end,$ and you need to predict the remaining useful life from this time point $t_end.$ The truncation is done following the same manner:
- If the run-to-failure sequence is shorter than $7$, i.e., if the time-to-failure is less than or equal to $6$, we keep the sequence as it is.
- If the run-to-failure sequence is longer than $7$, i.e., if the time-to-failure is greater than $6$, it is truncated at a random time point $t_end$, which is generated from a uniform distribution from [1, ttf-1].

The following code will generate a "pseudo" testing dataset based on the training dataset, which can be used by you to evaluate the performance of the model you developed:

In [1]:
import os
import pandas as pd
import random

group = 0

directory = 'pseudo_testing_data_with_truth'
directory_truth = 'degradation_data'

if not os.path.exists(directory):
    os.makedirs(directory)

# List all CSV files in the directory
csv_files = [f for f in os.listdir(directory_truth) if f.endswith('.csv')]

# Iterate over the shuffled file list and rename the files
for i, file_name in enumerate(csv_files):
    df = pd.read_csv(directory_truth + '/' + file_name)
    ttf = df.iloc[0]['rul (months)']
    if ttf >= 6:
        random_integer = random.randint(1, ttf-1)

        df = df[df['rul (months)'] >= random_integer]
        df.to_csv(directory + '/' + file_name, index=False)
    else:
        df = df[df['rul (months)'] > 0]
        df.to_csv(directory + '/' + file_name, index=False)

In [2]:
import os
import pandas as pd

directory = 'pseudo_testing_data_with_truth'
directory_student = 'pseudo_testing_data'

if not os.path.exists(directory_student):
    os.makedirs(directory_student)

# Iterate over the shuffled file list and rename the files
solution = pd.DataFrame()
for i in range(50):
    file_name = 'item_' + str(i) + '.csv'
    df = pd.read_csv(directory + '/' + file_name)
    
    true_rul = df.iloc[-1]['rul (months)']
    solution = pd.concat([solution, pd.DataFrame([{'item_index': 'item_{}'.format(i), 
                                     'label': 1 if true_rul<=6 else 0,
                                     'true_rul': true_rul}])])
    
    df = df.drop(columns=['rul (months)'])
    df.to_csv(directory_student + '/' + file_name, index=False)

solution.to_csv(directory + '/' + 'Solution.csv', index=False)

After running this script, you will find two folders in the directory:
- `pseudo_testing_data_with_truth`: contains the generated pseudo testing data with the true RUL. Especially, in this folder, you will find a file `Solution.csv`, which contains the ground truth. You can directly use this file to evaluate your model.
- `pseudo_testing_data`: contains the testing data without the true RUL.

Below, you will find a script that allows you evaluate your prediction on the pseudo testing data.

In [None]:
import pandas as pd
import pandas.api.types


class ParticipantVisibleError(Exception):
    # If you want an error message to be shown to participants, you must raise the error as a ParticipantVisibleError
    # All other errors will only be shown to the competition host. This helps prevent unintentional leakage of solution data.
    pass


def score(solution: pd.DataFrame, submission: pd.DataFrame, row_id_column_name: str) -> float:
    '''
    This metric is customized to measure the performance of remaining useful life prediction. 
    The participant is asked to predict whether the RUL of an item is less than 6 months: 1 - if RUL<=6 and 0 otherwise.
    In the ground truth file "Solution.csv", there will be a column "true_rul" as well as a column "label".
    If the predicted label matches the ground truth, a reward of 5 will be given.
    If it does not match, then,
    - A penalty of -10 will be given, if truth is 1 and prediction is 0;
    - A penalty of -1/6*true_rul will be given, if truth is 0 and prediction is 1.

    TODO: Add unit tests. We recommend using doctests so your tests double as usage demonstrations for competition hosts.
    https://docs.python.org/3/library/doctest.html
    # This example doctest works for mean absolute error:
    >>> import pandas as pd
    >>> row_id_column_name = "item_index"
    >>> solution_data = {'item_index': [0, 1, 2, 3], 'label': [1, 0, 1, 0], 'true_rul': [5, 20, 1, 6]}
    >>> submission_data = {'item_index': [0, 1, 2, 3], 'label': [1, 0, 0, 0]}
    >>> solution = pd.DataFrame(solution_data)
    >>> submission = pd.DataFrame(submission_data)
    >>> score(solution.copy(), submission.copy(), row_id_column_name)
    2
    '''

    # Initialize rewards and penalties
    reward = 2
    penalty_false_positive = -1/60
    penalty_false_negative = -4

    # Compare labels and calculate rewards/penalties
    rewards_penalties = []
    for _, (sol_label, sub_label, true_rul) in enumerate(zip(solution['label'], submission['label'], solution['true_rul'])):
        if sol_label == sub_label:
            rewards_penalties.append(reward)
        elif sol_label == 1 and sub_label == 0:
            rewards_penalties.append(penalty_false_negative)
        elif sol_label == 0 and sub_label == 1:
            rewards_penalties.append(penalty_false_positive * true_rul)
        else:
            rewards_penalties.append(0)  # No reward or penalty if labels don't match   
    
    return sum(rewards_penalties)


row_id_column_name = "item_index"
solution = pd.read_csv('pseudo_testing_data_with_truth/Solution.csv')

# Put the path to your prediction result here:
submission = 

print(score(solution.copy(), submission.copy(), row_id_column_name))