<a href="https://www.kaggle.com/code/shyamgupta196/explaining-classification-metric-rsna2024?scriptVersionId=179642821" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# RSNA 2024 Lumbar Spine Degenerative Classification Metric Explanation

This metric has been provided by the host of the competition. Below is the explanation of the code ✨

## ParticipantVisibleError Class

The **ParticipantVisibleError** class is a custom exception class that inherits from the built-in **Exception** class. It's used to raise an exception when there's an error that should be visible to the participant.

## get_condition Function

The **get_condition** function takes a string **full_location** as input, which seems to be a string containing information about a medical condition and its location. The function checks if any of the predefined conditions (**'spinal'**, **'foraminal'**, **'subarticular'**) are present in the **full_location** string. If found, it returns the condition. If none of the conditions are found, it raises a **ValueError**.

## score Function

The **score** function is more complex. It takes four arguments: two pandas DataFrames **solution** and **submission**, a string **row_id_column_name**, and a float **any_severe_scalar**. The function calculates the sample weighted log loss for each medical condition, derives a new '**any_severe**' label, calculates the sample weighted log loss for the new '**any_severe**' label, and returns the average of all of the label group log losses as the final score. This score is normalized for the number of columns in each group to mitigate the impact of '**spinal stenosis**' having only half as many columns as the other two conditions.


In [1]:
import numpy as np
import pandas as pd
import pandas.api.types
import sklearn.metrics


class ParticipantVisibleError(Exception):
    
    pass


def get_condition(full_location: str) -> str:
    # Given an input like spinal_canal_stenosis_l1_l2 extracts 'spinal'
    for injury_condition in ['spinal', 'foraminal', 'subarticular']:
        if injury_condition in full_location:
            return injury_condition
    raise ValueError(f'condition not found in {full_location}')


def score(
        solution: pd.DataFrame,
        submission: pd.DataFrame,
        row_id_column_name: str,
        any_severe_scalar: float
    ) -> float:
    '''
    Pseudocode:
    1. Calculate the sample weighted log loss for each medical condition:
    2. Derive a new any_severe label.
    3. Calculate the sample weighted log loss for the new any_severe label.
    4. Return the average of all of the label group log losses as the final score, normalized for the number of columns in each group.
       This mitigates the impact of spinal stenosis having only half as many columns as the other two conditions.
    '''

    target_levels = ['normal_mild', 'moderate', 'severe']

    # Run basic QC checks on the inputs
    if not pandas.api.types.is_numeric_dtype(submission[target_levels].values):
        raise ParticipantVisibleError('All submission values must be numeric')

    if not np.isfinite(submission[target_levels].values).all():
        raise ParticipantVisibleError('All submission values must be finite')

    if solution[target_levels].min().min() < 0:
        raise ParticipantVisibleError('All labels must be at least zero')
    if submission[target_levels].min().min() < 0:
        raise ParticipantVisibleError('All predictions must be at least zero')

    solution['study_id'] = solution['row_id'].apply(lambda x: x.split('_')[0])
    solution['location'] = solution['row_id'].apply(lambda x: '_'.join(x.split('_')[1:]))
    solution['condition'] = solution['row_id'].apply(get_condition)

    del solution[row_id_column_name]
    del submission[row_id_column_name]
    assert sorted(submission.columns) == sorted(target_levels)

    submission['study_id'] = solution['study_id']
    submission['location'] = solution['location']
    submission['condition'] = solution['condition']

    condition_losses = []
    condition_weights = []
    for condition in ['spinal', 'foraminal', 'subarticular']:
        condition_indices = solution.loc[solution['condition'] == condition].index.values
        condition_loss = sklearn.metrics.log_loss(
            y_true=solution.loc[condition_indices, target_levels].values,
            y_pred=submission.loc[condition_indices, target_levels].values,
            sample_weight=solution.loc[condition_indices, 'sample_weight'].values
        )
        condition_losses.append(condition_loss)
        condition_weights.append(1 / solution.loc[condition_indices, 'location'].nunique())

    any_severe_spinal_labels = pd.Series(solution.loc[solution['condition'] == 'spinal'].groupby('study_id')['severe'].max())
    any_severe_spinal_weights = pd.Series(solution.loc[solution['condition'] == 'spinal'].groupby('study_id')['sample_weight'].max())
    any_severe_spinal_predictions = pd.Series(solution.loc[solution['condition'] == 'spinal'].groupby('study_id')['severe'].max())
    any_severe_spinal_loss = sklearn.metrics.log_loss(
            y_true=any_severe_spinal_labels,
            y_pred=any_severe_spinal_predictions,
            sample_weight=any_severe_spinal_weights
        )
    condition_losses.append(any_severe_spinal_loss)
    condition_weights.append(any_severe_scalar)
    return np.average(condition_losses, weights=condition_weights)