# Brier Score

The Brier score is a metric used to measure the accuracy of probabilistic predictions. 
In the context of survival analysis, it quantifies how close the predicted survival probabilities are to the actual outcomes.

The Brier score ranges from 0 to 1, where 0 indicates perfect accuracy and 1 indicates the worst possible prediction.

In [2]:
import numpy as np

def brier_score(y_true, y_pred, event_observed):
    """
    Compute the Brier score for survival analysis.

    Parameters:
    -----------
    y_true : array-like, shape (n_samples,)
        Actual survival/censoring times.
    y_pred : array-like, shape (n_samples,)
        Predicted survival probabilities at a specific time.
    event_observed : array-like, shape (n_samples,)
        Event indicator: 1 if event occurred, 0 if censored.

    Returns:
    --------
    float
        The Brier score.
    """
    y_true = np.asarray(y_true)
    y_pred = np.asarray(y_pred)
    event_observed = np.asarray(event_observed)

    # For censored data, use inverse probability of censoring weighting (IPCW)
    # For simplicity, here we ignore IPCW and compute the naive Brier score
    # (for uncensored data or as an illustrative example)
    return np.mean((event_observed - y_pred) ** 2)

# Example usage:
y_true = [5, 6, 7, 10]
y_pred = [0.9, 0.9, 0.1, 0.9]  # predicted survival probabilities at a given time
event_observed = [1, 1, 0, 1]  # 1=event occurred, 0=censored
score = brier_score(y_true, y_pred, event_observed)
print("Brier score:", score)


Brier score: 0.009999999999999997


The lower the score, the better the prediction

#### Pros and Cons of the Brier Score

 **Pros:**
 - Simple to compute and interpret.
 - Provides a direct measure of the accuracy of probabilistic predictions.
 - Can be used for both binary and survival analysis settings.

 **Cons:**
 - Sensitive to censoring; naive computation does not account for censored data properly.
 - Requires adjustment (e.g., IPCW) for use with censored survival data, which can complicate implementation.
 - May not distinguish well between models with similar calibration but different discrimination.


# concordance Index

The concordance index (C-index) is a metric used to evaluate the predictive accuracy of risk scores in survival analysis.
It measures the agreement between the predicted risk scores and the actual order of observed survival times.

A higher C-index indicates better model discrimination, with a value of 1.0 representing perfect prediction and 0.5 indicating random chance.


In [None]:
def concordance_index(event_times, predicted_scores, event_observed):
    """
    Compute the concordance index (C-index) for survival predictions.

    Parameters
    ----------
    event_times : array-like
        Actual observed survival times.
    predicted_scores : array-like
        Predicted risk scores (higher means higher risk).
    event_observed : array-like
        Event indicator (1 if event occurred, 0 if censored).

    Returns
    -------
    c_index : float
        Concordance index value between 0 and 1.
    """
    n = 0  # number of comparable pairs
    n_concordant = 0
    n_tied = 0

    for i in range(len(event_times)):
        for j in range(len(event_times)):
            if i == j:
                continue
            # Only consider pairs where one event is observed and the other survived longer
            if event_times[i] < event_times[j] and event_observed[i] == 1:
                n += 1
                if predicted_scores[i] > predicted_scores[j]:
                    n_concordant += 1
                elif predicted_scores[i] == predicted_scores[j]:
                    n_tied += 1
    if n == 0:
        return np.nan
    return (n_concordant + 0.5 * n_tied) / n

# Example usage:
event_times = [5, 6, 7, 10]
predicted_scores = [0.9, 0.9, 0.1, 0.9]  # higher means higher risk
event_observed = [1, 1, 0, 1]
c_index = concordance_index(event_times, predicted_scores, event_observed)
print("Concordance index:", c_index)


Concordance index: 1.0


 
#### Pros and Cons

Pros 
 - Intuitive interpretation: Measures the probability that, for a randomly selected pair, the model correctly predicts which subject experiences the event first.
 - Handles censored data: Can be used even when some survival times are censored.
 - Model-agnostic: Does not require assumptions about the underlying survival distribution or model type.
 - Widely used: Standard metric for comparing survival models.

Cons 
 - Ignores calibration: Only assesses ranking, not the absolute accuracy of predicted survival times.
 - Sensitive to ties: May be less informative when there are many tied predicted scores or event times.
 - Pairwise computation: Can be computationally expensive for large datasets due to pairwise comparisons.
 - Does not account for time-dependent covariates or dynamic predictions.
