<a href="https://colab.research.google.com/github/sevenjunebaby/AiModels/blob/main/likelihoodratios.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Pre-test vs. post-test analysis**

In the context of disease diagnosis, the pre-test and post-test analysis refers to the evaluation of a machine learning model's performance in predicting the presence of a disease.

Pre-test probability: This is the probability of a person having the disease before taking a test. In this example, it's around 10.37% (10% of the population has the disease).
Post-test probability: This is the probability of a person having the disease after taking a test. The positive likelihood ratio (LR+) is used to calculate the post-test probability.

**EXEMPLE**

In [6]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import class_likelihood_ratios

# Generate a classification dataset
X, y = make_classification(n_samples=10_000, weights=[0.9, 0.1], random_state=0)

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

# Train a logistic regression model
estimator = LogisticRegression().fit(X_train, y_train)

# Predict the test set
y_pred = estimator.predict(X_test)

# Compute the positive likelihood ratio
pos_LR, neg_LR = class_likelihood_ratios(y_test, y_pred)
print(f"LR+: {pos_LR:.3f}")

LR+: 12.617


🔬 1. What are Likelihood Ratios? In diagnostic testing:

Positive Likelihood Ratio (LR+) = Sensitivity / (1 - Specificity)

→ How much more likely a positive result is in someone with the disease compared to someone without.

Negative Likelihood Ratio (LR−) = (1 - Sensitivity) / Specificity

→ How much less likely a negative result is in someone with the disease compared to someone without.

An LR+ > 10 and LR− < 0.1 usually indicate a useful diagnostic test.

In [7]:
import pandas as pd


def scoring(estimator, X, y):
    y_pred = estimator.predict(X)
    pos_lr, neg_lr = class_likelihood_ratios(y, y_pred, raise_warning=False)
    return {"positive_likelihood_ratio": pos_lr, "negative_likelihood_ratio": neg_lr}


def extract_score(cv_results):
    lr = pd.DataFrame(
        {
            "positive": cv_results["test_positive_likelihood_ratio"],
            "negative": cv_results["test_negative_likelihood_ratio"],
        }
    )
    return lr.aggregate(["mean", "std"])

In [8]:
from sklearn.model_selection import cross_validate
from sklearn.dummy import DummyClassifier
estimator = DummyClassifier(strategy="most_frequent")
extract_score(cross_validate(estimator, X, y, scoring=scoring, cv=10))

Unnamed: 0,positive,negative
mean,,1.0
std,,0.0


## Invariance with respect to prevalence
