<a href="https://colab.research.google.com/github/foxtrotmike/CS909/blob/master/CPHHR.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

📌 **Interpretation hazards in Hazard Ratios**

(Fayyaz Minhas)

In survival analysis, the *hazard ratio* quantifies how the hazard (i.e., the instantaneous event rate) changes with a one-unit increase in a covariate. For example, a hazard ratio of 1.5 for age means that for each additional unit (say, 1 year), the hazard increases by 50%.

---

❗ **Important Warning:**  
Hazard ratios are *scale-dependent*. Changing the unit (e.g., measuring age in years, decades, or centuries) will change the numeric value of the hazard ratio — even though the underlying model and its predictions remain exactly the same.

🚫 **Therefore: You should never compare models based on the magnitude of their hazard ratios.**  
Saying *"my hazard ratio is higher than yours"* makes no logical sense for evaluating model performance.  
It's not a scoring metric — it's a parameter estimate that reflects the scale of the covariate.

---

🎯 **Goal of this experiment:**  
To illustrate that the same Cox model fitted to the same data will produce different hazard ratios depending only on how we scale the covariate — *not* because one model is better than another.

---

🧠 **Machine learning researchers, take note:**  
Use proper evaluation metrics like **concordance index** to assess survival model performance — **not hazard ratios!**


In [6]:
!pip install lifelines matplotlib



In [7]:
import numpy as np
import pandas as pd
from lifelines import CoxPHFitter

# Set random seed for reproducibility
np.random.seed(42)

# Generate simulated data
n = 200  # Number of observations
age = np.random.normal(50, 10, n)  # Age in years
# Simulate event times with a baseline hazard and age effect
baseline_hazard = 0.01
time_to_event = np.random.exponential(1 / (baseline_hazard * np.exp(0.03 * age)))
# Censor some data randomly
event_occurred = np.random.binomial(1, 0.8, n)  # 80% events, 20% censored

# Create DataFrame
df = pd.DataFrame({
    'age': age,
    'time_to_event': time_to_event,
    'event_occurred': event_occurred
})

# Fit the Cox model with age in years
cph_years = CoxPHFitter()
cph_years.fit(df, duration_col='time_to_event', event_col='event_occurred', formula="age")
print("Cox Model with Age in Years:")
cph_years.print_summary()
print(f"Hazard Ratio for age (per year): {np.exp(cph_years.params_['age']):.4f}\n")

# Rescale age to decades
df['age_in_decades'] = df['age'] / 10

# Fit the Cox model with age in decades
cph_decades = CoxPHFitter()
cph_decades.fit(df, duration_col='time_to_event', event_col='event_occurred', formula="age_in_decades")
print("Cox Model with Age in Decades:")
cph_decades.print_summary()
print(f"Hazard Ratio for age (per decade): {np.exp(cph_decades.params_['age_in_decades']):.4f}")

# Rescale age to centuries
df['age_in_centuries'] = df['age'] / 100

# Fit the Cox model with age in centuries
cph_centuries = CoxPHFitter()
cph_centuries.fit(df, duration_col='time_to_event', event_col='event_occurred', formula="age_in_centuries")
print("Cox Model with Age in centuries:")
cph_centuries.print_summary()
print(f"Hazard Ratio for age (per century): {np.exp(cph_centuries.params_['age_in_centuries']):.4f}")


Cox Model with Age in Years:


0,1
model,lifelines.CoxPHFitter
duration col,'time_to_event'
event col,'event_occurred'
baseline estimation,breslow
number of observations,200
number of events observed,156
partial log-likelihood,-664.12
time fit was run,2025-03-24 06:10:57 UTC

Unnamed: 0,coef,exp(coef),se(coef),coef lower 95%,coef upper 95%,exp(coef) lower 95%,exp(coef) upper 95%,cmp to,z,p,-log2(p)
age,0.03,1.04,0.01,0.02,0.05,1.02,1.05,0.0,3.81,<0.005,12.83

0,1
Concordance,0.58
Partial AIC,1330.25
log-likelihood ratio test,14.41 on 1 df
-log2(p) of ll-ratio test,12.74


Hazard Ratio for age (per year): 1.0355

Cox Model with Age in Decades:


0,1
model,lifelines.CoxPHFitter
duration col,'time_to_event'
event col,'event_occurred'
baseline estimation,breslow
number of observations,200
number of events observed,156
partial log-likelihood,-664.12
time fit was run,2025-03-24 06:10:57 UTC

Unnamed: 0,coef,exp(coef),se(coef),coef lower 95%,coef upper 95%,exp(coef) lower 95%,exp(coef) upper 95%,cmp to,z,p,-log2(p)
age_in_decades,0.35,1.42,0.09,0.17,0.53,1.18,1.69,0.0,3.81,<0.005,12.83

0,1
Concordance,0.58
Partial AIC,1330.25
log-likelihood ratio test,14.41 on 1 df
-log2(p) of ll-ratio test,12.74


Hazard Ratio for age (per decade): 1.4170
Cox Model with Age in centuries:


0,1
model,lifelines.CoxPHFitter
duration col,'time_to_event'
event col,'event_occurred'
baseline estimation,breslow
number of observations,200
number of events observed,156
partial log-likelihood,-664.12
time fit was run,2025-03-24 06:10:57 UTC

Unnamed: 0,coef,exp(coef),se(coef),coef lower 95%,coef upper 95%,exp(coef) lower 95%,exp(coef) upper 95%,cmp to,z,p,-log2(p)
age_in_centuries,3.49,32.63,0.91,1.69,5.28,5.44,195.67,0.0,3.81,<0.005,12.83

0,1
Concordance,0.58
Partial AIC,1330.25
log-likelihood ratio test,14.41 on 1 df
-log2(p) of ll-ratio test,12.74


Hazard Ratio for age (per century): 32.6314
