# 🧬 Gene Therapy Simulation

This notebook simulates patient eligibility for gene therapy based on RA severity, symptom volatility, and consent status. It mirrors clinical trial logic and documents all steps for ethics compliance. All data is simulated.


In [2]:
import pandas as pd

# Load CRF data
df = pd.read_csv('../data/crf_simulated.csv')

# Map symptom notes to numeric codes
symptom_map = {'Stable': 0, 'Improved': 1, 'Worsened': 2}
df['Symptom_Code'] = df['Symptom_Notes'].map(symptom_map)

# Aggregate patient-level features
agg_df = df.groupby('Patient_ID').agg({
    'RA_Severity_Score': ['mean', 'max'],
    'Symptom_Code': lambda x: x.nunique(),
    'Consent_Status': lambda x: 1 if x.iloc[0] == 'Yes' else 0
})

agg_df.columns = ['Severity_Mean', 'Severity_Max', 'Symptom_Volatility', 'Consent_Binary']
agg_df.reset_index(inplace=True)

# Define eligibility logic
def check_eligibility(row):
    if row['Severity_Max'] >= 9 and row['Symptom_Volatility'] >= 2 and row['Consent_Binary'] == 1:
        return 'Eligible'
    else:
        return 'Not Eligible'

agg_df['Gene_Therapy_Eligibility'] = agg_df.apply(check_eligibility, axis=1)

# Preview results
agg_df[['Patient_ID', 'Severity_Max', 'Symptom_Volatility', 'Consent_Binary', 'Gene_Therapy_Eligibility']]


Unnamed: 0,Patient_ID,Severity_Max,Symptom_Volatility,Consent_Binary,Gene_Therapy_Eligibility
0,RA000,8,1,1,Not Eligible
1,RA001,7,2,1,Not Eligible
2,RA002,9,1,0,Not Eligible
3,RA003,10,2,0,Not Eligible
4,RA004,8,3,1,Not Eligible
...,...,...,...,...,...
95,RA095,8,3,1,Not Eligible
96,RA096,5,3,1,Not Eligible
97,RA097,7,2,1,Not Eligible
98,RA098,9,2,1,Eligible


**Eligibility Summary:**  
Out of 100 simulated patients, only one (RA098) met all criteria for gene therapy eligibility: high severity (≥9), symptom volatility (≥2), and confirmed consent. This mirrors real-world inclusion logic used in advanced clinical trials. All data is simulated and documented for ethics compliance.
