In [2]:
import pandas as pd

mic = pd.read_csv('./data/MIC.csv')
mic.head()

Unnamed: 0,ID,AGE,SEX,INF_ANAM,STENOK_AN,FK_STENOK,IBS_POST,IBS_NASL,GB,SIM_GIPERT,...,JELUD_TAH,FIBR_JELUD,A_V_BLOK,OTEK_LANC,RAZRIV,DRESSLER,ZSN,REC_IM,P_IM_STEN,LET_IS
0,1,77.0,1,2.0,1.0,1.0,2.0,,3.0,0.0,...,0,0,0,0,0,0,0,0,0,0
1,2,55.0,1,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0,0,0,0,0,0,0,0,0,0
2,3,52.0,1,0.0,0.0,0.0,2.0,,2.0,0.0,...,0,0,0,0,0,0,0,0,0,0
3,4,68.0,0,0.0,0.0,0.0,2.0,,2.0,0.0,...,0,0,0,0,0,0,1,0,0,0
4,5,60.0,1,0.0,0.0,0.0,2.0,,3.0,0.0,...,0,0,0,0,0,0,0,0,0,0


In [6]:
mic.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1700 entries, 0 to 1699
Columns: 124 entries, ID to LET_IS
dtypes: float64(110), int64(14)
memory usage: 1.6 MB


In [10]:
mic.isnull().sum().sort_values(ascending = False).head(20)

KFK_BLOOD        1696
IBS_NASL         1628
S_AD_KBRIG       1076
D_AD_KBRIG       1076
NOT_NA_KB         686
LID_KB            677
NA_KB             657
GIPER_NA          375
NA_BLOOD          375
K_BLOOD           371
GIPO_K            369
AST_BLOOD         285
ALT_BLOOD         284
S_AD_ORIT         267
D_AD_ORIT         267
DLIT_AG           248
ROE               203
ritm_ecg_p_06     152
ritm_ecg_p_08     152
ritm_ecg_p_01     152
dtype: int64

Prediction Task: predicting complications of MI
- Complications in columns 113-124 (12 target variables)
- any of these could be modeled as a separate binary classification problem or it could be a multi-label classification problem

Clinical risk factors	
- Age
- Gender
- Hypertension
- Diabetes
- Heart Failure	

MI history (Past events often predict future ones)
- INF_ANAM
- STENOK_AN
- CHD in recent weeks

ECG findings (help localize and classify MI type)
- ant_im
- lat_im
- inf_im,
- ritm_ecg_, np_

Bloodwork (reflect organ function / inflammation)
- K_BLOOD
- Na_BLOOD
- L_BLOOD
- ROE
- AST/ALT/CPK	

Symptoms on admission
- Pulmonary edema
- cardiogenic shock
- arrhythmia	

Time to hospital (Delayed treatment = worse outcomes)
TIME_B_S	

Therapies applied	
- Fibrinolytics
- opioids
- NSAIDs
- nitrates	

**Demographics & History**

- AGE: Older age is associated with worse cardiac outcomes, reduced physiological reserve, and higher complication rates after MI (e.g., arrhythmia, heart failure)
- SEX: sex differences exist in cardiac presentation and outcomes; for example, females may present atypically and are often undertreated. Men tend to have earlier onset of CHD
- INF_ANAM – Previous MI events: A history of prior myocardial infarctions suggests underlying coronary artery disease, increasing the risk of reinfarction, heart failure, and arrhythmias
- STENOK_AN – History of angina: Indicates pre-existing ischemic burden. Chronic angina suggests advanced coronary disease and reduced myocardial reserve
- FK_STENOK – Functional class of angina: The severity of angina symptoms before MI helps predict cardiac reserve and tolerance to ischemia. Higher classes correlate with worse prognosis
- IBS_POST – CHD in recent weeks: Recent ischemic episodes (like unstable angina or recent MI) suggest an unstable plaque or active coronary syndrome, which can lead to complications
- ZSN_A – Chronic heart failure: Existing heart failure significantly raises the risk of acute decompensation, arrhythmias, and mortality post-MI
- GB – Hypertension stage: Hypertension is a major risk factor for CHD and contributes to left ventricular hypertrophy, increased afterload, and worse outcomes
- SIM_GIPERT – Symptomatic hypertension: Reflects poorly controlled hypertension; may exacerbate ischemia or trigger acute complications like stroke or heart failure
- endocr_01 – Diabetes: Diabetics have higher risk of silent ischemia, worse atherosclerosis, impaired healing, and higher complication/mortality rates post-MI
- endocr_02 – Obesity: Obesity is associated with metabolic syndrome, inflammation, and increased cardiac workload. Paradoxically, some studies show an "obesity paradox," but overall it's a risk enhancer

**Admission Condition**
- S_AD_KBRIG / D_AD_KBRIG – Blood pressure on emergency arrival: Low BP may indicate cardiogenic shock or poor perfusion. High BP may reflect acute stress or hypertensive crisis. Hemodynamic status is critical in risk stratification
- K_SH_POST – Cardiogenic shock: A key marker of severe LV dysfunction. Strong predictor of mortality and multiple complications (e.g., AKI, arrhythmias, multi-organ failure)
- O_L_POST – Pulmonary edema: Suggests left-sided heart failure or high LV filling pressures; often indicates acute decompensation or significant myocardial injury
- MP_TP_POST – Atrial fibrillation on arrival: AF on admission may result from atrial strain or existing structural heart disease, increasing stroke risk and complicating management
- SVT_POST – Supraventricular tachycardia: Can worsen ischemia and hemodynamics; indicates electrical instability or hyperadrenergic state
- FIB_G_POST – Ventricular fibrillation: A life-threatening arrhythmia usually reflecting large infarct size or ischemic myocardium; critical predictor of poor outcome

**ECG Findings**
ant_im – Anterior infarct: Anterior wall infarctions typically involve more myocardium and are associated with worse LV dysfunction and outcomes
inf_im – Inferior infarct: Often involves the RCA, which can affect conduction system and cause bradyarrhythmias. Outcomes generally better than anterior MI
post_im – Posterior infarct: Often missed due to subtle ECG changes. Associated with high risk of arrhythmias and larger infarct size if extensive
ritm_ecg_p_01 – Sinus rhythm: Normal rhythm; absence of arrhythmias is generally favorable. Baseline for comparing risk if abnormal rhythms appear later
ritm_ecg_p_07 – Tachycardia: May be a response to pain, hypoxia, or low cardiac output. Persistent tachycardia increases myocardial oxygen demand and worsens ischemia
n_r_ecg_p_03 – Ventricular ectopy: Early sign of ventricular irritability, can precede ventricular tachycardia/fibrillation. Important marker of electrical instability

**Blood Tests**
K_BLOOD – Potassium levels: Both hypokalemia and hyperkalemia increase arrhythmia risk. Potassium disturbances are common in MI due to stress, renal dysfunction, or medication
KFK_BLOOD – Creatine phosphokinase (CPK) level: Creatine phosphokinase is a marker of muscle damage. High levels usually indicate larger infarct size and correlate with severity of myocardial injury

I think we should also only take the features at admission. 