
# Data Cleaning for ICU Measurements

This notebook processes ICU data to filter and clean measurements for heart rate, SpO2, 
systolic and diastolic arterial pressure, mean arterial pressure, respiration rate, 
and temperature to prepare it for machine learning model training.

## Steps
1. Load the data.
2. Filter by relevant item IDs.
3. Consolidate data to ensure each timestamp contains all measurements.
4. Handle missing data by dropping any incomplete records.
5. Output the clean dataset.


In [6]:

import pandas as pd

# Load the data
# chartevents_path = "~/Desktop/Fluid-Solutions-ML/data/raw/chartevents.csv"
chartevents_path = "~/Fluid-Solutions-ML/data/raw/chartevents.csv"
chart_df = pd.read_csv(chartevents_path)
    

In [7]:

# Define the item IDs for each measurement
item_ids = {
    'Heart Rate': 220045,
    'SpO2': 226253,
    'Systolic BP': 220050,
    'Diastolic BP': 220051,
    'MAP': 220052,
    'Respiration Rate': 227539,
    'Temperature': 223762  # Assuming this is the core temperature
}
    

In [8]:

# Filter chartevents for relevant item IDs
filtered_chart = chart_df[chart_df['itemid'].isin(item_ids.values())]
filtered_chart['charttime'] = pd.to_datetime(filtered_chart['charttime'])
    

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered_chart['charttime'] = pd.to_datetime(filtered_chart['charttime'])


In [None]:

# Pivot the table to have one row per timestamp per patient with all measurements
pivot_chart = pd.pivot_table(
    data=filtered_chart,
    index=['subject_id', 'itemid', 'charttime'],
    columns='hadm_id',
)
# Rename columns based on item_ids for clarity
pivot_chart.rename(columns={v: k for k, v in item_ids.items()}, inplace=True)
    

TypeError: agg function failed [how->mean,dtype->object]

In [None]:

# Drop rows with any missing values to ensure completeness for model training
complete_records = pivot_chart.dropna()
# Save the clean dataset
clean_data_path = "~/Desktop/Fluid-Solutions-ML/data/processed/clean_data.csv"
complete_records.to_csv(clean_data_path)
    

In [None]:

print("Data cleaning complete. Output saved to:", clean_data_path)
    